FSD Videos are For Entertainment Only
Evaluate the progress of Tesla FSD is basically understanding the reliability of the full self driving system. You can write a program that drive you from A to B if there is not traffic and people on the road within days but It take 10 years for us to deploy our first Robotaxi.
Human are bad at probability. We debate the solution of Monty Hall problem forever and don't understand that we have high probability of having same birthday in a classroom. It's super hard for human to have a reliable estimation of probability of a event. We use intuitive and example not statics and math to understanding probability and It's bad when we try to estimate the progress of FSD.
At the beginning of each sub version, we always have videos from influencer showing all these 0 disengagement videos and hype everybody up. After FSD being released to public, we see more disengagements and people start reporting disengagement data in the FSD tracker. Some influencer doesn't happy with the data and start attacking FSD tracker for being bias or manipulated.
Let assuming both party are honest, why this happened? From FSD tracker, it says the average miles to critical (safety) disengagement is 220 miles and 50 miles to disengagement. For a short drive (5 miles), the possibility that you get 0 critical disengagement drive is 97% and no disengagement drive is 90%. For longer drive (30 miles), the probability of having no critical disengagement drive is 87% and having no disengagement drive is 50%.
When you are watching these one hour no disengage video and impressed by the progress of FSD, you probably not witnessing the improvement of FSD, rather, you find a influencer that flips a head in a coin toss.
Can you evaluate the progress of FSD by watching videos? Not really. When we talking about evaluation, we are talking about confidence. The problem becomes, how many hours of videos we need to watch to make sure we have confidence that fsd 13.2 is 2x better than 12.5?
It depends on two factor: the confidence level and number of failure you can tolerate. The higher confidence you have, the more failure you can have during the test, the longer test time.
For example, in order to have very high confidence (90%) that fsd 13.2 have more than 400 miles miles to critical disengagement. You need to have a continuous run of fsd of 921 city miles without any critical disengagement. At 30 miles per hour, that's about 30 hours of video.
The zero failure test is the fastest way of having confidence. However, it pretty unlikely that you will get one. Only 5% of chance you will get a such a run with a true mile to critical disengagement of 400 miles FSD system. You need tolerate more failure which means longer test time.
What we can get from these fsd videos? Probably nothing. It's nice to see new functionalities but it basically gives us 0 confidence about the safety and progress of FSD system.