← All Posts

When the Regulator Calls: Lessons from Tesla's FSD Data Challenge

Tesla has secured a second deadline extension from NHTSA in the agency's ongoing investigation into traffic violations by vehicles running Full Self-Driving. The original deadline for delivering crash data was January 19. It was pushed to February 23, and now to March 9. Whatever you think about Tesla or FSD, the underlying challenge here is worth paying attention to — because it's one the entire autonomous vehicle industry will face.

NHTSA opened Preliminary Evaluation PE25012 in October 2025 after linking 58 incidents to vehicles operating with FSD, including crashes involving red-light violations and lane departures into oncoming traffic. By December, the count had risen to 80 documented violations across a fleet of 2.88 million vehicles. The agency's Information Request covers consumer complaints, field reports, crash data, lawsuits, and internal assessments — a sweeping ask, but not an unusual one for an investigation of this scale.

The data production challenge

What NHTSA wants is, on paper, straightforward: CAN bus data, event data recorder files, video, and performance anomaly reports for each incident. A 30-second timeline before each traffic violation. Which FSD software version was running. Whether drivers received warnings. Whether crashes, injuries, or fatalities resulted. For anyone who's worked with vehicle data systems, this is a reasonable — if labor-intensive — set of requirements.

Tesla told NHTSA it has 8,313 records requiring manual review, with capacity to process about 300 per day. The company also noted that responding simultaneously to multiple federal probes — including investigations into delayed crash reporting and inoperative door handles — compounds the burden. For the latest extension, Tesla explained it couldn't determine the total number of relevant files until it completed its incident list, and only then could it begin querying for associated data and generating the required file formats.

Here's what's interesting about this: Tesla almost certainly has all this data. The company has logged billions of FSD miles and runs one of the most data-intensive driving programs on the planet. The challenge isn't data collection — it's data retrieval and production. The distance between "we captured it" and "we can hand it to a regulator in a structured, timely format" turns out to be enormous when your data infrastructure wasn't built with regulatory production in mind. This isn't a uniquely Tesla problem. It's an architectural gap that most AV companies will eventually confront as regulatory scrutiny intensifies.

A growing industry challenge

The timing underscores the urgency. Tesla launched unsupervised Robotaxi rides in Austin on January 22, and the same FSD software stack that's under investigation is at the core of that service. Across the industry, the tension between moving fast on deployment and maintaining robust data practices is real. Waymo, for instance, has invested heavily in proactive transparency — publishing peer-reviewed safety data and filing a voluntary recall within weeks when a school-bus-passing issue surfaced. Different companies are at different points on this spectrum, but the regulatory direction is clear: the era of "trust us" is giving way to "show us."

Every AV company will eventually face its version of this moment — a federal information request, a compliance audit, a liability proceeding — where the question isn't "did you collect the data?" but "can you produce it?"

Designing for the question you know is coming

This is the problem PhyWare is built to solve — not just for Tesla, but for any company operating autonomous systems at scale.

PhyTrace captures telemetry continuously from autonomous systems — CAN bus data, sensor readings, AI decision traces, speed, location, operational mode, safety events — and normalizes it into a structured, queryable format we call the Unified Data Model. The UDM models autonomous vehicles as a first-class source type, with dedicated data domains that map directly to the kinds of things regulators ask for: motion data (speed, acceleration, braking), perception (sensor health, object detections), safety (violations, emergency stops, proximity events, time-to-collision), AI decisions (model version, confidence, alternatives considered), and operational state (autonomous vs. manual mode). Every event carries microsecond-precision timestamps and software version tags.

PhyCloud stores that data immutably with cryptographic provenance — hash-chained, tamper-evident, and independently verifiable. When a regulator asks for 30 seconds of pre-incident data across 80 incidents, the answer is a query, not a months-long manual review. The data is already structured, indexed, and export-ready. PhyComp, our compliance automation layer, maps regulatory requirements to data queries and generates audit-ready evidence packages — turning an information request into a report you export, not a project you staff up for.

The goal isn't to second-guess any particular company's data practices. It's to make sure the answer to "can you produce your safety data?" is always, simply, "yes."

Looking ahead

The regulatory environment for autonomous systems is tightening worldwide — the EU AI Act, the Machinery Regulation, updated ISO standards, and increasingly assertive NHTSA investigations. Companies that build data infrastructure for regulatory transparency now will have a significant advantage over those that retrofit it later under pressure.

PhyWare is in active development. If you're building or operating autonomous systems and you're thinking about these challenges, we'd like to hear from you.

www.phyware.io · LinkedIn · business@phyware.io
Share this post: X LinkedIn