The Black Box Problem In AI Trade Secret Litigation: How Do You Prove Use?

Article Insights

Sarah Tishler’s articles from Beck Reed Riden are most popular:

within Technology topic(s)
with readers working within the Technology and Law Firm industries

Beck Reed Riden are most popular:

within Technology and Intellectual Property topic(s)

The dismissal of xAI's trade secret claims against OpenAI earlier this month meant that the court never reached what will surely be one of the thorniest questions in AI trade secret litigation: at the end of the day, how would xAI be able to prove that OpenAI actually used any stolen trade secrets? My earlier coverage discussed Judge Lin's order dismissing the case, finding that xAI had failed to plead facts connecting OpenAI's own conduct to the alleged misappropriation by its former employees. There was no plausible inference that OpenAI induced the theft, and no allegation that it ever received or incorporated what was stolen.

However, had xAI survived the motion to dismiss, it would have eventually faced a second and perhaps harder problem: how do you demonstrate (particularly before discovery) that a specific stolen file or methodology shaped anything inside a frontier AI model? That is the question this post examines.

The Traditional Playbook Falls Short

In a conventional trade secret case, proving use can be tricky, but achievable. A former employee takes a customer list and joins a competitor. Six months later, the competitor is calling your customers. The causal inference is not difficult to draw. A departing engineer takes manufacturing specifications to a rival. The rival's next product incorporates design features it had no prior capability to produce. Again, the inference is visible from the outside.

Courts have built a substantial body of case law around these kinds of observable signals. For example, in Applied Biological Laboratories, Inc. v. Diomics Corp., the defendant had no prior experience in the relevant industry before allegedly obtaining the plaintiff's trade secrets and suddenly releasing a competing product. No. 20-cv-02500-AJB-LL, 2021 WL 4060531 (S.D. Cal. Sept. 7, 2021) (denying motion to dismiss trade secret claims). In Autodesk, Inc. v. ZWCAD Software Co., the court denied a motion to dismiss trade secret claims where defendant's "products display identical idiosyncrasies and bugs that could have been introduced only through the wholesale copying of significant portions of misappropriated Autodesk code." No. 14-cv-01409-EJD, 2015 WL 2265479 (N.D. Cal. May 13, 2015). And in Yeiser Research & Development LLC v. Teknor Apex Co., the defendant had no prior capability to build a compact hose before receiving the plaintiff's confidential designs, then released one that incorporated the plaintiff's concept. 281 F. Supp. 3d 1021 (S.D. Cal. 2017) (denying motion to dismiss trade secret claims). In each case, the signal of use was observable from outside the defendant's systems.

The Black Box Problem: Even the Builders Don't Know

What makes AI trade secret cases unique is that even the people who build these systems openly admit they do not fully understand how they work.

For example, at the International Telecommunication Union's AI for Good Global Summit in May 2024, OpenAI CEO Sam Altman was asked directly how his company's large language models function. "We certainly have not solved interpretability," he said, acknowledging that the company has yet to figure out how to trace back its AI models' output to the decisions that produced it.

Anthropic CEO Dario Amodei has been even more direct. In an April 2025 essay on interpretability, he wrote that "people outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work," and that "this lack of understanding is essentially unprecedented in the history of technology." He went further, describing how even the basic architecture of these systems produces cognitive mechanisms that emerge organically from training in ways that researchers struggle to explain: "the model's actual cognitive mechanisms emerge organically from these ingredients, and our understanding of them is poor."

These are admissions from the CEOs of the two most prominent frontier AI companies in the world. The significance for trade secret law is clear: if the people who build these systems cannot fully explain how they work or how specific inputs influence specific outputs, how is a plaintiff supposed to plead that a specific stolen file contributed to a specific capability in a deployed model?

This opacity is not incidental—it is structural. As one scholar has observed, AI-based inventions are "even more difficult to reverse engineer" than traditional software "because they are neither explainable nor scrutable."¹ The same inscrutability that frustrates would-be reverse engineers also frustrates potential plaintiffs trying to trace stolen information through a model's training pipeline.

What the Black Box Means for Plaintiffs

Large language models, training pipelines, and proprietary AI architectures are not like customer lists or manufacturing processes. They are extraordinarily complex systems whose internal workings are, by design, largely opaque. Whether a specific piece of stolen source code contributed to a specific capability in a deployed model is a question that may be genuinely unanswerable without deep access to the defendant's internal systems, training data, model weights, and development history.

Consider the specific allegations in the xAI case. Li allegedly uploaded xAI's entire source code base to a personal cloud account. Fraiture allegedly copied source code and internal materials to his personal device before joining OpenAI. Assuming for the sake of argument that those allegations are true and that the materials constituted protectable trade secrets, how would xAI demonstrate that any of that information made its way into OpenAI's models or systems? The source code for a frontier AI model runs to millions of lines. Training pipelines involve complex interdependencies. Even if a specific piece of xAI's code appeared somewhere in OpenAI's development environment, tracing its influence on a deployed model's capabilities would require the kind of forensic access that simply is not available before discovery (and it is hard to imagine how it would be outwardly observable).

This challenge is compounded by the pleading standards plaintiffs already face. Courts—including a growing number of federal courts—require that misappropriation complaints identify the alleged trade secret with "sufficient particularity" to allow the defendant to understand what specific information is at issue and to respond.² For an AI algorithm whose very operation may be opaque even to its own designers, meeting that standard while simultaneously showing how the stolen information was incorporated into a frontier model creates a burden with no clear analogue in traditional trade secret litigation.

Despite this challenge, prior cases in analogous technology contexts offer some instructive lessons about how courts have approached the problem, and what strategies have worked.

How Courts Have Handled Analogous Complexity

The black box problem is not entirely new. Courts have encountered versions of it in prior cases involving complex software and autonomous systems, and their approaches offer a roadmap, imperfect but useful, for AI trade secret plaintiffs.

WeRide Corp. v. Kun Huang, 379 F. Supp. 3d 834 (N.D. Cal. 2019)

The WeRide litigation arose when the company's former CEO and Head of Hardware Technology allegedly copied proprietary autonomous vehicle source code and founded a competing company called AllRide. On WeRide's motion for preliminary injunction, the core evidentiary challenge was proving that AllRide's self-driving capabilities actually incorporated WeRide's stolen code rather than being independently developed. The defendant's systems were complex, and direct code comparison was unavailable before discovery.

The court's solution was to reason from impossibility rather than from direct evidence. WeRide's expert opined that it would have been impossible to independently develop the advanced driving capabilities AllRide publicly demonstrated just ten weeks after the former employee's last day at WeRide. The court found this sufficient to support a preliminary injunction, noting that implausibly fast development of technology can itself contribute to a finding of misappropriation. The court also pointed to a hardware configuration detail that reinforced the inference: AllRide positioned its radar component on the front center of the vehicle roof, just like WeRide, rather than on the front bumper or rear view mirror like most competitors. WeRide's expert testified that this placement was consistent with use of WeRide's source code, which would only be useful with the radar in that specific location.

The WeRide case offers two practical lessons for AI plaintiffs. First, the speed-of-development inference is a powerful tool when a defendant demonstrates capabilities that would have required substantial independent development time it demonstrably lacked. Second, observable product-level details that are consistent with use of specific stolen information, and inconsistent with independent development, can bridge the gap between theft and incorporation even without direct code comparison. For AI cases, the analog could be a capability, architecture choice, or benchmark performance that reflects specifically what was stolen in ways that cannot be explained by independent development. That may be a harder case to make, but the analytical framework is the same.

What Has Worked So Far

When no smoking gun is available, several categories of circumstantial evidence have proven effective in trade secret disputes, and offer a template for AI trade secret plaintiffs doing pre-filing investigation.

Of course, the clearest signal of use is a product capability that mirrors the plaintiff's alleged trade secrets and that the defendant had no prior ability to produce independently. The speed-of-development inference from WeRide is particularly powerful when it can be quantified. If a defendant can be shown, by credible expert analysis, to have demonstrated capabilities that would have required more time or resources than it actually had, that gap is difficult to explain without misappropriation.

Patent filings are another potentially useful signal. If a defendant files patents in the period following the alleged misappropriation that cover technical ground closely related to the plaintiff's alleged trade secrets, that is observable from outside the defendant's systems and can support a plausible inference of use. 3D Systems, Inc. v. Wynne, No. 21-cv-01141-LAB, 2022 WL 21697345 (S.D. Cal. Mar. 9, 2022), turned in part on exactly this kind of allegation.

The challenge for AI plaintiffs is that all of these signals are harder to read in the AI context. AI companies release products with new capabilities constantly. It would be genuinely difficult to distinguish a capability jump that results from misappropriation from one that results from independent research and development, particularly in a field where progress is rapid across the entire industry. And the sheer complexity of frontier AI systems makes product-level comparison far more difficult than comparing two pieces of software with identical interfaces.

There is also an important threshold question that pre-filing investigation must address: the protectability of the stolen information itself. Not every category of information related to a frontier AI system qualifies as a trade secret, even if kept confidential.³ Plaintiffs who fail to distinguish protectable trade secrets—such as proprietary training data, novel architecture choices, and non-public source code—from information that is generally known or readily ascertainable in the field risk dismissal on grounds wholly separate from the use-proof problem.

What Plaintiffs' Counsel Should Be Doing

Given this landscape, there are several practical steps that trade secret plaintiffs in AI cases should consider before filing.

The most important is pre-filing technical investigation. This means engaging forensic experts not just to document what was taken, but to analyze the defendant's publicly available products, papers, and patent filings for signs that the stolen information was put to use. The WeRide approach of quantifying development timelines and identifying product-level details inconsistent with independent development is a useful template. If the misappropriated materials related to a specific technical capability, model architecture, or training methodology, the investigation should focus on whether the defendant's public outputs reflect that capability in ways that would be surprising absent access to the plaintiff's information. As in the 3D Systems case, patent filings are another useful area of research.

Early preservation demands are also essential. The WeRide litigation is a powerful reminder that in complex technology cases, the most probative evidence of incorporation—specifically internal engineering records, development histories, and communications about technical decisions—is precisely what defendants are most motivated to destroy. A preservation demand issued at or before the time of filing is not a formality; it is a substantive litigation strategy.

Counsel should also think carefully about the limits of what can be proven. In AI trade secret cases, the question of use may ultimately be unanswerable through circumstantial evidence alone, no matter how skillfully assembled. The goal of pre-filing investigation is not to achieve certainty but to build a plausible inference strong enough to survive a motion to dismiss and reach discovery—where the evidence needed to answer the use question, if it exists, will actually be found.⁴

Looking Ahead

As talent continues to move rapidly between AI companies and as the competitive stakes in the industry grow, we can expect to see more disputes that raise the black box issue. The cases discussed above offer a consistent lesson: where direct evidence of use is unavailable, courts look for circumstantial evidence such as product-level similarities, implausibly fast development, and observable details that can only be explained by access to the stolen information.

We will continue to monitor developments in this area and will report on any significant rulings as they emerge.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]