Understanding AI Distillation In The Trade Secret Context

Article Insights

Sarah Tishler’s articles from Beck Reed Riden are most popular:

within Intellectual Property topic(s)
with readers working within the Banking & Credit, Media & Information and Securities & Investment industries

Beck Reed Riden are most popular:

within Intellectual Property topic(s)

For most of the past year, when AI companies have talked about distillation, they have talked about it as a Chinese problem. In February 2026, OpenAI told the House Select Committee on China that DeepSeek employees had developed methods to circumvent OpenAI’s access restrictions and programmatically harvest outputs to train a competing model. Anthropic’s public technical disclosure went further, naming DeepSeek, Moonshot AI, and MiniMax, and quantifying the activity: roughly 24,000 fraudulent accounts, more than 16 million exchanges, focused on extracting Claude’s reasoning, coding, and agentic capabilities.

By April, the White House Office of Science and Technology Policy had adopted a national-security framing, accusing China of running “deliberate, industrial-scale campaigns” to distill American frontier models. Throughout this timeline, the narrative was that of foreign actors using proxy infrastructure to extract capabilities the United States had spent billions to build.

Recently, however, that narrative got more complicated. On the stand in Musk v. Altman in the Northern District of California, Elon Musk admitted under cross-examination that his AI company, xAI, had “partly” distilled OpenAI’s models. “It is standard practice,” Musk said, “to use other AIs to validate your AI.” There were, by the reporters’ account, audible gasps in the courtroom.

Whether or not it is in fact standard practice, distillation is now a topic that trade secret practitioners should understand. The legal frameworks for how courts will treat it are not yet settled, and the questions it raises map onto familiar trade secret doctrine in ways that are sometimes intuitive and sometimes not.

What Is Distillation?

In the technical literature, knowledge distillation is a method developed in the mid-2010s for transferring capabilities from a large, expensive model (the “teacher”) into a smaller, cheaper model (the “student”). The student is trained to reproduce the teacher’s outputs across a broad range of inputs, with the result being a model that approximates the teacher’s behavior at a fraction of the size and cost. As Anthropic’s own disclosure acknowledges, distillation is “widely used and legitimate,” and frontier labs routinely distill their own models to create smaller, cheaper versions for production deployment.

The contested form of distillation, sometimes called adversarial or unauthorized distillation, takes the same technique and applies it to someone else’s model. The distiller does not need to steal weights or breach servers. Access to the teacher model’s API is sufficient. The distiller submits a large volume of carefully constructed queries, captures the outputs, and uses those outputs as training data for a competing student model. The technique converts what would otherwise be a public-facing inference service into a training corpus for a rival system.

The finances of this technique are compelling, to say the least. To take one example, ChatGPT-5 reportedly cost more than $2 billion to develop. DeepSeek’s R1, which OpenAI alleges was built in part through distillation of ChatGPT outputs, reportedly cost approximately $6 million of marginal training compute.

The Trade Secret Theory

The question for trade secret lawyers is whether what gets extracted through distillation can be a trade secret, and whether the extraction can constitute misappropriation.

First, it is always worth starting from first principles. A trade secret must be information that derives independent economic value from not being generally known or readily ascertainable, and that the owner has taken reasonable measures to keep secret. 18 U.S.C. § 1839(3). Frontier model weights and the proprietary training methodologies that produced them comfortably satisfy these criteria, just as compiled software and machine-learning architectures have for years. See, e.g., Camilla A. Hrdy, Trade Secrecy Meets Generative AI, 100 Chi.-Kent L. Rev. 317 (2025). The harder question is whether the outputs of a model, accessed through a public API, can carry trade secret status. And if they can, whether systematically harvesting them to reconstruct the underlying capabilities is misappropriation, or merely a novel form of reverse engineering.

Of course, that distinction matters. Reverse engineering of a publicly available product is a textbook proper means of acquisition under both the DTSA and the UTSA. See 18 U.S.C. § 1839(6)(B). If a frontier model is made available to the public through an API, and a competitor pays the access fees and queries it to learn how it behaves, the surface analogy to traditional reverse engineering is obvious: buy the product, study how it works, and build a competitor.

On the other hand, traditional reverse engineering targets a finished good, the embodied capability. Distillation targets the model’s underlying reasoning processes, learned representations, and capability distributions. As one recent analysis framed it, the trade secret on this theory is “the aggregate of learned representations that required billions of dollars in compute, proprietary training data, and years of research to develop, and that the owner has chosen to make available only through controlled inference access rather than” by distributing the model itself. That looks less like buying a product to take apart and more like extracting something the owner specifically chose not to release.

The terms-of-service overlay further complicates the analysis. OpenAI’s terms expressly prohibit using outputs to develop “imitation frontier AI models.” Anthropic’s terms contain similar restrictions. A user who agrees to those terms and then engages in mass output extraction is not an arms-length reverse engineer, but a contracting party using a service contrary to its agreed restrictions. Whether this transforms the conduct from “proper means” reverse engineering into misappropriation by improper means is, to my knowledge, an open question.

While there is no published decision directly on point, prior cases provide useful guidance. For example, in Compulife Software, Inc. v. Newman, 959 F.3d 1288 (11th Cir. 2020), defendants used a bot to submit automated queries to a public insurance-quote website, harvesting more than 43 million quotes in four days; a feat the court recognized would have taken a human “thousands of man-hours” to replicate. Id. at 1310. The Eleventh Circuit held that even though each individual quote was publicly available and not itself a trade secret, the compilation obtained at machine scale could be misappropriated through “improper means.” Id. at 1313–14. The Court wrote:

“Nor does the fact that the defendants took the quotes from a publicly accessible site automatically mean that the taking was authorized or otherwise proper. Although Compulife has plainly given the world implicit permission to access as many quotes as is humanly possible, a robot can collect more quotes than any human practicably could. So, while manually accessing quotes from Compulife’s database is unlikely ever to constitute improper means, using a bot to collect an otherwise infeasible amount of data may well be—in the same way that using aerial photography may be improper when a secret is exposed to view from above.”

Id. at 1314. Applying the Compulife framework to distillation, one could easily see how a court would also find that adversarial distillation constitutes misappropriation.

Three Hypotheticals

For practitioners advising clients on either side of these issues, three scenarios illustrate where the doctrine may soon be tested in a courtroom near you.

Hypothetical One: The departing employee. A senior research engineer at a frontier lab leaves to join a competitor, taking with him knowledge of the prompt strategies and query patterns that most efficiently elicit the teacher model’s distinctive capabilities. He then directs his new employer’s distillation pipeline using that knowledge, achieving extraction efficiency the new employer could not have achieved on its own. Even if every individual query is permissible, and even if the employee discloses no model weights or training data, the prompt engineering methodology itself may be a protectable trade secret. This fits neatly into traditional employee mobility doctrine, with the wrinkle that the misappropriated information is a method of extracting third-party model capabilities rather than information about the former employer’s own operations.

Hypothetical Two: The validation defense. A company is sued by a frontier lab on the theory that it engaged in unauthorized distillation. The defendant responds: it was using the teacher model to validate its own model’s outputs, not to train on them. Validation is a real and legitimate practice in AI development. Distillation and validation can also look very similar, with both involving large volumes of structured queries and capture of outputs. The forensic question becomes: how does a plaintiff prove that captured outputs ended up in a training pipeline rather than a benchmarking dashboard? (This is a particular flavor of the black box problem that I have written about before.)

Hypothetical Three: The downstream user. A startup builds a product on top of an open-source model that, it later emerges, was itself distilled from a frontier model in violation of the frontier lab’s terms of service. The startup did not perform the distillation, did not know the upstream provenance, and is many steps removed from the original conduct. Does the DTSA’s “knew or had reason to know” standard, 18 U.S.C. § 1839(5)(B), reach the downstream user? At what point does the diligence obligation attach? The “innocent acquirer” framework was designed for a world in which trade secrets traveled through identifiable human or corporate intermediaries. It is not obvious how it applies when the trade secret allegedly travels through model weights that have been released to the public on Hugging Face.

The Bottom Line

As the AI industry continues its arms race, it seems likely that distillation is here to stay.

Companies whose AI offerings are accessible through APIs should be evaluating their terms-of-service architecture, their detection capabilities for anomalous query patterns, and their internal documentation of the resources invested in developing the underlying capabilities, because all of those will matter for any misappropriation case down the line. Companies that build on top of third-party models should be developing diligence practices around the provenance of those models, because of the DTSA’s constructive-knowledge standard. And practitioners on both sides should be watching the Musk v. Altman docket with interest. Formally, the case is about a charitable trust dispute. But the testimony it has produced may turn out to be the most candid public record we are going to get, for some time, of how distillation (and other AI-specific issues) actually works between sophisticated AI developers.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

Understanding AI Distillation In The Trade Secret Context

Contributor

What Is Distillation?

The Trade Secret Theory

Three Hypotheticals

The Bottom Line

Intellectual Property

Contributor

United States