Is Training AI On Copyrighted Content An Infringement?

Article Insights

Akul Chauhan’s articles from Khurana and Khurana are most popular:

within Intellectual Property topic(s)
with Senior Company Executives, HR and Finance and Tax Executives
in United States
with readers working within the Accounting & Consultancy, Healthcare and Media & Information industries

Introduction

Anytime you type a query into Chat GPT or prompt any AI to write something, you are interacting with a model that has been trained on a massive corpus of text-books, articles, web pages, etc. The vast majority of that text belongs to somebody else. Authors, journalists, publishers, artists all possess copyrights over their work. In nearly every instance, they never asked permission before their work was fed to an AI model.

This has now sparked a flurry of lawsuits around the world, each raising essentially the same question: does the training of an AI system on copyrighted data represent a copyright violation? Courts in the U.S. Have begun to answer this question, with India not far behind.

This piece examines the state of the law at present, and its implications for creators and AI creators.

What Happens When AI Is “Trained”?

To understand the legal problem, it helps to understand the technology- briefly. An AI language model is trained by processing massive datasets of text. During this process, the AI system makes a copy of the material, reads through it, and learns patterns of language. The original text is not stored word for word inside the model, but a copy of the material is made at some stage in order to process it.

This is where the legal issue starts. In the majority of jurisdictions, copyright legislation confers the creator of the work with the sole right to reproduce the work. Creating any copy of the work (even a transient one, or a copy in a machine to read/ learn the data from) may infringe that right if none of the explicit exceptions will cover that particular situation. AI companies argue their use is transformative, like a student learning from books. Rights holders argue that students cannot build billion-dollar businesses from books they never paid for.

What Courts Around the World Have Said

United States

The United States has been the stage for most of the cases on this topic. It was not until February 2025 that the first significant court decision on the matter in Thomson Reuters v. ROSS Intelligence ruled that the unauthorized use of Westlaw’s copyrighted editorial materials to train a competing AI legal application was not fair use1. The reasoning applied in Thomson Reuters was that the AI developer was producing a product designed to compete directly with Westlaw, and thus was using Westlaw’s content in a way that was detrimental and commercial, rather than transformative, to Westlaw’s business.

However, later that year, in June 2025, two California federal courts had different decisions on the use of books by AI. One decision found Anthropic to have made fair use of books for training an AI model, while another, involving Meta and books5, ruled that Meta also engaged in fair use training on books, holding that there was insufficient evidence to demonstrate that the AI output would substitute or cause economic harm to the books. What can be learned: The same activity could be fair use or copyright infringement depending on how the AI is intended to function and how it will impact the market for the source material.

The U.S Copyright Office issued a report in May 2025 clarifying that commercial-scale AI training on copyrighted content- especially where the AI produces content that competes with the originals -is unlikely to qualify as fair use.

European Union

The EU AI Act, implemented in 2024, ensures that AI companies comply with copyright law in training their models. Indeed, according to EU law, commercial AI training is only permitted unless the copyright holder actively opts out. The onus falls on AI developers to ensure that content owners have not objected to use of their work, as this is clearly the opposite of a take-first-then-ask-forgiveness policy some tech companies have taken.

Other Jurisdictions

Japan and Singapore have taken the most permissive approach, allowing AI training on copyrighted content for data analysis purposes without requiring a licence. Germany, on the other hand, ruled in November 2025 that Open AI had infringed copyright by reproducing song lyrics in its outputs that it had been trained on -a stark reminder that training-side infringement can surface in the output.

The Indian Legal Position

The Copyright Act, 1957 and Its Gaps

In India, copyright is governed by the Copyright Act, 1957. Section 51 of the Act defines infringement as doing anything that only the copyright owner is allowed to do -such as reproducing or adapting their work -without a licence.⁷ The main exception is Section 52, which permits “fair dealing” for purposes like private study, research, criticism, and news reporting.

The problem is that Section 52 was written in 1957 and last updated in 2012. It contains no mention of text and data mining, AI training, or computational analysis. Nowhere does it say that feeding an entire library of books into a machine for profit qualifies as “research” or “private study.” The Department for Promotion of Industry and Internal Trade (DPIIT) has already indicated that commercial AI training does not fall within the fair dealing exceptions.⁹

ANI Media v. OpenAI -India’s Landmark Case

India’s most significant case on this issue is ANI Media Pvt. Ltd. v. Open AI Inc., filed before the Delhi High Court. ANI, one of India’s largest news agencies, alleged that Open AI scraped and stored its copyrighted news articles to train ChatGPT - without permission and without any payment.

The Delhi High Court is examining four key questions: whether storing ANI’s content for training is infringement; whether using it to generate responses is infringement; whether any such use qualifies as fair dealing under Section 52; and whether Indian courts have jurisdiction over a foreign AI company at all.

No final judgment has been delivered yet, but this case has already had a real-world impact. NDTV, The Indian Express, Hindustan Times, and the Federation of Indian Publishers -representing publishers like Bloomsbury and Penguin Random House -have filed similar suits. The momentum is unmistakable.

What Needs to Change

In May 2025, Government of India formed an expert committee to examine whether the Copyright Act, 1957 is suitable for AI. Early indications show the panel contemplating on creating a separate chapter for AI generated works, more defined terms of authors and also, potentially a royalty or a license mechanism wherein developers of AI can make use of readily available content with payment collected by copyright society as it is currently for music works. Such a mechanism seems to appropriately balance the two equally valid interests of promoting development of AI on one hand and compensating its creators on the other hand. If this ultimately turns into a statute or an amendment is something which is left to see.

Conclusion

This law on AI training and copyright is in the making in courtrooms in Delhi, California and Munich, and in committees in Brussels and New Delhi. What we know for sure though is this - the premise that using public domain content to train AI is always permissible is no longer true, especially not in India, where DPIIT has already given a signal that any commercial usage of copyrighted content for AI training must be licensed. For the creator, the law has conveyed one message - your work has value, and the law is here to catch up. For the AI developer, it has a very simple message - implement licensing frameworks now, lest the courts do it for you. For the Indian legislator, there is an open window of opportunity to set forth unambiguous and fair rules. This window however will not remain open for long.

References/End-Notes

Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc., No. 20-613-MN (D. Del. Feb. 11, 2025).
Bartz et al. v. Anthropic PBC, No. C 24-05417 WHA (N.D. Cal. June 23, 2025); Kadrey et al. v. Meta Platforms, Inc., No. 3:23-cv-03417-VC (N.D. Cal. June 25, 2025).
U.S. Copyright Office, Copyright and Artificial Intelligence: Part 3 -Generative AI Training (May 2025).
Regulation (EU) 2024/1689 (EU AI Act), arts. 53(1)(c), 53(2); Directive (EU) 2019/790 on Copyright in the Digital Single Market, art. 4.
Japan: Article 30-4, Copyright Act of Japan (as amended 2018); Singapore: Section 244, Copyright Act 2021.
GEMA v. OpenAI GmbH (Landgericht Munich I, Nov. 2025).
Copyright Act, 1957 (India), § 51.
Copyright Act, 1957 (India), § 52.
Department for Promotion of Industry and Internal Trade (DPIIT), PIB Press Release (2024), available at pib.gov.in/PressReleasePage.aspx?PRID=2004715.
ANI Media Pvt. Ltd. v. OpenAI Inc., 2024 SCC OnLine Del 8120 (Del. HC).
Federation of Indian Publishers v. OpenAI Inc. (New Delhi, 2025) (filed); suits by NDTV, The Indian Express and Hindustan Times (2024-2025).
Ministry of Commerce and Industry, Government of India, Constitution of Expert Panel on Copyright and AI (May 2025).

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.