ARTICLE
27 November 2025

AI And Copyright – Part II – The European And US Approach

HS
Hannes Snellman Attorneys Ltd

Contributor

Hannes Snellman Attorneys Ltd logo

Hannes Snellman is a leading Finnish business law firm entrusted by its clients in matters of critical importance. Our mission is to provide our clients with world-class advice and our people with world-class careers. What sets us apart is our deep commitment to achieving our clients’ goals. With our industry knowledge and business understanding, we provide simple yet effective advice and fresh perspectives, even in the most complex and demanding situations. We focus on what matters the most.

General-purpose artificial intelligence ("GPAI") models are rapidly transforming the creative and technological landscape, but their reliance on vast datasets — often including copyright-protected works — has ignited complex legal debates.
Finland Intellectual Property
Vilhelm Schröder’s articles from Hannes Snellman Attorneys Ltd are most popular:
  • within Intellectual Property topic(s)
  • in United States
  • with readers working within the Retail & Leisure and Law Firm industries
Hannes Snellman Attorneys Ltd are most popular:
  • within Intellectual Property, Media, Telecoms, IT, Entertainment, Litigation and Mediation & Arbitration topic(s)

The Challenge: AI vs. Copyright

General-purpose artificial intelligence ("GPAI") models are rapidly transforming the creative and technological landscape, but their reliance on vast datasets — often including copyright-protected works — has ignited complex legal debates. Companies developing these models frequently use web crawlers to collect data from the internet, raising concerns among rightholders about the loss of control over their content and the adequacy of current copyright laws.

GPAI models often use copyright-protected material for training, sparking debates over fair use and infringement. The difficulty in tracing sources and attributing original creators in AI-generated outputs further complicates matters. The ethical and particularly economic impact is profound, with the potential devaluation of human creativity and disruption of creative industries but a huge opportunity to make financial gains in the rapidly growing AI industry.

We wrote about this topic in our previous blog, where we considered the questions particularly from a national perspective. In this post, we turn to Europe and the US to review how they have dealt with the issue. The regions have adopted fairly different approaches, with the EU opting for a closed list of exceptions and the US applying a more flexible case-by-case assessment. There is also an interesting fresh ruling from the UK, Getty Images v. Stability AI, which raises questions as to whether the EU and the UK are in fact moving in different directions.

The European Approach: Specific Text and Data Mining (TDM) Exceptions

The EU Directive on Copyright in the Digital Single Market (2019/790) introduced exceptions for "text and data mining" ("TDM"), allowing certain uses of protected works for AI training under specific conditions. TDM is defined as any automated analytical technique aimed at analysing text and data in digital form to generate information, including patterns, trends, and correlations. These exceptions permit automated analysis of digital text and data, but only under lawful access and if rightholders have not opted out.

Rightholders can opt out of the TDM exception by using machine-readable or other appropriate means, but practical and legal ambiguities persist. The EU Artificial Intelligence Act (2024/1689) imposes obligations on providers of GPAI models to comply with Union law on copyright and related rights, including identifying and respecting reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790. Providers must also publish a sufficiently detailed summary about the content used for training their models. A pending case before the CJEU, Like Company v. Google Ireland (Case C-250/25), is expected to clarify whether training an LLM on protected works amounts to an act of reproduction and whether GPAI training is covered by the TDM exception. The dispute arose after Google's Gemini chatbot displayed responses partially identical to content from Like Company's news website. The chatbot can generate summaries and reproduce content from press publications when prompted by users. Like Company alleges that between June 2023 and February 2024, Google reproduced and made available its press publications electronically without consent, exceeding the permitted "use of individual words or very short extracts of a press publication". Google denies infringement, arguing that the chatbot is a creative tool rather than an information database and that any reproduction falls under exceptions for temporary acts or TDM.

The following questions (abbreviated form) have been referred to the CJEU:

  1. Does displaying text in an LLM-based chatbot's response that partially matches the press publishers' content and is long enough to be protected under Article 15 of Directive 2019/790 amount to "communication to the public"? If so, does the fact that the text is generated through predictive modelling matter?
  2. Does training an LLM-based chatbot by observing and matching linguistic patterns constitute "reproduction" under Article 15(1) of Directive 2019/790 and Article 2 of Directive 2001/29?
  3. If training qualifies as reproduction, does reproducing lawfully accessible works fall under the text and data mining exception in Article 4 of Directive 2019/790?
  4. When a user prompts an LLM-based chatbot with text matching or referring to a press publication, and the chatbot outputs part or all of that content, does this constitute reproduction by the chatbot service provider under Article 15(1) of Directive 2019/790 and Article 2 of Directive 2001/29?

A ruling for Like Company could require GPAI developers to obtain licences for both training and deployment, increasing legal and financial risks, whereas a ruling for Google could enable broader use of public data for GPAI training without compensation. The decision is expected sometime in 2027, and it is very interesting to see where the CJEU will land in this matter.

There are also a few recent interesting rulings from national courts in Europe. On 11 November 2025, the Munich Regional Court issued a judgement in Gema v. OpenAI (Case No: 42 O 14139/24). Gema, the German collecting society, sued OpenAI for using song lyrics from nine German artists to train ChatGPT without a licence. The court found that both the memorisation of lyrics in the AI models and their reproduction in ChatGPT's outputs infringed copyright. Importantly, these acts were not protected by the TDM exception under German law. The court emphasised that training language models that can reproduce copyrighted works goes beyond mere analysis and directly affects the rightholders' ability to exploit their works. OpenAI's argument that outputs are user-generated and that training does not store specific works was rejected. While the court's decision signals that GenAI developers in Germany would need to obtain licences when training models on copyrighted material, as memorisation and reproduction are not covered by existing exceptions, we can expect OpenAI to appeal the decision.

Furthermore, the UK High Court decision of 4 November 2025 in Getty Images v. Stability AI [2025] EWHC 2863 (Ch) is also relevant in the European debate on AI and copyright. Interestingly, the UK court came to a completely different conclusion compared to the Munich Regional Court in Gema v. OpenAI. Getty Images alleged that Stability AI scraped millions of its photographs without consent to train its GenAI model, and brought claims for copyright infringement, trademark infringement, database infringement, and passing off. In its decision, the court held that training a GenAI model on copyrighted images — including where such training has involved infringing acts abroad — does not, in itself, make the resulting AI model an "infringing copy" under UK law, as long as the model does not store or reproduce those works. However, the court found limited trademark infringement where AI-generated images included Getty Images or iStock watermarks, but only when real-world evidence of confusion existed. The judgement highlights the importance of concrete evidence and sets boundaries on copyright and trademark liability for AI training in the UK, offering some contrast to the ongoing debate on the topic within the EU.

The US Approach: The Fair Use Doctrine

In the US, the Fair Use Doctrine under 17 US Code § 107 is the primary defence relied upon by GPAI companies in copyright litigation. Fair use is determined by courts using four factors: purpose and character of the use, nature of the copyrighted work, amount and substantiality of the portion used, and the effect of the use upon the potential market for the original work. Outcomes can be unpredictable and depend on the facts of each case.

In its report concerning generative AI training from earlier this year, the US Copyright Office (USCO) found that using copyrighted works to train GPAI models may constitute an infringement, especially if outputs are substantially similar to the training data inputs. Whether AI training is fair use depends on the degree of transformation and the purpose of the outputs — training that produces competing content is "at best, modestly transformative". Implementing technical safeguards to prevent infringing outputs can support a fair use defence. Knowingly using pirated or illegally accessed works for training counts against fair use, though it is not automatically disqualifying. The USCO report takes a broad view of market harm, including lost sales, market dilution, and lost licensing opportunities for rightholders. The report encourages the development of voluntary licensing markets for training data.

Recent US cases illustrate the complexity of applying fair use to GPAI training. In Authors Guild v. Google, the United States Court of Appeal for the Second Circuit found Google's use transformative, enabling search and discovery without substituting for the original works, and supported the argument that copying entire works for transformative, non-consumptive uses can be fair use.

In its judgement of 11 February 2025 in Thomson Reuters v. ROSS, the Delaware District Court found that ROSS's use of Westlaw headnotes was not transformative and served the same purpose as the original, creating a direct competitor and threatening Westlaw's market and future licensing opportunities. The court emphasised that fair use cannot be evoked when AI training materials are used to create a competing product serving the same function as the originals.

Kadrey v. Meta Platforms and Bartz v. Anthropic further highlight the unsettled legal landscape. In Kadrey v. Meta Platforms, the District Court of Northern District of California issued an order on partial summary judgement on 25 June 2025 and ruled in favour of Meta, finding the use "highly transformative" and noting that demonstrable market harm is crucial to defeat a fair use defence in GPAI training cases. It seems the case was decided on the plaintiffs' failure to provide any meaningful evidence on market dilution. Had the plaintiffs been able to demonstrate any meaningful evidence on market dilution, the judgement could have shifted in favour of the plaintiffs. While the decision does not establish that all GPAI training with copyrighted works is lawful, it also highlights that the outcome in future cases may differ if there is stronger evidence.

In Bartz v. Anthropic, the District Court of Northern District of California granted summary judgement for Anthropic on 23 June 2025, accepting transformative fair use for GPAI training with lawfully acquired works but not when based on pirated data. On 25 September 2025, the Court granted initial approval of the proposed settlement in Bartz v. Anthropic, where Anthropic agreed to pay a minimum of USD 1.5 billion for its past use of pirated books, amounting to the largest US copyright settlement in history.

Comparing the Different Approaches

The EU has adopted a copyright protection regime with a closed list of exceptions, meaning GPAI developers may need to license every piece of content used for training unless the exceptions apply. The US applies a flexible, case-by-case fair use doctrine, but outcomes are unpredictable and depend on the facts of each case. It is too early to say how the different EU Member State courts will rule in these matters — the outcomes can also differ. In the EU, we are awaiting CJEU decisions to set more specific guidelines for the whole EU market. Some US judges have accepted fair use arguments from tech companies, but no clear precedents from higher courts exist yet.

EU developers may face high compliance burdens, including documenting training datasets and meeting strict governance standards under applicable law, while the US system creates more legal uncertainty but fewer up-front compliance requirements. Stricter rules in the EU may slow AI development but provide more certainty, whereas the flexible US system may foster innovation but create uncertainty for both developers and rightholders.

Key Points and Questions for the Future

The legality of AI training usually appears to depend on several factors, for instance, whether the material is legal or illegal, whether there is reproduction, how much material is copied, whether the use is "transformative", and whether rightholders have opted out. The conflicting interests should be balanced in the future, but for now, significant legal uncertainty remains on both sides of the Atlantic.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More