Login
Login is restricted to DCN Publisher Members. If you are a DCN Member and don't have an account, register here.

Digital Content Next logo

Menu

Policy / DCN perspectives on policy, law, and legislative news surrounding digital content

The AI reckoning for publishers and platforms   

Two recent legal developments have major implications for the use of media companies’ content to train AI

February 27, 2025 | By Jason Kint, CEO – DCN@jason_kint
AI and human playing chess to represent AI licensing negotations

The publishing industry has been of two minds on AI’s rapid advancements – optimistic and cautious – sometimes within the same company walls. Business development teams explore much-needed new revenue opportunities while legal teams work to protect their art and existing rights. However, two major legal developments, the Thomson Reuters v. Ross Intelligence ruling and shocking new revelations in Kadrey v. Meta, expose the fault lines in AI’s unchecked expansion and set the stage for publishers to negotiate fair value for their investments. 

One case confirms that publishers have a right to license their content for AI training and that tech advocates’ tortured analysis of fair use doesn’t throw out rights engrained in the U.S. Constitution or require publishers to opt-in to attain them. The other case suggests that Meta may have knowingly pirated books in its high-stakes race to keep up with OpenAI and that Meta’s notorious growth-at-all-cost playbook is more exposed than ever. 

AI companies can no longer operate in a legal gray zone, scraping content as if laws don’t apply to them. Courts, lawmakers, researchers and the public are taking notice. For publishers, the priority is clear: AI must respect copyright from the beginning including for training purposes, and the media industry must ensure it plays an active role in shaping AI’s future rather than being exploited by it. 

Thomson Reuters v. Ross: A win for AI licensing, a loss for those who intentionally avoid it 

In a landmark decision, a federal judge ruled this month in favor of Thomson Reuters against Ross Intelligence, a startup that trained its AI model without rights or permission using the Reuters’ Westlaw legal database. 

Judge Stephanos Bibas’ ruling in the Delaware district court is notable because he explicitly recognized the emerging market for licensing AI training data. This undercuts the argument that AI developers can freely use copyrighted works under “fair use” factors. And, consistent with DCN’s policy team, it also highlights the significant importance of the fourth factor of fair use, which publishers have been demonstrating with the signing of each new licensing deal.  

For publishers, this is a crucial precedent for two reasons: 

  • AI training is not automatically fair use. Content owners have the right to be paid when their work is being used to train AI.   
  • A market for AI licensing is forming – this is the fourth factor. Publishers should define and monetize it before platforms dictate the terms.   

This decision marks a turning point, ensuring that AI development doesn’t come at the expense of the people and companies producing high-quality content.  Sam Altman of OpenAI, and other leadership across the powerful AI industry, have attempted to invent a “right to learn” for their machines. That’s an absurd argument on its face but regularly repeated in high-profile interviews, as if the technocrats might will it into reality. 

Kadrey v. Meta: Pirated Books, torrenting, and a familiar playbook 

While the Reuters ruling validates AI licensing, Kadrey v. Meta reveals how some AI developers have worked to avoid it. 

Recently unsealed court documents suggest that Meta employees knowingly pirated books to train LLaMA AI models used as their first commercial version (LLaMA2). Significantly, their fair use analysis shifted from “research” to making bank – a lot of it. 

Evidence revealed that demonstrates this knowing strategic shift: 

  • Meta employees downloaded pirated book datasets from a massive, pirated dataset, LibGen, with employees even using torrenting technology to pull it down.   
  • They may have “seeded” and distributed this pirated content to others. That’s a potential violation of criminal code that their own employees shared this, “What is the probability of getting arrested for using torrents in the USA?”.  
  • Meta worried that licensing even one book would weaken its fair use argument, so it didn’t license any at all. 
  • Some employees explicitly avoided normal approval processes to keep leadership from having to formally sign off.   
  • Some documents suggest Mark Zuckerberg himself may have been aware of these tactics with documents referencing escalations to “MZ.” 
  • Meta appears to have stopped using this material ahead of LLaMA3, possibly signaling awareness that their actions were legally indefensible.   

Making matters worse, Meta’s case is being overseen by Judge Vincent Chhabria in the Northern District of California. This is the same judge who sanctioned Facebook’s lawyers in its massive privacy settlement that led to record-breaking settlements approaching $6 billion with the FTC, SEC and private plaintiffs. In that case, Facebook was accused of stalling, misleading regulators, and withholding evidence related to its user data practices. In other words, Judge Chhabria knows Meta’s playbook: delay, deny, deflect.   

Now, Meta faces a crime-fraud doctrine claim. This means that some currently sealed legal advice could be unsealed if it was in furtherance of a crime. If proven, this would not be a simple copyright dispute; it could potentially lead to criminal liability and further regulatory scrutiny. The Court is ordering Meta to unseal more documents this week. 

Move fast, break things… again: Meta’s AI strategy mirrors its past scandals 

The Kadrey case’s revelations closely resemble Meta’s past data controversies, particularly those that were all put into the basket of Cambridge Analytica. The many ongoing details of the cover up of the scandal are still emerging today. Unfortunately, they were mostly overlooked by the tech press corp who have not been tuned in to these issues for far too long.  

For years, Facebook pursued a strategy of aggressive data harvesting to accelerate its growth in mobile where it had risk of being supplanted by new platforms. The company:   

  1. Scraped vast amounts of publisher and user data without clear consent.   
  1. Shared this data widely with developers in exchange for reciprocal access to their user data – fueling Facebook’s mobile market share grab.   
  1. Ultimately settled with regulators for billions after repeated privacy violations. 

Now, in Kadrey v. Meta, history appears to be repeating itself.  Internal documents show that Meta feared OpenAI and needed to accelerate its AI development. Thus, Meta felt pressured to take outsized risks.  Meta’s approach to AI training follows a similar pattern:   

  1. Acquire the best data – legally or not.   
  1. Use it to gain an edge over AI competitors.  
  1. Deal with legal and regulatory fallout later, if necessary. 

Recently unsealed documents even expose a documented mitigation strategy. 

  1. Remove data clearly marked as pirated (but only if it’s in the filename despite letting the coders strip out copyright info in the actual content) 
  1. Don’t let anyone know what data sets they’re using (including illegal datasets) 
  1. Do whatever possible to suppress prompts that spit out IP violations 

Key takeaways for publishers and media companies 

The Thomson Reuters and Kadrey cases demonstrate both the risks and the opportunities for publishers in the AI era. Courts are starting to push back on AI’s unlicensed use of copyrighted content. But it’s up to the publishing industry to define what comes next.   

Here are the big issues we must address: 

  1. AI models need high-quality data. And publishers must ensure they’re compensated for it. The Reuters ruling proves that a growing licensing market for AI exists.   
  1. Litigation is working. The unsealed evidence in the Kadrey case suggests that even AI giants like Meta know they’ve crossed legal lines. Facebook isn’t dumb, evidence from other peer companies may be even more damaging. The plural press needs to be shining the light on these wrongs as national security isn’t an excuse for AI companies to break copyright law. 
  1. Publishers must be proactive in shaping AI policy. Big Tech will push its own narrative. Meta and Google pay front groups like Chamber of Progress to stretch the meaning of fair use both in the U.S. and across the pond. Media companies must work together to establish AI licensing frameworks and legal protections and reinforce existing copyright law.  
  1. Regulatory scrutiny on AI will intensify. If Meta is found to have used pirated data, it will accelerate AI regulations. This will not likely be confined to copyright but could extend across tech policy as it did in 2018, when one scandal exposed larger problems leading to Facebook being dragged before parliaments around the globe.  

The future of AI depends on trust, ethics and media leadership 

The past year has shown that AI is both a disruptor and an opportunity. The Reuters ruling confirmed publishers can and should demand licensing deals. The Meta revelations prove why that’s so necessary.   

AI is reshaping media, but it must be built ethically. The publishing industry has both the legal and ethical high ground. And media companies must use it to define the next phase of AI’s evolution. The future of AI isn’t just about innovation. It’s about who controls the data and the IP – and whether the people who create it are respected or exploited.