The New York Post printing plant in the Bronx. Photo Credit: Jim Henderson
In a case that could establish precedent relevant to multiple music industry lawsuits against generative AI companies, the owners of the Wall Street Journal and the New York Post are suing Perplexity for copyright infringement.
Dow Jones & Company as well as NYP Holdings submitted that copyright complaint to a New York federal court, naming as the lone defendant San Francisco-based Perplexity. Billing itself as today’s “most powerful answer engine,” the latter startup counts as stakeholders Jeff Bezos and Nvidia.
Against the backdrop of sizable funding rounds and massive valuations in the AI space, the just-filed action points to a possible $3 billion market worth for Perplexity – though reports today suggested that the business is looking to raise $500 million at a whopping $8 billion valuation.
Conveyed in different words, it’s an understatement to say that ample cash is floating around the AI world. But according to the corporate entities behind the Journal and the Post, Perplexity in particular owes its success to a “brazen scheme to compete for readers while simultaneously freeriding on the valuable content” at hand.
As recounted in the 42-page suit, the plaintiffs reached out to the defendant in July of 2024 with a letter describing infringement concerns and “offering to discuss a potential licensing deal.” (Separately, the New York Times recently sent Perplexity a cease-and-desist letter concerning alleged infringement, Reuters reported.)
Predictably, in light of the fresh complaint, the filing parties, having previously finalized a licensing pact with ChatGPT developer OpenAI via their parent, say they never received a response from Perplexity.
Shifting to the actual copyright claims, the complaint contrasts previously filed actions against generative AIs (including Amazon-backed Anthropic, OpenAI, and more) by accusing Perplexity of infringement at several stages.
First, the platform, often used to summarize news, allegedly “copied hundreds of thousands” of copyrighted Journal and Post articles without permission for its retrieval-augmented generation (RAG) database. Taking aim at arguments made by other AI giants, the action rather directly claims the alleged practice isn’t transformative and doesn’t constitute fair use.
In a nutshell, the RAG database, distinct from the much-discussed training process for large-language models, is said to house a continually updated (via web scraping) collection of information for use in AI-generated answers to user questions (including requests for breakdowns of articles, for example).
(Incidentally, at the time of this writing, the AI platform was declining to use the Post article about the lawsuit to create a summary of the matter, even when asked to do so. Citations are featured prominently beside Perplexity answers but, according to the plaintiffs, render “users less inclined to visit the original content source” and generate “virtually no click-through traffic” in any event.)
Next, Perplexity’s “full or partial verbatim reproductions of” copyrighted articles allegedly constitute independent instances of copyright infringement. That includes detailed, quote-heavy summaries of paywall-protected Journal coverage as well as entire Post pieces.
Furthermore, the AI defendant allegedly makes additional unauthorized copies of “articles to preserve the outputs it generates in another database that it uses for analytical and other purposes.” The exact quantity of alleged copies is unclear, but the plaintiffs say “each individual electronic copy constitutes its own infringement subject to statutory damages under the Copyright Act.”
Lastly, Perplexity allegedly produces “made-up text (hallucinations) in its outputs” and then falsely attributes said text, sometimes alongside genuine quoted materials, to specific articles and authors from the plaintiff publications. Among other things, the alleged practice is “likely to cause confusion or mistake,” according to the suit.
“This conduct likewise harms the news-consuming public,” the complaint sums up towards its end. “Generating content for advertisement or subscription revenue is unsustainable if the content is taken en masse and reproduced by bad-faith actors for substitutive commercial purposes.”
All told, the plaintiffs are seeking substantial damages and a number of orders – one barring the unauthorized copying of protected materials and another calling for the “destruction of any index or database created by Perplexity that contains” the same materials, to name a couple.