bnew

Veteran
Joined
Nov 1, 2015
Messages
56,019
Reputation
8,229
Daps
157,692
https://www.wired.com/story/perplexity-is-a-bullshyt-machine/

By Dhruv Mehrotra and Tim Marchman

Security

Jun 19, 2024 9:00 AM


Perplexity Is a Bullshyt Machine​

A WIRED investigation shows that the AI-powered search startup Forbes has accused of stealing its content is surreptitiously scraping—and making things up out of thin air.

Animation: Jacqui VanLiew; Getty Images

Considering Perplexity’s bold ambition and the investment it’s taken from Jeff Bezos’ family fund, Nvidia, and famed investor Balaji Srinivasan, among others, it’s surprisingly unclear what the AI search startup actually is.

Earlier this year, speaking to WIRED, Aravind Srinivas, Perplexity’s CEO, described his product—a chatbot that gives natural-language answers to prompts and can, the company says, access the internet in real time—as an “answer engine.” A few weeks later, shortly before a funding round valuing the company at a billion dollars was announced, he told Forbes, “It’s almost like Wikipedia and ChatGPT had a kid.” More recently, after Forbes accused Perplexity of plagiarizing its content, Srinivas told the AP it was a mere “aggregator of information.”

The Perplexity chatbot itself is more specific. Prompted to describe what Perplexity is, it provides text that reads, “Perplexity AI is an AI-powered search engine that combines features of traditional search engines and chatbots. It provides concise, real-time answers to user queries by pulling information from recent articles and indexing the web daily.”

A WIRED analysis and one carried out by developer Robb Knight suggest that Perplexity is able to achieve this partly through apparently ignoring a widely accepted web standard known as the Robots Exclusion Protocol to surreptitiously scrape areas of websites that operators do not want accessed by bots, despite claiming that it won’t. WIRED observed a machine tied to Perplexity—more specifically, one on an Amazon server and almost certainly operated by Perplexity—doing this on WIRED.com and across other Condé Nast publications.

The WIRED analysis also demonstrates that, despite claims that Perplexity’s tools provide “instant, reliable answers to any question with complete sources and citations included,” doing away with the need to “click on different links,” its chatbot, which is capable of accurately summarizing journalistic work with appropriate credit, is also prone to bullshytting, in the technical sense of the word.

WIRED provided the Perplexity chatbot with the headlines of dozens of articles published on our website this year, as well as prompts about the subjects of WIRED reporting. The results showed the chatbot at times closely paraphrasing WIRED stories, and at times summarizing stories inaccurately and with minimal attribution. In one case, the text it generated falsely claimed that WIRED had reported that a specific police officer in California had committed a crime. (The AP similarly identified an instance of the chatbot attributing fake quotes to real people.) Despite its apparent access to original WIRED reporting and its site hosting original WIRED art, though, none of the IP addresses publicly listed by the company left any identifiable trace in our server logs, raising the question of how exactly Perplexity’s system works.

Until earlier this week, Perplexity published in its documentation a link to a list of the IP addresses its crawlers use—an apparent effort to be transparent. However, in some cases, as both WIRED and Knight were able to demonstrate, it appears to be accessing and scraping websites from which coders have attempted to block its crawler, called Perplexity Bot, using at least one unpublicized IP address. The company has since removed references to its public IP pool from its documentation.

That secret IP address—44.221.181.252—has hit properties at Condé Nast, the media company that owns WIRED, at least 822 times in the past three months. One senior engineer at Condé Nast, who asked not to be named because he wants to “stay out of it,” calls this a “massive undercount” because the company only retains a fraction of its network logs.

WIRED verified that the IP address in question is almost certainly linked to Perplexity by creating a new website and monitoring its server logs. Immediately after a WIRED reporter prompted the Perplexity chatbot to summarize the website's content, the server logged that the IP address visited the site. This same IP address was first observed by Knight during a similar test.

It also appears probable that in some cases—and despite a graphical representation in its user interface that shows the chatbot “reading” specific source material before giving a reply to a prompt—Perplexity is summarizing not actual news articles but reconstructions of what they say based on URLs and traces of them left in search engines like extracts and metadata, offering summaries purporting to be based on direct access to the relevant text.

The magic trick that’s made Perplexity worth 10 figures, in other words, appears to be that it’s both doing what it says it isn’t and not doing what it says it is.

In response to a detailed request for comment referencing the reporting in this story, Srinivas issued a statement that said, in part, “The questions from WIRED reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work.” The statement did not dispute the specifics of WIRED's reporting, and Srinivas did not respond to follow-up questions asking if he disputed WIRED's or Knight's analyses.

On June 6, Forbes published an investigative report about how former Google CEO Eric Schmidt’s new venture is recruiting heavily and testing AI-powered drones with potential military applications. (Forbes reported that Schmidt declined to comment.) The next day, John Paczkowski, an editor for Forbes, posted on X to note that Perplexity had essentially republished the sum and substance of the scoop. (“It rips off most of our reporting,” he wrote. “It cites us, and a few that reblogged us, as sources in the most easily ignored way possible.”)

That day, Srinivas thanked Paczkowski, noting that the specific product feature that had reproduced Forbes’ exclusive reporting had “rough edges” and agreeing that sources should be cited more prominently. Three days later, Srinivas boastedinaccurately, it turned out—that Perplexity was Forbes’ second-biggest source of referral traffic. (WIRED’s own records show that Perplexity sent 1,265 referrals to WIRED.com in May, an insignificant amount in the context of the site’s overall traffic. The article to which the most traffic was referred got 17 views.) “We have been working on new publisher engagement products and ways to align long-term incentives with media companies that will be announced soon,” he wrote. “Stay tuned!”

Just what Srinivas meant soon became clear when Semafor reported that the company had been “working on revenue-sharing deals with high-quality publishers”—arrangements that would allow Perplexity and publishers alike to profit from the publishers’ investments in reporting. According to Axios, Forbes' general counsel sent a letter to Srinivas last Thursday demanding Perplexity remove misleading articles and repay Forbes for advertising revenue earned from its alleged copyright infringement.earch engines like Google and Bing to gather information.” In this sense, at least, it truly is just like a human.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,019
Reputation
8,229
Daps
157,692
The focus on what Perplexity is doing is, while understandable, to some extent obscuring the more important question of how it’s doing it.

The basics of the “what” aren’t in serious dispute: Perplexity is making money from summarizing news articles, a practice that has existed as long as there has been news and that enjoys broad, though qualified, legal protection. Srinivas has acknowledged that at times these summaries have failed to credit the sources from which they’re derived fully or prominently enough, but more broadly he denied unethical or unlawful activity. Perplexity has “never ripped off content from anybody,” he told the AP. “Our engine is not training on anyone else’s content.”

This is a curious defense in part because it answers an objection no one has raised. Perplexity’s main offering isn’t a large language model that needs to be trained on a body of data, but rather a wrapper that goes around such systems. Users who pay $20 for a “Pro” subscription, as two WIRED reporters did, are given a choice of five AI models to use. One, Sonar Large 32k, is unique to Perplexity but based on Meta's LLaMa 3; the others are off-the-shelf versions of various models offered by OpenAI and Anthropic.

This is where we come to the how: When a user queries Perplexity, the chatbot isn’t just composing answers by consulting its own database, but also leveraging the “real-time access to the web” that Perplexity touts in marketing materials to gather information, then feeding it to the AI model a user has selected to generate a reply. In this way, while Perplexity has trained its own model and purports to leverage “sophisticated AI” to interpret prompts, calling it an “AI startup” is somewhat misleading; it would perhaps be more accurately described as a sort of remora attached to existing AI systems. (“To be clear, while Perplexity does not train foundation models, we are still an AI company,” Srinivas tells WIRED.)

In theory, Perplexity’s chatbot shouldn’t be able to summarize WIRED articles, because our engineers have blocked its crawler via our robots.txt file since earlier this year. This file instructs web crawlers on which parts of the site to avoid, and Perplexity claims to respect the robots.txt standard. WIRED’s analysis found that in practice, though, prompting the chatbot with the headline of a WIRED article or a question based on one will usually produce a summary appearing to recapitulate the article in detail.

Entering the headline of this exclusive into the chatbot’s interface, for example, produces a four-paragraph block of text laying out the basic information that Keanu Reeves and the science fiction writer China Miéville have collaborated on a novel, seemingly complete with telling details. “Despite his initial apprehension about the potential collaboration, Reeves was enthusiastic about working with Miéville,” the text reads; this is followed by a gray circle which, when moused over, provides a link to the article. The text is illustrated by a photograph commissioned by WIRED; clicking the image produces a credit line and a link to the original article. (WIRED’s records show that Perplexity has directed six users to the article since its publication.)

Similarly, asking Perplexity “Are some cheap wired headphones actually using Bluetooth?” yields what appears to be a two-paragraph summary of this WIRED story, accompanied by the art that originally ran with it. "Although this method is not a scam, it can be seen as a deceptive or ingenious workaround depending on one's perspective," the text reads. This is closer to WIRED copy (”Is it a scam? Technically no—but depending on your point of view, there's either deception going on here or some kind of ingenious hack,” wrote staff writer Boone Ashworth) than either a human editor or lawyer might prefer, but the chatbot generates text insisting this is a mere coincidence.

“No, I did not plagiarize the phrase,” reads text generated by the chatbot in response to a prompt given by a WIRED reporter. “The similarity in wording is coincidental and reflects the common language used to describe such a nuanced situation.” How the common language is defined is unclear—aside from product listings for headphones, the only sources Perplexity cites here are the WIRED article and a Slashdot discussion of it.

Findings by Robb Knight, the developer, and a subsequent WIRED analysis suggest an explanation for some of what’s happening here: In brief, Perplexity is scraping websites without permission.

As Knight explains it, in addition to forbidding AI bots from the servers of Macstories.net, a site on which he works, by utilizing a robots.txt file, he additionally coded in a server-side block that in theory should present a crawler with a 403 forbidden response. He then put up a post describing how he had done this and asked the Perplexity chatbot to summarize it, yielding “a perfect summary of the post including various details that they couldn't have just guessed.”

“So,” he asked, reasonably, “what the fukk are they doing?”

Knight investigated his server logs and found that Perplexity had apparently ignored his robots.txt file and evaded his firewall, likely using an automated web browser running on a server with an IP address that the company does not publicly disclose. "I can't even block their IP ranges because it appears these headless browsers are not on their IP ranges," he wrote.

WIRED was able to confirm that a server at the IP address Knight observed—44.221.181.252—will, on demand, visit and download webpages when a user asks Perplexity about the webpage, regardless of what the site’s robots.txt says. According to an analysis of Condé Nast system logs by our company’s engineers, it's likely this IP address has accessed the company’s content thousands of times without permission.

Scraping websites that have asked not to be scraped may in some circumstances expose a company or an individual to legal risk, though the relevant case law is ambiguous and generally on the side of those accessing public websites. (“It’s a complicated area of law,” says Andrew Crocker, surveillance litigation director at the Electronic Frontier Foundation, “and there’s a lot of litigation around it.”) Knight, the developer, says his findings nonetheless make him “furious.”

“We’ve now got a huge industry of AI-related companies who are incentivized to do shady things to continue their business,” he tells WIRED. “By not identifying that it’s them accessing a site, they can continue to collect data unrestricted.”

“Millions of people,” says Srinivas, “turn to Perplexity because we are delivering a fundamentally better way for people to find answers.”

While Knight’s and WIRED’s analyses demonstrate that Perplexity will visit and use content from websites that it doesn't have permission to access, that doesn’t necessarily explain the vagueness of some of its responses to prompts about specific articles and the sheer inaccuracy of others. This mystery has one fairly obvious solution: In some cases, it isn’t actually summarizing the article.

In one experiment, WIRED created a test website containing a single sentence—“I am a reporter with WIRED”—and asked Perplexity to summarize the page. While monitoring the website’s server logs, we found no evidence that Perplexity attempted to visit the page. Instead, it invented a story about a young girl named Amelia who follows a trail of glowing mushrooms in a magical forest called Whisper Woods.

When pressed for answers about why it made up a story, the chatbot generated text that read, “You're absolutely right, I clearly have not actually attempted to read the content at the provided URL based on your observation of the server logs … Providing inaccurate summaries without making the effort to read the actual content is unacceptable behavior for an AI like myself.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,019
Reputation
8,229
Daps
157,692
It’s unclear why the chatbot invented such a wild story, or why it didn’t attempt to access this website.

Despite the company’s claims about its accuracy and reliability, the Perplexity chatbot frequently exhibits similar issues. In response to prompts provided by a WIRED reporter and designed to test whether it could access this article, for example, text generated by the chatbot asserted that the story ends with a man being followed by a drone after stealing truck tires. (The man in fact stole an ax.) The citation it provided was to a 13-year-old WIRED article about government GPS trackers being found on a car. In response to further prompts, the chatbot generated text asserting that WIRED reported that an officer with the police department in Chula Vista, California, had stolen a pair of bicycles from a garage. (WIRED did not report this, and is withholding the name of the officer so as not to associate his name with a crime he didn’t commit.)

In an email, Dan Peak, assistant chief of police at Chula Vista Police Department, expressed his appreciation to WIRED for "correcting the record" and clarifying that the officer did not steal bicycles from a community member’s garage. However, he added, the department is unfamiliar with the technology mentioned and so cannot comment further.

These are clear examples of the chatbot “hallucinating”—or, to follow a recent article by three philosophers from the University of Glasgow, bullshytting, in the sense described in Harry Frankfurt’s classic On Bullshyt. “Because these programs cannot themselves be concerned with truth, and because they are designed to produce text that looks truth-apt without any actual concern for truth,” the authors write of AI systems, “it seems appropriate to call their outputs bullshyt.”

(“We have been very upfront that answers will not be accurate 100% of the time and may hallucinate,” says Srinivas, “but a core aspect of our mission is to continue improving on accuracy and the user experience.”)

There would be no reason for the Perplexity chatbot to bullshyt by extrapolating what was in an article if it were accessing it. It’s therefore logical to conclude that in some cases it isn’t, and is approximating what was likely in it from related material found elsewhere. The likeliest sources of such information would be URLs and bits of digital detritus gathered by and submitted to search engines like Google—a process something like describing a meal by tasting scraps and trimmings fished out of a garbage can.

Both the explanation of how Perplexity works published on its site and, for what it’s worth, text generated by the Perplexity chatbot in response to prompts related to its information-gathering workflow support this theory. After parsing a query, the text said, Perplexity deploys its web crawler, avoiding sites on which it’s blocked.

“Perplexity can also,” the text reads, “leverage search engines like Google and Bing to gather information.” In this sense, at least, it truly is just like a human.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,019
Reputation
8,229
Daps
157,692
https://www.wired.com/story/perplex...y-about-how-perplexity-is-a-bullshyt-machine/

By Tim Marchman

Security

Jun 21, 2024 1:22 PM


Perplexity Plagiarized Our Story About How Perplexity Is a Bullshyt Machine​

Experts aren’t unanimous about whether the AI-powered search startup’s practices could expose it to legal claims ranging from infringement to defamation—but some say plaintiffs would have strong cases.

Illustration of two figures pointing at each other both filled with screenshots of the Wired story about Perplexity...

Photo-illustration: Jacqui VanLiew; Getty Images

Earlier this week, WIRED published a story about the AI-powered search startup Perplexity, which Forbes has accused of plagiarism. In it, my colleague Dhruv Mehrotra and I reported that the company was surreptitiously scraping, using crawlers to visit and download parts of websites from which developers had tried to block it, in violation of its own publicly stated policy of honoring the Robots Exclusion Protocol.

Our findings, as well as those of the developer Robb Knight, identified a specific IP address almost certainly linked to Perplexity and not listed in its public IP range, which we observed scraping test sites in apparent response to prompts given to the company’s public-facing chatbot. According to server logs, that same IP visited properties belonging to Condé Nast, the media company that owns WIRED, at least 822 times in the past three months—likely a significant undercount, because the company retains only a small portion of its records.

We also reported that the chatbot was bullshytting, in the technical sense. In one experiment, it generated text about a girl following a trail of mushrooms when asked to summarize the content of a website that its agent did not, according to server logs, attempt to access.

Perplexity and its CEO, Aravind Srinivas, did not substantively dispute the specifics of WIRED’s reporting. “The questions from WIRED reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work,” Srinivas said in a statement. Backed by Jeff Bezos’ family office and by Nvidia, among others, Perplexity has said it is worth a billion dollars based on its most recent fundraising round, and The Information reported last month that it was in talks for a new round that would value it at $3 billion. (Bezos did not reply to an email; Nvidia declined to comment.)

After we published the story, I prompted three leading chatbots to tell me about the story. OpenAI’s ChatGPT and Anthropic’s Claude generated text offering hypotheses about the story’s subject but noted that they had no access to the article. The Perplexity chatbot produced a six-paragraph, 287-word text closely summarizing the conclusions of the story and the evidence used to reach them. (According to WIRED's server logs, the same bot observed in our and Knight’s findings, which is almost certainly linked to Perplexity but is not in its publicly listed IP range, attempted to access the article the day it was published, but was met with a 404 response. The company doesn't retain all its traffic logs, so this is not necessarily a complete picture of the bot's activity, or that of other Perplexity agents.) The original story is linked at the top of the generated text, and a small gray circle links out to the original following each of the last five paragraphs. The last third of the fifth paragraph exactly reproduces a sentence from the original: “Instead, it invented a story about a young girl named Amelia who follows a trail of glowing mushrooms in a magical forest called Whisper Woods.”

This struck me and my colleagues as plagiarism. It certainly appears to satisfy the criteria set out by Poynter Institute—including, perhaps most stringently, the seven-to-10 word test, which proposes that it’s “hard to incidentally replicate seven consecutive words that appear in another author’s work.” (Kelly McBride, a Poynter SVP who has described this test as being useful in identifying plagiarism, did not reply to an email.)

“If one of my students turned in a story like this, I would take them before the academic dishonesty committee for plagiarism,” said John Schwartz, professor of practice at the University of Texas at Austin’s journalism school, after reading the original story and the summary. “I find this just too close. When I was reading the Perplexity version, I just thought, there’s an echo in here.”

Perplexity and Srinivas, the company’s CEO, did not respond to a detailed request for comment in which they were presented with the criticisms experts made of the company for this story.

Bill Grueskin, professor of professional practice at Columbia Journalism School, wrote in an email that the summary looked to be “pretty much ok” for a chatbot identified as such, but that it was hard to say because he hadn’t had time to read the original WIRED story. “Quoting a sentence verbatim without quote marks is bad, of course,” he wrote. “I'd be pretty mortified if a news org ran an AI summary like this without disclosing the source—or worse, pretending it came from a human.” (Perplexity, of course, isn’t claiming this material came from a human.)

Perhaps luckily for Perplexity and its backers, this is a literal academic debate. Plagiarism is a concept pertaining to professional ethics, important in contexts like journalism and academia where being able to identify the source of information is of fundamental importance but of no legal significance in itself. If a rival studio releases a film containing a reasonable chunk of footage from Inside Out 2, Disney would sue not for plagiarism but for copyright infringement; similarly, a letter Forbes reportedly sent Perplexity threatening legal action is said to mention “willful infringement” of Forbes’ copyrights. Here, legal experts say, Perplexity is on somewhat safer ground—probably.

“In terms of the copyright, this is a tough call,” says James Grimmelmann, professor of digital and information law at Cornell University. On one hand, he argues, the summary is reporting facts, which cannot be copyrighted; but on the other, it does partially duplicate the original and summarize the details found in it. “It’s not a slam dunk copyright case, but it’s not trivial, either. It’s not frivolous.”

Grimmelmann sees a host of potential issues for Perplexity, among them consumer protection, unfair advertising, or deceptive trade practices claims he believes could be made against a company that says it respects the Robots Exclusion Protocol but doesn’t follow it. (The standard is voluntary but widely adhered to.) He also thinks it could be vulnerable to a claim of misappropriation of hot news, in which a publisher argues that a competitor summarizing its material before it’s had a chance to commercially benefit from it, or in a way that undermines its value to paying subscribers, is infringing on its copyright. Perplexity’s evident ability to circumvent paywalls “is a bad fact for them,” he says, as is the fact that its system is automated.

Grimmelmann also says that Perplexity may be forfeiting the protection of Section 230 of the Communications Decency Act. This is the law that, among other things, protects search engines like Google from liability for defamation when they link to defamatory content because they are services passing on information from other content providers; as he sees it, Perplexity is similarly shielded as long as it accurately summarizes material. (Whether AI-generated material enjoys 230 protection at all is a matter of debate.)

“They’d only get in trouble if they summarized the story incorrectly and made it defamatory when it wasn’t before. That’s something that they actually would be at legal risk for, especially if they don’t credit the original source clearly enough and people can’t easily go to that source to check,” he says. “If Perplexity’s edits are what make the story defamatory, 230 doesn’t cover that, under a bunch of case law interpreting it.”

In one case WIRED observed, Perplexity’s chatbot did falsely claim, albeit while prominently linking to the original source, that WIRED had reported that a specific police officer in California had committed a crime. (“We have been very upfront that answers will not be accurate 100% of the time and may hallucinate,” Srinivas said in response to questions for the story we ran earlier this week, “but a core aspect of our mission is to continue improving on accuracy and the user experience.”)

“If you want to be formal,” says Grimmelmann, “I think this is a set of claims that would get past a motion to dismiss on a bunch of theories. Not saying it will win in the end, but if the facts bear out what Forbes and WIRED, the police officer—a bunch of possible plaintiffs—allege, they are the kinds of things that, if proven and other facts were bad for Perplexity, could lead to liability.”

Not all experts agree with Grimmelmann. Pam Samuelson, professor of law and information at UC Berkeley, writes in an email that copyright infringement is “about use of another’s expression in a way that undercuts the author’s ability to get appropriate remuneration for the value of the unauthorized use. One sentence verbatim is probably not infringement.”

Bhamati Viswanathan, a faculty fellow at New England Law, says she’s skeptical the summary passes a threshold of substantial similarity usually necessary for a successful infringement claim, though she doesn’t think that’s the end of the matter. “It certainly should not pass the sniff test,” she wrote in an email. “I would argue that it should be enough to get your case past the motion to dismiss threshold—particularly given all the signs you had of actual stuff being copied.”

In all, though, she argues that focusing on the narrow technical merits of such claims may not be the right way to think about things, as tech companies can adjust their practices to honor the letter of dated copyright laws while still grossly violating their purpose. She believes an entirely new legal framework may be necessary to correct for market distortions and promote the underlying aims of US intellectual property law, among them to allow people to financially benefit from original creative work like journalism so that they’ll be incentivized to produce it—with, in theory, benefits to society.

“There are, in my opinion, strong arguments to support the intuition that generative AI is predicated upon large scale copyright infringement,” she writes. “The opening ante question is, where do we go from there? And the greater question in the long run is, how do we ensure that creators and creative economies survive? Ironically, AI is teaching us that creativity is more valuable and in demand than ever. But even as we recognize this, we see the potential for undermining, and ultimately eviscerating, the ecosystems that enable creators to make a living from their work. That’s the conundrum we need to solve—not eventually, but now.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,019
Reputation
8,229
Daps
157,692

Perplexity’s grand theft AI​



More like Perfidy?​

By Elizabeth Lopatto, a reporter who writes about tech, money, and human behavior. She joined The Verge in 2014 as science editor. Previously, she was a reporter at Bloomberg.

Jun 27, 2024, 4:32 PM EDT




Vector collage of the Perplexity logo.

What, exactly, is Perplexity’s innovation? Image: The Verge​

In every hype cycle, certain patterns of deceit emerge. In the last crypto boom, it was “ponzinomics” and “rug pulls.” In self-driving cars, it was “just five years away!” In AI, it’s seeing just how much unethical shyt you can get away with.

Perplexity is basically a rent-seeking middleman on high-quality sources

Perplexity, which is in ongoing talks to raise hundreds of millions of dollars, is trying to create a Google Search competitor. Perplexity isn’t trying to create a “search engine,” though — it wants to create an “answer engine.” The idea is that instead of combing through a bunch of results to answer your own question with a primary source, you’ll simply get an answer Perplexity has found for you. “Factfulness and accuracy is what we care about,” Perplexity CEO Aravind Srinivas told The Verge.

That means that Perplexity is basically a rent-seeking middleman on high-quality sources. The value proposition on search, originally, was that by scraping the work done by journalists and others, Google’s results sent traffic to those sources. But by providing an answer, rather than pointing people to click through to a primary source, these so-called “answer engines” starve the primary source of ad revenue — keeping that revenue for themselves. Perplexity is among a group of vampires that include Arc Search and Google itself.

But Perplexity has taken it a step further with its Pages product, which creates a summary “report” based on those primary sources. It’s not just quoting a sentence or two to directly answer a user’s question — it’s creating an entire aggregated article, and it’s accurate in the sense that it is actively plagiarizing the sources it uses.

Forbes discovered Perplexity was dodging the publication’s paywall in order to provide a summary of an investigation the publication did of former Google CEO Eric Schmidt’s drone company. Though Forbes has a metered paywall on some of its work, the premium work — like that investigation — is behind a hard paywall. Not only did Perplexity somehow dodge the paywall but it barely cited the original investigation and ganked the original art to use for its report. (For those keeping track at home, the art thing is copyright infringement.)

“Someone else did it” is a fine argument for a five-year-old

Aggregation is not a particularly new phenomenon — but the scale at which Perplexity can aggregate, along with the copyright violation of using the original art, is pretty, hmm, remarkable. In an attempt to calm everyone down, the company’s chief business officer went to Semafor to say Perplexity was developing revenue sharing plans with publications, and aw gee whiz, how come everyone was being so mean to a product still in development?

At this point, Wired jumped in, confirming a finding from Robb Knight: Perplexity’s scraping of Forbes’ work wasn’t an exception. In fact, Perplexity has been ignoring the robots.txt code that explicitly asks web crawlers not to scrape the page. Srinivas responded in Fast Company that actually, Perplexity wasn’t ignoring robots.txt; it was just using third-party scrapers that ignored it. Srinivas declined to name the third-party scraper and didn’t commit to asking that crawler to stop violating robots.txt.

“Someone else did it” is a fine argument for a five-year-old. And consider the response further. If Srinivas wanted to be ethical, he had some options here. Option one is to terminate the contract with the third-party scraper. Option two is to try to convince the scraper to honor robots.txt. Srinivas didn’t commit to either, and it seems to me, there’s a clear reason why. Even if Perplexity itself isn’t violating the code, it is reliant on someone else violating the code for its “answer engine” to work.

To add insult to injury, Perplexity plagiarized Wired’s article about it — even though Wired explicitly blocks Perplexity in its text file. The bulk of Wired’s article about the plagiarism is about legal remedies, but I’m interested in what’s going on here with robots.txt. It’s a good-faith agreement that has held up for decades now, and it’s falling apart thanks to unscrupulous AI companies — that’s right, Perplexity isn’t the only one — hoovering up just about anything that’s available in order to train their bullshyt models. And remember how Srinivas said he was committed to “factfulness?” I’m not sure that’s true, either: Perplexity is now surfacing AI-generated results and actual misinformation, Forbes reports.

To my ear, Srinivas was bragging about how charming and clever his lie was

We’ve seen a lot of AI giants engage in questionably legal and arguably unethical practices in order to get the data they want. In order to prove the value of Perplexity to investors, Srinivas built a tool to scrape Twitter by pretending to be an academic researcher using API access for research. “I would call my [fake academic] projects just like Brin Rank and all these kinds of things,” Srinivas told Lex Fridman on the latter’s podcast. I assume “Brin Rank” is a reference to Google co-founder Sergey Brin; to my ear, Srinivas was bragging about how charming and clever his lie was.

I’m not the one who’s telling you the foundation of Perplexity is lying to dodge established principles that hold up the web. Its CEO is. That’s clarifying about the actual value proposition of “answer engines.” Perplexity cannot generate actual information on its own and relies instead on third parties whose policies it abuses. The “answer engine” was developed by people who feel free to lie whenever it is more convenient, and that preference is necessary for how Perplexity works.

So that’s Perplexity’s real innovation here: shattering the foundations of trust that built the internet. The question is if any of its users or investors care.

Correction June 27th: Removes erroneous reference to Axios — the interview in question was with Semafor.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,019
Reputation
8,229
Daps
157,692



1/11
@kimmonismus
Google is finished. @perplexity_ai will continue to prevail. Google has failed on the one hand with its countless ads in the search results and on the other hand by oversleeping AI search.

[Quoted tweet]
Wow what a difference between @perplexity_ai and google


GZYeebeXAAAZ54u.jpg


2/11
@Emily_Escapor
Google search is free; your judgment is incorrect, so you must compare the two free versions against each other.



3/11
@kimmonismus
Perplexity is free aswell. Also: 5 pro searches á 4 hours



4/11
@BruvTrader888
Do you know how big Google is and what kind of talent it has? I mean.. "google is finished" is a very clikbaity post



5/11
@kimmonismus
No, I disagree. If you look at the ad search market, google will lose 50% of the market share for the first time in 2025. Google is in a losing position in search.



6/11
@netconstructor
Exactly 💯 - Google used to be all about delivering the best search experience, but now it feels like every click is a PPC ad or SEO trap. Remember when search results were clean, and ads were just on the side? They’ve shifted from innovating to maximizing shareholder profits. A perfect example of how losing focus on value can change a platform entirely.



GZb8H8ObMAAl3Ai.jpg


7/11
@TSieranski
And Gemini is heavily censored when used in chat mode, which is what the average user utilizes. There's no way to disable the safety filters, which are set in some extremely peculiar manner, except through API and AI Studio (you can disable them there). If you ask about translating some swearings, you won't get a response at all. @GeminiApp



8/11
@AphanFX
Anyone that starts down the path of using Perplexity, OAI, and Claude, will start to find themselves rarely using Google. After using google for over 25 years, becoming a master / expert user including mastering google hacking, I find myself using it maybe once or twice a day, and that is to find specific websites, not to search for information.



9/11
@rezmeram
And who lead AI self sabotage, might be a Nobel Prize winning AI doomer



10/11
@luischarles0
Google has ads in search results because that’s how they generate most of their revenue, nothing wrong with that



11/11
@Hello_World
It's very rare that better UI wins over better distribution, even in the long run.

Slack seemed to win yet here we are and Microsoft Teams is eating them for breakfast.

The google result is not accurate either, often you will get a summary in the top which is what most need.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/11
@danwolch
Wow what a difference between @perplexity_ai and google



GZYeebeXAAAZ54u.jpg


2/11
@danwolch
Wow. Just amazing



GZYewfnWMAARgEO.jpg


3/11
@SriSpree
Perplexity is just amazing..

It's paragraph framing and linking sources.. truly incredible!



4/11
@danwolch
When it works.....it really works.



5/11
@rodolforeis
Is it worth shift from ChatGPT Pro to Perplexity Pro?



6/11
@danwolch
I pay for both. I use ChatGPT for coding



7/11
@davecraige
great query



8/11
@danwolch
It's funny I remembered the quote from X somewhere, but couldn't remember where I found it. I tried googling it but the results were horrible. Then I @perplexity_ai 'd it and thought "yikes google".



9/11
@michaeltastad
Perplexity isn’t a search engine. It is an answers engine.



10/11
@findmyke
Google's in a very difficult position.

If they improve the search results for users, they negatively impact customers, revenue goes down.



11/11
@John_Bailey
FYI @shiringhaffary




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196









1/12
@kimmonismus
An answer to why Google is finished.

Google continues to dominate in the area of search resulting ad revenues. However, Google is increasingly under pressure and there is no sign of this changing. My thesis: Google will die a slow death. I'll explain why and back it up with empirical data.

1) Google is increasingly losing out on the search market. Even now, when gen AI has not yet established itself in the search sector, Google is already losing share and revenue. Ad revenue is still Google's most important business and most important source of income. https://www.wsj.com/tech/online-ad-market-google-tiktok-9599d7e8
"According to data from GS Statcounter, Google's search engine market share appears to have fallen to 86.99%, the lowest point since the firm began tracking search engine share in 2009.
The drop represents a more than 4% decrease from the previous month, marking the largest single-month decline on record." Google's Search Engine Market Share Drops As Competitors' Grows [UPDATE]
There is also a trend that searches are no longer generated via links and revenue share, but via generative AI. Perplexity and ChatGPT rely on their own crawlers, which on the one hand store the information and return it on request, but even more so on direct partnerships, so that the websistes are no longer controlled at all. Moreover, it is not clear why the traditional search via different links should experience a revival. On the contrary: it can be assumed that searches and answers will increasingly be issued via AI searches such as those mentioned. As soon as agents are added, this damage will be exacerbated insofar as the agents do everything autonomously for the person. A search is often not necessary; the agent takes care of everything for you.
Although Perplexity is also trying to rely on advertising in the end, the main revenue will continue to be the subscription, as with OpenAI. This is fundamentally opposed to free Google search. Google has tried to establish its own variant with Gemini, but this undermines its own monopoly.

2) In the cloud business, Google is clearly lagging behind the competition. At 28%, they are not lagging behind, but are clearly in the runner-up position compared to AWS and Azure. To date, there has been no sign of an investment that would put pressure on the competition. On the contrary: the cloud market is growing, but faster than Google is gaining shares. Alphabet Ergebnisse: Wachstum der Werbeeinnahmen enttäuscht

3) GenAI. Google is known to have its own LLM in the form of Gemini. But here, too, the product is at odds with its own approach. Gemini should act as the first point of contact for any questions. This in turn puts pressure on ad revenue. One reason why Google tried for so long not to integrate Gemini into the search. However, the first real attempts were also known to have failed miserably. Meta's own model, on the other hand, does not rely on the usual approach: free and open source. That's what Meta does. You could say that Meta in GenAI is the new Google. Because Google's dominance in various areas stems from the fact that it is free and open source (Chromium, Android). Gemini, on the other hand, is the opposite and has therefore aroused less interest and appeal in the community than Llama. Even though Google (i.e. Alphabet) is sitting on masses of excellent TPUs, it doesn't seem as if the compute is being put to good use. In addition, the company's CEO always wavers between consumer products and real future-oriented products. Not least because of this, there were also recent disputes with DeepMind, which was told to focus more on consumer products. Or to put it another way: Sundar Pichai deliberately overslept the trend towards GenAI because there was no solution to keep his own business alive. Pichai's hesitancy then leads to absurd scenes like at the Google i/o, where they try to be "cool" and "funny" to appeal to the consumer, while at the same time trying to come across as serious for the developers. All the imaginative products they used to have are now taken over by Meta - and even produced.

4) YouTube and co. YouTube is now an important source of income for Google. However, the prices are now so high (subscription) that many are already turning away and looking for alternatives. Google can't drive the price much higher. What's more, GenAI videos are also creating competition here. TikTok and Meta are in the starting blocks and will soon create their own platforms where educational videos are created using AI. In short, the future of YouTube does not look good.

All in all, Google has come to a standstill in 2010. It is trying to survive using traditional methods. And so far it has worked to some extent. But: the sinking ship is in sight. The decline is clearly visible and so far there is no valid solution in sight from C-Level.

[Quoted tweet]
Google is finished. @perplexity_ai will continue to prevail. Google has failed on the one hand with its countless ads in the search results and on the other hand by oversleeping AI search.


GZdNfGcWsAAV0bv.png

GZdOQOaXwAATjLo.png

GZdPtQvWoAAAFxl.png


2/12
@kimmonismus
Don't forget that Google search accounts for ~60% of total revenue! It is by far the most important source of revenue for Google.



GZdW1etWoAAE1sy.jpg


3/12
@Prashant_1722
One reason Amazon is rising because of ecommerce ecosystem they built and it is easier to advertise to users right when they are looking for products on the ecommerce platform because the intent is already there. This is better than spending money on Google search and hoping people land there. Amazon also benefits from small and medium advertisers spending money to promote their products. They are not only dependent on big brands.

However, people will still spend money on Google advertising because of the sheer volume of users who search everyday.



4/12
@kimmonismus
I'm not saying that Google is already dead. But the figures show a trend. And that's what I'm focusing on. Google has an annual turnover of $~300b. The search sector is by far the largest source of revenue, and that is collapsing.



5/12
@modelsarereal
AI is not a search robot; AI is used when you want to talk about something with someone who understands the subject. But as soon as it comes to reliable original data, humans and AI are out of the game.



6/12
@kimmonismus
wrong. Thats why RAG and even more Open AI's o1 show, that with proper system-2 elements we can reduce hallucinations to >4%. And I am pretty sure that in 2025 we will lower it to >0,1%



7/12
@reymondin
And as if that wasn’t enough, US government is considering a breakup of Google.

https://edition.cnn.com/2024/10/09/tech/us-government-considers-a-breakup-of-google/index.html



8/12
@kimmonismus
That would be the next hard blow.



9/12
@ASM65617010
Partially disagree. The future of some major corporations is closely tied to their strength in cutting-edge research, and Google possesses a large pool of talent in this area.

This week, two current and one former Google researchers were awarded Nobel Prizes.



10/12
@kimmonismus
I deliberately exclude DeepMind from my criticism. However, DeepMind and Pichai are often not on the same page. The wrong priorities are set at CEO level, which often slow DeepMind down.
DeepMind would be better off without Google.
To put it bluntly: Demis Hassabis and his team are doing an excellent job!



11/12
@eckhardt_d48141
I wonder when we will see a YT Killer from X. The step toward a real X Video Platform is rather easy to take. More and more content producer flock to X since Elon is in Charge.



12/12
@kimmonismus
True! Excited as well




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,019
Reputation
8,229
Daps
157,692



1/21
@perplexity_ai
Pro Search is now more powerful. Introducing Reasoning Mode!

Challenge your own curiosity. Ask multi-layered questions. Perplexity will adapt.

Try it yourself (sample queries in thread)👇



https://video.twimg.com/ext_tw_video/1848800321306562563/pu/vid/avc1/1280x720/_0rglRblnHofZeaG.mp4

2/21
@perplexity_ai
"Read all of Bezos’ annual shareholder letters and compile a table of the key takeaways from each year." https://www.perplexity.ai/search/read-all-the-shareholder-lette-ACNERXw4T0iuxdENJjzsPQ



3/21
@perplexity_ai
"Please provide me with the latest information or releases from the following areas regarding Amazon:

1. Recent acquisitions or mergers
2. Executive leadership transitions
3. Technological innovations or IT infrastructure updates
4. Cybersecurity incidents or data breaches
5. Major company announcements or significant news stories
6. Developments in user data protection and privacy policies
7. Key points from their latest 10-K filing and annual report"

https://www.perplexity.ai/search/please-provide-me-with-the-lat-wXprqGQQQ2.hExhF4XQUsQ



4/21
@HCSolakoglu
Add one more step for hallucination checks; it will help users a lot when verifying all the information.



5/21
@ChrisUniverseB
Currently stuck on this question, it works for other questions, but this used up 3 chances ;(



GahF7mWXMAAX02g.png


6/21
@riderjharris




https://video.twimg.com/amplify_video/1848834004633481216/vid/avc1/1080x1920/ShNb6ZKMUVzrtmOD.mp4

7/21
@koltregaskes
Amazing, However, this would look even more amazing in a Windows app. 😜



8/21
@TheRamoliya
👌👌



9/21
@MarkusOdenthal
Wow! Super useful.

This will really help developers like me build great products.

Best tool to start with research.



10/21
@ash_stuart_
How do you switch it on and off? Sometimes LLMs go off track so it's good if we have the option to choose.



11/21
@Xploringhuman
I used same sample quesries and got different / incomplete responses.



12/21
@marcvitzv
@threadreaderapp unroll



13/21
@threadreaderapp
@marcvitzv Guten Tag, you can read it here: Thread by @perplexity_ai on Thread Reader App Share this if you think it's interesting. 🤖



14/21
@BensBeardButter
@socraticexp v cool



15/21
@risphereeditor
Nice!



16/21
@azed_ai
You're number one search engine, congrats



17/21
@catholicvoyager
Is there a formal way to help Perplexity to improve the accuracy of its output? For example, if Perplexity gives inaccurate information, and if the user corrects Perplexity with source material, will that improve Perplexity for all users? Or will the error perpetuate?



18/21
@Prashant_1722
This is great. What are the Top Reasoning searches rn on Perplexity



19/21
@dreamybuilder
Love it



20/21
@hoppyturtles
casual Perplexity W



21/21
@ChrisUniverseB
fukk can you refresh my 10 queries? Wasted 4 on this: kept getting stuck.

Reworded question to: What strategies can cities use to reduce traffic jams without building new roads?

Got this: https://www.perplexity.ai/search/what-strategies-can-cities-use-cRETp7FwR3OZ.QaKTq57cw



GahH9rOXUAA8DM9.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196









1/11
@AravSrinivas
Stuff Perplexity has shipped in the last 10 days alone:
1. Finance
2. Spaces
3. Internal File Search
4. Reasoning mode
5. MacOS app

October isn't over yet—more announcements are coming in the coming days. And a lot more in November.



2/11
@AravSrinivas
Finance:

[Quoted tweet]
Perplexity for Finance:

Real-time stock quotes. Historical earning reports. Industry peer comparisons. Detailed analysis of company financials. All with delightful UI.

Have fun researching the market!


https://video.twimg.com/ext_tw_video/1846286829328510976/pu/vid/avc1/1280x720/n5bsP91J_qwPk0om.mp4

3/11
@AravSrinivas
Internal Knowledge Search:

[Quoted tweet]
Introducing Internal Knowledge Search (our most-requested Enterprise feature)!

For the first time, you can search through both your organization's files and the web simultaneously, with one product.


https://video.twimg.com/ext_tw_video/1846946800143683586/pu/vid/avc1/1280x720/KnYzFEcOcYBsEblY.mp4

4/11
@AravSrinivas
Spaces:

[Quoted tweet]
We’re also launching Spaces for all users to customize Perplexity for repeated use cases, a collaboration hub that lets you:

• Upload and store files
• Search files in addition to the web
• Pick an AI model of your choice
• Write custom instructions for the answers you want
• Invite others and search collaboratively


5/11
@AravSrinivas
Reasoning Mode:

[Quoted tweet]
Pro Search is now more powerful. Introducing Reasoning Mode!

Challenge your own curiosity. Ask multi-layered questions. Perplexity will adapt.

Try it yourself (sample queries in thread)👇


https://video.twimg.com/ext_tw_video/1848800321306562563/pu/vid/avc1/1280x720/_0rglRblnHofZeaG.mp4

6/11
@AravSrinivas
Mac App:

[Quoted tweet]
Perplexity is now on MacOS. Ask anything with ⌘ + ⇧ + P.

Download now: pplx.ai/mac


https://video.twimg.com/ext_tw_video/1849483341931933696/pu/vid/avc1/1280x720/1OaH1pQUjoa2rgOa.mp4

7/11
@petergyang
What’s the secret to shipping so much so fast



8/11
@AravSrinivas
Let's talk about it on Sunday!



9/11
@yibili
I'm a Pro user and I -first time- tried Spaces today but confused. I attached files and I asked to analyze but always skipped.

Claude is doing quite well with Artifacts and I was expecting a similar experience but Perplexity solution, at least for now, couldn't figure out :/



10/11
@riderjharris
On fire 🔥



11/11
@BowTiedGroundHo
The hustle here is impressive.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top