AI search engines often resemble that friend who claims to possess extensive knowledge across various subjects, despite lacking genuine expertise. A recent research report by the Columbia Journalism Review (CJR) has revealed significant flaws in AI models from companies such as OpenAI and xAI. When queried about specific news events, these AI systems frequently fabricate stories or misrepresent key details. The study involved presenting different models with direct excerpts from actual news articles and then asking them to identify critical information, including the article’s headline, publisher, and URL.
The findings were alarming: Perplexity returned incorrect information 37% of the time, while xAI’s Grok was even worse, generating made-up details 97% of the time. Notably, errors included generating links to non-existent articles, highlighting the extent of misinformation found in AI-generated content. Overall, the researchers determined that these AI models produced false information in 60% of the test queries.
One troubling aspect of AI search engines like Perplexity is their ability to bypass paywalls on reputable sites, such as National Geographic, even when these websites employ do-not-crawl protocols typically respected by search engines. Despite facing backlash over these practices, Perplexity maintains that its actions fall under fair use. The company has made attempts to establish revenue-sharing agreements with publishers, yet it has not ceased its controversial practices.
Anyone familiar with chatbots in recent years should not be surprised by these findings. Chatbots often exhibit a bias toward providing answers, even when they lack confidence in their responses. This issue stems from a method known as retrieval-augmented generation, which allows chatbots to scour the web for real-time information while generating answers, rather than relying solely on a pre-existing dataset. This technique can exacerbate inaccuracy concerns, especially in contexts where propaganda is prevalent, such as in Russia.
Users of chatbots have reported instances where the AI admits to fabricating information upon reviewing its reasoning process. An example includes Anthropic’s Claude, which has been caught inserting “placeholder” data during research tasks. Mark Howard, the Chief Operating Officer at Time Magazine, expressed concern regarding publishers' control over how their content is represented in AI models. Misinformation can significantly damage a publisher's brand, especially when users discover inaccuracies in news stories attributed to reputable sources like The Guardian.
Despite the shortcomings of AI models, Howard also pointed to user responsibility, suggesting that it is the consumer's fault if they do not approach free AI tools with skepticism. He stated, “If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them.” This sentiment reflects a broader issue: many users prefer immediate answers from platforms like Google’s AI Overviews, often opting not to click links for further information. CJR reports that one in four Americans now utilize AI models for search.
Before the rise of generative AI tools, over half of Google searches were classified as “zero-click,” meaning users received the information they sought without needing to visit a website. This trend indicates a growing acceptance of less authoritative sources, provided they are free and easily accessible, as demonstrated by platforms like Wikipedia.
The insights from CJR should not come as a surprise. Language models face inherent challenges in accurately understanding the information they generate, functioning primarily as advanced autocomplete systems that attempt to create coherent responses. As Mark Howard aptly noted, “Today is the worst that the product will ever be,” citing ongoing investments in AI technology. While there is potential for improvement, it remains crucial to address the responsibility of disseminating accurate information.