This article is more than 1 year old

Microsoft's AI Bing also factually wrong, fabricated text during launch demo

Redmond's hype box and Google's Bard just as bad as each other

Microsoft's new AI-powered Bing search engine generated false information on products, places, and could not accurately summarize financial documents, according to the company's promo video used to launch the product last week.

After months of speculation, CEO Satya Nadella finally confirmed rumors that Microsoft was going to revamp Bing with an OpenAI chatbot, reportedly more powerful than ChatGPT. New capabilities showcasing Bing's potential to make web search more flexible and efficient were demoed at a private, invite-only press event held at the company's headquarters in Washington.

The following day, Google launched its own rival AI-search chatbot Bard and was heavily criticized when it made a factual error about the James Webb Space Telescope. Google's parent biz Alphabet's market value temporarily dropped by 9 percent shortly afterwards, a decline worth over $120 billion, prompting investors and analysts to debate whether it was losing to Microsoft. 

In reality, both Microsoft's Bing and Google's Bard are just as bad as each other. Both companies launched shoddy AI chatbots that generated text containing false information, but Microsoft's mistakes were not immediately caught. Now, some of its errors have been spotted by Dmitri Brereton, a search engine researcher. 

Brereton pointed out that Bing claimed a specific pet hair vacuum cleaner had a "short cord length of 16 feet" despite it being a handheld machine, and that it may be too noisy. A link providing the website Bing summarized information from, however, said the vacuum is actually quiet and cordless.

When Yusuf Mehdi, Microsoft's Corporate Vice President, Modern Life, Search, and Devices, asked Bing about the nightlife in Mexico, it fabricated some details about existing bars and clubs. The opening hours for one listing were wrong, for example, whilst it claimed another had a website for users to browse when it did not. Bing also missed vital information too, and didn't mention El Almacen was, in fact, one of Mexico's oldest gay bars. 

Microsoft also touted a feature where Bing could summarise information from financial documents, but the software made glaring errors here too. A demonstration of Bing generating key takeaways from department stores Gap and Lulelemon's financials shows it quoting wrong numbers and figures that don't appear in the original documents at all. 

"Bing AI got some answers completely wrong during their demo. But no one noticed. Instead, everyone jumped on the Bing hype train," Brereton wrote in a blog post on Substack. "Google's Bard got an answer wrong during an ad, which everyone noticed. Now the narrative is 'Google is rushing to catch up to Bing and making mistakes!'. That would be a fine narrative if Bing didn't make even worse mistakes during its own demo."

None of this is surprising. Language models powering the new Bing and Bard are prone to fabricating text that is often false. They learn to generate text by predicting what words should go next given the sentences in an input query with little understanding of the tons of data scraped from the internet ingested during their training. Experts even have a word for it: hallucination.

If Microsoft and Google can't fix their models' hallucinations, AI-powered search is not to be trusted no matter how alluring the technology appears to be. Chatbots may be easy and fun to use, but what's the point if they can't give users useful, factual information? Automation always promises to reduce human workloads, but current AI is just going to make us work harder to avoid making mistakes.

"I think everyone can see the amazing potential for LLM powered search engines," Brereton told The Register. "It feels like we're so close to having it, and it's a huge shift from the past, and a much smoother user experience. Everyone wants it to happen right now. Some people are already using ChatGPT as their main search engine, even though the answers may not be accurate. It's just such a superior experience that people can't help but hop on the hype train."

The Register asked Microsoft for comment and the company told us it is aware of this report "and have analyzed its findings in our efforts to improve this experience."

"It's important to note that we ran our demo using a preview version," a Microsoft spokesperson added. "Over the past week alone, thousands of users have interacted with our product and found significant user value while sharing their feedback with us, allowing the model to learn and make many improvements already. We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period, which is why the feedback is critical so we can learn and help the models get better." ®

More about

TIP US OFF

Send us news


Other stories you might like