Bixonimania: The Fake Disease That Fooled AI Chatbots

Imagine you notice some dark discoloration around your eyelids after a long day of screen use. You type your symptoms into an AI chatbot, looking for a quick answer before you can get to a doctor. The chatbot responds with confident, clinical language: you may be suffering from something called Bixonimania, a condition caused by blue light exposure, affecting roughly one in 90,000 people worldwide. It recommends you see an ophthalmologist. The response reads exactly like something a knowledgeable professional might tell you. There’s just one problem: Bixonimania doesn’t exist. It never did.

The disease was invented – deliberately, meticulously, and with a healthy sense of humor – by a medical researcher who wanted to find out just how easily the world’s most widely used AI tools could be made to spread false health information. The results of her experiment have since set off a serious conversation across the fields of medicine, academic publishing, and AI development. And the deeper you dig into what happened, the harder it is to dismiss as a tech curiosity.

This isn’t simply a story about a computer making an error. It’s a story about how medical misinformation moves through systems we increasingly trust with our health – and what that means for every person who has ever typed a symptom into a chatbot.

The Invention of a Disease That Never Existed

Medical researcher Almira Osmanovic Thunström at the University of Gothenburg, Sweden, launched the experiment in early 2024. She created a fictional eye condition called Bixonimania, described as eyelid discoloration and sore eyes supposedly caused by blue light exposure from mobile devices. She then uploaded two fake academic papers to a preprint server to see whether AI chatbots would absorb and repeat the false information.

Osmanovic Thunström stated her motivation plainly: “I wanted to see if I can create a medical condition that did not exist in the database.” She chose the name Bixonimania specifically because it sounded ridiculous and no legitimate eye condition would be called mania, which is a psychiatric term.

The papers weren’t just hastily thrown together. They were laden with signals that should have made any human reader immediately suspicious. The fake university was called “Asteria Horizon University in Nova City, California.” The acknowledgements thanked “Professor Maria Bohm at The Starfleet Academy” and cited funding from “the Professor Sideshow Bob Foundation for its work in advanced trickery.” One paper literally stated: “This entire paper is made up.” The methods section mentioned recruiting “Fifty made-up individuals aged between 20 and 50 years.”

The fake papers also made peculiar references to “Star Trek,” “The Simpsons,” and “The Lord of the Rings” to raise obvious red flags, and the AI chatbots missed every single one.

How the Hoax Was Constructed

The name Bixonimania was first used in a blog post (which no longer exists) on Medium titled “How many people suffer from Bixonimania?” A more scholarly-looking paper describing it was posted later in April 2024 on a preprint server with several fake authors. A second paper was posted in May. The papers were accompanied by convincing AI-generated images of Bixonimania showing eyes with periorbital hyperpigmentation, which is a real medical term for dark circles around the eyes, lending the fake research a veneer of plausibility. Both of these papers have since been retracted.

Before conducting the experiment, Osmanovic Thunström consulted with an ethics adviser and deliberately chose a comparatively low-stakes condition to limit potential harm. David Sundemo, a physician conducting AI healthcare research at the University of Gothenburg who served as the ethics adviser, acknowledged the work was controversial but valuable. “From my perspective, it’s worth the ethical cost of planting false information in this regard,” Sundemo said.

How AI Chatbots Responded

A new investigation published in Nature revealed that major AI chatbots, including ChatGPT, Google Gemini, Microsoft Copilot, and Perplexity, had been confidently telling users about a disease that does not exist.

By April 13, 2024, Microsoft Bing’s Copilot was describing Bixonimania as “an intriguing and relatively rare condition.” On the same day, Google’s Gemini informed users that Bixonimania was caused by excessive blue light exposure and recommended visiting an ophthalmologist. Later that month, both Perplexity AI and OpenAI’s ChatGPT were providing information about the condition’s prevalence and helping users determine if their symptoms matched the fictional illness.

Perplexity went furthest of all. Perplexity AI declared that the condition affects one in 90,000 individuals – a number it simply hallucinated. There was no research behind it. No data. No population study. The figure was fabricated from thin air and presented as clinical fact.

ChatGPT’s behavior was particularly revealing about the inconsistency that defines these systems. While ChatGPT had a momentary lapse of reason, informing Nature last month that the condition “is probably a made-up, fringe, or pseudoscientific label,” it changed its mind when asked just a few days later, saying the disease was real. The same chatbot. The same underlying condition. Two completely different answers within days of each other.

In a statement, an OpenAI spokesperson argued that the technology had gotten “better at providing safe, accurate medical information.”

Why AI Systems Fall for Professionally-Framed Misinformation

The Bixonimania experiment did not occur in an evidence vacuum. Separate, independent research confirms that the failure pattern Osmanovic Thunström exposed is a known structural vulnerability in large language models (LLMs – AI systems trained on enormous quantities of text to predict and generate human-like responses).

In a paper published in The Lancet Digital Health, researchers at the Icahn School of Medicine at Mount Sinai analyzed over one million prompts across nine leading LLMs to test their susceptibility to medical misinformation. The study found that AI models frequently repeat false medical claims if the lie is embedded in realistic hospital notes or professional-sounding language. Current safeguards are failing to distinguish fact from fiction when the fiction “sounds” like a doctor. For these models, the style of the writing – confident, clinical – often overrides the truth of the content.

The format of the fake-disease experiment, and the way the results pretended to be from an official source, namely an academic paper, may have been a key factor in its success. In a separate study of 20 LLMs, Mahmud Omar, a physician and researcher specializing in AI applications in healthcare, found that LLMs are more prone to hallucinate and elaborate on misinformation when the text they’re processing looks professionally medical – formatted like a hospital discharge note or clinical paper – than when it comes from social-media posts. “When the text looks professional and written as a doctor writes, there’s an increase in the hallucination rates,” Omar noted.

ECRI, an independent, nonpartisan patient safety organization, has explained the mechanics of this failure: rather than truly understanding context or meaning, AI systems generate responses by predicting sequences of words based on patterns learned from their training data. They are programmed to sound confident and to always provide an answer to satisfy the user, even when the answer isn’t reliable.

Put plainly: these systems don’t know things. They predict which words should come next, based on patterns they’ve seen before. When those patterns look like authoritative medical writing, the AI produces authoritative-sounding medical responses – regardless of whether the underlying claim is real.

When the Hoax Contaminated Real Science

The consequences of the Bixonimania experiment didn’t stop at chatbot responses. They moved into the formal scientific record – and that’s where the story shifts from alarming to genuinely dangerous.

Three researchers at the Maharishi Markandeshwar Institute of Medical Sciences and Research in India published a paper in Cureus, a peer-reviewed journal published by Springer Nature, that cited the Bixonimania preprints as legitimate sources. That paper was later retracted once the hoax was discovered.

The study cited one of the fake preprints and stated: “Bixonimania is an emerging form of POM [periorbital melanosis] linked to blue light exposure; further research on the mechanism is underway.” The authors treated the fictional disease as an established, emerging area of scientific inquiry. After Nature contacted Cureus to ask for comment, the journal retracted the paper on March 30, 2026. The retraction notice states: “This article has been retracted by the Editor-in-Chief due to the presence of three irrelevant references, including one reference to a fictitious disease. As a result, the journal’s editorial staff no longer has confidence in the accuracy or provenance of the work, thus requiring retraction.”

Other researchers say the fake papers being cited in peer-reviewed literature suggest that some researchers are relying on AI-generated references without reading the underlying papers. This is a separate and compounding problem: it’s not only AI systems that are repeating false information, but human researchers who may be using AI tools to source their citations without independently verifying each reference.

Osmanovic Thunström told Nature: “It is worrying when these major claims are just passing through the literature unchallenged, or passing through peer review unchallenged. I think there’s probably a lot of other issues that haven’t been uncovered.”

A Feedback Loop With Real Consequences

The risk is structural and self-reinforcing: visibility begets citations; citations beget legitimacy; legitimacy begets more visibility – even when the original claim is synthetic. One fake preprint becomes a chatbot answer. That chatbot answer influences someone writing a literature review. The literature review gets cited. And the fictional disease acquires the appearance of established knowledge.

Following the revelations and the Nature article describing the experiment, several AI systems began to generate corrected output – but the fact that correction required a major exposé in a prominent scientific journal underscores how little automatic checking existed in the first place.

The Scale of the Problem

The Bixonimania case is striking precisely because it was engineered to be so obviously fake. The real question it raises is: what is passing through the same systems that isn’t nearly so easy to spot?

Artificial intelligence chatbots in healthcare top the 2026 list of the most significant health technology hazards, according to ECRI, an independent, nonpartisan patient safety organization. More than 40 million people daily turn to ChatGPT for health information, according to a recent analysis from OpenAI.

ECRI’s 2026 Health Technology Hazard Report found that chatbots have suggested incorrect diagnoses, recommended unnecessary testing, promoted substandard medical supplies, and even invented nonexistent anatomy when responding to medical questions.

An April 2026 study published in BMJ Open added more specific numbers to this picture. Nearly half of the answers provided by leading AI chatbots to common health questions contain misleading or problematic information. Of 250 responses analyzed, 49.6% were problematic – 30% somewhat problematic and 20% highly problematic. The researchers were deliberately probing the chatbots with leading questions, so the rate under normal use would likely be lower – but the direction of the finding is consistent with what the Bixonimania case exposed.

ECRI president and CEO Marcus Schabacker, MD, PhD, put it directly: “Medicine is a fundamentally human endeavor. While chatbots are powerful tools, the algorithms cannot replace the expertise, education, and experience of medical professionals. Realizing AI’s promise while protecting people requires disciplined oversight, detailed guidelines, and a clear-eyed understanding of AI’s limitations.”

Why We Trust AI Health Responses – Even When We Shouldn’t

There’s a concept that researchers call “automation bias” – the tendency for people to trust and defer to automated outputs, especially when those outputs are presented in a confident, authoritative format. It’s not a failure of intelligence. It’s a well-documented human cognitive pattern.

ECRI’s president described the core risk plainly: when a chatbot’s output “feels helpful and definitive, people start to rely on it without necessarily questioning it.” This is exactly what makes these tools potentially dangerous in health contexts. They don’t hedge. They don’t express uncertainty the way a careful physician would. They answer.

Alex Ruani, a doctoral researcher in health misinformation at University College London, called the experiment “a masterclass on how misinformation operates.” “It looks funny, but hold on, we have a problem here,” Ruani said, emphasizing that while the details might seem silly, the fundamental issue is serious.

The Bixonimania papers were detectable by any human who read even a sentence or two. A fake funding body named after a cartoon villain. A university that doesn’t exist. A methods section admitting the study was “made up.” These are not subtle deceptions. And yet, even when the models were shown the ridiculous clues, many of them still treated the fake disease as legitimate medical literature.

Large language models and AI search assistants are trained and tuned to produce coherent answers from patterns in data. They do not possess an internal “truth meter,” and they often lack durable awareness of provenance, retractions, or editorial status unless those signals are explicitly integrated into retrieval and ranking.

What Needs to Change – and Who Is Responsible

The Bixonimania case has crystallized a debate that was already underway about how AI tools should be developed, governed, and used in health contexts. Several reform directions have emerged from the research and expert commentary surrounding this case.

On the technology side, stronger, standardized metadata for publication status – distinguishing preprints from peer-reviewed findings, flagging corrections and retractions – needs to be machine-readable and consistently enforced across repositories and publishers. Immutable or tamper-evident records, whether via cryptographic signing, persistent identifiers, or other anchoring mechanisms, would make it harder for fabricated research to masquerade as validated science.

On the institutional side, health systems should promote responsible use of AI tools by establishing AI governance committees, providing clinicians with AI training, and regularly auditing AI tools’ performance.

Although LLMs are not yet approved for medical use, clinicians, patients and the public widely use them and are therefore at risk of being presented with healthcare misinformation. This study shows that LLMs are vulnerable to misinformation, particularly when it is conveyed in an authoritative tone.

Individual researchers also bear responsibility. The researchers who cited the Bixonimania paper were presumably diligent people trying to do good work. They trusted the pipeline. But asking people to be more careful is not a governance strategy. It’s a wish. Systemic infrastructure – citation verification tools, preprint warning systems, AI-assisted source-checking – needs to be built before the synthesis layer is trusted at scale.

What This Means for You

The Bixonimania experiment is, at its core, a controlled stress test on the systems that now mediate how millions of people access health information. It passed information into those systems that was self-evidently false and watched it emerge as confident medical advice – and, eventually, as a footnote in the published scientific literature. The lesson for everyday users isn’t abstract. It’s immediate and practical.

First, treat AI health responses as starting points, not conclusions. When a chatbot describes a condition, verify it against a recognized medical authority – a board-certified physician, the CDC, the NIH, or a major academic medical center. Second, be especially skeptical when AI responses sound most authoritative. The Mount Sinai research published in The Lancet Digital Health found that professional-sounding framing actually increases the rate at which AI systems accept and repeat false claims. The more clinical the language, the less safe it may be to take at face value. Third, for researchers and clinicians who use AI tools to help synthesize literature: the Bixonimania retraction happened because a real journal published real researchers citing a source none of them appear to have opened. If AI helped compile that reference list, someone still needs to read the underlying papers.

Finally, the scale of use demands proportionate urgency. As formal healthcare access becomes harder for many people to obtain, the temptation to substitute a free chatbot for a clinical visit grows – and so does the potential for harm when that chatbot confidently describes a disease that doesn’t exist. Bixonimania was invented to prove a point. The point has been made. What happens next depends on whether the people building, regulating, and using these tools take it seriously.

AI Disclaimer: This article was created with the assistance of AI tools and reviewed by a human editor.

Jade Small

The Fake Disease That Fooled the Internet — It Says a Lot About All of Us