Generative AI Lies

Examples of generative AI making stuff up

Posts

  • AI in healthcare

    ()

    Artificial intelligence systems [in healthcare contexts] require consistent monitoring and staffing to put in place and to keep them working well.”

    “Evaluating whether these products work is challenging. Evaluating whether they continue to work — or have developed the software equivalent of a blown gasket or leaky engine — is even trickier.”

    “‘Even in the best case, the [LLMs] had a 35% error rate’”


    (To be clear: Some of this article is about LLMs, and some of it is about predictive algorithms that I assume are old-fashioned non-generative AI. So this is partly an LLM issue, but also partly a non-LLM issue.)


    (Original Facebook post.)


  • Expert testimony

    ()

    A federal court judge has thrown out expert testimony from a Stanford University artificial intelligence and misinformation professor[, Jeff Hancock], saying his submission of fake information made up by an AI chatbot ‘shatters’ his credibility.”

    “At Stanford, students can be suspended and ordered to do community service for using an AI chatbot to ‘substantially complete an assignment or exam’ without instructor permission. The school has repeatedly declined to respond to questions […] about whether Hancock would face disciplinary measures.”

    (Original Facebook post.)


  • Identifying sources

    ()

    Researchers asked ChatGPT’s search tool to identify the source of excerpts from a couple hundred online articles.

    The result: ChatGPT made up answers. (Not always, but often.)

    Gasp! Shock! Surprise!

    (Original Facebook post.)


  • We don’t like to talk about that

    (, )

    If your ChatGPT prompt includes certain not-uncommon names of humans, ChatGPT says “I’m unable to produce a response” and ends the session.

    Turns out that those names are names of some people who have prominently reported that ChatGPT was making up lies about them.

    So apparently, on learning that ChatGPT is lying about specific people, OpenAI has decided to prevent ChatGPT from responding to any prompt that mentions those people’s names.

    Of course, usually there’s more than one human who has a particular name, so OpenAI is also preventing ChatGPT from talking about anyone who has the same name as someone who ChatGPT has previously prominently lied about.

    (Original Facebook post.)


  • Album release date

    (, )

    Today I did a Google search for [“field of stars” mccutcheon] and I forgot to append “-AI” to leave out the AI Overview. When I forget to leave out the Overview, I normally try to not even look at the Overview; but this time the Overview caught my eye. It starts out:

    “Field of Stars is an album by American folk singer-songwriter John McCutcheon. The album was released on January 10, 2024.”

    McCutcheon had an online concert today to celebrate the release of the album, so I spent several seconds wondering why he waited 10+ months after its release to have the concert. And then I realized that of course the AI Overview is just wrong, once again. The album will be officially released on January 10, 2025. (But is available now in various pre-official-release contexts.)

    But this is one of the reasons that I usually try not to even look at the Overview, because they often read to me as so authoritative that even though I know they include false information, I still sometimes believe them.

    (Original Facebook post.)


  • Kosher bacon

    ()

    If you do a Google search for [salt pork substitute kosher], the AI Overview tells you to try pancetta or bacon as a kosher substitute for salt pork.

    Yet another example of why you should never believe anything that generative AI tells you.

    (Original Facebook post.)

    (Update: Sometime in the year after I posted this, Google stopped returning an AI Overview in response to that query.)


  • Transcription

    ()

    Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said

    This is about Whisper, which I’ve heard praised in other contexts. 🙁

    “Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.”

    “medical centers [are using] Whisper-based tools to transcribe patients’ consultations with doctors”

    “While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper.”

    (Original Facebook post.)


  • Scaling and reliability

    ()

    “A common assumption is that scaling up [LLMs] will improve their reliability—for instance, by increasing the amount of data they are trained on, or the number of parameters they use to process information. However, more recent and larger versions of these language models have actually become more unreliable, not less, according to a new study.”

    “This decrease in reliability is partly due to changes that made more recent models significantly less likely to say that they don’t know an answer, or to give a reply that doesn’t answer the question. Instead, later models are more likely to confidently generate an incorrect answer.”

    (Article from Oct. 3.)

    (Original Facebook post.)


  • Apple Intelligence

    ()

    In Apple’s launch event for the iPhone 16 yesterday, I was not thrilled with the amount of emphasis they put on the new “Apple Intelligence” features. But I did think that if those features work well, some of them could be pretty useful.

    Unfortunately, this review makes me even more dubious.

    “In the preview I’m using, Apple Intelligence does an uncomfortable amount of making things up.”

    “like the time it alerted me that Donald Trump had endorsed Tim Walz for president. (Ha.) And the time it made up the idea that I’m teaching at UC Berkeley. (No.) And the time it elevated an obvious Social Security scam to my ‘priority’ inbox. (Yikes). And the time it edited a selfie to make me bald. (Double yikes.)”

    “it feels weird […] to see fabrications and misinterpretations of your life appear on your lock screen, inbox and other core parts of your iPhone.”

    “I told Apple about the many times I saw Apple Intelligence get facts wrong (I’ve had at least five to 10 laugh-out-loud moments per day). It says it is working to improve accuracy. But so is every other AI company — and that has proved to be a giant challenge.”

    (Original Facebook post.)


  • Fact-checking

    ()

    Email from a political organization:

    “As we navigate new waters and integrate cutting-edge AI technology into our fact-checking processes, we understand that occasional hurdles are inevitable.”

    …I sure hope that they’re not talking about using generative AI as a fact-checker. I’ve now sent them an email to point out what a terrible idea that would be, just in case.

    (Original Facebook post.)