Generative AI Lies

Examples of generative AI making stuff up

Posts

  • More financial advice

    ()

    Yet another article that talks about using ChatGPT for your finances. As usual for these articles, it both explicitly says that ChatGPT gets things ruinously wrong, and frames things as if using ChatGPT for your finances is a perfectly reasonable thing to do.

    Make sure to triple check all the numbers and suggestions it gives you, because some of the suggestions we got made no sense and were totally incorrect — if I had followed what ChatGPT suggested, it could have seriously wrecked my credit score and got me in even more debt.

    Here’s what I did and how you can do the same

    I continue to be baffled by this kind of take. If you went to a human financial adviser and they gave you advice that would wreck your credit score, you wouldn’t say “So when you go to this adviser, be sure to check their numbers!”; you would say “Don’t go to this adviser.”

    (Original Facebook post.)


  • Medical-journal image

    ()

    Obviously AI-generated image from an article in the journal Medicine.

    The article has now been retracted “after concerns were raised over the integrity of the data and an inaccurate figure,” but you can still view the whole article or just the image.

    The AI-generated image is figure 2 in the article, labeled “Mechanism diagram of alkaline water treatment for chronic gouty arthritis.” It seems clear that nobody reviewed that image in any way.

    Medicine’s website says: “The Medicine® review process emphasizes the scientific, technical and ethical validity of submissions.”

    (via Mary Anne)

    (Original Facebook post.)


  • Human nonsense generator

    ()

    “Listen, you don’t need a large language model like chatGPT. You can ask me questions and I’ll generate confident sounding nonsense at a fraction of the resource consumption. Half the time, I don’t even remember to drink water!”

    —existennialmemes

    (Original Facebook post.)


  • Retrieval Augmented Generation

    ()

    Article (from May 4) about feeding relevant information to a generative AI to reduce the likelihood of it making stuff up, an approach known as “RAG,” or “Retrieval Augmented Generation.”

    “RAG can help reduce a model’s hallucinations — but it’s not the answer to all of AI’s hallucinatory problems. Beware of any vendor that tries to claim otherwise.”

    (Original Facebook post.)


  • Khan Academy

    (, )

    Khan Academy pilot/beta program: using GPT to teach kids.

    “Simple concepts such as multiplication and division of integers and decimals repeatedly offer incorrect answers, and will even sometimes treat students’ wrong inputs as the correct solutions.”

    (Article from August, 2023.)

    (Original Facebook post.)


  • Eclipse

    (, )

    Elon Musk’s Grok [AI] Creates Bizarre Fake News About the Solar Eclipse Thanks to Jokes on X”

    Grok’s headline: “Sun’s Odd Behavior: Experts Baffled.”

    Gizmodo article explains: “The tweets that apparently prompted this bizarre proclamation from Grok included joke after joke of people wondering where the sun had gone. It appears that so many people making the same joke confused Grok, which tried to turn the jokes into something we should all be worried about.”

    (Original Facebook post.)


  • Who are we talking to?

    ()

    Long but excellent article about generative AI (LLM) chatbots: “Who are we talking to when we talk to these bots?”, by Colin Fraser.

    (Article from early 2023; I’m sure some details of LLM behavior have changed since then, but I think the core ideas here are still valid and relevant.)

    Fraser’s central points are that LLMs are designed to create sequences of words, and that a chatbot persona is a fictional character overlaid (by means of training) on that word-sequence-creation software. And that our experience of having a conversation with an LLM chatbot is partly shaped by the user interface and by our expectations about how conversations go.

    A few quotes:

    “At least half of the reason that interacting with the bot feels like a conversation to the user is that the user actively participates as though it is one.”

    “the first-person text generated by the computer is complete fiction. It’s the voice of a fictional ChatGPT character, one of the two protagonists in the fictional conversation that you are co-authoring with the LLM.”

    “A strong demonstration that the LLM and the ChatGPT character are distinct is that it’s quite easy to switch roles with the LLM. All the LLM wants to do is produce a conversation transcript between a User character and the ChatGPT character, it will produce whatever text it needs to to get there.”

    “my interlocutor is not really the ChatGPT character, but rather, an unfeeling robot hell-bent on generating dialogues.”

    “The real story isn’t that the chat bot is bad at summarizing blog posts—it’s that the chat bot is completely lying to you”

    “[AI companies] actually have very little control what text the LLM can and can’t generate—including text that describes its own capabilities or limitations.”

    “The LLM [is never truthfully refusing to answer a question]. It’s always just trying to generate text similar to the text in its training data”

    (Original Facebook post.)


  • Reverse centaur

    (, )

    Cory Doctorow on how several recent AI failures illuminate the idea of a “reverse centaur”:

    “This turns AI-‘assisted’ coders into reverse centaurs. The AI can churn out code at superhuman speed, and you, the human in the loop, must maintain perfect vigilance and attention as you review that code, spotting the cleverly disguised hooks for malicious code that the AI can’t be prevented from inserting into its code.”

    (Original Facebook post.)


  • US military

    ()

    Pentagon explores military uses of large language models

    “‘Everybody wants to be data-driven,’ Martell said. ‘Everybody wants it so badly that they are willing to believe in magic.’”

    (Article from Feb. 20.)

    (Original Facebook post.)


  • Task list

    ()

    I just saw a recommendation for an online task-management-helper tool called “Magic ToDo” that looks kinda interesting.

    The idea is that you enter a task and click a button, and it breaks the task into steps. You can then have it break down any of those steps into substeps, and so on.

    On the plus side, I tried it on a few tasks, and it seemed to work OK on most of them. (But see an example of it going wrong in comments on this post.) And the tool is free, and it’s “designed to help neurodivergent people with tasks they find overwhelming or difficult.”

    On the minus side, it uses generative AI to do the task breakdown, so of course it sometimes gets things significantly wrong. (And it has one of those everything-here-could-be-wrong disclaimers that I was complaining about the other day.)

    So … I don’t know whether any of y’all would find it useful, but I thought it was worth linking to.

    https://goblin.tools

    The first task that I entered was “Bake a cake.”

    It gave me what I (a non-baker) think was a reasonably good overview of the steps to follow. The first one was:

    >Gather all necessary ingredients and materials

    So I asked it to break that into substeps. Those, too, were more or less reasonable. The first one said:

    >>Check pantry for ingredients

    I asked it to break that into substeps. The first one was:

    >>>Check pantry for flour

    I asked it to break that into substeps too. Here were the steps for how to check the pantry for flour:

    >>>>Check pantry for flour

    >>>>Gather all necessary ingredients and materials

    >>>>Bake a cake

    I didn’t try any other multi-level step breakdowns, so I don’t know whether it produces similarly incoherent results if you go too many levels down in other tasks.

    (Original Facebook post.)