Generative AI Lies

Examples of generative AI making stuff up

Posts

  • Retrieval Augmented Generation

    ()

    Article (from May 4) about feeding relevant information to a generative AI to reduce the likelihood of it making stuff up, an approach known as “RAG,” or “Retrieval Augmented Generation.”

    “RAG can help reduce a model’s hallucinations — but it’s not the answer to all of AI’s hallucinatory problems. Beware of any vendor that tries to claim otherwise.”

    (Original Facebook post.)


  • Khan Academy

    (, )

    Khan Academy pilot/beta program: using GPT to teach kids.

    “Simple concepts such as multiplication and division of integers and decimals repeatedly offer incorrect answers, and will even sometimes treat students’ wrong inputs as the correct solutions.”

    (Article from August, 2023.)

    (Original Facebook post.)


  • Eclipse

    (, )

    Elon Musk’s Grok [AI] Creates Bizarre Fake News About the Solar Eclipse Thanks to Jokes on X”

    Grok’s headline: “Sun’s Odd Behavior: Experts Baffled.”

    Gizmodo article explains: “The tweets that apparently prompted this bizarre proclamation from Grok included joke after joke of people wondering where the sun had gone. It appears that so many people making the same joke confused Grok, which tried to turn the jokes into something we should all be worried about.”

    (Original Facebook post.)


  • Who are we talking to?

    ()

    Long but excellent article about generative AI (LLM) chatbots: “Who are we talking to when we talk to these bots?”, by Colin Fraser.

    (Article from early 2023; I’m sure some details of LLM behavior have changed since then, but I think the core ideas here are still valid and relevant.)

    Fraser’s central points are that LLMs are designed to create sequences of words, and that a chatbot persona is a fictional character overlaid (by means of training) on that word-sequence-creation software. And that our experience of having a conversation with an LLM chatbot is partly shaped by the user interface and by our expectations about how conversations go.

    A few quotes:

    “At least half of the reason that interacting with the bot feels like a conversation to the user is that the user actively participates as though it is one.”

    “the first-person text generated by the computer is complete fiction. It’s the voice of a fictional ChatGPT character, one of the two protagonists in the fictional conversation that you are co-authoring with the LLM.”

    “A strong demonstration that the LLM and the ChatGPT character are distinct is that it’s quite easy to switch roles with the LLM. All the LLM wants to do is produce a conversation transcript between a User character and the ChatGPT character, it will produce whatever text it needs to to get there.”

    “my interlocutor is not really the ChatGPT character, but rather, an unfeeling robot hell-bent on generating dialogues.”

    “The real story isn’t that the chat bot is bad at summarizing blog posts—it’s that the chat bot is completely lying to you”

    “[AI companies] actually have very little control what text the LLM can and can’t generate—including text that describes its own capabilities or limitations.”

    “The LLM [is never truthfully refusing to answer a question]. It’s always just trying to generate text similar to the text in its training data”

    (Original Facebook post.)


  • Reverse centaur

    (, )

    Cory Doctorow on how several recent AI failures illuminate the idea of a “reverse centaur”:

    “This turns AI-‘assisted’ coders into reverse centaurs. The AI can churn out code at superhuman speed, and you, the human in the loop, must maintain perfect vigilance and attention as you review that code, spotting the cleverly disguised hooks for malicious code that the AI can’t be prevented from inserting into its code.”

    (Original Facebook post.)


  • US military

    ()

    Pentagon explores military uses of large language models

    “‘Everybody wants to be data-driven,’ Martell said. ‘Everybody wants it so badly that they are willing to believe in magic.’”

    (Article from Feb. 20.)

    (Original Facebook post.)


  • Task list

    ()

    I just saw a recommendation for an online task-management-helper tool called “Magic ToDo” that looks kinda interesting.

    The idea is that you enter a task and click a button, and it breaks the task into steps. You can then have it break down any of those steps into substeps, and so on.

    On the plus side, I tried it on a few tasks, and it seemed to work OK on most of them. (But see an example of it going wrong in comments on this post.) And the tool is free, and it’s “designed to help neurodivergent people with tasks they find overwhelming or difficult.”

    On the minus side, it uses generative AI to do the task breakdown, so of course it sometimes gets things significantly wrong. (And it has one of those everything-here-could-be-wrong disclaimers that I was complaining about the other day.)

    So … I don’t know whether any of y’all would find it useful, but I thought it was worth linking to.

    https://goblin.tools

    The first task that I entered was “Bake a cake.”

    It gave me what I (a non-baker) think was a reasonably good overview of the steps to follow. The first one was:

    >Gather all necessary ingredients and materials

    So I asked it to break that into substeps. Those, too, were more or less reasonable. The first one said:

    >>Check pantry for ingredients

    I asked it to break that into substeps. The first one was:

    >>>Check pantry for flour

    I asked it to break that into substeps too. Here were the steps for how to check the pantry for flour:

    >>>>Check pantry for flour

    >>>>Gather all necessary ingredients and materials

    >>>>Bake a cake

    I didn’t try any other multi-level step breakdowns, so I don’t know whether it produces similarly incoherent results if you go too many levels down in other tasks.

    (Original Facebook post.)


  • NYC business chatbot

    ()

    NYC’s business chatbot is reportedly doling out ‘dangerously inaccurate’ information

    “An investigation by The Markup found the chatbot sometimes gets city policies wrong in its responses.”

    A “spokesperson for the NYC Office of Technology and Innovation” says this isn’t a problem because there’s a disclaimer that says not to rely on the info given by the chatbot. (I’m paraphrasing.)

    (Edited to add, because someone asked: This isn’t an April Fool’s joke.)

    (Original Facebook post.)


  • Malware vector

    ()

    A vector for malware, enabled by generative AI (LLMs):

    If you ask an LLM to write code for you, the resulting code may include the names of software packages that don’t exist.

    In theory, that might not be a big deal. If a human tries to run that code, they’ll find that the packages don’t exist.

    But a security researcher has now found that sometimes LLMs repeatedly make up the same names for nonexistent software packages. And he created and published a real software package with one of those recurring names.

    And that package has now been downloaded over 15,000 times.

    The real package didn’t contain malware, but the researcher’s point is that it could have.

    So if you’re a software developer, and you’re using code written by an LLM, maybe check that all of the dependencies that it tells you to rely on are legitimate.

    (Original Facebook post.)


  • Cohen legal filing

    (, )

    Michael Cohen [(Trump’s former lawyer)] used fake cases created by AI in bid to end [Cohen’s] probation”

    “In the filing, Cohen wrote that he had not kept up with ‘emerging trends (and related risks) in legal technology and did not realize that Google Bard was a generative text service that, like ChatGPT, could show citations and descriptions that looked real but actually were not.’ To him, he said, Google Bard seemed to be a ‘supercharged search engine.’”

    (Original Facebook post.)