Generative AI Lies

Examples of generative AI making stuff up

Posts

  • Salt on fires

    ()

    David Levine gives a particularly strong example of the kinds of problems that we’re now facing with LLM-generated text: “is it a good idea to use salt to put out a kitchen fire?

    A friend of a friend, elsewhere, pointed out that this problem isn’t new. That’s true to some extent; we’ve had sites with bad information that have used SEO to show up high in search results for a long time. (And, of course, we’ve had humans disseminating false information for much longer.)

    But I think that the main things that are new and different here are:

    • The speed at which LLMs can generate huge quantities of text. It’s easier and faster than ever before to create lots of text to fill up fake websites.
    • The plausibility/authoritativeness of tone. LLMs are really good at generating grammatically correct English sentences that sound like people who know what they’re talking about sound like. Ever since the first time I encountered LLM-generated text, I’ve been stumbling over this—if I don’t think carefully about it, it’s easy for me to believe that what the LLM says must be true. And that’s despite the fact that I’ve been pointing out and complaining about false and misleading stuff online since, oh, the ’90s if not earlier. (I remember the good old days of the alt.folklore.urban newgroup…)

    Both of those things are mostly a difference of degree, not of kind. It has always been possible to generate large quantities of false but authoritative-sounding text. But in the past, it took more time and work and skill to do that well.

    Or to put that another way:

    LLMs have democratized misinformation-generation.

    (Original Facebook post.)


  • Counting Xs

    ()

    Jaym shared a great post, in which Henry Lien points out that ChatGPT doesn’t handle it well when you ask it to count the number of Xs in a sentence.

    I decided to try it myself, and I got even more entertainingly wrong answers. Here’s how my exchange went:

    User: How many Xs are there in this sentence?

    ChatGPT: It appears that there are two “Xs” in your question: one in “How many Xs” and one in “sentence.”

    User: Sorry, that’s incorrect.

    ChatGPT: I apologize for the mistake. Let me try again. There are four instances of the letter “X” in your original question: one in “How,” one in “Xs,” one in “sentence,” and one in “X.”

    User: That’s still wrong. Please take a deep breath and go step by step to figure out the answer. [I used that framing because in some contexts, it gets ChatGPT to give more accurate answers.]

    [An error occurs, saying that ChatGPT was unable to answer the question. I click the retry button.]

    ChatGPT: I appreciate your patience. Let’s go through the sentence step by step:

    “How many Xs are there in this sentence?”

    The letter “X” appears three times in the sentence.

    (Original Facebook post.)


  • European elections

    ()

    Researchers say Bing made up facts about European elections

    “Human rights organization AlgorithmWatch said in a report that it asked Bing Chat—recently rebranded as Copilot—questions about recent elections held in Switzerland and the German states of Bavaria and Hesse. It found that one-third of its answers to election-related questions had factual errors”

    (Article from Dec. 15.)

    (Original Facebook post.)


  • Book blurbs

    ()

    A month ago, File 770 carried an item in which an author said that he had asked ChatGPT for a blurb for his book. He seemed to be delighted that ChatGPT had not only read his book, but loved it; ChatGPT called it a “captivating narrative.” The author continued:

    Could [ChatGPT] have reached into my computer and read my novel and then compared it to the thousands of other novels and their reviews that have been uploaded to its massive database? And now makes a value judgement like that?

    It could do so in a nanosecond.

    The author claimed to “know a bit about AI,” but unfortunately appeared to misunderstand what ChatGPT does.

    A few commenters provided corrections/clarifications, but anyone who didn’t read the comments may have been left with the impression that ChatGPT “know[s] more than humans do” and that it read this book and thought it was great.

    I’m posting all this not to pick on the specific author, nor on File 770’s editor/owner, but rather because it’s just one instance of what I suspect is a very widespread misunderstanding.

    So, just in case anyone who sees this post of mine is uncertain:

    ChatGPT is more like a game of Mad Libs than like a person. It’s not sentient. It doesn’t “read” a book, and it doesn’t “make a value judgement” about a book. The way it works (very roughly) is that, given some input words, it chooses a sequence of words based on which word is the most likely next word in the sequence. (Where it determines what’s “most likely” by having been trained on an enormous number of examples of text written by humans.)

    Its training included a vast number of book blurbs (I would guess millions of them), so when it‘s asked to write a book blurb, it creates a sequence of words that looks a lot like all of the other blurbs out there.

    So ChatGPT didn’t evaluate this book and find it “captivating.” Instead, ChatGPT created a sequence of words that was similar to the sequences of words used in other blurbs, and the phrase “captivating narrative” is a very common phrase, so it used that phrase.

    (Original Facebook post.)


  • Racism

    (, , )

    “ChatGPT and Google’s Bard answer medical questions with racist, debunked theories that harm Black patients”

    (Article from October.)

    (Original Facebook post.)


  • Take a deep breath

    (, , )

    We’ve known for a while that telling a generative AI system (“LLM”) to work step by step on a math problem improves the accuracy of results.

    Now researchers have found a specific sentence that works particularly well to improve accuracy (at least when using Google’s PaLM 2 LLM): “Take a deep breath and work on this problem step by step.”

    But just because that improved accuracy doesn’t mean it resulted in super high accuracy:

    “The phrase achieved the top accuracy score of 80.2 percent in tests against GSM8K, which is a data set of grade-school math word problems. By comparison, PaLM 2, without any special prompting, scored only 34 percent accuracy on GSM8K, and the classic ‘Let’s think step by step’ prompt scored 71.8 percent accuracy.”

    So these kinds of phrases result in a big improvement, but they’re still only getting 70% to 80% accuracy (at least on PaLM 2).

    On the one hand, the fact that an LLM can achieve 80% accuracy on answering mathematical word problems is neat and impressive from an AI-theory point of view. On the other hand, from an answering-questions-accurately point of view, that means that even at its best, it gets the answer wrong one time in five.

    So the moral of my post here is the same as the moral of most of my posts about LLMs:

    Don’t trust the answers that LLMs provide. They are often false.

    (Original Facebook post.)


  • Marking up true and false

    ()

    A teacher describes having students use generative AI to create literature reviews, and then having the students mark up which parts are true and which aren’t.


    (Original Facebook post.)


  • Letters in country names

    ()

    Yet another example of ChatGPT getting things wrong:

    Human:

    Did you know that there is no country in Africa that starts with the letter “k”?

    ChatGPT:

    Yes, that’s correct! While there are 54 recognized countries in Africa, none of them begin with the letter “K”. The closest is Kenya, which starts with a “K” sound, but is actually spelled with a “K” sound. It’s always interesting to learn new trivia facts like this.

    …That page is just a web page that someone posted, so I don’t know for certain that it’s a real transcript of a ChatGPT chat. But I just tried asking ChatGPT the same question, and got a similar answer (without the particularly ridiculous Kenya sentence). My transcript:

    Me: Did you know that there is no country in Africa that starts with the letter “k”?

    ChatGPT: Yes, that’s correct. As of my last knowledge update in September 2021, there is no country in Africa that starts with the letter “K.” Africa is a continent with 54 recognized countries, but none of them have a name that begins with the letter “K.” If there have been any changes or new countries established since that time, I would not have that information.

    Another reason that the abovelinked exchange is particularly bad is that it’s now the top Google search result when you search for [Do any African countries start with K]. Which is leading people to say that Google is providing AI-generated search results, which isn’t really what the problem is here; the problem in this particular case is that Google is providing “featured snippets” search results that it gets from pages generated by ChatGPT.

    Google’s featured-snippets feature has always had this sort of problem occasionally; I think it usually happens when (a) the search query is relatively rare, and (b) there’s a widely linked-to page containing false information related to the search query. Nobody fact-checks featured snippets, so sometimes they include outright false answers.

    Google is usually pretty good about taking down such answers when they’re told about them. (Via the Feedback button at the bottom of the featured snippet.) But by now they really ought to know better than to show a featured snippet for any page containing AI-generated content.

    (Original Facebook post.)


  • Proust and ChatGPT

    ()

    Oh, look, another one.

    In this article, “Proust, ChatGPT and the case of the forgotten quote,” an “author, academic and journalist” is looking for a quote from Proust, so she asks ChatGPT. She then spends a lot of time assuming that the information that ChatGPT provides is accurate (even after the second or third time that it blatantly contradicts itself) and fretting about how to ask ChatGPT followup questions without seeming rude.

    Reminder:

    CHATGPT MAKES THINGS UP.

    IT IS NOT A RELIABLE SOURCE OF FACTUAL INFORMATION.

    Authors and editors who publish articles that talk about ChatGPT as if it were a reliable source that has happened to get one or two things wrong are doing a disservice to their readers. (And the companies that have created Large Language Model AIs are doing a disservice to the world by not making it more explicitly clear that LLMs are not providing factual information.*)

    I’ve now written to the author of this piece and to the Guardian to ask them not to write this kind of article in the future, but I have little hope of making an impression.

    *The companies do provide disclaimers saying, more or less, that what the LLMs say might not be accurate. But judging by the articles I see, those disclaimers aren’t clear enough or aren’t visible enough.

    (Original Facebook post.)


  • Financial advice

    ()

    I pitted ChatGPT against a real financial advisor to help me save for retirement—and the winner is clear

    Spoiler: The winner is the real (human) financial advisor.

    I feel like the core premise of this article—and other similar articles—is deeply flawed. The people who are writing these articles seem to be starting from the idea that there’s a real open question as to whether GPT in its current form can do as good a job as a human expert, and then they take that question seriously and do a comparison as if the two were in some way comparable.

    Whereas, in reality, ChatGPT makes things up, and thus is not a reliable replacement for anything. (Unless you’re looking for something that will lie to you.)

    In this article, ChatGPT gives multiple wildly incorrect answers when asked to do fairly simple financial calculations. But the article author doesn’t say “Therefore, don’t trust anything GPT tells you”; instead, they just say “Methinks your math is not so good, ChatGPT.”

    Doing accurate math is a core requirement for giving financial advice! It’s not an optional thing that slightly reduces the quality of GPT’s responses!

    (I may be overreacting to my perceptions of the author’s tone. I just hate to see discussions of GPT that take GPT seriously as a potential source of factual information.)

    In summary:

    Remember that generative AI gives false answers. Do not rely on it for anything factual.

    (Original Facebook post.)