Generative AI Lies

Examples of generative AI making stuff up

Category: Math

  • Is 2026 next year?

    ()

    Hacker News discussion of various generative-AI systems’ responses to the query [Is 2026 next year?]

    For example, here are the first couple of responses shown on that page:

    Google search AI response: “No, 2026 is not next year; it will be the year after next. Since the current year is 2025, “next year” would be 2026, which is only a little over a year away. “

    ChatGPT does something similar; response:

    “No — if today is 2025, then 2026 is the year after next. Next year would be 2026 only if we were currently in 2025. Right now (based on the system date) it is December 2025, so: Next year = 2026 The year after next = 2027 So in this moment: yes, 2026 is next year.”

    Also:

    Claude [version] Haiku 4.5

    “No, 2026 is not next year. Since it’s currently December 2025, next year would be 2026. Wait—let me correct that: 2026 is next year from the perspective of 2025. So yes, 2026 is next year.”

    And:

    Grok [version] 4 fast: “No, 2026 is not next year.

    Today is December 2, 2025, so:

    – “Next year” means 2026 (starting January 1, 2026). – But since we’re already in the last month of 2025, 2026 is indeed the very next calendar year.

    So yes — 2026 is next year. “

    …Some people (probably humans) commented on that page that this issue might be related to when the models’ training data was last updated. In keeping with that theory, here’s another AI’s answer:

    LLama [version] 4 Scout: “To determine if 2026 is next year, we need to consider the current year. As of my knowledge cutoff, the current year is 2024.

    If we assume the current year is 2024, then:

        2025 would be next year.
        2026 would be the year after next.
    

    However, if the current year is 2025, then:

        2026 would indeed be next year.
    

    Could you please clarify or provide the current year for a more accurate response?”

    But that’s presumably not the entire issue, because the other systems quoted above did explicitly say that the current year is 2025, while still responding as if the current year were 2024.

    …I should mention that several of the models quoted on that Hacker News page did correctly answer the question. (And Google’s answer seems to have improved over time.) But the fact that several of them failed in similar ways does make it sound to me like there may be something more specific going on than just the usual LLMs-make-stuff-up issue.


  • Sum of consecutive cubes

    ()

    I try to avoid reading the AI Overviews when I do Google searches that show them. But this time I happened to glance at the AI Overview. And it was blatantly false.

    I searched for [sum of three “consecutive cubes”]. Here’s what the AI Overview told me:

    The sum of three consecutive cubes is always equal to the square of the sum of those three numbers; in other words, if you take any three consecutive integers, add them together, then square that sum, you will get the same result as adding the cubes of those three numbers.

    Example:

    • Consider the numbers 1, 2, and 3.
    • Adding them gives: 1 + 2 + 3 = 6
    • Squaring the sum gives: 6^2 = 36
    • Adding their cubes gives: 1^3 + 2^3 + 3^3 = 1 + 8 + 27 = 36

    Key points:

    • This property holds true for any set of three consecutive integers.
    • You can use this fact to quickly calculate the sum of three consecutive cubes without having to individually cube each number.

    It’s true that this pattern holds true if you happen to pick the numbers 1, 2, and 3. Super cool!

    So let’s look at the very next set of three consecutive integers: 2, 3, and 4.

    2 + 3 + 4 = 9, and 9^2 = 81.
    2^3 + 3^3 + 4^3 = (8 + 27 + 64) = 99.

    But wait, 81 doesn’t equal 99! What’s wrong???

    What’s wrong is that the AI Overview’s claim is false for all cases except for (1,2,3), (0,1,2), and (-1,0,1). There’s one other case where the sum of the cubes of three consecutive integers is a perfect square (of a different number), but the only cases where it’s specifically the square of the sum of those three consecutive integers are (1,2,3), (0,1,2), and (-1,0,1).

    (I’m reminded of a joke proof that all odd numbers are prime: 3 is prime; 5 is prime; 7 is prime; therefore, by induction, all odd numbers are prime.)

    Google’s AI Overview links to sources to support its statements. The sources that it linked to in this case are the following. Not one of them makes the claim that the AI Overview is making.

    As usual, the moral of this story is: Don’t believe anything that generative AI tells you.


    I often remember to add ” -ai” to the ends of searches (to tell Google not to give an AI Overview), but I often don’t.

    But I’ve been hearing reports that that isn’t working any more for some people. And, indeed, when I re-run this search with -ai, sometimes it gives me an AI Overview and sometimes it doesn’t.

    Interestingly, the specific contents of the AI Overview vary—if I do the search without -ai, I get the one that I posted about, but if I do the search with -ai, I get a different claim (that doesn’t mention squares) that I haven’t checked yet.

    Adding swear words does still seem to work to remove the AI Overview. It also provides search results that use those swear words, which may or may not improve your search results, depending on what kinds of search results you want. 

    🙂

    The reason I was doing this search was that an interesting fact about a particular number was mentioned in a recent TV show. I had played around with the relevant numbers a bit, and had reached the point where I was curious about what work had been done on this topic.

    If I hadn’t tried out the sums of cubes of three consecutive numbers on my own just prior, I might have been tempted to believe what the AI Overview said. But because I had just calculated several such answers myself, I knew immediately that the AI Overview was wrong.


    (Original Facebook post.)


  • More financial advice

    ()

    Yet another article that talks about using ChatGPT for your finances. As usual for these articles, it both explicitly says that ChatGPT gets things ruinously wrong, and frames things as if using ChatGPT for your finances is a perfectly reasonable thing to do.

    Make sure to triple check all the numbers and suggestions it gives you, because some of the suggestions we got made no sense and were totally incorrect — if I had followed what ChatGPT suggested, it could have seriously wrecked my credit score and got me in even more debt.

    Here’s what I did and how you can do the same

    I continue to be baffled by this kind of take. If you went to a human financial adviser and they gave you advice that would wreck your credit score, you wouldn’t say “So when you go to this adviser, be sure to check their numbers!”; you would say “Don’t go to this adviser.”

    (Original Facebook post.)


  • Khan Academy

    (, )

    Khan Academy pilot/beta program: using GPT to teach kids.

    “Simple concepts such as multiplication and division of integers and decimals repeatedly offer incorrect answers, and will even sometimes treat students’ wrong inputs as the correct solutions.”

    (Article from August, 2023.)

    (Original Facebook post.)


  • Take a deep breath

    (, , )

    We’ve known for a while that telling a generative AI system (“LLM”) to work step by step on a math problem improves the accuracy of results.

    Now researchers have found a specific sentence that works particularly well to improve accuracy (at least when using Google’s PaLM 2 LLM): “Take a deep breath and work on this problem step by step.”

    But just because that improved accuracy doesn’t mean it resulted in super high accuracy:

    “The phrase achieved the top accuracy score of 80.2 percent in tests against GSM8K, which is a data set of grade-school math word problems. By comparison, PaLM 2, without any special prompting, scored only 34 percent accuracy on GSM8K, and the classic ‘Let’s think step by step’ prompt scored 71.8 percent accuracy.”

    So these kinds of phrases result in a big improvement, but they’re still only getting 70% to 80% accuracy (at least on PaLM 2).

    On the one hand, the fact that an LLM can achieve 80% accuracy on answering mathematical word problems is neat and impressive from an AI-theory point of view. On the other hand, from an answering-questions-accurately point of view, that means that even at its best, it gets the answer wrong one time in five.

    So the moral of my post here is the same as the moral of most of my posts about LLMs:

    Don’t trust the answers that LLMs provide. They are often false.

    (Original Facebook post.)


  • Financial advice

    ()

    I pitted ChatGPT against a real financial advisor to help me save for retirement—and the winner is clear

    Spoiler: The winner is the real (human) financial advisor.

    I feel like the core premise of this article—and other similar articles—is deeply flawed. The people who are writing these articles seem to be starting from the idea that there’s a real open question as to whether GPT in its current form can do as good a job as a human expert, and then they take that question seriously and do a comparison as if the two were in some way comparable.

    Whereas, in reality, ChatGPT makes things up, and thus is not a reliable replacement for anything. (Unless you’re looking for something that will lie to you.)

    In this article, ChatGPT gives multiple wildly incorrect answers when asked to do fairly simple financial calculations. But the article author doesn’t say “Therefore, don’t trust anything GPT tells you”; instead, they just say “Methinks your math is not so good, ChatGPT.”

    Doing accurate math is a core requirement for giving financial advice! It’s not an optional thing that slightly reduces the quality of GPT’s responses!

    (I may be overreacting to my perceptions of the author’s tone. I just hate to see discussions of GPT that take GPT seriously as a potential source of factual information.)

    In summary:

    Remember that generative AI gives false answers. Do not rely on it for anything factual.

    (Original Facebook post.)


  • +10%, -10%

    ()

    Another example of ChatGPT makin’ stuff up:

    Me: If you reduce a number by 10% and then raise it by 10%, what would you get?

    ChatGPT: If you reduce a number by 10% and then raise it by 10%, the final result will be the same as the original number. This is because a 10% reduction followed by a 10% increase results in a net change of 0%.

    For example, if the original number is 100, reducing it by 10% would give you 100 – (10/100) * 100 = 90. Then, raising it by 10% would give you 90 + (10/100) * 90 = 99. As you can see, the final result is very close to the original number of 100, but not quite equal to it due to rounding error. However, in general, the final result will be the same as the original number.

    Me: So is 99 equal to 100?

    ChatGPT: In the example that I provided, 99 is not equal to 100. However, it is important to note that the result I provided was just an example to illustrate the concept of reducing a number by 10% and then raising it by 10%. In general, if you reduce a number by 10% and then raise it by 10%, the final result will be the same as the original number.

    (Original Facebook post.)