Generative AI Lies

Examples of generative AI making stuff up

Decline in accuracy

Over just a few months, [GPT-4] went from correctly answering a [particular] math problem 98% of the time to just 2%, study finds”

More specifically:

“in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked. But just three months later, its accuracy plummeted to a lowly 2.4%. Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. The March version got the answer to the same question right just 7.4% of the time—while the June version was consistently right, answering correctly 86.8% of the time.”

Also, it looks like they asked GPT-4 to give step-by-step reasoning for the primes question; in March, it gave good step-by-step answers, but in June, it ignored the step-by-step part of the prompt.

Here’s the paper that the article is talking about (not yet peer-reviewed, I think).

(Original Facebook post.)

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *