“Over just a few months, [GPT-4] went from correctly answering a [particular] math problem 98% of the time to just 2%, study finds”
More specifically:
“in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked. But just three months later, its accuracy plummeted to a lowly 2.4%. Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. The March version got the answer to the same question right just 7.4% of the time—while the June version was consistently right, answering correctly 86.8% of the time.”
Also, it looks like they asked GPT-4 to give step-by-step reasoning for the primes question; in March, it gave good step-by-step answers, but in June, it ignored the step-by-step part of the prompt.
Here’s the paper that the article is talking about (not yet peer-reviewed, I think).
Leave a Reply