OpenAI's ChatGPT Faces Unexplained Performance Drop
ChatGPT, an artificial intelligence chatbot powered by OpenAI, has been experiencing a surprising degradation in its performance capabilities, leaving researchers puzzled. A recent study conducted by research teams from UC Berkeley and Stanford, published on July 18th, revealed that the latest models of ChatGPT have significantly decreased in their aptitude to correctly answer a recurring sequence of questions across a span of several months.
Unidentified Factors Behind Performance Decline
The researchers behind the study were not able to pinpoint the exact reasons behind this notable decline in the AI chatbot’s efficiency. To evaluate the reliability of the numerous ChatGPT models, researchers Lingjiao Chen, Matei Zaharia, and James Zou tasked both ChatGPT-3.5 and ChatGPT-4 models with problems like solving mathematics, responding to sensitive queries, generating fresh lines of code, and performing spatial reasoning from prompts.
ChatGPT’s Deterioration Documented in Study
The research findings indicated that, in March, the GPT-4 model successfully identified prime numbers with a staggering 97.6% accuracy rate. However, by June, this model's accuracy in the same test had nose-dived to a mere 2.4%. Strangely, during this same period, the preceding GPT-3.5 model had exhibited an enhancement in its prime number identification skills.
Models Struggle With New Code and Sensitive Questions
When asked to generate new code lines, both ChatGPT models demonstrated a considerable decrease between March and June. It was also noted that the chatbot's responses to sensitive queries, especially those related to ethnicity and gender, became increasingly succinct, often rejecting to provide answers. Earlier versions of the chatbot offered detailed rationale for not providing responses to certain sensitive queries, but the June iteration simply offered an apology and declined to provide an answer.
Recommendations and Future of ChatGPT
The researchers concluded that the “same” language model service's behavior could undergo significant changes within a short period of time, emphasizing the importance of consistent AI model quality monitoring. For entities and individuals who integrate Large Language Model (LLM) services into their workflows, they proposed implementing a form of monitoring analysis to ensure the chatbot remains competent. On a related note, OpenAI announced on June 6 a plan to form a team geared towards handling risks associated with superintelligent AI, which it expects to materialize within the next decade.