Only 2.4% in math: Is ChatGPT turning dumb?

By Andrew McCollum On Jul 20, 2023

Why is ChatGPT in the news?

Recently, researchers Lingjiao Chen and James Zou from Stanford University, and Matei Zaharia from UC Berkeley tested GPT-3.5 and GPT-4 for solving math problems, answering sensitive and dangerous questions, generating code and for visual reasoning. The conclusion: the “performance and behaviour” of both these large language models (LLMs) “can vary greatly over time”. The March version of GPT-4 identified prime numbers with 97.6% accuracy. In the June version, accuracy collapsed to 2.4%. Both made “more formatting mistakes in code generation in June than in March”.

How did other experts react?

When the findings were published, AI expert Gary Marcus tweeted that “this instability will be LLMs’ undoing”. Jim Fan, senior scientist at Nvidia, opined that in a bid to make GPT-4 “safer”, OpenAI could have made it less useful, “leading to a possible degradation in cognitive skills”. He added that in a bid to cut costs, OpenAI could have reduced the parameters. Princeton professor of computer science Arvind Narayanan and a PhD student at the same university co-authored a response in which they argue, among other things, that variance in behaviour does not suggest a degradation in capability.

How is OpenAI reacting to this controversy?

Reacting to user criticism, Peter Welinder (in pic), vice-president of OpenAI, which owns ChatGPT, said GPT-4 was getting smarter with each new version. “When you use it more heavily, you start noticing issues you didn’t see before.” Logan Kilpatrick, lead of developer relations at OpenAI, tweeted: “We are actively looking into the reports people shared.”

What does this mean for users and cos?

Human resources tasks like onboarding, training, performance management, and employee queries and complaints can be automated using ChatGPT. But to integrate OpenAI’s application programming interfaces (APIs) with the business workflows of companies, one has to continuously monitor, retrain and fine-tune the models to ensure that they continue to produce accurate output and stay up-to-date. Variance in AI model behaviour only makes it a bigger challenge.

Is it a boost for open-source LLMs?

The day the paper was released, Meta too released the second version of its free open-source LLM called Llama 2 for research and commercial use, providing an alternative to the pricy proprietary LLMs sold by OpenAI like ChatGPT Plus and Google’s Bard. Interestingly, Databricks Inc., whose CTO is Zaharia (one of the paper’s authors), has open-sourced its LLM called Dolly 2.0. Hugging Face’s BigScience Large Open-science Open-access Multilingual Language Model (BLOOM), too, is open to researchers to run.

Catch all the Technology News and Updates on Live Mint.
Download The Mint News App to get Daily Market Updates & Live Business News.

More
Less

Updated: 20 Jul 2023, 11:46 PM IST

For all the latest Technology News Click Here

For the latest news and updates, follow us on Google News.

Read original article here

Denial of responsibility! NewsUpdate is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.