An AI that only criticizes ChatGPT!

CriticGPT will allow for better supervision and correction of artificial intelligence. Its role is to report ChatGPT errors.

OpenAI unveiled its new artificial intelligence model last week. It can identify errors in the code that ChatGPT can generate. The Californian startup has designed CriticGPT to improve results and to refine the accuracy of its popular generative chatbot.

L’reinforcement learning from human feedback (RLHF) is a crucial process in the development of artificial intelligence.

This is indeed the technique that allows the improvement of the results and the precision of large language models (LLM). OpenAI’s new tool facilitates its realization.

CriticGPT assists developers in the programming code verification. The researchers from the Californian start-up detail their work in this paper.

Training an Automatic Critic

Current capabilities of ChatGPT rely on GPT-4 and its other versions. Let us recall that the Californian start-up’s in-house LLM is available in GPT-4 Turbo et GPT-4o.

CriticGPT is also based on this family of large language models. It makes it easier for developers to perform a code analysis pour report potential errors. These latter can escape human attention.

It was trained using a dataset of code samples. These samples contained intentionally inserted errors.

The developers then provided feedback as if they had discovered the errors. Using this method, the AI was able to recognize and report various coding errors.

Three out of five developers already prefer CriticGPT

The development of CriticGPT required a new technique called Force Sampling Beam Search (FSBS). This allows the chatbot to write very detailed reports.

The method also offers the ability to adjust the rigor with which the chatbot searches for bugs. In addition, it controls the frequency with which it generates false positives. As a result, it hallucinates less than ChatGPT.

CriticGPT has demonstrated its ability to detect both intentional and natural errors in ChatGPT results. Developers already seem to be really enjoying this new AI assistant.

Indeed, in 63% of cases involving natural errorsthey preferred the chatbot’s feedback to that of other developers, which generated more comprehensive reviews.

Beyond Code Review

CriticGPT excels not only in code review, but also in training data evaluation. With sets intended for ChatGPT, the new tool found errors — passed under the developers’ watch — in 24% of cases.

This ability to detect mistakes often missed by humans suggests its usefulness beyond coding. However, this new AI has its limits.

She was trained on short answers and may have struggle with longer and more complex tasksIts ability to reduce hallucinations is impressive, but it does not eliminate them entirely.

Regardless, OpenAI is considering integrating CriticGPT into its RLHF process to improve the evaluation of its LLM results.

Do you think this AI will soon be available to subscribers?

Share the article:

Facebook

Our blog is reader-powered. When you buy through links on our site, we may earn an affiliate commission.