OpenAI has introduced CriticGPT, a GPT-4-based model designed to identify mistakes in ChatGPT's code outputs, significantly improving the accuracy of AI model training and evaluation.

This development represents a significant step towards improving the Reinforcement Learning from Human Feedback (RLHF) process used to train and align advanced AI systems.

According to OpenAI's research, when human reviewers use CriticGPT to assess ChatGPT's code, they outperform those without AI assistance 60% of the time. This improvement addresses a growing challenge in AI development: as models become more sophisticated, their mistakes become subtler and harder for human trainers to identify.

CriticGPT was trained using RLHF, similar to ChatGPT, but with a focus on identifying and critiquing errors in code. The model has shown a 63% preference rate over ChatGPT in catching naturally occurring bugs, partly due to its reduced tendency to make trivial complaints or hallucinate problems.

OpenAI researchers have also developed methods to generate more comprehensive critiques using additional test-time search against the critique reward model. This allows for a balance between aggressively identifying problems and maintaining accuracy.

While CriticGPT represents a significant advancement, OpenAI acknowledges several limitations, including the need to develop methods for understanding longer and more complex tasks, addressing dispersed errors, and managing model hallucinations.



Share this post
The link has been copied!