GPT-4 Progress in AGI alignment
Blog post description.
LARGE LANGUAGE MODELSRISK


Recently, OpenAI released a significant research paper that exhibited a remarkable leap in the problem-solving ability of their large language model, GPT-4. Notably, its performance nearly doubled in a mathematics test, but the improvements extend to other domains as well. We'll delve into the specifics of this upgrade, consider its implications, and reflect on lingering concerns that OpenAI hasn't yet fully addressed.
OpenAI trained two reward models for GPT-4. One model rewarded GPT-4 for the final result – the solution to a mathematics problem, for instance. The other provided positive reinforcement for each intermediate reasoning step within the solution. This latter approach, focused on the reasoning process, performed surprisingly well. It solved 78% of problems from a specific math test subset, significantly outpacing GPT-4's raw performance of 42.5% (which is already double GPT-3's performance of 23%).
Interestingly, this process-oriented model outperformed the model that exclusively rewarded correct answers. In other words, rewarding good working-out proved more effective than simply incentivizing correct answers. This development represents a shift in how we perceive model training and hints at the possibility of improving model performance by refining their understanding of problem-solving steps.
The paper titled "Improving Mathematical Reasoning with Process Supervision" presented an intriguing method: process supervision. This approach improves model performance by directly training it to produce a thought process endorsed by humans. The model is trained to generate correct reasoning steps in response to a given problem, thereby enhancing its interpretability and alignment with human problem-solving.
This form of process supervision shows potential beyond mathematics. The method has demonstrated state-of-the-art results in calculus, chemistry, physics, and more, indicating its wide applicability. This cross-domain effectiveness showcases the concept of out-of-distribution generalization, another promising avenue for AI development.
While OpenAI's new approach presents many exciting possibilities, it also raises some important questions. Specifically, the study illuminates the ongoing need for fine-tuning, but doesn't reveal the specifics of the 'math mix' dataset used for this process. This lack of transparency is common in the field, but can limit our understanding of the methods and their applicability.
Moreover, the presence of synthetic data in the model's training raises further questions. While synthetic data could alleviate data bottlenecks and allow for more scalable training, it is unclear how much synthetic data was used and what impact this might have on model performance or interpretability.
Another key concern relates to the alignment of the model. While the recent advancements promote surface-level alignment by encouraging human-like reasoning, they don't necessarily ensure deep alignment. A recent paper pointed out that language models like GPT-4 may provide 'unfaithful explanations,' suggesting that the reasoning steps they generate may not truly reflect the model's internal processes. This highlights the need for more research into inner alignment to ensure that AI models are not only providing the right answers, but doing so for the right reasons.
OpenAI's recent paper on GPT-4's performance represents a significant step forward in AI research. However, despite the remarkable advancements, there are still critical unanswered questions regarding the specifics of the training process and the alignment of AI models. As we celebrate these advancements, it is essential to continue pursuing a comprehensive understanding of these models and the potential risks they pose.