September 13, 2024 | 13.51
READING TIME: 4 minutes
OpenAI has announced the release of o1, the first in a series of “reasoning” LLMs trained to answer complex questions faster than a human. Along with o1, it is also releasing o1-mini, a smaller, cheaper version. For OpenAI, o1 represents a step toward its long-term goal of human-like AI. In practical terms, the model can write code and solve multi-step problems more efficiently than previous models. However, it is also more expensive and slower to operate than GPT-4o. OpenAI is calling this release of o1 a preview to underscore its early stage of development.
Access to o1-preview and o1-mini has been available since September 12 for ChatGPT Plus and Team users, with Enterprise and Edu users getting access early next week. OpenAI plans to expand access to o1-mini to all free ChatGPT users, but has not yet set a release date. Developer access to o1 is particularly expensive: in the API, o1-preview costs $15 for 1 million input tokens and $60 for 1 million output tokens. By comparison, GPT-4o costs $5 for 1 million input tokens and $15 for 1 million output tokens.
The training behind o1 is fundamentally different from that of its predecessors, explains Jerry Tworek, OpenAI’s head of research. While the company is keeping the exact details secret, Tworek says that o1 “was trained using a completely new optimization algorithm and a new training dataset specifically tailored for it.” While previous GPT models were trained to mimic patterns in the training data, with o1 OpenAI trained the model to solve problems autonomously using a technique known as reinforcement learning, which teaches the system through rewards and penalties. The model then uses a “chain of thought” to process queries, similar to how humans process problems step by step.
With this new training methodology, OpenAI says the model should be more accurate. “We’ve noticed that this model makes fewer mistakes,” Tworek says. But the problem persists. “We can’t say we’ve completely solved every comprehension problem.” The main feature that sets this new model apart from GPT-4o is its ability to tackle complex problems, like coding and math, much better than its predecessors, even providing explanations for its reasoning. “The model is definitely better at solving AP math than me, and I took college math,” says Bob McGrew, OpenAI’s head of research. McGrew says OpenAI also tested o1 on a qualifying exam for the International Mathematical Olympiad, and while GPT-4o only solved 13 percent of the problems correctly, o1 scored 83 percent.
In online programming competitions known as Codeforces, this new model has achieved the 89th percentile of participants, and OpenAI says the next update of this model will perform “similar to doctoral students on challenging physics, chemistry, and biology tasks.” At the same time, o1 isn’t as capable as GPT-4o in many areas. It’s not as good at factual knowledge of the world, and it doesn’t have the ability to navigate the web or process files and images. Still, the company believes it represents a new beginning for AI: it’s called o1 to mean “reset the counter to 1.”
While it’s not yet possible to test o1 directly, McGrew and Tworek gave a demonstration during a live presentation. They asked the model to solve a complex mathematical puzzle, and the model came up with a correct answer after 30 seconds of processing. The interface was designed to show the steps of reasoning as the model thinks. What’s striking is not so much the fact that it shows its work, but how deliberately o1 seems to mimic human thinking. Phrases like “I’m curious about what you think,” “I’m thinking about what you think,” and “OK, let’s see” create the illusion of a step-by-step thought process.