OpenAI launches GPT-4o mini, which will replace GPT-3.5 in ChatGPT
OpenAI has announced the introduction of GPT-4o mini, a new, compact version of its latest GPT-4o AI language model, which will take the place of GPT-3.5 Turbo in ChatGPT, according to reports from CNBC and Bloomberg. This model will be accessible today for free users as well as those with ChatGPT Plus or Team subscriptions, and it is expected to be available for ChatGPT Enterprise next week.
GPT-4o mini is said to be multimodal, similar to its larger counterpart (released in May), with image input capabilities currently enabled in the API. OpenAI states that in the future, GPT-4o mini will be able to understand images, text, and audio, and will also have the capability to generate images.
The model supports an input context of 128K tokens and has a knowledge cutoff date of October 2023. It is also quite affordable as an API product, priced at 60% less than GPT-3.5 Turbo, costing 15 cents per million input tokens and 60 cents per million output tokens. Tokens are segments of data that AI language models utilize to process information.
Importantly, OpenAI mentions that GPT-4o mini will be the company's first AI model to implement a new technique known as "instruction hierarchy," which will enable the AI model to prioritize certain instructions over others. This may make it more challenging for individuals to execute prompt injection attacks, jailbreaks, or system prompt extractions that undermine the built-in fine-tuning or directives provided by a system prompt.
Once the model is released to the public (GPT-4o mini is currently not available in our version of ChatGPT), it is likely that users will test this new protective mechanism.
Performance
As expected, OpenAI claims that GPT-4o mini excels in various benchmarks, such as MMLU (which assesses undergraduate-level knowledge) and HumanEval (focused on coding). However, the issue is that these benchmarks often lack significant value, and very few truly reflect the practical utility of the model. This is because the perceived quality of a model's output often relates more to its style and structure rather than just its factual or mathematical accuracy. This subjective evaluation, often referred to as "vibemarking," is currently one of the more frustrating aspects of the AI landscape.
To provide some insight, OpenAI reports that the new model surpassed last year's GPT-4 Turbo on the LMSYS Chatbot Arena leaderboard, which evaluates user ratings by comparing the model against another randomly selected model. Nevertheless, this metric has proven less useful than previously anticipated within the AI community. Observers have noted that while GPT-4o (the larger version) consistently outperforms GPT-4 Turbo on the Chatbot Arena, it often yields significantly less useful outputs overall. For instance, its responses can be overly verbose or may address tasks that were not requested.
The value of smaller language models
OpenAI is not the first organization to introduce a smaller version of an existing language model; this practice is quite prevalent in the AI industry, with companies like Meta, Google, and Anthropic also following suit. These smaller language models are typically designed for simpler tasks at a reduced cost, such as creating lists, summarizing information, or suggesting words, rather than engaging in complex analyses.
Generally, these smaller models target API users, who pay a fixed fee based on the number of tokens they input and output when utilizing these models in their applications. In this instance, offering GPT-4o mini for free as part of ChatGPT could also be a cost-saving measure for OpenAI.
Olivier Godement, OpenAI’s head of API product, mentioned to Bloomberg, "In our mission to enable cutting-edge advancements and to develop the most powerful and useful applications, we certainly aim to continue creating frontier models that push boundaries. However, we also strive to offer the best small models available."
Smaller large language models (LLMs) typically have fewer parameters compared to their larger counterparts. Parameters are numerical values within a neural network that retain learned information. A model with fewer parameters has a smaller neural network, which usually restricts its ability to understand context deeply. Models with a higher number of parameters tend to be "deeper thinkers," benefiting from a greater number of connections between concepts represented by those numerical values.
However, it is important to note that there is not always a straightforward relationship between the size of parameters and the model's capabilities. Factors such as the quality of training data, the effectiveness of the model architecture, and the training methodology itself can significantly influence a model's performance. This has been demonstrated by the recent success of smaller models like Microsoft Phi-3.
Having fewer parameters results in fewer calculations needed to operate the model, which means that either less powerful (and less costly) GPUs can be used, or existing hardware can perform fewer calculations. This leads to reduced energy costs and a lower overall expense for the end user.