Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Would you like to insight in your inbox? Register for our weekly newsletters to only receive the company manager of Enterprise AI, data and security managers. Subscribe now
A comprehensive New study It has shown that models for artificial intelligence of open source models use much more arithmetic resources than their costs for closed source when executing identical tasks, which may undermine their cost advantages and redesign the evaluation of AI reporting strategies.
The research carried out by the AI company Nous researchfound that open weight models between 1.5 and 4 times more token-the basic units of the AI calculation used as closed models such as those of Openai And Anthropic. With simple questions of knowledge, the gap expanded dramatically, with some open models used up to 10 times more tokens.
Measurement of thinking efficiency in argumentation models: the missing benchmarkhttps://t.co/b1e1rjx6vz
We have measured the token use across argumentation models: Open models give 1.5-4x more tokens than closed models for identical tasks, but with enormous deviations depending on the tasks (to … to … pic.twitter.com/ly1083won8
– Nous Research (@noussresearch) August 14, 2025
“Open weight models use 1.5–4 × more tokens than closed (up to 10 × for simple knowledge questions), which means that they sometimes become more expensive despite lower costs per query,” wrote the researchers in their report in their report.
The results question a predominant assumption in the AI industry that open source models offer clear economic advantages over proprietary alternatives. While open source models generally cost less per token, the study suggests that this advantage “can easily compensate for if you need more tokens for a specific problem.
AI scale hits its limits
Power caps, rising token costs and infection delays change the company -ai. Take our exclusive salon to find out how top teams: Top teams are:
Secure your place to stay in front: https://bit.ly/4mwgngo
The investigated research 19 different AI models In three categories of tasks: basic knowledge issues, mathematical problems and logical puzzles. The team measured the “token efficiency” – how many computer units use models in relation to the complexity of their solutions – a metric that, despite its considerable effects, has received little systematic examination.
“The token efficiency is a critical metric for several practical reasons,” the researchers noticed. “While the hosting of open weight models may be cheaper, this cost advantage can be easily compensated for if you need more tokens for a specific problem.”
Inefficiency is particularly pronounced for large argumentation models (LRMS) that are expanded “Chains”To solve complex problems. These models that think through problems step by step can consume thousands of tokens that think about simple questions that should require a minimal calculation.
For basic knowledge like “What is the capital of Australia?” The study showed that argumentation models “hundreds of tokens think about simple questions of knowledge”, which could be answered in a single word.
Research resulted in strong differences between model providers. Openais models, especially his his O4 mini and newly published open source Gpt-Osses Variants showed an extraordinary token efficiency, especially for mathematical problems. The study showed that Openai models “occur for extreme token efficiency in mathematics problems”, with up to three times less token than other commercial models used.
Under open source options, Nvidia’s Lama-3.3-Nemotron Super-49b-V1 developed as “the token -standing open weight model in all areas”, while newer models of companies like Magistral showed “exceptionally high token use” as a outlier.
The efficiency gap varied considerably from the type of task. While open models used about twice as many tokens for mathematical and logical problems, the difference for simple questions of knowledge should occur in which efficient argument should be unnecessary.
The results have an immediate effect on the introduction of companies AI, in which the computing costs can quickly scale with use. Companies that evaluate AI models often focus on accuracy benchmarks and pro-member pricing, but can overlook the total arithmetic requirements for real tasks.
“The better token efficiency of models with closed weight often compensates for the higher API price design of these models,” said the researchers when analyzing the comprehensive key parts.
The study also showed that providers of closed source model providers seem to be actively optimized for efficiency. “Models with closed weight have been optimized iteratively to use fewer tokens to reduce the inference costs”, while open source models “increased their token use for newer versions and possibly reflect a priority for better argumentation performance”.
The research team stood with unique challenges in measuring efficiency between different model architectures. Many models with closed source do not reveal their raw argumentation processes, but provide compressed summaries of their internal calculations to prevent competitors from copying their techniques.
In order to remedy this, the researchers used completion -offs – the entire computing units that were invoiced for each query – used as a deputy for the argument. They found that “the latest models did not share their raw traces of argument” and instead use smaller voice models to transmit the chain of thought into summaries or compressed representations.
The study of the study included testing with modified versions of well -known problems to minimize the influence of mermanced solutions, e.g. B. variables in mathematical competitive problems from the American Invitational Mathematics Examination (Aime).
The researchers suggest that token efficiency should become a primary optimization goal in addition to accuracy for future model development. “A densifter cot also enables more efficient consumption of context and can counteract contexts with challenging argumentation tasks.” You wrote.
The publication of Openais Open Source GPT-Oß modelsDemonstrating the most modern efficiency with “freely accessible cot” could serve as a reference point for the optimization of other open source models.
The complete research data record and the evaluation code are Available on GithubSo that other researchers can validate and expand the results. Since the AI industry runs to stronger argumentation skills, this study suggests that the real competition may not be about who can build the smartest AI – but who can build up the most efficient.
In a world in which every token counts, the most lavish models may be out of the market, regardless of how well you can think.