< >
< >
< >
< >
< >
Treequest by Sakana Ai: Provision of Multi-Model teams that outperform individual LLMs by 30% - current-scope.com
< >
< >

Treequest by Sakana Ai: Provision of Multi-Model teams that outperform individual LLMs by 30%


Would you like to insight in your inbox? Register for our weekly newsletters to only receive the company manager of Enterprise AI, data and security managers. Subscribe now


Japanese Ki laboratory Saman Has introduced a new technology with which several large voice models (LLMS) can work together in a single task in order to effectively create a “dream team” of AI agents. The method called Multi-LLM AB-MCTSIt enables models, experiments and errors to be carried out and combining their unique strengths to solve problems that are too complex for each individual model.

For companies, this approach offers a means of developing more robust and more capable AI systems. Instead of being included in a single provider or model, companies can dynamically use the best aspects of different border models and assign the right AI for the right part of a task to achieve superior results.

The power of collective intelligence

Frontier -KI models develop quickly. However, each model has its own strengths and weaknesses that are derived from its unique training data and architecture. One could distinguish yourself in the coding, while another distinguishes another creative letter. The researchers of Sakana Ai argue that these differences are not a mistake, but a characteristic.

“We do not see these prejudices and different skills as restrictions, but as valuable resources for creating collective intelligence” Blog post. They believe that AI systems, as well as the greatest achievements of mankind from different teams, can come through the cooperation between different teams. “By bundling their intelligence, AI systems can solve problems that are insurmountable for each individual model.”

Think longer for inference time

The new algorithm of Sakana Ai is a “inference time calming” technique (also “called” as “referred to” “Test time calming”), A research area that has become very popular last year. While most of the focus in the AI ​​on the“ Skaling of the training time ”(models grow and they train on larger data sets), the scaling of the inference time improves performance by forming more calculation resources according to a model.

A general approach is to use reinforcement learning to arrange models Chain of thought (Cot) sequences, such as in popular models such as Openai O3 and can be seen Deepseek-R1. Another simpler method is a repeated sample, with the model receiving the same input several times to create a variety of potential solutions, similar to a brainstorming session. Sakana Ai’s work combines and promotes these ideas.

“Our framework offers a more intelligent, more strategic version of best-of-n (also known as repeated sampling),” Takuya Akiba, research scientist at Sakana Ai and co-author of paper, told Venturebeat. “It supplements argumentation techniques such as Long Cot via Rl. By dynamic selection of the search strategy and the corresponding LLM, this approach maximizes the performance within a limited number of LLM calls and provides better results for complex tasks.”

How adaptive branch search works

The core of the new method is an algorithm, which is referred to as adaptive-branching Monte Carlo tree search (AB-MCTS). It enables an LLM to effectively carry out test and error by intelligently compensating for two different search strategies: “search deeper” and “search more widely”. In the deeper search, a promising answer is recorded and repeatedly refined while looking wider, means to generate completely new solutions from scratch. AB-MCTS combines these approaches and enables the system to improve a good idea, but also to shoot and try something new when it hits a dead end or discovered another promising direction.

To achieve this, the system uses Monte Carlo Tree Search (Mcts), a decision algorithm that is known to be used Deepminds Alphago. With each step, AB-MCTS uses probability models to decide whether it is more strategic to refine an existing solution or generate a new one.

Different test time calming strategies Source: Sakana Ai

The researchers went one step further with multi-LLM-AB-MCTs, which not only decides “what” (generate refinement vs.), but also “which” LLM should do so. At the beginning of a task, the system does not know which model is best suited for the problem. It begins to try out a balanced mix of available LLMs and to learn in the course of progress which models are more effective and assign them more of the workload over time.

Put the KI dream team to the test

The researchers tested their multi-LLM-AB-MCTS system on the ARC-AGI-2-benchmark. ARC (abstraction and argumentation body) should test a human ability to solve new problems with visual thinking, which makes it notorically difficult for AI.

The team used a combination of Frontier models, including O4 miniPresent Gemini 2.5 Proand deepseek-r1.

The collective of models was able to find correct solutions for over 30% of the 120 test problems, a score that significantly exceeded one of the models that worked alone. The system has demonstrated the ability to dynamically assign the best model for a specific problem. In the case of tasks in which there was a clear way to a solution, the algorithm quickly identified the most effective LLM and used it more often.

AB-MCTS against individual models (source: Sakana Ai)
AB-MCTS against individual models Source: Sakana Ai

The team observed cases in which the models solved problems that were so far impossible for a single one of them. In one case, a solution generated by the O4 mini model was wrong. However, the system passed this incorrect attempt to use Deepseek-R1 and Gemini 2.5 Pro for Deepseek-R1 and Gemini-2.5 Pro, which analyzed, correct and ultimately resulted in the correct answer.

“This shows that Multi-LLM-AB-MCTS can flexibly combine frontier models in order to solve unsolvable problems and exceed the limits of achieving by LLMs as collective intelligence,” the researchers write.

AB-MTCs can select different models in different phases of solving a problem (source: Sakana Ai)
AB-MTCs can select different models in different phases of solving a problem source: Sakana Ai

“In addition to the individual advantages and disadvantages of each model, the tendency for hallucination can vary significantly from you,” said Akiba. “By creating an ensemble with a model that is less likely, it could be possible to achieve the best of both worlds: mighty logical skills and strong soil. Since hallucination is a main problem in a business context, this approach could be valuable for its reduction.”

From research to real applications

In order to help developers and companies in the application of this technology, Sakana Ai called the underlying algorithm as an open source framework TrequestAvailable under an Apache 2.0 license (can be used for commercial purposes). Treequest offers a flexible API with which users can implement Multi-LLM-AB-MCTs for their own tasks with custom evaluation and logic.

“While we are on specific business-oriented problems in the early phases of using AB-MCTs, our research shows considerable potential in several areas,” said Akiba.

In addition to the ARC-AGI-2 benchmark, the team AB-MCTS successfully used tasks such as complex algorithmic coding and improving the accuracy of machine learning models.

“AB-MCTS could also require problems that require iterative experiments and errors, such as the optimization of the power metrics of existing software,” said Akiba. “For example, it could be used to automatically find paths to improve the response to the response to a web service.”

The publication of a practical open source tool could pave the way for a new class of more powerful and more reliable AI applications for companies.


Leave a Reply

Your email address will not be published. Required fields are marked *

< >