Weibo’s new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 with a post-training budget of $7,800

Another day at the end of 2025, another impressive result from a Chinese company in the field of open source artificial intelligence.

Chinese social networking company Weibo’s AI division recently released its open source VibeThinker-1.5B– a 1.5 billion parameter language model (LLM) that is a fine-tuned variant of the rival Chinese tech company Alibaba’s Qwen2.5-Math-1.5B.

It is now available for free download and use by researchers and corporate developers – including for commercial purposes – under a permissive MIT license Hugging face, GitHub And ModelScopewith a Technical report on the open access science publishing site arxiv.org.

And yet, despite its compact size, VibeThinker-1.5B achieves benchmark thinking performance on math and programming tasks, competing or outperforming models hundreds of times its size, and even outperforming Chinese competitor DeepSeek’s famous R1 that went viral earlier this year – a model with 671 billion parameters – on the formal reasoning benchmark.

It dwarfs Mistral AI’s Magistral Medium and holds its own against Anthropic’s Claude Opus 4 and OpenAI’s gpt-oss-20B Medium, while requiring only a fraction of the infrastructure and investment.

This is also achieved because after training there is a budget of just $7,800 for computing resources (3,900 GPU hours on Nvidia H800s) – far less than the tens or even hundreds of thousands of dollars typically required to fine-tune models of similar or larger scale.

However, note that this is not the total cost of model development: LLMs are trained in stages. First, pre-training occurs, where the model learns basic language structures and general knowledge by predicting the next word based on huge amounts of text from the Internet, books and articles. This gives him fluency in language, but not much sense of how to follow instructions or carry on a conversation

Next comes post-training, which uses much smaller, higher-quality data sets—typically collections of sample questions, prompts, and expert-written answers—to teach the model how to respond helpfully, reason about problems, and align with human expectations. Nevertheless, Weibo’s cost effectiveness after training with VibeThinker-1.5B is remarkable and should be praised.

The open source version upends assumptions about parameter scale, computational intensity, and the minimum viable size for high-performance LLMs.

Another training approach: spectrum-to-signal

VibeThinker-1.5B owes its performance not to scaling, but to the training framework behind it: the spectrum-to-signal principle (SSP).

Instead of optimizing a model solely for the correctness of a single answer (Pass@1), the SSP framework decouples supervised fine-tuning (SFT) and reinforcement learning (RL) into two distinct phases with different goals:

SFT (“spectrum phase”): The model is trained to maximize the variety of potentially correct answers and thus improve its Pass@K score. This creates a wide range of plausible solutions.
RL (“signal phase”): A second-stage reinforcement learning system (called MaxEnt-Guided Policy Optimization or MGPO) is used to identify and reinforce the most correct paths from this diverse solution pool. MGPO prioritizes problems where the model is most uncertain and uses entropy-based weighting to focus learning.

The authors argue that this separation allows small models to explore the reasoning space more effectively, achieving signal amplification without relying on massive parameter counts.

VibeThinker-1.5B makes a compelling case that the industry’s reliance on parameter scaling as the only path to better reasoning performance may be outdated.

By introducing a diversity-first training pipeline, WeiboAI has shown that smaller, more accessible models can rival and even outperform billion-dollar systems on logic-heavy tasks.

Low resource consumption is one of the most important aspects of VibeThinker-1.5B. At under $8,000, the post-training cost is 30 to 60 times lower than models like DeepSeek R1 and MiniMax-M1, which cost between $294,000 and $535,000 to train.

Performance across domains

Despite its small size, VibeThinker-1.5B delivers cross-domain thinking that outperforms many larger open source and commercial models:

Model	AIME25	LiveCodeBench v6	GPQA Diamond
VibeThinker-1.5B	74.4	51.1	46.7
GPT-OSS-20B media	72.1	54.9	66.0
Close work 4	69.2	56.6	79.6
MiniMax M1 (456B)	74.6	62.3	69.2
DeepSeek R1 (671B)	70.0	65.9	71.5
Kimi K2 (1.09T)	49.5	53.7	75.1

VibeThinker was compared to both inference-centric models (Magistral, Claude, OpenAI o3-mini) and non-inference LLMs (GPT-4.1, Kimi K2, DeepSeek V3). In the structured thinking benchmarks, the model consistently performed better than models without reasoning, regardless of size:

On AIME24 (Math) it beat Kimi K2 (1.09T) by over 10 points (80.3 vs. 69.6).
On LiveCodeBench v6 it outperformed Claude Opus 4 (51.1 vs. 47.4).
On the GPQA, the score was below GPT-4.1 and Claude, but still doubled over the base model (from 16.4 to 46.7).

This supports the authors’ claim that size is not the only route to reasoning ability – with the right training design, smaller models can match or even exceed the performance of much larger systems on targeted tasks.

Notably, it achieves parity with models hundreds of times larger in math and code, although it lags behind in general knowledge reasoning (GPQA), where larger models maintain an edge.

This suggests a possible trade-off in specialization: while VibeThinker excels at structured logic tasks, it has less capacity for long-range encyclopedic retrieval, a known limitation of smaller architectures.

Business Introduction Guide

The release includes recommended inference settings (temperature = 0.6, top_p = 0.95, maximum tokens = 40960).

The model is small enough to be deployed on edge devices, including mobile phones and vehicle embedded systems, while inference costs are estimated to be 20-70 times cheaper than large models.

This positions VibeThinker-1.5B not just as a research achievement, but as a potential basis for cost-effective, locally deployable reasoning systems.

Weibo’s strategy and market position

Launched in 2009 by Sina Corporation, Weibo remains a cornerstone of China’s social media ecosystem. The platform is often referred to as China’s version of

Despite the number of 600 million monthly active users (more than twice as many as X), Investors are not optimistic about the growth potential of advertising revenue in the near future, and Weibo faces increasing competition from video-first platforms like Douyin, which are attracting younger users and spending more time elsewhere.

In response, Weibo has focused on creator economy monetization, live streaming and vertical video, and added tools for influencer engagement, e-commerce integration and deeper analytics for brands.

The platform’s role as a digital public square also makes it a focus of regulatory scrutiny. Chinese authorities continue to apply pressure on issues ranging from content management to data security. In September 2025, Weibo was among the platforms named in official warningsindicating that it remains exposed to political risks.

Weibo’s push into AI research and development – exemplified by the release of VibeThinker-1.5B – signals a change in ambition. In addition to being a media platform, Weibo is positioning itself as a player in the next phase of China’s AI development, leveraging its capital reserves, user behavior data and internal research capabilities to pursue related technical areas.

What it means for technical decision makers in companies

For technical leaders and enterprise AI teams, the release of VibeThinker has practical implications for everything from orchestration pipelines to cost modeling.

A 1.5B parameter model that outperforms 100 times larger models in math and programming tasks not only saves computing power but also shifts the architectural balance. It enables LLM inference on constrained infrastructure, reducing latency at the edge and lowering the barrier to entry for applications that would otherwise have required API access to closed-loop models at frontier scale.

This is important for enterprise ML leaders looking to deploy reasoning agents in existing systems, or for platform owners tasked with integrating LLMs into automated workflows.

It also appeals to those running reinforcement learning from human feedback (RLHF) pipelines or managing inference optimization in hybrid cloud environments.

The model’s post-training methodology – particularly its entropy-focused reinforcement learning approach – provides a roadmap for teams that want to refine smaller control points rather than relying on large-scale pre-training.

VibeThinker’s benchmark transparency and data decontamination steps also address another emerging priority in enterprise AI: auditability. While its performance on general knowledge tests still lags behind large boundary models, its task-specific reliability makes it an attractive candidate for controlled environments where correctness is more important than coverage.

In short, VibeThinker-1.5B is not just a research milestone – it is a strong candidate for practical use, deployment and learning in enterprises. This suggests that a new class of compact, reasoning-optimized models is suitable for enterprise use cases previously reserved for much larger systems. For companies looking to balance cost, latency, interpretability and control, this is a good new option in the long, growing list of Chinese open source offerings.

Another training approach: spectrum-to-signal

Performance across domains

Business Introduction Guide

Weibo’s strategy and market position

What it means for technical decision makers in companies

Leave a ReplyCancel Reply