Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124


Just hours after OpenAI updated its flagship foundation model GPT-5 to GPT-5.1which promises lower overall token usage and a more pleasant personality with more preset options, according to the Chinese search giant Baidu introduced its next-generation foundation model, ERNIE 5.0. alongside a range of AI product upgrades and strategic international expansions.
The goal: to position itself as a global competitor in the increasingly competitive market for enterprise AI.
Announced at the company’s Baidu World 2025 event, ERNIE 5.0 is a proprietary, natively omnimodal model designed to collaboratively process and generate content from text, images, audio and video.
Unlike Baidu’s recently released ERNIE-4.5-VL-28B-A3B-Thinkingwhich is open source under a business-friendly and permissive Apache 2.0 license, ERNIE 5.0 is a proprietary model and only available through Baidu’s ERNIE Bot Website (I had to manually select it from the Model Selection dropdown) and the Qianfan cloud platform application programming interface (API) for enterprise customers.
In parallel with the model launch, Baidu introduced major updates to its digital human platform, no-code tools and general-purpose AI agents – all with the aim of expanding its AI presence beyond China.
The company also introduced ERNIE 5.0 Preview 1022, a variant optimized for text-intensive tasks, alongside the general preview model that balances all modalities.
Baidu emphasized that ERNIE 5.0 represents a shift in the way intelligence is deployed at scale, with CEO Robin Li stating, “When you internalize AI, it becomes a native capability, transforming intelligence from a source of cost to a source of productivity.”
The ERNIE 5.0 benchmark results indicate that Baidu has achieved parity – or near parity – with leading Western foundation models across a broad range of tasks.
In public benchmark slides shared during the Baidu World 2025 event, ERNIE 5.0 Preview outperformed or matched OpenAI’s GPT-5 high and Google’s Gemini 2.5 Pro multimodal thinking, document understanding and image-based quality assurancewhile also Demonstrate strong language modeling and code execution skills.
The company emphasized its ability to process common inputs and outputs across different modalities rather than relying on post-hoc modality fusion, which it described as a technical differentiator.
In visual tasks, ERNIE 5.0 achieved top scores on OCRBench, DocVQA and ChartQA, three benchmarks that test document recognition, understanding and reasoning in structured data.
Baidu claims the model beat both GPT-5-High and Gemini 2.5 Pro on these document- and chart-based benchmarks, areas it describes as core areas for enterprise applications such as automated document processing and financial analysis.
In image generation, ERNIE 5.0 met or exceeded Google’s Veo3 in all categories, including semantic alignment and image quality, according to Baidu’s internal GenEval-based evaluation. Baidu claimed that the model’s multimodal integration allows it to generate and interpret visual content with greater contextual awareness than models based on modality-specific encoders.
In audio and speech tasks, ERNIE 5.0 demonstrated competitive results in MM-AU and TUT2017 audio comprehension benchmarks, as well as in answering questions using spoken language input. While audio performance is not as emphasized as image or text playback, it does suggest a broad capability intended to support full-spectrum, multimodal applications.
In language tasks, the model demonstrated strong results in following instructions, answering factual questions, and mathematical reasoning – core areas that define the business value of large language models.
The preview 1022 variant of ERNIE 5.0, tailored for text performance, showed even stronger language-specific results in early developer access. While Baidu doesn’t claim superiority in general language comprehension, its internal reviews suggest that ERNIE 5.0 Preview 1022 closes the gap with top-tier English-language models and outperforms them in Chinese language performance.
Although Baidu did not publicly release full benchmark details or raw results, its performance positioning suggests a conscious attempt to define ERNIE 5.0 not as a niche multimodal system but as a flagship model that competes with the largest closed models in the general reasoning space.
Baidu claims a clear lead in structured document understanding, visual diagram reasoning, and integration of multiple modalities in a single, native modeling architecture. Independent verification of these results is still pending, but the breadth of claimed capabilities makes ERNIE 5.0 a serious alternative in the multimodal foundation modeling landscape.
ERNIE 5.0 is positioned at the Premium ending Baidu’s model pricing structure. The company has published specific pricing for API usage on its Qianfan platform, bringing costs in line with other top offerings from Chinese competitors such as Alibaba.
|
Model |
Input cost (per 1,000 tokens) |
Issuance costs (per 1,000 tokens) |
source |
|
ERNIE 5.0 |
$0.00085 (¥0.006) |
$0.0034 (¥0.024) |
|
|
ERNIE 4.5 Turbo (ex.) |
$0.00011 (¥0.0008) |
$0.00045 (¥0.0032) |
|
|
Qwen3 (Coder ex.) |
$0.00085 (¥0.006) |
$0.0034 (¥0.024) |
The cost difference between ERNIE 5.0 and previous models such as ERNIE 4.5 Turbo highlights Baidu’s strategy to differentiate between large-volume, low-cost models and high-performance models designed for complex tasks and multimodal thinking.
Compared to other US alternatives, it remains mid-range in terms of price:
|
Model |
Input (/1 million tokens) |
Issue (/1 million tokens) |
source |
|
GPT 5.1 |
$1.25 |
$10.00 |
|
|
ERNIE 5.0 |
$0.85 |
$3.40 |
|
|
ERNIE 4.5 Turbo (ex.) |
$0.11 |
$0.45 |
|
|
Complete work 4.1 |
$15.00 |
$75.00 |
|
|
Gemini 2.5 Pro |
$1.25 (≤200,000) / $2.50 (>200,000) |
$10.00 (≤200,000) / $15.00 (>200,000) |
|
|
Grok 4 (grok-4-0709) |
$3.00 |
$15.00 |
Parallel to the model release, Baidu is expanding internationally:
GenFlow 3.0Now with more than 20 million users, it is the company’s largest general-purpose AI agent and offers enhanced storage and multimodal task processing.
Famousa self-evolving agent capable of dynamically solving complex problems is now commercially available via invitation.
fearthe international version of Baidu’s no-code builder Miaoda, is available worldwide via medo.dev.
Oreatea productivity workspace with document, slide, image, video and podcast support, has reached over 1.2 million users worldwide.
Baidu’s digital human platform, which has already been launched in Brazil, is also part of the global push. According to company data, 83% of livestreamers during this year’s Double 11 shopping event in China used Baidu’s digital human technology, contributing to a 91% increase in GMV.
Baidu’s Apollo Go autonomous ride service has now surpassed 17 million rides, operating driverless fleets in 22 cities and securing the title of the world’s largest robotaxi network.
Two days before the flagship ERNIE 5.0 event, Baidu also released a multimodal open source model under the Apache 2.0 license: ERNIE-4.5-VL-28B-A3B-Thinking.
As reported by my colleague Michael Nuñez at VentureBeatThe model activates only 3 billion parameters and retains a total of 28 billion, using a Mixture of Experts (MoE) architecture for efficient inference.
The most important technical innovations include:
“Thinking with images,” which enables dynamic, zoom-based visual analysis
Support for diagram interpretation, document understanding, visual grounding and temporal perception in video
Running time on a single 80GB GPU, making it accessible to medium-sized businesses
Full compatibility with Transformers, vLLM, and Baidu’s FastDeploy toolkits
This release increases pressure on closed-source competitors. With Apache 2.0 licensing, ERNIE-4.5-VL-28B-A3B-Thinking becomes a viable base model for commercial applications without license restrictions – something that few high-performance models in this class offer.
Following the launch of ERNIE 5.0, developer and AI evaluator Lisan al Gaib (@scaling01) posted a mixed review on X. Although they were initially impressed with the model’s benchmark performance, they reported a persistent issue where ERNIE 5.0 repeatedly invoked tools during SVG generation tasks – even when not specifically instructed to do so.
“The ERNIE 5.0 benchmarks looked crazy until I tested them…unfortunately RL is brain damaged or they have a serious problem with their chat platform/system prompt,” Lisan wrote.
Baidu’s developer-focused support account is available within a few hours. @ErnieforDevs, replied:
“Thanks for the feedback! It’s a known bug – certain syntax can repeatedly trigger it. We’re working on a fix. You can try rewording or changing the prompt to avoid it for now.”
The quick turnaround reflects Baidu’s increasing focus on developer communications, particularly as the company courts international users through both proprietary and open source offerings.
Baidu’s ERNIE 5.0 marks a strategic escalation in the global race for foundation models. With performance claims that put it on par with the most advanced systems from OpenAI and Google, and a mix of premium pricing and open access alternatives, Baidu signals its ambition to become not just a domestic AI leader, but a credible global infrastructure provider.
At a time when enterprise AI users are increasingly demanding multimodal performance, flexible licensing, and deployment efficiency, Baidu’s two-pronged approach – hosted premium APIs and open source releases – can broaden its appeal to both enterprise and developer communities.
It remains to be seen whether the company’s performance claims will stand up to third-party tests. But in a landscape characterized by rising costs, model complexity and computational bottlenecks, ERNIE 5.0 and the supporting ecosystem give Baidu a competitive position in the next wave of AI deployment.