Intelligence Artificielle 13/02/2026 12 min read

MiniMax M2.5: the Chinese AI model that rivals Claude and GPT-5

A full analysis of MiniMax M2.5, the Chinese open-weight AI model. MoE architecture, SWE-Bench benchmarks, pricing 20x cheaper than Claude Opus 4.6, and use cases for agents and office productivity.

On February 11, 2026, Chinese startup MiniMax released M2.5, an open-weight language model that shakes up the ranking of the world's top-performing AI models. With a score of 80.2% on SWE-Bench Verified (the reference benchmark for software bug fixing), M2.5 lands just 0.6 points behind Claude Opus 4.6's Claude Opus 4.6 and ahead of OpenAI's GPT-5.2 — all at a cost roughly twenty times lower.

This release is part of an unprecedented wave of launches on the Chinese side: in the span of a single week, Zhipu AI (GLM-5), Moonshot AI (Kimi K2.5) and MiniMax all shipped frontier-class models. The competition between American and Chinese AI labs has never been so intense. Let's break down what M2.5 brings to the table and why this model deserves your attention.

MiniMax: from quiet startup to publicly listed giant

Before diving into the technical details of M2.5, it helps to understand where MiniMax comes from. Founded in late 2021 in Shanghai by Yan Junjie and Zhou Yucong, two former SenseTime engineers (the Chinese computer vision giant), the company is one of the six Chinese "AI tigers" identified by investors.

Yan Junjie, born in 1989 in Henan province, earned his PhD at the Institute of Automation of the Chinese Academy of Sciences before becoming SenseTime's youngest vice-president. Building on that experience, he co-founded MiniMax with the ambition of building a complete ecosystem of AI products.

In January 2026, MiniMax completed its IPO on the Hong Kong stock exchange (ticker 00100.HK), raising roughly 620 million dollars. The share price doubled on the first day of trading, pushing the market capitalization beyond 13 billion dollars. The company's investors include Alibaba, Tencent, MiHoYo (the publisher of Genshin Impact) and Hillhouse Capital.

Beyond language models, MiniMax develops an ecosystem of consumer products:

Hailuo AI: an AI video generator comparable to OpenAI's Sora, whose direct competitor I analyzed in my article on ByteDance's Seedance 2.0
Talkie: a conversational chatbot that reached 11 million monthly active users, more than half of them in the United States
MiniMax Audio: speech synthesis and music generation models

The company claims a presence in more than 200 countries, with over 70% of its revenue generated internationally — a figure that contrasts with the classic perception of Chinese tech companies focused on their domestic market.

Technical architecture: MoE and Lightning Attention

M2.5 is built on a Mixture of Experts (MoE) architecture of 230 billion total parameters, but with only 10 billion active parameters per token. That characteristic is what explains the model's exceptional performance-to-cost ratio.

The Mixture of Experts principle

In a classic dense model (like the early GPT-4), every input token passes through the entire network of parameters. With a MoE architecture, the model contains several specialized sub-networks (the "experts") and a routing mechanism that dynamically selects the relevant experts for each token. The result: you get the reasoning capacity of a massive model while only mobilizing a fraction of the compute on each inference.

MiniMax uses the CISPO algorithm to guarantee training stability for large-scale MoE, a notoriously difficult problem that held back adoption of this architecture for years. DeepSeek, another Chinese lab I've covered in detail, also uses a MoE architecture for its most recent models.

Lightning Attention: the secret behind the context window

The key technical innovation inherited from MiniMax's previous models is the Lightning Attention mechanism. It is an optimized implementation of linear attention that drastically reduces computational complexity compared with the classic softmax attention of Transformers.

In practice, the architecture alternates 7 layers of linear (Lightning) attention for every 1 layer of classic softmax attention. This hybrid approach preserves the quality of classic attention for the most critical dependencies while delivering the efficiency of linear attention for the rest of the processing.

The result is a context window of 205,000 tokens for M2.5, with the technical ability to extend well beyond that thanks to sequence parallelism. For reference, the previous MiniMax-M1 model natively supported a 1 million token context window using this same technology.

Lightning Attention in a nutshell: The classic attention of Transformers has quadratic complexity O(n²) relative to sequence length, which makes very long contexts extremely expensive. Linear attention reduces this complexity to O(n), allowing far longer sequences to be processed without an explosion in cost. The trade-off is a slight loss of precision on some long-distance dependencies, offset by the interleaved softmax layers.

Two variants: Standard and Lightning

M2.5 comes in two versions that share the same weights and capabilities but differ in throughput:

M2.5 Standard: 50 tokens per second, optimized for cost
M2.5 Lightning: 100 tokens per second, optimized for speed

Both versions are open-weight under the MIT license, which means they can be used without restriction for commercial purposes. The weights are available on Hugging Face and the model is already integrated into Ollama for local execution.

Benchmarks: where M2.5 stands against the competition

Benchmarks should always be taken with caution (labs naturally optimize for the most publicized tests), but M2.5's results are consistent across several independent evaluations:

Coding and software engineering

SWE-Bench Verified: 80.2% (Claude Opus 4.6: 80.8% — GPT-5.2: 80.0%)
Multi-SWE-Bench: 51.3% (Claude Opus 4.6: 50.3%)
SWE-Bench Pro: 55.4%

On Multi-SWE-Bench, which evaluates bug fixing across complex multi-file codebases, M2.5 takes the lead over Claude Opus 4.6. That's a significant result, because this benchmark reflects the reality of professional software development better than SWE-Bench Verified does.

Agentic capabilities and web navigation

BrowseComp: 76.3% (web search and context synthesis)
BFCL Multi-Turn: 76.8% (multi-turn function calling)
MEWC: 74.4% (multi-expert workflow coordination)

These scores position M2.5 as a model particularly well suited to agentic workflows, where the model has to chain actions together, call tools and navigate complex environments.

Execution speed

On SWE-Bench Verified, M2.5 completes the evaluation 37% faster than its predecessor M2.1, with an average of 22.8 minutes per task — almost identical to Claude Opus 4.6's 22.9 minutes. The difference plays out on cost per task: roughly $0.15 for M2.5 versus about $3 for Claude Opus 4.6.

A model designed for office productivity

One of M2.5's distinctive traits compared with its competitors is its explicit specialization in office tasks. MiniMax trained the model to master the manipulation of Office documents:

Word: creating, editing, formatting and restructuring complex documents with table and style management
Excel: building formulas, analyzing data, generating pivot tables, financial modeling
PowerPoint: building presentations from specifications, adding charts and layouts

In MiniMax's internal evaluations on advanced office tasks, M2.5 achieved a 59% win rate in direct comparison with competing models. The MiniMax agent's "MAX" mode automatically loads the Office skills suited to the type of file being processed.

This office orientation is strategic: it's a massive enterprise use case that most competing models don't directly address. If you already use tools like Claude Code for development, imagine an equivalent for Office productivity at a fraction of the cost.

Pricing: the value for money that changes the game

This is probably the most disruptive aspect of M2.5. Here are the API rates:

M2.5 Standard: $0.15/million input tokens — $1.20/million output tokens
M2.5 Lightning: $0.30/million input tokens — $2.40/million output tokens

To put these figures in perspective: running M2.5 Lightning continuously for one hour at 100 tokens per second costs about 1 dollar. The Standard version at 50 tokens per second works out to $0.30 an hour. MiniMax calculated that you could run four instances of M2.5 continuously for a year for 10,000 dollars.

By comparison, American frontier models like Claude Opus 4.6 charge between 15 and 75 dollars per million output tokens. The ratio is on the order of 1 to 20 in M2.5's favor. This is exactly the kind of cost disruption that propelled DeepSeek into the spotlight in early 2025.

Local hosting with Ollama

Since M2.5 is open-weight under the MIT license, it's possible to run it locally via Ollama. With only 10 billion active parameters per token, quantized versions of the model are accessible on reasonable hardware. The GGUF-format weights are available on Hugging Face.

# Install via Ollama (quantized versions)
ollama run minimax-m2.5

# Or directly from Hugging Face
huggingface-cli download MiniMaxAI/MiniMax-M2.5 \
  --local-dir MiniMax-M2.5

For companies that don't want to send their data to an external API, this local hosting option is a major competitive advantage over closed proprietary models.

Geopolitical context: the China vs United States AI race

The release of M2.5 comes against a tense geopolitical backdrop around artificial intelligence. American restrictions on the export of GPU chips to China (notably the NVIDIA H100) were supposed to slow down the progress of Chinese labs. The opposite effect is plainly happening: the constraints push Chinese researchers toward more efficient architectures.

The fact that MiniMax trained its previous M1 model for roughly 535,000 dollars (the reinforcement learning phase requiring only 512 H800 GPUs for three weeks) illustrates this trend of doing more with less. The MoE architecture with only 10 billion active parameters out of 230 billion is a perfect example of optimization under constraint.

As of February 2026, China now fields several frontier-class models:

DeepSeek: open-source reasoning models (R1, V3)
MiniMax: M2.5 for coding and agents
Zhipu AI: GLM-5 for multimodal tasks
Moonshot AI: Kimi K2.5
Alibaba: Qwen 3 (235 billion parameters)

This proliferation of competitive, often open-weight models is accelerating the democratization of AI and putting considerable pressure on API prices. What vibe coding began to transform in development practices, affordable Chinese models could amplify massively.

Concrete use cases: who is M2.5 for?

Developers and engineering teams

With its SWE-Bench scores on par with Claude Opus 4.6, M2.5 is a serious candidate for automated development pipelines: code review, bug fixing, test generation. Its value for money makes it a particularly relevant choice for teams running development agents continuously.

Office automation in the enterprise

M2.5's Office specialization opens up use cases for teams that handle large volumes of documents: report generation, data transformation between formats, building presentations from raw data.

Autonomous agents and complex workflows

The high scores in BFCL (function calling) and BrowseComp (web navigation) make M2.5 an excellent engine for AI agents that need to interact with external tools. The article I devoted to OpenClaw shows how these autonomous agents are starting to transform professional workflows.

Startups and SMEs on a limited budget

For organizations that can't afford the rates of American frontier models, M2.5 offers a credible alternative. The MIT license permits all commercial uses without restriction, and local hosting eliminates dependency on a third-party API.

Limitations and points to watch

M2.5 isn't without weaknesses, and it would be dishonest not to mention them:

General reasoning: on pure reasoning benchmarks (GPQA, ARC-AGI), M2.5 still lags behind Claude Opus 4.6 and specialized reasoning models like DeepSeek R1
Training data: as with all Chinese models, transparency about training data remains limited
Ecosystem: integration into Western development tools (IDEs, CI/CD) is less mature than for OpenAI or Anthropic models
French language: performance in French is not specifically documented, even though the model is multilingual
Local hosting: despite the 10 billion active parameters, the full 230 billion model requires substantial hardware for performant local hosting

FAQ

What exactly is MiniMax M2.5?

MiniMax M2.5 is an open-weight language model (LLM) developed by the Chinese startup MiniMax, based in Shanghai. It uses a Mixture of Experts (MoE) architecture of 230 billion parameters with 10 billion active parameters per token. It is released under the MIT license and available in two variants: Standard (50 tokens/s) and Lightning (100 tokens/s).

How does M2.5 compare to Claude Opus 4.6 and GPT-5?

On SWE-Bench Verified (software bug fixing), M2.5 scores 80.2%, very close to Claude Opus 4.6 (80.8%) and slightly ahead of GPT-5.2 (80.0%). On Multi-SWE-Bench (multi-file tasks), M2.5 leads with 51.3% versus 50.3% for Claude. The major difference is price: M2.5 costs roughly 20 times less than Claude Opus 4.6 for comparable coding performance.

Can M2.5 be used for free or locally?

Yes. M2.5 is released under the MIT license, which permits all uses, including commercial ones. The model weights are available on Hugging Face and the model is integrated into Ollama for local execution. Quantized versions in GGUF format exist to reduce memory requirements. The MiniMax API is also available at very low rates.

What is M2.5's context window?

M2.5 supports a context window of 205,000 tokens. That's enough for the vast majority of professional use cases. For larger context needs, the MiniMax-M1 model (the reasoning-oriented predecessor) natively supports 1 million tokens thanks to the Lightning Attention mechanism.

Is MiniMax M2.5 reliable for production use?

MiniMax is a company listed on the Hong Kong Stock Exchange with a capitalization of more than 13 billion dollars and top-tier investors (Alibaba, Tencent). The model is already used in production in more than 200 countries. That said, as with any AI model, thorough testing on your specific use case is essential before any critical deployment.

What is the Mixture of Experts architecture and why does it matter?

Mixture of Experts (MoE) is a neural network architecture in which the model contains several specialized sub-networks. For each input, only a subset of experts is activated. M2.5 activates only 10 billion parameters out of 230 billion, which considerably reduces inference cost while retaining the reasoning capacity of a massive model. It's the same approach used by DeepSeek and Qwen.

Conclusion: a strong signal for the industry

MiniMax M2.5 isn't just one more model in the list of Chinese LLMs. It's a signal that performance parity between American and Chinese models has now been reached on the most demanding tasks, such as software development and agentic workflows. And this parity comes with a cost advantage on the order of 10x to 20x.

For developers and businesses, the message is clear: the LLM market is commoditizing fast. High-performing open-weight models are multiplying, prices are falling, and value is shifting toward integration, orchestration and proprietary data. Whether you choose M2.5, Claude or GPT-5, the real competition now plays out on what you build on top of the model, not on the model itself.

MiniMax M2.5 is available right now via the MiniMax API, on Hugging Face and through Ollama. Pricing starts at $0.15 per million input tokens. It's hard to get more accessible for a model of this class.

Did you enjoy this article?

Comments

Morgann Riu

Cybersecurity and Linux administration expert. I help companies secure and optimize their critical infrastructures.

Contact me

minimax ai llm open-source china moe ai-agents coding benchmarks

Back to the blog