LongCat-2.0: A 1.6T Coding Model Trained on Chinese Chips
Meituan open-sourced LongCat-2.0 on June 30 — a 1.6-trillion-parameter mixture-of-experts coding model trained and served end to end on more than 50,000 Chinese-made chips, with no Nvidia hardware in the loop. After two months quietly topping OpenRouter under the alias “Owl Alpha,” the MIT-licensed model claims 59.5% on SWE-bench Pro at a fraction of GPT-5.5’s price.
Chinese food-delivery and local-services giant Meituan open-sourced LongCat-2.0 on June 30, a 1.6-trillion-parameter mixture-of-experts (MoE) model built for agentic software engineering — and it arrives with an unusual backstory. For roughly two months it had been quietly running near the top of OpenRouter under the codename “Owl Alpha,” reportedly serving on the order of 10 trillion tokens a month and growing more than 240% month over month, before Meituan revealed the model behind the alias. It is not an outlier: Chinese models already account for a majority of OpenRouter traffic.
The headline number is the size, but the real story is the silicon. Meituan says LongCat-2.0 was trained and served end to end on a cluster of more than 50,000 domestic Chinese accelerators — no Nvidia H100s, no AMD MI300X — making it, by the company’s account, the first trillion-parameter model both trained and deployed entirely on Chinese-made chips. That claim matters because Washington’s export controls have choked off China’s access to Nvidia’s most powerful GPUs; a homegrown, trillion-parameter run is a concrete proof point that Chinese labs can reach frontier scale without them, exactly the outcome Beijing’s national AI-grid plan is designed to force.
Under the hood, LongCat-2.0 is a sparse MoE that fires only about 48 billion of its 1.6 trillion parameters per token — dynamically ranging from roughly 33 to 56 billion — keeping inference cost closer to a mid-size model than the headline count implies. It ships with a 1-million-token context window and a pair of efficiency tricks Meituan details in its technical report: “LongCat Sparse Attention,” which tames the quadratic cost of long context, and “zero-compute experts,” which route trivial tokens through cheap subnetworks while reserving heavier expert capacity for the hard ones. The whole design is tuned for tool use, self-correction and multi-step execution rather than open-ended chat.
On Meituan’s own numbers, LongCat-2.0 scores 59.5% on SWE-bench Pro — a hair above OpenAI’s GPT-5.5 at 58.6%, a gap well inside the margin of error — alongside 70.8% on Terminal-Bench 2.1, 77.3% on SWE-bench Multilingual and 79.9% on the BrowseComp web-research test. Those figures are self-reported and not yet independently verified; the stealth OpenRouter run is arguably the more convincing evidence that developers already find it useful. Either way, it lands in an increasingly crowded field of open Chinese coding models, following Zhipu’s GLM-5.2 and Moonshot’s Kimi K2.7-Code in June — all of them chasing the coding crown that Anthropic’s Claude Fable 5 and GPT-5.6 currently hold.
Pricing is the sharpest weapon. Meituan lists promotional rates starting at $0.30 per million input tokens and $1.20 per million output tokens — roughly $0.75 and $2.95 at standard rates — against GPT-5.5’s $5 and $30. The “open” label still carries an asterisk: the model is MIT-licensed and Meituan has stood up Hugging Face and GitHub pages, but at launch the downloadable weights were marked “coming soon,” with access flowing through OpenRouter and Meituan’s own LongCat platform. If and when the full weights land, LongCat-2.0 becomes something Western teams can run and fine-tune themselves — the piece that would make it more than a cheap API endpoint.
Want AI news before everyone else?
The morning's most important AI stories, straight to your inbox. No fluff.