Deeptech Press Release

DeepSeek Unveils AI Self-Verifying Math Reasoning Model


DeepSeekMath-V2 Achieves Gold-Medal IMO Performance With Built-In Proof Checker

Chinese AI startup DeepSeek has released DeepSeekMath-V2, an open-weights mathematical reasoning model that introduces self-verifying capabilities, generating proofs while simultaneously checking their logical soundness. The 685B parameter mixture-of-experts model, built on DeepSeek-V3.2-Exp-Base, is available under Apache 2.0 on Hugging Face and GitHub, marking a breakthrough in reliable AI theorem proving.

Verifier-First Architecture Overcomes Final-Answer Limitations

Unlike models trained solely on correct numeric outcomes, DeepSeekMath-V2 prioritises proof quality through a dual-system: a “prover” generates step-by-step arguments, while a “verifier” scores them on rigor (0, 0.5, or 1) and natural language analysis. Reinforcement learning rewards agreement between self-assessment and verifier judgment, with sequential refinement iterating fixes within a 128K token context for complex problems.

This addresses key flaws where algebraic errors cancel out to yield right answers, or proofs lack derivation rigor. Trained initially on 17,500+ olympiad-style proofs from Art of Problem Solving, the system scales verification compute to label harder cases autonomously.

Competition-Dominating Benchmarks

With scaled test-time compute, DeepSeekMath-V2 secures gold-medal scores on 2025 International Mathematical Olympiad (IMO) problems and 2024 Chinese Mathematical Olympiad (CMO), outperforming prior open models. It scores 118/120 on the 2024 Putnam Exam, surpassing the top human (90/120), and excels on IMO-ProofBench over DeepMind’s DeepThink.

These results position it alongside closed models from OpenAI and Google DeepMind, but as the first open-weight system achieving IMO gold without formal competition entry.

Implications For Research And Open AI

The model advances fields like cryptography and space exploration by tackling unsolved problems via deeper reasoning. DeepSeek highlights self-verification as a path to trustworthy mathematical AI, with China’s open models gaining 17% global download share per recent MIT-Hugging Face analysis. Future iterations could automate proof discovery at scale.

Follow Startup Story

Related Posts

© Startup Story Private Limited. All Rights Reserved.