Researchers Debut Preview of DeepCoder Reasoning Model

A new open-source code reasoning model called DeepCoder-14B-Preview has hit the market. Built atop DeepSeek-R1 and Qwen2.5 using reinforcement learning (RL), it aims to provide more flexibility by combining high-performance code generation with reasoning capabilities for real-world applications. Its performance is said to be comparable to OpenAI’s o3-mini, “but with a smaller footprint,” say its developers, the research-driven AI companies Together AI and Agentica. “We democratize the recipe for training a small model into a strong competitive coder,” explains Together AI.

“Prior work in the math domain has shown that reinforcement learning with verifiable rewards can significantly enhance a model’s reasoning capabilities,” Together AI writes in a blog post. “Unlike math — where abundant high-quality, verifiable data is readily available on the Internet — the coding domain suffers from a relative scarcity of such data.”

The model was trained on 24K verifiable coding problems over 2.5 weeks on 32 Nvidia H100s, producing results “reaching — and even surpassing — OpenAI’s o3-mini on various coding benchmarks,” claims the development team.

The training saw Together AI develop “a technique called ‘one-off pipelining’ that reportedly cuts training time in half,” writes The Decoder, explaining that “the process runs training, reward calculation, and sampling in parallel, with each training iteration requiring over 1,000 separate tests.”

VentureBeat mentions DeepCoder-14B performing “strongly across several challenging coding benchmarks,” including Codeforces, HumanEval+ and LiveCodeBench (LCB).

The most impactful aspect of DeepCoder is “achieving this level of performance with only 14 billion parameters,” making it “significantly smaller and potentially more efficient to run than many frontier models,” VentureBeat points out.

In practical terms, the achievement of DeepCoder means cutting-edge AI performance “is no longer solely the domain of hyperscalers or those willing to pay premium API fees,” VB writes, explaining that “models like DeepCoder can empower organizations of all sizes to leverage sophisticated code generation and reasoning, customize solutions to their specific needs, and securely deploy them within their environments.”

The artifacts for training and running DeepCoder-14B are available on GitHub and Hugging Face.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.