Pinned
Last week we released s1 - our simple recipe for sample-efficient reasoning & test-time scaling.
Weโre releasing ๐ฌ๐.๐ trained on the ๐ฌ๐๐ฆ๐ ๐๐ ๐ช๐ฎ๐๐ฌ๐ญ๐ข๐จ๐ง๐ฌ but performing much better by using r1 instead of Gemini traces. 60% on AIME25 I.
Details in ๐งต1/9
DeepSeek r1 is exciting but misses OpenAIโs test-time scaling plot and needs lots of data.
We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention.
๐arxiv.org/abs/2501.19393
