VOOZH about

URL: https://x.com/_philschmid/status/1842846050320544016

โ‡ฑ Philipp Schmid on X: "Can @AnthropicAI Claude 3.5 sonnet outperform @OpenAI o1 in reasoning? Combining Dynamic Chain of Thoughts, reflection, and verbal reinforcement, existing LLMs like Claude 3.5 Sonnet can be prompted to increase test-time compute and match reasoning strong models like OpenAI o1. https://t.co/vzdgszizx1" / X


Post

Post

Can @AnthropicAI Claude 3.5 sonnet outperform @OpenAI o1 in reasoning? Combining Dynamic Chain of Thoughts, reflection, and verbal reinforcement, existing LLMs like Claude 3.5 Sonnet can be prompted to increase test-time compute and match reasoning strong models like OpenAI o1. namic Chain of thoughts + reflection + verbal reinforcement prompting ๐Ÿ“Š Benchmarked against tough academic tests (JEE Advanced, UPSC, IMO, Putnam) ๐Ÿ† Claude 3.5 Sonnet outperformes GPT-4 and matched O1 models ๐Ÿ” LLMs can create internal simulations and take 50+ reasoning steps for complex problems ๐Ÿ“š Works for smaller, open models like Llama 3.1 8B +10% (Llama 3.1 8B 33/48 vs GPT-4o 36/48) โŒ Didnโ€™t benchmark like MMLU, MMLU pro, or GPQA due to computing and budget constraints ๐Ÿ“ˆ High token usage - Claude Sonnet 3.5 used around 1 million tokens for just 7 questions
Don't miss what's happening
People on X are the first to know.