Voozh

Can @AnthropicAI Claude 3.5 sonnet outperform @OpenAI o1 in reasoning? Combining Dynamic Chain of Thoughts, reflection, and verbal reinforcement, existing LLMs like Claude 3.5 Sonnet can be prompted to increase test-time compute and match reasoning strong models like OpenAI o1. namic Chain of thoughts + reflection + verbal reinforcement prompting 📊 Benchmarked against tough academic tests (JEE Advanced, UPSC, IMO, Putnam) 🏆 Claude 3.5 Sonnet outperformes GPT-4 and matched O1 models 🔍 LLMs can create internal simulations and take 50+ reasoning steps for complex problems 📚 Works for smaller, open models like Llama 3.1 8B +10% (Llama 3.1 8B 33/48 vs GPT-4o 36/48) ❌ Didn’t benchmark like MMLU, MMLU pro, or GPQA due to computing and budget constraints 📈 High token usage - Claude Sonnet 3.5 used around 1 million tokens for just 7 questions

👁 Image

8:35 AM · Oct 6, 20241.5MViews

URL: https://x.com/_philschmid/status/1842846050320544016

Post

Post