Very proud of us @xai after seeing the GPT5 release. With a much smaller team, we are ahead in many. Grok4 worldβs first unified model, and crushing GPT5 in benchmarks like ARC-AGI.
@OpenAI is a very respectful competitor and still the leader in many, but weβre fast and
Kudos to our crew π "we've got a truly marvelous group of people, which this margin is too narrow to tag them all"
fun side note: happy to witness two launches in a weekπ photoed in @SpaceX launch site, Starbase.
timelapse #83 (22 hrs):
- it was very easy to dive super deep into anything i needed to (this is what i focused on today because not all days are like this)
- finding the grok code fast 1 + grok 4 for deep thinking and verification combo to be super useful in cursor. speed was
Three components of Reasoning for AI:
1. Foundation (Pre-training)
2. Self-improvement (RL)
3. Test-time compute (planning).
@xai will soon have the best foundation in the world - Grok3. Join us to advance reasoning to the next-level! π₯π₯
Retire BrowseComp and use FinSearchComp!
- Align with real financial experts' everyday work, where score means value!
- Grok 4 is unbelievable π₯even approaching human experts! GPT5 is also super cool, slightly behind Grok 4.
- Search engine/capability is the new infra to AGI,
Coming to #NeurIPS23 now. Will be there until Friday night.
DM me to chat about: reasoning, AI for math, and what weβre doing @xai.
Also will be at #MATHAI workshop panel discussion on Friday morning. See you there!
Euclidean geometry problems have been my favorite math puzzles since middle school. The most intriguing part of it is the creation of auxiliary lines, which opens a space for imagination and the freedom to explore various diagrams. Once a proof is found, these auxiliary lines
Boris, check out our mini model numbers, it surpassed o3mini high in all AIME 2024, GPQA, and LCB for pass@1.
Generally I also donβt think our current benchmarks capture enough of the model intelligence. Our big Grok3 is worse on pass@1, but in our testing we can feel a smarter
Grok 4 is now free for all users worldwide!
Simply use Auto mode, and Grok will route complex queries to Grok 4. Prefer control? Choose "Expert" anytime to always use Grok 4.
For a limited time, we are rolling out generous usage limits so you can explore Grok 4βs full
Updates to ChatGPT:
You can now choose between βAutoβ, βFastβ, and βThinkingβ for GPT-5. Most users will want Auto, but the additional control will be useful for some people.
Rate limits are now 3,000 messages/week with GPT-5 Thinking, and then extra capacity on GPT-5 Thinking
It seems we're doing very well on code gen - #1 on your benchmark. But much worse on code completion. Since the chat interface is not suitable for such tasks, we did not target at this for the chat model. But we will release the API model that handles that well. Stay tuned!