Voozh

Xingyao Wang

495 posts

Xingyao Wang

@xingyaow_

Co-founder @OpenHandsDev | PhD candidate @IllinoisCDS | BS @UMichCSE ('22) | Ex Intern @GoogleAI @Microsoft | Opinions are my own

Joined April 2019

👁 user avatar
Xingyao Wang
@xingyaow_
Feb 5, 2024
Large Language Model (LLM) agents promise to free us from mundane tasks, but how should they best interact with our world? Introducing CodeAct, an agent {framework, instruction-tuning dataset, model}, employs executable Python code to unify the actions of LLM agents. 🧵1/
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Jan 5, 2025
I often get asked this question: Why is o1 not so good on OpenHands, but their official report shows a decent SWE-bench number? 🤔 🧵
👁 user avatar
Alejandro Cuadron
@Alex_Cuadron
Jan 5, 2025
Surprising find: OpenAI's O1 - reasoning-high only hit 30% on SWE-Bench Verified - far below their 48.9% claim. Even more interesting: Claude achieves 53% in the same framework. Something's off with O1's "enhanced reasoning"... 🧵1/8
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Dec 26, 2024
People have been asking how well Deepseek v3 performs when using native function calling Answer: performance dropped to 8.33% on SWE-Bench Lite from 23% Notably, the percentages of empty patches & stuck-in-loop increase a lot (often happens with OSS models!) Examples in 🧵
👁 Image
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Dec 26, 2024
DeepSeek v3 seems exceptionally capable with its $0.14/$0.28 per 1M tokens pricing 🤑 as an OpenHands agent
👁 user avatar
Xingyao Wang
@xingyaow_
May 7, 2024
Introducing OpenDevin CodeAct 1.0 - a new State-of-the-art open coding agent! It achieves a 21% unassisted resolve rate on SWE-Bench Lite, a 17% relative improvement above the previous SOTA by SWE-Agent. Check out our blog or the thread 🧵for more details: xwang.dev/blog/2024/open…
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Dec 26, 2024
DeepSeek v3 seems exceptionally capable with its $0.14/$0.28 per 1M tokens pricing 🤑 as an OpenHands agent
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Feb 2, 2025
o3-mini on SWE-Bench Verified using OpenHands: 43.7% and costs $314 (we ran four runs and took the average, following the official system card) TLDR: It is slightly cheaper than Sonnet and performs slightly worse. Why can't we get the official 61% number? (speculations in 🧵)
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Jul 25, 2024
Software is a powerful tool, enabling human developers to interact with the world in complex & profound ways. What if we could use software as a tool to create similar versatile AI agents? Meet OpenDevin: an open platform for AI software developers as generalist agents. 🧵 1/
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
May 18, 2023
Can pretrained language models (LMs) go beyond learning from labels and scalar rewards? Introducing LeTI, a new LM finetuning paradigm that explores LMs' potential to learn from textual interactions & feedback, allowing LMs to understand not just if they were wrong, but why. 🧵1/
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Mar 25, 2025
Deepseek V3 0324 got 38.8% SWE-Bench Verified w/ OpenHands Best in open-source model so far 👀
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Jan 5, 2025
Replying to @xingyaow_
I have a theory: the amount of information provided in the context differs significantly for these two types of agent scaffolds. And this causes the reasoning model like o1 to perform differently. Reasoning models are trained to THINK hard, e.g., by solving extremely
👁 user avatar
Xingyao Wang
@xingyaow_
Sep 21, 2023
We often interact with Large Language Models (LLMs) like ChatGPT in multi-turn dialogues, yet we predominantly evaluate them with single-turn benchmarks. Bridging this gap, we introduce MINT, a new benchmark tailored for LLMs' multi-turn interactions. 🧵
👁 Image
👁 user avatar
Xingyao Wang
@xingyaow_
Sep 5, 2024
Excited to share that @allhands_ai has raised $5M -- and it's finally time to announce a new chapter in my life: I'm taking a leave from my PhD to focus full-time on All Hands AI. Let's push open-source agents forward together, in the open!
👁 user avatar
OpenHands
@OpenHandsDev
Sep 5, 2024
We are proud to announce that All Hands has raised $5M to build the world’s best software development agents, and do it in the open 🙌 all-hands.dev Thank you to @MenloVentures and our wonderful slate of investors for believing in the mission!
👁 user avatar
Xingyao Wang
@xingyaow_
May 1, 2024
I finally managed to integrate (most of) CodeAct into OpenDevin 🥳. Now, it can work end-to-end on model training (well - very simple linear regression😉). It is somewhat buggy - But I'm excited that we may have a fully open-sourced AI software engineer/data scientist in the near
👁 Image
00:00
👁 user avatar
Xingyao Wang
@xingyaow_
May 21, 2025
The real "wow" moment for me with Devstral: I asked it to build a todo list app — and instead of jumping straight in, it asked me how I wanted to build it, listing actual options. After so many one-sided decisions from Sonnet 3.7, being asked felt... emotional 😭
👁 Image
👁 Image
👁 user avatar
Mistral AI
@MistralAI
May 21, 2025
Meet Devstral, our SOTA open model designed specifically for coding agents and developed with @allhands_ai mistral.ai/news/devstral

URL: https://x.com/xingyaow_

⇱ Xingyao Wang (@xingyaow_) / X