Voozh

Anastasios Nikolas Angelopoulos

2,646 posts

Anastasios Nikolas Angelopoulos

@ml_angelopoulos

Measuring intelligence @arena. Statistics, model evaluation. Formerly @Berkeley_EECS, @StanfordEng, student researcher @GoogleDeepMind.

angelopoulos.ai

Joined September 2019

Pinned
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Jun 4
Agent Arena gives every model access to a Claude-Code-like harness and a computer. Our users went nuts, generating millions of real traces per week. We used this data to build the first large-scale benchmark of agent usefulness in the wild. We analyze agents by collecting many
👁 Image
👁 Image
👁 user avatar
Arena.ai
@arena
Jun 4
Introducing Agent Arena: real-world agentic evals at scale. How do you evaluate agents doing actual work? We measure millions of live sessions where real users accomplish real tasks. On Arena, models now get web search, filesystem, and terminal tools to complete complex
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Sep 7, 2022
📢Huge update to Gentle Introduction to Conformal Prediction📢 arxiv.org/abs/2107.07511 Notebooks for EVERY example, easy-2-run WITHOUT model/data download. Open+run in Colab!✅ New repo here: github.com/aangelopoulos/… New sections on time-series and risk control!✅ More in 🧵
👁 Updated table of contents to Gentle Introduction, including conformal risk control, conformal under distribution drift, 5 worked examples of conformal prediction, and full conformal prediction.
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Nov 19, 2024
🚨 New Textbook on Conformal Prediction 🚨 arxiv.org/abs/2411.11824 “The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these
👁 Image
👁 Image
👁 Image
👁 Image
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Jan 25, 2023
📯Prediction-Powered Inference📯 arxiv.org/abs/2301.09633 With the rise of AlphaFold etc., people are using ML predictions to replace costly experimental data. But predictions aren't perfect; can we still use them for rigorous downstream inferences? The answer: yes. A 🧵
👁 Left: A picture of a phosphorylated protein. Middle-Right: confidence intervals. The prediction-powered confidence interval is correct, while the imputed one is too small and the classical one is too big.
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Nov 9, 2023
Prediction-powered inference was published today as a research article in Science! @ScienceMagazine science.org/doi/full/10.11… Check it out - and if ur interested in collaborating, learning about PPI, or ML for science more broadly, plz reach out! Also see Berkeley News
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Jan 25, 2023
📯Prediction-Powered Inference📯 arxiv.org/abs/2301.09633 With the rise of AlphaFold etc., people are using ML predictions to replace costly experimental data. But predictions aren't perfect; can we still use them for rigorous downstream inferences? The answer: yes. A 🧵
👁 Left: A picture of a phosphorylated protein. Middle-Right: confidence intervals. The prediction-powered confidence interval is correct, while the imputed one is too small and the classical one is too big.
👁 Image
science.org
Prediction-powered inference
A statistical protocol for valid scientific discovery using machine learning is presented.
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Sep 20, 2025
xAI’s Grok 4 fast is the #1 small model in LMArena. Grok’s ascent to the top of the Arena: – Grok 2 (August 2024): debuted at #3 on the overall leaderboard – Grok 3 (March 2025): debuted at #1 on the overall leaderboard – Grok 4 fast (Sept 2025) → #1 in search and the #1
👁 Image
👁 Image
👁 Image
👁 user avatar
Arena.ai
@arena
Sep 19, 2025
🚨 Leaderboard Disrupted! Grok-4-fast by @xai has arrived in the Arena, and it’s shaking things up! ⚡️ 🏆 #1 on the Search Leaderboard Tested under the codename “menlo,” Grok-4-fast-search just rocketed to the top spot with the community. 💠 Tied for #8 on the Text Leaderboard
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Aug 1, 2023
🤖Conformal PID Control for Time-Series Prediction🕹 📝w/Candès & Tibshirani! arxiv.org/abs/2307.16895 Is conformal prediction out of control🤪? Yes! CP can be thought of as PID control, giving stronger guarantees+algorithms that predict distribution shifts as they happen! A🧵
👁 Image
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Nov 9, 2025
Have you ever considered that the model made by @MistralAI (a French company) might somehow be better for French users? This is why it's important to have independent scientific benchmarks from real-world usage... sometimes reality can surprise you and defy intuition.
👁 user avatar
Lisan al Gaib
@scaling01
Nov 8, 2025
The french government created an LLM leaderboard akin to lmarena, but rigged it so that Mistral Medium 3.1 would be at the top Mistral 3.1 Medium > Claude 4.5 Sonnet or Gemma3-4B and a bunch of Mistral models > GPT-5 ??????????????????? LMAO
👁 Image
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Oct 26, 2025
It's Zareen's, man! And yes, we love it
👁 Image
👁 Image
👁 user avatar
SemiAnalysis
@SemiAnalysis_
Oct 26, 2025
Zareen is one of the go to places for many SF Bay Area AI researchers to get a quick bite. Most of the food is very good and was even on the Michelin guide in 2020. AI researchers not experienced with the Indian cuisine will commonly order their chicken tikka masala with garlic
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Mar 1, 2022
This is a huge moment for conformal prediction. @sirbayes has included a section of his ML book on conformal prediction, based on our Gentle Introduction (w/@stats_stephen + including some of our figures). This is becoming mainstream. The whole community should be proud.
👁 user avatar
Kevin Patrick Murphy
@sirbayes
Feb 28, 2022
I am delighted to announce that a draft of my latest book, “Probabilistic Machine Learning: Advanced Topics”, is now available online at probml.ai. It covers #DeepGenerativeModels, #BayesianInference, #Causality, #ReinforcementLearning, #DistributionShift, etc.
👁 Image
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Jul 13, 2023
🥹🥹🥹 thank u ☺️
👁 Image
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Aug 26, 2025
Here's a summary of today's big news: unpeeling the story of a tiny banana. 🤏 🍌 Gemini-2.5-Flash-Image-Preview by @GoogleDeepMind is an image editing model that has taken the world by storm. It allows conditioning on both text and images, so it can follow textual instructions
👁 Image
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Aug 7, 2025
Millions of people have used GPT-5 under the codename summit on LMArena over the past couple weeks 🏔️ The people have spoken: GPT-5 is #1 on EVERYTHING in LMArena. 🧮 Math 💻 Coding 🖋️ Creative writing Check out an example of its multifaceted intelligence in the 🧵
👁 Image
👁 Image
👁 user avatar
Arena.ai
@arena
Aug 7, 2025
GPT-5 is here - and it’s #1 across the board. 🥇#1 in Text, WebDev, and Vision Arena 🥇#1 in Hard Prompts, Coding, Math, Creativity, Long Queries, and more Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date. Huge congrats to @OpenAI on this
👁 user avatar
Anastasios Nikolas Angelopoulos
👁 Arena.ai
@ml_angelopoulos
Mar 31, 2021
We built an eye tracker using an event-based camera that operates at 10KHz and above in a low-power and small form-factor. It has accuracy of 0.5deg in central FOV. See the oral tomorrow at #IEEEVR21, journal paper in TVCG, and website: computationalimaging.org/publications/e…#vr #ar #ai 🧵1/n
👁 Image
00:00

URL: https://x.com/ml_angelopoulos

⇱ Anastasios Nikolas Angelopoulos (@ml_angelopoulos) / X