Voozh

OpenAI

1,911 posts

OpenAI

@OpenAI

OpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: openai.com/jobs

Joined December 2015

Pinned
👁 user avatar
OpenAI
@OpenAI
Jun 3
It's time to fly.
👁 Image
00:00
👁 user avatar
OpenAI
@OpenAI
1h
Introducing LifeSciBench, a benchmark for measuring and improving how well AI supports real-world life science research. Developed with 173 scientists from biotechnology and pharmaceutical research, LifeSciBench includes 750 expert-authored tasks across seven biological research
👁 Image
👁 user avatar
OpenAI
@OpenAI
1h
Benchmarks often test biological knowledge or narrow skills. The tasks in LifeSciBench test whether models can reason from evidence, work with scientific artifacts, handle uncertainty, and make useful decisions under real-world constraints. GPT‑Rosalind scores above GPT‑5.5
👁 Image
👁 user avatar
OpenAI
@OpenAI
1h
LifeSciBench is a foundation for more realistic evaluation, targeted improvements, and continued partnership with the life sciences community—helping the field measure progress, identify gaps, and improve AI together for the benefit of everyone.
👁 user avatar
OpenAI
@OpenAI
4h
GPT-5.4 helped drive a medicinal chemistry project from literature review to a validated experimental result. Paired with Molecule.one’s Maria AI and specialized lab, the model proposed an unexpected way to improve a widely used reaction in drug discovery.
👁 Image
00:00
👁 user avatar
OpenAI
@OpenAI
4h
Replying to @OpenAI
Maria tested the idea across 10,080 reactions, and human chemists later validated representative results by hand. Under the optimized conditions, yields improved for 88% of the boronic acids and 83% of the sulfonamides tested. Human chemists then repeated 14 representative
👁 Image
👁 Image
👁 user avatar
OpenAI
@OpenAI
4h
The full process took about 2.5 months, plus another half month for human chemists to write up the results. This is an early example of frontier models supporting more of the scientific research loop: reviewing studies, proposing hypotheses, designing experiments, interpreting
👁 user avatar
OpenAI
@OpenAI
Jun 16
We’re sharing new research on a method for anticipating how models may behave in real-world use before release: simulating deployment with recent, de-identified user requests and studying candidate model responses.
👁 Image
Predicting model behavior before release by simulating deployment
From openai.com
👁 user avatar
OpenAI
@OpenAI
Jun 16
Replying to @OpenAI
Simulated deployments also reduced evaluation awareness to levels close to real production traffic. We extended the method to agentic deployments with stateful tools, showing that tool simulators can produce realistic trajectories when given sufficient context and capabilities.
👁 Image
👁 Image
👁 user avatar
OpenAI
@OpenAI
Jun 16
Deployment Simulation works best with representative production data, which external evaluators often can’t access. In a companion post for our Alignment blog, we also explore the public WildChat dataset and find that, while less precise, it still provides a useful signal about
👁 OpenAI Alignment Research illustration showing public chat data converging into a deployment simulation and branching into real-world misalignment signals.
Can public chat data predict real-world AI misalignments?
From alignment.openai.com
👁 user avatar
OpenAI
@OpenAI
Jun 16
Let’s talk about evals. We’re always looking for better ways to measure and forecast model progress, especially as benchmarks get saturated or gamed. @tejalpatwardhan, who leads our frontier evals team, spoke to @AndrewMayne about why evals matter and what models need to be
👁 Image
00:00
👁 user avatar
OpenAI
@OpenAI
Jun 16
Listen to the OpenAI Podcast on— Spotify open.spotify.com/episode/5pkjhU… Apple podcasts.apple.com/us/podcast/why… YouTube youtu.be/CFqjjKp9Y-Q
👁 Image
open.spotify.com
Why Tejal Patwardhan stopped underestimating the models - Episode 21
OpenAI Podcast · Episode
OpenAI reposted
👁 user avatar
OpenAI Developers
👁 OpenAI
@OpenAIDevs
Jun 16
More of Codex is rolling out across Europe this week. We’re bringing Computer use, the Codex Chrome extension, personalized memory, and Chronicle to Codex users in the EEA, UK, and Switzerland.
👁 Changelog – Codex | OpenAI Developers
Changelog – Codex | OpenAI Developers
From developers.openai.com
👁 user avatar
OpenAI
@OpenAI
Jun 12
We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:
👁 Image
00:00
👁 user avatar
OpenAI
@OpenAI
Jun 12
For the next two weeks, Plus and Pro users can invite up to three friends to try Codex. When a friend sends their first Codex message, you’ll both get another banked reset.
OpenAI reposted
👁 user avatar
Jakub Pachocki
@merettm
Jun 8
The north stars we're working towards at OpenAI all center around the mission: ensure AGI benefits all of humanity. AI should expand human agency, not make people less consequential to the future.
openai.com
Built to benefit everyone: our plan
A vision for the future of AI, focusing on access, safety, and shared prosperity as OpenAI works to ensure AGI benefits everyone.

URL: https://x.com/OpenAI

⇱ OpenAI (@OpenAI) / X