Discover real AI creators shaping the future. Track their latest blogs, X posts, YouTube videos, WeChat Official Account posts, and GitHub commits — all in one place.
Just started running a large scale randomized controlled test of Claude Opus 4.6 right now against every other model. it's beating pretty consistently for me in arenamode. Any guesses on how much more Elo ++ this thing will be over SOTA? doesn't take a lot... >60% winrate is a clear margin of vicotry
Introducing Arena Mode in Windsurf: One prompt. Two models. Your vote. Benchmarks don't reflect real-world coding quality. The best model for you depends on your codebase and stack. So we made real-world coding the benchmark. Free for the next week. May the best model win.
View quoted postExplains the ads
My first interaction with Opus 4.6 is that it is so far the least friendly and most brusque claude ive ever interacted with @AmandaAskell
View quoted postwhere my forward deployed agents at
Hahaha omg Opus 4.6 is TOKEN HUNGRY! I’ve never seen anything like this.
Pelicans for Opus 4.6 and Codex 5.3 - I don't have much interesting to say about these models yet to be honest, they're both incremental improvements on their predecessors and very capable https://simonwillison.net/2026/Feb/5/two-new-models/

Two major new model releases today, within about 15 minutes of each other. Anthropic released Opus 4.6. Here's its pelican: OpenAI release GPT-5.3-Codex, albeit only via their Codex app, not yet in their API. Here's its pelican: I've had a bit of preview access to both of these models and to be honest I'm finding it hard to find a good angle to write about them - they're both really good, but so were their predecessors Codex 5.2 and Opus 4.5. I've been having trouble finding tasks that those previous models couldn't handle but the new ones are able to ace. The most convincing story about capabilities of the new model so far is Nicholas Carlini from Anthropic talking about Opus 4.6 and Building a C compiler with a team of parallel Claudes - Anthropic's version of Cursor's FastRender project. Tags: llm-release, anthropic, generative-ai, openai, pelican-riding-a-bicycle, ai, llms, parallel-agents, c, nicholas-carlini
SF is in store for one of the greatest super bowls ever. Pretty weird that it's on a Thursday this year.