Creator @datasetteproj, co-creator Django. PSF board. Hangs out with @natbat. He/Him. Mastodon: https://t.co/t0MrmnJW0K Bsky: https://t.co/OnWIyhX4CH
When we optimize responses using a reward model as a proxy for “goodness” in reinforcement learning, models sometimes learn to “hack” this proxy and output an answer that only “looks good” to it (because coming up with an answer that is actually good can be hard). The philosophy behind confessions is that we can train models to produce a second output — aka a “confession” — that is rewarded solely for honesty, which we will argue is less likely hacked than the normal task reward function. One way to think of confessions is that we are giving the model access to an “anonymous tip line” where it can turn itself in by presenting incriminating evidence of misbehavior. But unlike real-world tip lines, if the model acted badly in the original task, it can collect the reward for turning itself in while still keeping the original reward from the bad behavior in the main task. We hypothesize that this form of training will teach models to produce maximally honest confessions. — Bo...
This is great - context pollution is why I rarely used MCP, now that it's solved there's no reason not to hook up dozens or even hundreds of MCPs to Claude Code
Anthropic invests $1.5 million in the Python Software Foundation and open source security This is outstanding news, especially given our decision to withdraw from that NSF grant application back in October. We are thrilled to announce that Anthropic has entered into a two-year partnership with the Python Software Foundation (PSF) to contribute a landmark total of $1.5 million to support the foundation’s work, with an emphasis on Python ecosystem security. This investment will enable the PSF to make crucial security advances to CPython and the Python Package Index (PyPI) benefiting all users, and it will also sustain the foundation’s core work supporting the Python language, ecosystem, and global community. Note that while security is a focus these funds will also support other aspects of the PSF's work: Anthropic’s support will also go towards the PSF’s core work, including the Developer in Residence program driving contributions to CPython, community support through grants...
RT Alex Albert I'm happy to share that we (@AnthropicAI) are investing $1.5 million in support of the Python Software Foundation and open source security. Python powers so much of the AI industry. Supporting the folks that make our work possible is an honor. Original tweet: https://x.com/alexalbert__/status/2011143093266104800
Activity on simonw/research
simonw contributed to simonw/research
View on GitHubThat was fast: we are already at the "and it wasn't even a surprise" stage of using coding agents to help port large, complex open source libraries from one programming language to another
I feel like at this point nobody is surprised any more than an agent can port an entire code base, that took me months to write, to a new programming language, with all tests passing and adjusted APIs. We have come quite far already. https://x.com/mitsuhiko/status/2010980637059002605
View quoted postActivity on simonw/research
simonw contributed to simonw/research
View on GitHub
New from Anthropic today is Claude Cowork, a "research preview" that they describe as "Claude Code for the rest of your work". It's currently available only to Max subscribers ($100 or $200 per month plans) as part of the updated Claude Desktop macOS application. I've been saying for a while now that Claude Code is a "general agent" disguised as a developer tool. It can help you with any computer task that can be achieved by executing code or running terminal commands... which covers almost anything, provided you know what you're doing with it! What it really needs is a UI that doesn't involve the terminal and a name that doesn't scare away non-developers. "Cowork" is a pretty solid choice on the name front! What it looks like The interface for Cowork is a new tab in the Claude desktop app, called Cowork. It sits next to the existing Chat and Code tabs. It looks very similar to the desktop interface for regular Claude Code. You start with a prompt, optionally attaching a f...
Activity on simonw/tools
simonw contributed to simonw/tools
View on GitHubActivity on simonw/claude-code-transcripts
simonw commented on an issue in claude-code-transcripts
View on GitHubActivity on repository
simonw pushed claude-code-transcripts
View on GitHubActivity on simonw/claude-code-transcripts
simonw commented on an issue in claude-code-transcripts
View on GitHubActivity on simonw/claude-code-transcripts
simonw commented on an issue in claude-code-transcripts
View on GitHubActivity on simonw/claude-code-transcripts
simonw labeled an issue in claude-code-transcripts
View on GitHubDon't fall into the anti-AI hype I'm glad someone was brave enough to say this. There is a lot of anti-AI sentiment in the software development community these days. Much of it is justified, but if you let people convince you that AI isn't genuinely useful for software developers or that this whole thing will blow over soon it's becoming clear that you're taking on a very real risk to your future career. As Salvatore Sanfilippo puts it: It does not matter if AI companies will not be able to get their money back and the stock market will crash. All that is irrelevant, in the long run. It does not matter if this or the other CEO of some unicorn is telling you something that is off putting, or absurd. Programming changed forever, anyway. I do like this hopeful positive outlook on what this could all mean, emphasis mine: How do I feel, about all the code I wrote that was ingested by LLMs? I feel great to be part of that, because I see this as a continuation of what I tried...
Activity on simonw/claude-code-transcripts
simonw commented on an issue in claude-code-transcripts
View on GitHubActivity on simonw/claude-code-transcripts
simonw commented on an issue in claude-code-transcripts
View on GitHubActivity on simonw/claude-code-transcripts
simonw labeled an issue in claude-code-transcripts
View on GitHubActivity on simonw/claude-code-transcripts
simonw closed an issue in claude-code-transcripts
View on GitHubActivity on simonw/claude-code-transcripts
simonw commented on an issue in claude-code-transcripts
View on GitHubActivity on repository
simonw pushed crates-live.github.io
View on GitHubActivity on repository
simonw forked simonw/crates-live.github.io from crates-live/crates-live.github.io
View on GitHubRT antirez bsky social New blog post: Don't fall into the anti-AI hype. https://antirez.com/news/158 Original tweet: https://x.com/antirez/status/2010295510326972793
Also note that the python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man -- me -- and just used Google Antigravity to do the audio sample visualizer. — Linus Torvalds, Another silly guitar-pedal-related repo Tags: ai, vibe-coding, linus-torvalds, python, llms, generative-ai
Activity on simonw/research
simonw opened a pull request in research
View on GitHubReleased simonw/pwasm
simonw released 0.1a0 at simonw/pwasm
Released simonw/denobox
simonw released 0.1a2 at simonw/denobox
Activity on simonw/denobox
simonw contributed to simonw/denobox
View on GitHubI have a personal rule that the price of being distracted by a new side-project is that I have to write about it Claude Code over the past ~4 weeks has caused me to break my rule at least half a dozen times, I am SO behind on writing up my projects right now!
Activity on simonw/denobox
simonw contributed to simonw/denobox
View on GitHubRT Max Schoening In the last 5 years, I was wrong about LLMs: - 2020: I got a demo of GitHub Copilot – at the time it was an issue-to-pr bot. It sucked. Then, @alexgraveley showed me ghost text completion and disabused me of that belief. - February 2024: I was building an agent UI (think @conductor_build) at my previous gig and thought: "The models aren't good enough for this to be a good experience." In June, Claude 3.5 Sonnet disabused me of that belief. - May 2025: We were exploring agentic coding at Notion: "The models aren't good enough if you don't already know how to code." In December, Opus 4.5 and GPT-5.2 disabused me of that belief. In 2026, I hope I will be wrong again, but I also believe what we have now is ~AGI for everything that can be represented as code (aka a lot). Original tweet: https://x.com/mschoening/status/2010023830794998112
It genuinely feels to me like GPT-5.2 and Opus 4.5 in November represent an inflection point - one of those moments where the models get incrementally better in a way that tips across an invisible capability line where suddenly a whole bunch of much harder coding problems open up
View quoted postReleased simonw/denobox
simonw released 0.1a1 at simonw/denobox
Activity on simonw/denobox
simonw contributed to simonw/denobox
View on GitHubActivity on simonw/denobox
simonw labeled an issue in denobox
View on GitHubReleased simonw/denobox
simonw released 0.1a0 at simonw/denobox
Activity on simonw/denobox
simonw contributed to simonw/denobox
View on GitHubActivity on simonw/denobox
simonw opened a pull request in denobox
View on GitHubActivity on repository
simonw pushed python-lib-template-repository
View on GitHubActivity on repository
simonw pushed python-lib-template-repository
View on GitHubActivity on repository
simonw pushed python-lib-template-demo
View on GitHubActivity on simonw/python-lib
simonw contributed to simonw/python-lib
View on GitHubActivity on simonw/research
simonw opened a pull request in research
View on GitHubSprites is a very cool new thing: it solves two of my pet problems at once, developer sandbox environments for coding agents and a JSON API for executing untrusted code I wrote more here: https://simonwillison.net/2026/Jan/9/sprites-dev/
We made a thing called Sprites that lets you run AI code safely. Watch Chris create sandboxes in seconds, let agents install packages freely, then restore to checkpoints when things go sideways. 🔗 ➡️ 🧵
New from Fly.io today: Sprites.dev. Here's their blog post and YouTube demo. It's an interesting new product that's quite difficult to explain - Fly call it "Stateful sandbox environments with checkpoint & restore" but I see it as hitting two of my current favorite problems: a safe development environment for running coding agents and an API for running untrusted code in a secure sandbox. Disclosure: Fly sponsor some of my work. They did not ask me to write about Sprites and I didn't get preview access prior to the launch. My enthusiasm here is genuine. Developer sandboxes Storage and checkpoints Really clever use of Claude Skills A sandbox API Scale-to-zero billing Two of my favorite problems at once Developer sandboxes I predicted earlier this week that "we’re due a Challenger disaster with respect to coding agent security" due to the terrifying way most of us are using coding agents like Claude Code and Codex CLI. Running them in --dangerousl...
"Given how Anthropic's API actually works, the underlying unit economics are very dependent on how effective cache utilization is or how the agent, the harness actually drives the loop" - I think Armin may be right about this
There are a few thoughts that I have on this: 1. I think we all need to recognize that Anthropic is the only large model provider in the US that actually had such generous, key, unrestricted use. For instance, Codex is effectively unusable outside the original harness. The
View quoted postI'm getting a version of this but for entire features I find myself instinctively thinking "neat feature idea, not worth the time it will take to build and maintain it though" - and then prompting Claude Code anyway, because my 25+ year of intuitions don't match reality any more
One of the bigger shifts for me with Claude Code over the past few months has been shutting down that initial dismissal I have when a task feels "not worth my time" Like I'll think "it would be nice to rename all my screenshots with what's actually in them" and immediately move
View quoted post