Discover real AI creators shaping the future. Track their latest blogs, X posts, YouTube videos, WeChat Official Account posts, and GitHub commits — all in one place.
Cybersecurity Looks Like Proof of Work Now The UK's AI Safety Institute recently published Our evaluation of Claude Mythos Preview’s cyber capabilities, their own independent analysis of Claude Mythos which backs up Anthropic's claims that it is exceptionally effective at identifying security vulnerabilities. Drew Breunig notes that AISI's report shows that the more tokens (and hence money) they spent the better the result they got, which leads to a strong economic incentive to spend as much as possible on security reviews: If Mythos continues to find exploits so long as you keep throwing money at it, security is reduced to a brutally simple equation: to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them. An interesting result of this is that open source libraries become more valuable, since the tokens spent securing them can be shared across all of their users. This directly counters the idea that the low cost of vib...
Here, we measure success by the fraction of the “performance gap” we can close between the weak model and the potential of the strong model. After 7 days, human researchers closed it by 23%. Then, our Automated Alignment Researchers—Opus 4.6 with extra tools—closed it by 97%.To test the broader usefulness of the AARs’ methods, we assessed how well they worked on two datasets the AARs hadn’t seen before. The AARs’ best-performing method successfully generalized to both coding and math tasks, though their second-best method only generalized to math.
Yep. All use cases. Zero reason for a cowork like interface separate from Claude code. - List of agents / tasks - Agent - Thing being created
I am more and more convinced that this is the future of software development UI. @cursor_ai is the closest in my opinion A list of work you're working on parallel, the agent in the middle, and most importantly, the thing you're building on the right. Because you want to see what
RT Brace ☁️ Salesforce tools now in Fleet One of the most requested features we've gotten, and it's now a first-class supported tool in Fleet! Just sign in with your Salesforce account, and start using it immediately in your agents: http://smith.langchain.com/agents Original tweet: https://x.com/BraceSproul/status/2044112213296967838
Building agents locally does not mean they’re ready to deploy in production LangSmith deployments helps with that
🔐 One deployment, isolated data per user. Add custom auth so every user gets their own scoped threads, runs, and conversation history — with per-user data isolation and role-based access using any auth provider. Full walkthrough: https://youtu.be/DkNqgCz8cjE Docs:
Big release here! Async subagents will become more and more of a thing, as subagents get longer running and you don’t want to block the event loop
🚀 deepagents 0.5 release 👉 Async subagents - kick off background tasks on any Agent Protocol backed server while you continue to interact with the main agent. Start multiple background tasks in parallel, keep the conversation going, and collect results as they come in. Tasks
View quoted postRT Viv Open Harness 🤝 Deployed Agents if you wanna use Claude, GLM5, and Codex in your deployed harness then you should be able to! deepagents deploy has easy configs to let users customize their harness and deploy (redeploy) asap more coming on subagents (my fave upcoming thing for task decomposition) Original tweet: https://x.com/Vtrivedy10/status/2044100301792120989
ICYMI -- last week we released `deepagents deploy`, the fastest way to take a highly capable, long running agent to production. agents are becoming more and more standardized, and we're betting on this open standard for agent config! user memory and subagents coming soon!
RT Sydney Runkle this is a fundamental building block for `deepagents deploy` we're designing a memory layer built for multi-tenant systems, so memory can be scoped to a user, agent, or organization please dm me if this resonates and you have a use case! Original tweet: https://x.com/sydneyrunkle/status/2044099832319500484
🔐 One deployment, isolated data per user. Add custom auth so every user gets their own scoped threads, runs, and conversation history — with per-user data isolation and role-based access using any auth provider. Full walkthrough: https://youtu.be/DkNqgCz8cjE Docs:
RT LangChain 🔐 One deployment, isolated data per user. Add custom auth so every user gets their own scoped threads, runs, and conversation history — with per-user data isolation and role-based access using any auth provider. Full walkthrough: https://youtu.be/DkNqgCz8cjE Docs: https://docs.langchain.com/langsmith/set-up-custom-auth Original tweet: https://x.com/LangChain/status/2044098386270310783