Discover real AI creators shaping the future. Track their latest blogs, X posts, YouTube videos, WeChat Official Account posts, and GitHub commits — all in one place.
how to use obsidian + claude code to build a 24/7 personal operating system and build your startup: 1. write everything in markdown (daily notes, projects, beliefs, people, meetings) 2. link your notes together so they mirror how your brain actually thinks. 3. install obsidian cli so claude code can read your entire vault + the relationships. 4. stop reexplaining projects every session. use reference files instead. 5. build custom slash commands: /context → load your full life + work state /trace → see how an idea evolved over months /connect → bridge two domains you’ve been circling /ideas → generate startup ideas from your vault /graduate → promote daily thoughts into real assets 6. keep a strict rule: human writes the vault. agents read it, suggest, execute. 7. let claude aka clode surface patterns you’ve been unconsciously circling for years. 8. delegate from inside your notes. one sentence in obsidian → agent handles the rest. 9. treat writing as leverage.the more you write, the more context your agents have. 10. understand this:markdown files are the oxygen of llms. i really enjoyed seeing how to use obsidian thanks to @internetvin vin uses ai like a thinking partner wired into his life’s work. 99.99% of people won’t do this because it requires reflection + setup. but once the vault exists, the agent stops being generic. it starts thinking in your voice. episode is live on @startupideaspod (more there) this one is different. send this tweet to a friend. im still processing how game changer obsidian + claude code is, maybe you too watch
New lab by JJ Allaire focused on Evals 👀 Excited that more data folks are getting into this! https://meridianlabs.ai/
We built an AI agent that lets you vibe-code document extraction - high accuracy and citations over the most complex documents. Our latest release lets you upload documents as context. All you then have to do is describe what you want extracted in natural language. 💡 Our agent will then read the document with file tools to infer the right schema, validation rules, and other pre/postprocessing logic. ✅ It will give you back a workflow that can extract over thousands/millions of documents at scale. You can still of course review and edit every output before approving. Stop handling paperwork manually; just upload files, describe your task, and let our agent handle the rest. Our vision for LlamaAgents is to provide the most advanced and easy-to-use way for you to orchestrate document work. Walkthrough: https://youtu.be/5Nk6KZhBDbQ Check it out: https://cloud.llamaindex.ai/ If you’re interested in reducing the operational burden of document extraction (invoices, claims, onboarding forms), come talk to us! https://www.llamaindex.ai/contact
🚀 LlamaAgents Builder just leveled up: File uploads are here! Our natural language interface for building agentic document workflows now supports file uploads. You can provide example documents as context, and the agent will use them as a starting point to design and tailor
View quoted postIf you want to benefit from Claude taking down all the other companies Here's public companies you can buy stock of that own Anthropic: - Amazon - Google - Microsoft - Nvidia - Salesforce - SAP - Zoom
BREAKING: IBM stock, $IBM, falls over -10% after Anthropic announces that Claude can streamline COBOL code. It’s becoming increasingly clear how pivotal the times we are in right now truly are.
RT swyx Big news today if you're into coding evals: SWE-Bench Verified is dead!! https://x.com/latentspacepod/status/2026027529039990985 i'm not sure if @HamelHusain is tired of me tagging him but it turns out @OpenAI really did look back at their own 2024 work and then you 1) look at the CoT and 2) look at the evals they realized that at LEAST 16.4% of SWE-Bench Verified should technically be unsolvable... ... and also that ALL frontier models, including OpenAI's own, are capable of solving them by sheer contamination (including being able to recite verbatim the entire SWE-Bench problem setup and solution, just by giving Task ID alone (!!!!)). Heroic work from the OAI Evals team, and imo an important highlight on the fragility and messiness of Evals work in general. OpenAI spent the money to do 3 independent reviews of each problem in 2024 and AT LEAST SIXTEEN PERCENT OF THESE were still egregiously prolematic (as shown in screenshots). in this 2026 audit they then did 6 independent reviews from software engineers, with ADDITIONAL positive finding verification from a separate team, in order to arrive at today's conclusion. If this happens to SWE-Bench Verified... what else is hiding in other benchmarks out there? Original tweet: https://x.com/swyx/status/2026029120040137066
🆕 The End of SWE-Bench Verified (2024-2026) https://latent.space/p/swe-bench-dead Today @OpenAIDevs is announcing the voluntary deprecation of SWE-Bench Verified! We're releasing a podcast + analysis in today's post. Saturation of SWE-Bench has been a community hot topic for over a year -
Big news today if you're into coding evals: SWE-Bench Verified is dead!! https://x.com/latentspacepod/status/2026027529039990985 i'm not sure if @HamelHusain is tired of me tagging him but it turns out @OpenAI really did look back at their own 2024 work and then you 1) look at the CoT and 2) look at the evals they realized that at LEAST 16.4% of SWE-Bench Verified should technically be unsolvable... ... and also that ALL frontier models, including OpenAI's own, are capable of solving them by sheer contamination (including being able to recite verbatim the entire SWE-Bench problem setup and solution, just by giving Task ID alone (!!!!)). Heroic work from the OAI Evals team, and imo an important highlight on the fragility and messiness of Evals work in general. OpenAI spent the money to do 3 independent reviews of each problem in 2024 and AT LEAST SIXTEEN PERCENT OF THESE were still egregiously prolematic (as shown in screenshots). in this 2026 audit they then did 6 independent reviews from software engineers, with ADDITIONAL positive finding verification from a separate team, in order to arrive at today's conclusion. If this happens to SWE-Bench Verified... what else is hiding in other benchmarks out there?
🆕 The End of SWE-Bench Verified (2024-2026) https://latent.space/p/swe-bench-dead Today @OpenAIDevs is announcing the voluntary deprecation of SWE-Bench Verified! We're releasing a podcast + analysis in today's post. Saturation of SWE-Bench has been a community hot topic for over a year -
RT Steven Heidel the Responses API now supports WebSockets! this can make your agents run 30-40% faster, especially when they make a lot of tool calls Original tweet: https://x.com/stevenheidel/status/2026026829388353578
Introducing WebSockets in the Responses API. Built for low-latency, long-running agents with heavy tool calls. http://developers.openai.com/api/docs/guides/websocket-mode
View quoted postActivity on simonw/simonwillisonblog
simonw contributed to simonw/simonwillisonblog
View on GitHub