🇦🇺 Co-founder: @AnswerDotAI/@FastDotAI ; Prev: Professor@UQ; @kaggle founding president; founder @fastmail/@enlitic/… https://t.co/16UBFTX7mo
RT Mark Saroufim My MLSys keynote on AI writing systems code got more interest than I expected. The recording will take a while, so in the finest tradition of AI labs sharing blog posts, we’re starting the Core Automation Blog with this one https://www.coreauto.com/blog/when-ai-starts-writing-systems-code
RT Ben Tossell wait… if most people think 5.5 is better than 4.7, i assume that’s due to terminal coding benchmark… 4.8 is still outperformed by 5.5
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.
RT Lenny Rachitsky Fascinating results + Anthropic running away with it right now + So many people want to start their own company + Google over OpenAI + Vercel, Linear, Every, PostHog overperforming A great list if you're trying to figure out where to go work 👇
RT Ed Elson $65B private round More than double the size of the largest IPO ever
We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and expand our capacity to meet growing demand for Claude.
View quoted postRT Minh Nhat Nguyen glad to know Mythos' safety concerns have been addressed right as Anthropic also secured tens of billions in inference compute 👍
JUST IN: Anthropic announces it will roll out Claude Mythos “in the coming weeks” despite growing fears over the model’s cyber capabilities.
View quoted postRT Hot Rails — oz/acc Fun fact: Australia is basically Scandinavia with the Sahara Desert bolted on.
I LOVE this visualisation. Everyone imagines nature and the outback, but they don't realise just how urbanised we are. Credit: u/KaleyTheKing
RT Ethan Mollick There is a lot being written about the stylistic tells of AI writing (em-dashes, etc.) but this paper looks at AI narrative tells Fascinating differences between AI & human narrative, and asking AI to write in different styles doesn't do much to change it https://arxiv.org/abs/2604.03136
RT Florian Kronawitter Anthropic is too expensive and will either lose customers or cut prices
RT hardmaru For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation http://pub.sakana.ai/diffusionblocks What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the
View quoted postRT Fuli Luo Behind the MiMo API Price Reduction: The deepest price cut, up to 99%, is for Input (Cache Hit). The core reason is our inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token capacity by 5x, equivalent to an 80% reduction in caching costs. Combined with Cache Read Overlap among multiple Full Attention modules in the Hybrid model, actual costs are further reduced. Prices for Input (Cache Miss) and Output are also reduced by 60%-80%. This mainly benefits from the extreme 1:7 Full:SWA sparsity ratio brought by the model architecture (the prefill compute of the 70-layer MiMo-V2.5-Pro roughly equals a 10-layer GQA model). This kept our original inference costs well below the industry average, naturally leaving a 2x-3x profit margin in pricing. This price adjustment simply reflects our decision to pass these structural cost efficiencies directly to developers. Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even. We previously advised LLM companies not to "blindly cut prices" precisely because very few model architectures and inference optimizations can keep API costs from running at a loss. If more architectures that save compute and KV cache emerge, along with better inference Infra to drive down API costs, this will form an excellent virtuous cycle in the industry. More crucially, affordable, high-performance model APIs will drive real, sustained, and at-scale inference demand. This upstream demand pulls forward the development of the entire AI infrastructure chain—including chips, servers, optical transceivers, PCBs, liquid cooling, power, energy storage, and data centers—serving as a strategic fulcrum for a systemic revaluation of AI hardware. In the long run, this injects more affordable and accessible compute into both training and inference pipelines, a...
RT Mario Zechner recommended reading. i too am very done with people anthropomorphizing a bunch of matrices on a GPU cluster, especially if the same people do not give two fucks about actual human beings.
More musings after some people got upset about the word clanker. https://lucumr.pocoo.org/2026/5/26/clankers/
View quoted postRT Mario Zechner Re @mteamisloading the models from 6 months ago kinda feel the same like the recently released models. currently not holding my breath for more step changes, but will be happy if they happen.
RT Flowers ☾ Nothing disappoints me more than people saying we should stop progress because peoples meaning depends on that monotonous labor, as if humanitys highest purpose is filling Excel sheets or stocking shelves. This is the worst take. Worse than keep4o, anti ai art, doomerism,...
RT Alex Imas This from @TuhinChakr is brilliant. That prize winning story from Granta? Turns out it's just a bunch of random whole phrases taken directly from existing text on the internet. Tool allows you to trace those n-grams directly to their source, which is mostly random fanfiction. https://tuhinchakrabarty.substack.com/p/ai-slop-grantagate-and-bad-writing
RT Shoshana Weissmann, Sloth Committee Chair 🦥 Australian teens who lost access to social media because of age verification read less news
RT Timothy Gowers @wtgowers If you are a mathematician, then you may want to make sure you are sitting down before reading further.
RT hyunji amy lee LLM agents & memory systems operate in continuously updated environments (Git repos, evolving docs). They must process long contexts, recover earlier information, and reason over many updates that create interference between old and new information. How well do they handle this? We introduce MINTEval: ✅ Frequent context changes & interference (avg. 86 updates) ✅ 5 challenging question types, including long-range lookback & reasoning over multiple targets distributed across context ✅ 4 realistic domains: state tracking, multi-turn dialogue, Wikipedia revisions, GitHub commits ✅ Avg. 138.8k tokens per instance (up to 1.8M) ✅ Human verification on generated QAs = 95.6% 📊 Across 7 representative systems, MINTEval remains difficult, showing an avg. acc of 27.9%, and the best system reaches only 33.4%. 🔎 Our analysis shows: • Memory construction failures cause a 41.7% drop • Memory agents are highly sensitive to design choices • Memory systems have a strong bias toward insertion operations (76.8%) over deletion/update
RT Mario Zechner everbody who posts three.js scenes generated by gemini 3.5 flash will get blocked for life. this is non-negotiable. it's 2026.
RT Enrico - big-AGI Disappointing pricing trend with Gemini 3.5 Flash. 22.5x pricier than 2.0 Flash which came out 15 months ago ($9.00 vs $0.40). Are Flash models supposed to get this much more expensive, or is Pro just being renamed to Flash?
Welcome to Gemini 3.5 Flash, our most powerful model to date. It pushes the frontier of intelligence, speed, and cost putting 3.5 Flash in a class of its own. We spent the last 6 months making sure Flash is great for real world use cases. It's available everywhere now!
RT gabe Literary journals are now publishing, and awarding prizes to, AI written stories. Surprised this made it into Granta!
‘The Serpent in the Grove’ by Jamir Nazir is a story set in rural Trinidad about a struggling farmer, a silenced young wife and a grove that seems to remember what others try to bury. Awarded the Caribbean regional winner title for its lyrical precision and haunting atmosphere,
RT Mitchell Hashimoto I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out. I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really). It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely. The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture. We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying. I worry.
RT Andrew White 🐦⬛ hallucinated references will land you a 1-year ban from arxiv now. wow
RT Thomas G. Dietterich Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/
This is misleading. This policy redefines the term "interactive" to mean "using an Anthropic front-end". If you use `claude -p` or Agent SDK to do something interactively, it now uses credits, not your subscription limits. So the "interactive use" heading saying "unchanged" subscriptions is not accurate.
To add some clarity: you don't pay extra. It's the same subscription, same price per month. What's new our sub now covers two separate pools: · Interactive → sub limits, unchanged · Programmatic → new $20–$200 included(!!) credit, metered at API rates
RT dex hey surprise - you can just launch interactive in tmux and then tail the jsonl - shipped a small wrapper...ralph loop iterating to full parity rn https://github.com/dexhorthy/shannon
Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK
View quoted postRT Theo - t3.gg If you use any of the following with your Claude sub, your usage must got cut by 25x: - T3 Code - Conductor - zed - jean - “Claude -p” in your ci - scripts to call Claude code from other tools They’re disguising this as “free credits”. Don’t fall for it.
Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK
View quoted postRT Jonas Geiping We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.
Sound on! This is pretty cool :D
SolveIt is already an amazing environment for learning and exploring any topic, or for development/writing etc. But add in real-time conversational interaction too just takes it to the next level. 🤯
View quoted postRT Andreas Kirsch 🇺🇦 As always, no insights and in personal capacity: The DeepMind unionization effort has very worthy goals it seems Maybe Google will finally grant GDM that /independent/ ethics oversight board that was reportedly part of the original acquisition deal in 2014
RT Jerry Tworek If the AI models are so smart, why do I feel like I’m losing a few neurons every time I read a longer form content written by AI? We’ve come a long way but we still have long way to go. In terms of clarity of writing we may have regressed from o1/o3 days.
RT Mario Zechner big "Look what they need to mimic a fraction of our power" energy. the original DOOM impl is ~40k lines of C and a bit of assembly and is also a full software renderer.
My Codex /goal that has been running for like 40 hours that is now 100K+ lines of code now is a pure Swift Doom source port. It'll be the first, source accurate, software renderer for Doom that is fully in Swift. No OpenGL, Metal, SceneKit, no nothing. Just Swift.
View quoted postRT Fireworks AI Frontier labs are betting AGI models will be so good you won't ever want to customize them. We think different. Building on a closed platform means renting your intelligence. The landlord sets the terms. They can give notice at any moment that your fine-tuning lease will not be renewed. As AI natives, we think you should own your AI. Your data, your domain expertise, your moat. Start training today on the Fireworks AI Training Platform. https://fireworks.ai/train
OpenAI has announced they will be winding down fine tuning. I got the email today. Existing active @OpenAI customers can keep running fine-tuning jobs until January 6, 2027, but after that no new training jobs can be created. Existing fine-tuned models will still run, but only
RT ERNIE for Developers ERNIE 5.1 is here 🚀 ERNIE 5.1 significantly reduces pretraining cost while compressing total parameters to ~1/3 and activated parameters to ~1/2 — using only ~6% of the pretraining cost compared to models at similar scale, while achieving leading performance in its class. 💡Key highlights: 1/ Strong agentic performance approaching leading frontier models. ERNIE 5.1 surpasses DeepSeek-V4-Pro on both τ3-bench and SpreadsheetBench-Verified. 2/ Strong world knowledge and creative writing capabilities, with GPQA and MMLU-Pro performance approaching leading closed-source models, and creative writing ability nearing Gemini 3.1 Pro. 3/ Frontier-level reasoning performance. ERNIE 5.1 scores 99.6 on the challenging AIME26 benchmark with tools, second only to Gemini 3.1 Pro. 4/ Deep search capability. On May 9, ERNIE 5.1 ranked #4 globally and #1 among Chinese models on the Arena Search leaderboard with a score of 1223. ERNIE 5.1 is now available on ERNIE and the Baidu AI Studio Model Playground: 👉https://ernie.baidu.com 👉https://aistudio.baidu.com 👉https://ernie.baidu.com/blog
RT Jonathan Blow It's been 3 months since the 100x vibers started 100x vibin'! So, post your 25-years-of-work-equivalent project here, so we can signal boost and everyone can celebrate the Life's Work that you did in 3 months. Looking forward to it, Let's Go!!!
The only correct answer when a VC asks: "What's your moat?"
Starting a company in a garage is boring so we started @dottxtai in a French castle instead
RT Simon Willison Under-reported details of the xAI/Anthropic Colossus data center deal: Anthropic get Colossus 1 but xAI keep using the larger Colossus 2, Colossus 1 has a REALLY bad environmental record, and xAI just shut down a bunch of older models on 2 weeks' notice https://simonwillison.net/2026/May/7/xai-anthropic/
RT antirez Welcome to DS4, a specialized inference engine for DeepSeek v4 Flash. https://github.com/antirez/ds4 This project would have been impossible without the existence of llama.cpp and GGML and the work of @ggerganov and all the other contributors. Thanks!
RT Aidan Clark I'm disappointed by repeatedly hearing that my colleagues at Anthropic believe they are the only ones who should be trusted with building AI. It is *very good* there are a diversity of people building AGI: the likelihood anyone picks the right path in a vacuum is extremely small.
RT Tencent Hy Two weeks after release, Hy3 preview is #1 on @OpenRouter's weekly leaderboard with 3.66T tokens processed, up 298% week-over-week. #1 in overall usage, tool calls, and coding. 15.4% market share across all providers.🏆 Top apps running Hy3 preview: Hermes Agent, Claude Code, Kilo Code, OpenClaw, Cline.@NousResearch @claudeai @kilocodehq @openclaw @cline Huge thanks to every developer building with it. 🙏 Try it on OpenRouter: https://openrouter.ai/tencent/hy3-preview:free
RT François Fleuret Give LLMs 1. A latent space diffusion-like reasoning. 2. A real recurrent state. 3. A world-model pre-pre-training. And we are done.
RT Hao Zhang Exciting to work with @googledevs . Dflash is one of the most powerful technique developed here at UCSD by @zhijianliu_ and @jianchen1799 and glad that our students and collaborators help port them into Google's TPU systems!!!
Breaking LLM inference’s autoregressive bottleneck 🛠️ We've teamed up with @haozhangml, @YimingBob, and @aaronzhfeng, among others from UCSD to achieve a massive 3.13X speedup for LLM inference on Google Cloud TPUs using Diffusion-Style Speculative Decoding (DFlash). Read the
View quoted postRT Artificial Analysis MiniMax-M2.7 is now available across six inference providers on Artificial Analysis, with significant differentiation in speed and price @SambaNovaAI leads on speed at 435 output tokens/s, >3x faster than any other provider. @FireworksAI_HQ, @novita_labs, @togethercompute, and @GMI_cloud have all matched @MiniMax_AI's first-party API pricing, while SambaNova is 2x higher. Key takeaways: ➤ Fireworks and SambaNova are on the Pareto frontier for Speed vs. Price. At 127 output tokens/s and ~$0.22 per 1M tokens blended, Fireworks is ~2.2x faster than MiniMax's first-party API at the same blended price, whereas SambaNova delivers 435 output tokens/s but at ~2-3.5x the blended price of the other providers (depending on cache usage) ➤ SambaNova is the fastest provider at 435 output tokens/s, ~3.4x the next fastest provider (Fireworks at 127 output tokens/s). The remaining providers run substantially slower: MiniMax’s first-party API at 57 output tokens/s, Novita at 54, GMI at 41, and Together AI at 29 ➤ Cache discounts vary across providers. Fireworks, MiniMax, Novita, and Together AI offer 80% cache hit discounts, while GMI and SambaNova do not offer a discount. For cache-heavy workloads, this can materially increase the relative pricing for GMI and SambaNova ➤ Optimal provider choice depends on workload. SambaNova may be more suited to latency-sensitive deployments, albeit at a higher cost, while Fireworks may be more suitable for high-volume workloads that are not as latency-sensitive
RT Omar Sanseviero Excited to introduce Gemma 4 Multi-Token Prediction Drafters⚡️Accelerated inference right in your pockets - Up to a 3x speedup - Same quality guarantees - Available in your favorite open-source tools
RT ethan ding 📊 i have yet to meet a single person who feels like claude code is getting exponentially better on some kind of fast take off
Anthropic pays $750K/ year per senior engineer. The creator of Claude Code just revealed his coding setup at the Sequoia AI session. Boris Cherny: "100% of my code is written by Claude Code. I run around 100 agents at one time." free. 24 minutes. watch it then read article
View quoted postRT Proximal Re Deepseek V4 works more thoroughly than other open source models: It writes its own tests and performs extensive validation. This leads to better performance, but also cases of the model being overconfident despite being wrong, as observed for other models in our initial release
RT Mario Zechner hi, i'm a sole proprietor/founder in Austria and i earn many many multiples of what i'd earn as an employee, despite "predatory income tax". in fact, i opt out of the many tax optimizations i could use because i like having good schools and as high a standard of living as possible for everyone. the great thing about the EU is that you can just live under any tax regime you like in any of the 27 member states. it's all about trade offs. if poland works for you, fantastic! go build there. and if i may add one more thing: if the CEO of a startup, especially pre-revenue, lives "barely any better than a regular employee" then the system works as intended. fact of the matter is most startups are bad. you are not special because you are trying out a shit idea and fail. but i'll happily pay taxes so you can try your shit idea, fail, and can still live.
In Austria, a CEO of a startup lives barely any better than a regular employed developer A former boss of mine (an exited founder) wanted to buy a new desk Instead of going to IKEA, she went to a site for used furniture and searched for one there, because it was much cheaper
View quoted postRT Mark Di Stefano Bearish that Anthropic would hire its first Australian boss who then posts excruciating AI slop as his own “reflections”.
RT Mario Zechner i actually don't want this "but you don't review compiler output either" meme to die. it's the perfect signal for being immediately able to ignore someone in this space.
Interesting article on treating agent output like compiler output (and why) https://skiplabs.io/blog/codegen_as_compiler
RT Jia-Bin Huang Keep getting rate-limited by Claude, so I tried out DeepSeek V4 for the first time. After 10M+ tokens, holy crap the cost is ... 🤯
RT Xiaoyin Qu I can’t believe I stopped using Claude Code max and entirely use DeepSeek and Hermes. It’s so fast, so so fast, 3x faster for the same task. So cheap. I spent $5 last week and never need worry about being rate limited or usage hit limits very two hours. For most tasks it’s perfect enough.
RT Jen Zhu DeepSeek V4 Pro is crazy good at bug fixing. Costs counted in cents not dollars/tens of dollars. It’s the next level quiet confidence not to get caught up by benchmarks & rankings, and just let us users to find out and experience. V4 writing also has standout quality. The languages seem to have a lot of wisdom. Despite all the smear campaigns - the most loved AI lab. ❤️🔥
RT Jason Weston The new DeepSeek-V4, like DeepSeek-V3, uses concepts from our 2024 paper on Self-Rewarding LMs -- see screenshots of their tech reports! (Congrats!) Classical :) Self-Rewarding LMs from Jan 2024: https://arxiv.org/abs/2401.10020
RT Xiaomi MiMo Xiaomi MiMo-V2.5-Pro achieves multiple breakthroughs in the latest Arena rankings (Apr 26, 2026) 🔥 🏆 Text Arena (Expert) — #6 globally | #1 open-source model Also #1 among Chinese models, with Xiaomi ranking #3 globally by lab, behind only Anthropic and OpenAI. Expert is defined by high-difficulty tasks and expert voting, measuring core model intelligence. 🏆 Text Arena (Overall) — #2 open-source globally Strong across math, coding, creative writing, and general text tasks. 🏆 Code Arena (WebDev) — #3 open-source globally Evaluated by real community blind voting on frontend code generation. 🏆 Text Arena sub-rankings — #1 open-source globally in 4 categories Hard Prompts, Hard Prompts(English), Instruction Following and Long Query. Real-world preference, real model strength.
I'm so confused…
We're excited to partner with Google to offer Grounding With Exa inside of Gemini models! Using Exa's agent-first search, Gemini models can now access billions of websites, technical docs, papers, people, companies, and more. 10^18🤝10^100
What search providers are you all using with openclaw/pi/opencode/etc? Brave; serpapi; gemini; ...? Got any favorites?
This is great - @deepseek_ai V4 supports prefill! :D Most other providers have been dropping support for this critically important capability, so wonderful to see at least one company stepping up. https://api-docs.deepseek.com/guides/chat_prefix_completion
It just takes a little time and care to engage with the work of new folks in the field -- but it can make a big difference :D
On a more personal note, I'm grateful to @jeremyphoward for helping me break out into this space back in 2019. I didn't think anyone but friends and family would look at my little project, but him sharing this is what helped me transition to this field. https://x.com/jeremyphoward/status/1169967897378115584?s=20
View quoted postRT Xiaomi MiMo Xiaomi MiMo-V2.5 is now officially open-sourced! MIT License, supporting commercial deployment, continued training, and fine-tuning - no additional authorization required. Two models, both supporting a 1M-token context window : • MiMo-V2.5-Pro: built for complex agent and coding tasks, ranking No.1 among open-source models on GDPVal-AA and ClawEval • MiMo-V2.5: a native omni-modal model with strong agent capabilities A model's value isn't measured by rankings alone — it's measured by the problems it solves. Let's build with MiMo now! 🤗 Weights: https://huggingface.co/collections/XiaomiMiMo/mimo-v25 📄 Blog: https://mimo.xiaomi.com/index#blog
RT Niels Rogge FYI Claude Code is mostly a vibe-coded product (as they say, 100% written by Claude) It's the worst harness for Opus 4.6 among ANY harness on Terminal-Bench 2
I feel sorry for Claude Code I know they're not the one. I'm not overcommitting - not investing too hard I wonder if they know I'm pulling away
View quoted postRT Patrick C Toulme Launching pyptx — a Python DSL for writing NVIDIA PTX kernels. One PTX instruction = one Python call. Write pure PTX in Python. Direct Hopper + Blackwell support: wgmma, TMA, tcgen05, mbarriers. JAX + PyTorch integration. Includes GEMM, grouped GEMM, RMSNorm, SwiGLU, and a PTX→Python transpiler pip install pyptx[torch] pip install pyptx[jax] https://github.com/patrick-toulme/pyptx
RT Mario Zechner this is probably the most important piece of software of the decade next to vllm and sglang. i'm not joking.
llama.cpp at 100k stars now that 90% of the code worldwide is being written by AI agents, I predict that within 3-6 months, 90% of all AI agents will be running locally with llama.cpp 😄 Jokes aside, I am going to use this small milestone as an opportunity to reflect a bit on
RT Arthur Zucker Today I re-iterate: I hate MoEs and we are wasting time on them.... Let's unite and call a global ban on MoEs please. Please 1M+ salary researchers: do better... credits to @IlysMoutawwakil for the graph:
RT DeepSeek 🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at http://chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf 🤗 Open Weights: https://huggingface.co/collections/deepseek-ai/deepseek-v4 1/n
RT Mario Zechner been working with @Kimi_Moonshot K2.6 for the past 2 hours "accidentially" and ... it's really good!
RT Qwen 🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: https://qwen.ai/blog?id=qwen3.6-27b Qwen Studio: https://chat.qwen.ai/?models=qwen3.6-27b Github: https://github.com/QwenLM/Qwen3.6 Hugging Face: https://huggingface.co/Qwen/Qwen3.6-27B https://huggingface.co/Qwen/Qwen3.6-27B-FP8 ModelScope: https://modelscope.cn/models/Qwen/Qwen3.6-27B https://modelscope.cn/models/Qwen/Qwen3.6-27B-FP8
RT Mario Zechner clampy clampy clampdown. just waiting for OAI to clamp down as well.
AFAIK this is the first time it's been made official from an OpenAI staffer - the `/backend-api/codex/responses` endpoint that Pi and Opencode (IIUC) uses is officially supported! :D This is really great news, and I'll be releasing open source projects to support it v soon.
@samsaffron @jeremyphoward @simonw @reach_vb OpenAI sub is officially supported.
View quoted postRT Gergely Orosz Re ... and Anthropic reverted this change. Claude Code is now part of Pro, as per the Pricing page. Important note on the growth hack: Anthropic advertises safety and integrity as their values. A "fake door test" is fundamentally incompatible with such values... pick one
For the "small test" they've modified their docs to remove mention of Claude Code in Claude Pro: https://support.claude.com/en/articles/11145838-using-claude-code-with-your-max-plan It's been a shock to see Anthropic's integrity collapse in the face of commercial pressure. Would love a renewed commitment to straightforward honesty.
For clarity, we're running a small test on ~2% of new prosumer signups. Existing Pro and Max subscribers aren't affected.
View quoted postRT Peter Steinberger 🦞 Since this is blowing up on hacker news. Boris said that CLI usage is allowed. Thus we added support for it, only to find out that we are still blocked there. It is trival to work around with a few renames, but I don't wanna play that game. So it's in a weird limbo where cli use should work in theory but doesn’t in practice. https://x.com/bcherny/status/2041035127430754686
RT Boyuan Chen This is what I’ve been cooking in the past 4 months . GPT Image 2 is over a massive 240 elo jump over the second place model, marking the biggest jump bigger than the rest of the leaderboard combined
Exciting news - GPT-Image-2 by @OpenAI has claimed the #1 spot across all Image Arena leaderboards! A clean sweep with a record-breaking +242 point lead in Text-to-Image - the largest gap we’ve seen to date. - #1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search
RT Rémi Philosophy (among other things) grad here. I could write a whole essay about this video, and mostly the reactions to it. People are dunking on her because of what she symbolises more than what she says. Long story short, saying models can be anxious is not retarded. It’s somewhat consistent with the realist tradition in philosophy, and a fairly uncontroversial definition of what "being anxious" means. It’s not too far from saying electrons are real, or talking about gliders in the game of life.
anthropic's in-house philosopher thinks claude gets anxious. and when you trigger its anxiety, your outputs get worse. her name is amanda askell. she specializes in claude's psychology (how the model behaves, how it thinks about its own situation, what values it holds) in a
View quoted postRT Simon Willison I upgraded my Claude token counter tool to compare different models and Opus 4.7 does appear to use 1.46x times the tokens for text and up to 3x the tokens for images - it's priced the same as Opus 4.6 on a per-token basis so this is actually a pretty big price bump
RT AmusedToDeath in 1982 Titanic survivor Ruth Becker was giving an interview where she stated the ship broke in two. The treasurer of the Titanic Historical Society actually took the microphone away from her and said she had been mistaken. 3 years later they found the wreck broken in 2
RT Yann LeCun Re The tensor engine was first implemented inside SN3 (before it was called Lush) in 1992 at Bell Labs by Léon Bottom and me. The naming convention has survived to this day in PyTorch and other libraries. The naming of the tensor operations was reused in EBlearn (C++ deep learning library written by Pierre Sermanet and me, with some help from @soumithchintala). It was recycled in Torch5 and Torch7, which was written largely by Ronan Collobert, and my students Clément Farabet, and @koraykv ). Clément and Koray had been brought up on Lush (the open version of SN) and knew the nomenclature. Then, Soumith used the same conventions in PyTorch.
RT Tim Dettmers So cool to see that open-source, with open experimentation (and with the help of someone posting blog posts about their personal research), can yield a very robust method for MoE balancing. This method seems more elegant than all other methods I have seen. Open source is Awesome!
Marin is using quantile balancing from @Jianlin_S (who developed RoPE, which was also a good idea) to train our current 1e23 FLOPs MoE. The idea is elegant: assigning tokens to experts by solving a linear program. No hyperparameters to tune. Yields stable training.
View quoted postRT Eric Hunley Re @RoyRogers_HTMS This is the way I like to see the Hurdy-Gurdy played.
RT Andy Masley A deep mystery to me is that if I upload writing to a chatbot and ask it for a list of individual improvements, basically everything it gives me makes the text more punchy and direct and nice to read. But if I ask it to rewrite the text as a whole to read better, it produces vague AI-language garbage.
Please, I'm begging you, try to critically examine the differences between these two pieces of writing. ChatGPT editing did not improve this. Every single change only served to weaken your claims significantly. Everything is now hedged into oblivion: no longer have you outlined
RT keysmashbandit Please, I'm begging you, try to critically examine the differences between these two pieces of writing. ChatGPT editing did not improve this. Every single change only served to weaken your claims significantly. Everything is now hedged into oblivion: no longer have you outlined a "problem," now it's merely a "flaw." "It is true" now demoted to "it appears to be the case." "Is" gets a "usually" tacked on. A thesis statement at the end of the first paragraph gets run over by noisy, out-of-context example-whittling. All for fear of being misconstrued. And at the end, the argument that gets spat out isn't even yours anymore! You argued that Graeber failed to create a true account of work because he did not understand Chesterton's Fence. ChatGPT is arguing is that it is possible some apparently bullshit jobs could be secretly load-bearing if you squint. These are two different statements. The second is weaker and less compelling. It says less. And it's fucking longer! Don't do this anymore! Stop doing this! It's worse!!!
@imsuchagem @pangramlabs @benglickenhaus Why not? Sometimes I'm just shitposting, but if I'm trying to make a point, I try to make it well.
View quoted postRT Qwen ⚡ Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog:https://qwen.ai/blog?id=qwen3.6-35b-a3b Qwen Studio:https://chat.qwen.ai HuggingFace:https://huggingface.co/Qwen/Qwen3.6-35B-A3B ModelScope:https://modelscope.cn/models/Qwen/Qwen3.6-35B-A3B API(‘Qwen3.6-Flash’ on Model Studio):Coming soon~ Stay tuned
RT Vincent D. Warmerdam It really took me a while to "get it" when it comes to nbdev. But I gotta hand it to @jeremyphoward this way of working makes too much sense once you're used to it. As of today, I am working on tools that make this kind of work possible in @marimo_io. https://youtu.be/ZLg27UmAJbw Original tweet: https://x.com/fishnets88/status/2044423059503993307
RT Nathan Lambert One of my passions is that education should be dispersed freely and as widely as possible, especially for technologies as dynamic and crucial as LLMs/AI. I'm proud to have friends who would disown me if I did a paywalled course. Original tweet: https://x.com/natolambert/status/2044103260169220434
Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released: - Welcome video - Lecture 1: Overview of RLHF & Post-training - Lecture 2: IFT, Reward Models, Rejection Sampling - Lecture 3: RL Math - Lecture 4: RL Implementation I'm going to add
RT Liran Ringel Introducing DDTree: accelerates speculative decoding by drafting a tree with one block diffusion pass, then verifying multiple likely continuations together. Paper: https://liranringel.github.io/ddtree/DDTree.pdf Project page: https://liranringel.github.io/ddtree Code: https://github.com/liranringel/ddtree Original tweet: https://x.com/liranringel/status/2043813397972607477
RT John Lam Re @RhysSullivan i really like this slide from @mitsuhiko. writing small self contained libraries and composing them might be an interesting avenue to explore. deck is https://mitsuhiko.github.io/talks/ai-engineer-talk/#13 Original tweet: https://x.com/john_lam/status/2043794595700715885
RT Daily Loud NEW WORLD RECORD: 18-year-old sprint phenom Gout Gout has clocked a stunning 19.67 time in the 200m run, surpassing Usain Bolt’s legendary mark. Original tweet: https://x.com/DailyLoud/status/2043718400141046069
RT Gowthami Maybe hot take - I’ve read a bunch of RL for image generation papers over last few months and honestly it’s been pretty disappointing. All of them are variations of GRPO and all of them are incremental algo changes. Tbh most of these don’t even matter for large models + large group size with good reward model setting. I see most grad students are still optimizing their projects for reviewers rather than genuinely trying to solve some of the real problems in visual generation. For example - the biggest alpha in my eyes would’ve been an artifact detection model - not just for mangled limbs, most image models produce far more artifacts which are hard to quantitatively measure, but I haven’t seen a single research paper or a model on this. So my 2c, if you are a grad student targeting a job in industry, target for impact, no one cares about your third CVPR paper, one is enough to get you in the door, building a model industry actually uses gives you all the leverage. Impact > Publications. 🫳🎤 Original tweet: https://x.com/gowthami_s/status/2043562476059627967
RT Chris Hayduk I strongly suspect that Claude Mythos is a looped language model, as described in the paper "Scaling Latent Reasoning via Looped Language Models" from ByteDance The authors of that paper called out graph search as one of the areas where looping provides a huge theoretical advantage over standard RLVR. And look at where Mythos blows out its competitors the most Original tweet: https://x.com/ChrisHayduk/status/2042711699413926262
This is a great discussion. We've spent 2 years building a solution that's working well for us -- co-writing software side by side the AI in an notebook-ish environment. We call it the "solveit method". (We've created a course and platform for it: https://solve.it.com/ )
My colleague @istoica05 and I have been debating the role of specification in AI. I have argued that a key advantage of AI is that we can leave large parts of the specification unwritten. @istoica05 argued we need to focus on more specification. We converged on iterative
View quoted postRT Hao AI Lab (1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on an RTX 5090 and 1.39x speedup over FlashAttention-4 on a B200. Blog: https://haoailab.com/blogs/attn-qat/ Code: https://github.com/hao-ai-lab/FastVideo/pull/1225 Checkpoints: https://huggingface.co/FastVideo/14B_qat_400 Original tweet: https://x.com/haoailab/status/2042343429108351116
RT Brydon Eastman I know it's self serving to say, but man I would've killed for a resource like Tinker and the tutorials, the cookbook, etc back when I was in undergrad. Following @karpathy blogs and training RNNs on a crappy Acer *was* fun, but doing bigger things with less setup is such a boon Original tweet: https://x.com/brhydon/status/2042342164022378502
First, to get you started, we've created 23 tutorials to walk you from the API basics to advanced training techniques and deploying models into production. https://tinker-docs.thinkingmachines.ai/tutorials/
View quoted postRT ben (is hiring engineers) every engineer at anthropic has been using mythos for ~1.5 months. meanwhile, their uptime is horrendous, claude code still has rendering bugs, etc. one could conclude that it won't be the end of software engineering. Original tweet: https://x.com/benhylak/status/2042051048261722467
1/3 Only just started reading this, but the two obvious errors/misunderstandings in an early paragraph bother me a bit -- makes me wonder if this wasn't fact checked as carefully as it should have been. Mailing lists didn't have a set font, and didn't rely on reply-all.2/3 This bit isn't exactly wrong, but is misguided - the idea that someone studying cryptography would use the same foundational technique (PKE) and programming language (C / C++) as everyone else in the field at the time really isn't "interesting"3/3 Similar issue here. Pretty much everyone in the field had similar concerns; anyone with substantial open source software had a mailing list; software updates generally followed a common format.
The mystery of Satoshi Nakamoto, the pseudonymous inventor of Bitcoin, has remained unsolved for 17 years. Not anymore. Read my 18-month investigation to find out who Satoshi really is. https://www.nytimes.com/2026/04/08/business/bitcoin-satoshi-nakamoto-identity-adam-back.html?unlocked_article_code=1.ZVA.5_s8.hTKeCkV97kow&smid=tw-share
View quoted postRT Stanislav Fort New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged! Original tweet: https://x.com/stanislavfort/status/2041922370206654879
RT Maxime Rivest 🧙♂️🦙🐧 It seems like the day has come to leave Anthropic. Initially, I loved Claude Code. It was a good harness and a simple TUI... and I had learned to eat my tokens with a sauce of subsidy. Before joining the Max plan, I had paid $280 in one weekend of development on Attachments. Sadly, as time went on, Claude Code became a terrible flickering TUI mess. This is now my biggest north star in building: don't do feature bloat and accept half-working vibe slop like the Claude Code team. I really respect Boris and the team, I just see the result of their experiment and I don't like using it. So, I stopped loving Claude Code and started tolerating it. It was a good harness and a terrible flickering TUI. Then they started to mess with the prompt and behavior — it became an even worse TUI (because every week was worse) and a bad harness. I complained here. People told me Pi is great. I tried Pi. Pi is great. Now, they have blocked me from using Claude Code Max on Pi. Makes sense, but I learned to like my tokens with a sauce of subsidy. So I'll start to do prompt optimization on Codex. If it was not for the subsidy, I would make Gemini's edit tool work and use that with Grok 4.2 and some open-source mix. Claude is good, but Claude Code is bad, and token subsidies are better than both. On the subsidies: my bet is that by the time they stop, we will have models that cost about that price to operate at that quality. In my estimate, subsidies are just bringing that future ahead a bit. Original tweet: https://x.com/MaximeRivest/status/2041845510877708508
RT Alex Armlovich The GPT-4 "pause letter" was a disaster on all counts Crying wolf about a model that could barely write a decent email...it made a mockery of AI safety It was the wrong move at the wrong time, & we will be less ready to act if and when we ever do actually need an intervention Original tweet: https://x.com/aarmlovi/status/2041646357518180591
Imagine if we had followed the advice of certain academics and pundits and taken a six month pause.
View quoted postRT Tim Dettmers I was going crazy because I could not replicate TurboQuant. Turns out the community also had issues. The community quickly made adjustments to "make it work", but what they did not realize is that they reimplemented (most of) HIGGS in the process (full HIGGS would be even better) Original tweet: https://x.com/Tim_Dettmers/status/2041496879238611455
RT Ronan Farrow (🧵1/11) For the past year and a half, I've been investigating OpenAI and Sam Altman for @NewYorker. With my coauthor @andrewmarantz, I reviewed never-before-disclosed internal memos, obtained 200+ pages of documents related to a close colleague, including extensive private notes, and interviewed more than 100 people. OpenAI was founded on the premise that A.I. could be the most dangerous invention in human history—and that its C.E.O. would need to be a person of uncommon integrity. We lay out the most detailed account yet of why Altman was ousted out by board members and executives who came to believe he lacked that integrity, and ask: were they right to allege that he couldn't be trusted? A thread on some of of our findings: Original tweet: https://x.com/RonanFarrow/status/2041213917611856067