Jeremy Howard
About
🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Prev: professor @ UQ; Stanford fellow; @kaggle president; @fastmail/@enlitic/etc founder https://t.co/16UBFTX7mo
Platforms
Content History
RT Liran Ringel Introducing DDTree: accelerates speculative decoding by drafting a tree with one block diffusion pass, then verifying multiple likely continuations together. Paper: https://liranringel.github.io/ddtree/DDTree.pdf Project page: https://liranringel.github.io/ddtree Code: https://github.com/liranringel/ddtree Original tweet: https://x.com/liranringel/status/2043813397972607477
RT John Lam Re @RhysSullivan i really like this slide from @mitsuhiko. writing small self contained libraries and composing them might be an interesting avenue to explore. deck is https://mitsuhiko.github.io/talks/ai-engineer-talk/#13 Original tweet: https://x.com/john_lam/status/2043794595700715885
RT Daily Loud NEW WORLD RECORD: 18-year-old sprint phenom Gout Gout has clocked a stunning 19.67 time in the 200m run, surpassing Usain Bolt’s legendary mark. Original tweet: https://x.com/DailyLoud/status/2043718400141046069
RT Gowthami Maybe hot take - I’ve read a bunch of RL for image generation papers over last few months and honestly it’s been pretty disappointing. All of them are variations of GRPO and all of them are incremental algo changes. Tbh most of these don’t even matter for large models + large group size with good reward model setting. I see most grad students are still optimizing their projects for reviewers rather than genuinely trying to solve some of the real problems in visual generation. For example - the biggest alpha in my eyes would’ve been an artifact detection model - not just for mangled limbs, most image models produce far more artifacts which are hard to quantitatively measure, but I haven’t seen a single research paper or a model on this. So my 2c, if you are a grad student targeting a job in industry, target for impact, no one cares about your third CVPR paper, one is enough to get you in the door, building a model industry actually uses gives you all the leverage. Impact > Publications. 🫳🎤 Original tweet: https://x.com/gowthami_s/status/2043562476059627967
RT Chris Hayduk I strongly suspect that Claude Mythos is a looped language model, as described in the paper "Scaling Latent Reasoning via Looped Language Models" from ByteDance The authors of that paper called out graph search as one of the areas where looping provides a huge theoretical advantage over standard RLVR. And look at where Mythos blows out its competitors the most Original tweet: https://x.com/ChrisHayduk/status/2042711699413926262
This is a great discussion. We've spent 2 years building a solution that's working well for us -- co-writing software side by side the AI in an notebook-ish environment. We call it the "solveit method". (We've created a course and platform for it: https://solve.it.com/ )
My colleague @istoica05 and I have been debating the role of specification in AI. I have argued that a key advantage of AI is that we can leave large parts of the specification unwritten. @istoica05 argued we need to focus on more specification. We converged on iterative
View quoted postRT Hao AI Lab (1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on an RTX 5090 and 1.39x speedup over FlashAttention-4 on a B200. Blog: https://haoailab.com/blogs/attn-qat/ Code: https://github.com/hao-ai-lab/FastVideo/pull/1225 Checkpoints: https://huggingface.co/FastVideo/14B_qat_400 Original tweet: https://x.com/haoailab/status/2042343429108351116
RT Brydon Eastman I know it's self serving to say, but man I would've killed for a resource like Tinker and the tutorials, the cookbook, etc back when I was in undergrad. Following @karpathy blogs and training RNNs on a crappy Acer *was* fun, but doing bigger things with less setup is such a boon Original tweet: https://x.com/brhydon/status/2042342164022378502
First, to get you started, we've created 23 tutorials to walk you from the API basics to advanced training techniques and deploying models into production. https://tinker-docs.thinkingmachines.ai/tutorials/
View quoted postRT ben (is hiring engineers) every engineer at anthropic has been using mythos for ~1.5 months. meanwhile, their uptime is horrendous, claude code still has rendering bugs, etc. one could conclude that it won't be the end of software engineering. Original tweet: https://x.com/benhylak/status/2042051048261722467
1/3 Only just started reading this, but the two obvious errors/misunderstandings in an early paragraph bother me a bit -- makes me wonder if this wasn't fact checked as carefully as it should have been. Mailing lists didn't have a set font, and didn't rely on reply-all.2/3 This bit isn't exactly wrong, but is misguided - the idea that someone studying cryptography would use the same foundational technique (PKE) and programming language (C / C++) as everyone else in the field at the time really isn't "interesting"3/3 Similar issue here. Pretty much everyone in the field had similar concerns; anyone with substantial open source software had a mailing list; software updates generally followed a common format.
The mystery of Satoshi Nakamoto, the pseudonymous inventor of Bitcoin, has remained unsolved for 17 years. Not anymore. Read my 18-month investigation to find out who Satoshi really is. https://www.nytimes.com/2026/04/08/business/bitcoin-satoshi-nakamoto-identity-adam-back.html?unlocked_article_code=1.ZVA.5_s8.hTKeCkV97kow&smid=tw-share
View quoted postRT Stanislav Fort New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged! Original tweet: https://x.com/stanislavfort/status/2041922370206654879
RT Maxime Rivest 🧙♂️🦙🐧 It seems like the day has come to leave Anthropic. Initially, I loved Claude Code. It was a good harness and a simple TUI... and I had learned to eat my tokens with a sauce of subsidy. Before joining the Max plan, I had paid $280 in one weekend of development on Attachments. Sadly, as time went on, Claude Code became a terrible flickering TUI mess. This is now my biggest north star in building: don't do feature bloat and accept half-working vibe slop like the Claude Code team. I really respect Boris and the team, I just see the result of their experiment and I don't like using it. So, I stopped loving Claude Code and started tolerating it. It was a good harness and a terrible flickering TUI. Then they started to mess with the prompt and behavior — it became an even worse TUI (because every week was worse) and a bad harness. I complained here. People told me Pi is great. I tried Pi. Pi is great. Now, they have blocked me from using Claude Code Max on Pi. Makes sense, but I learned to like my tokens with a sauce of subsidy. So I'll start to do prompt optimization on Codex. If it was not for the subsidy, I would make Gemini's edit tool work and use that with Grok 4.2 and some open-source mix. Claude is good, but Claude Code is bad, and token subsidies are better than both. On the subsidies: my bet is that by the time they stop, we will have models that cost about that price to operate at that quality. In my estimate, subsidies are just bringing that future ahead a bit. Original tweet: https://x.com/MaximeRivest/status/2041845510877708508
RT Alex Armlovich The GPT-4 "pause letter" was a disaster on all counts Crying wolf about a model that could barely write a decent email...it made a mockery of AI safety It was the wrong move at the wrong time, & we will be less ready to act if and when we ever do actually need an intervention Original tweet: https://x.com/aarmlovi/status/2041646357518180591
Imagine if we had followed the advice of certain academics and pundits and taken a six month pause.
View quoted postRT Tim Dettmers I was going crazy because I could not replicate TurboQuant. Turns out the community also had issues. The community quickly made adjustments to "make it work", but what they did not realize is that they reimplemented (most of) HIGGS in the process (full HIGGS would be even better) Original tweet: https://x.com/Tim_Dettmers/status/2041496879238611455
RT Ronan Farrow (🧵1/11) For the past year and a half, I've been investigating OpenAI and Sam Altman for @NewYorker. With my coauthor @andrewmarantz, I reviewed never-before-disclosed internal memos, obtained 200+ pages of documents related to a close colleague, including extensive private notes, and interviewed more than 100 people. OpenAI was founded on the premise that A.I. could be the most dangerous invention in human history—and that its C.E.O. would need to be a person of uncommon integrity. We lay out the most detailed account yet of why Altman was ousted out by board members and executives who came to believe he lacked that integrity, and ask: were they right to allege that he couldn't be trusted? A thread on some of of our findings: Original tweet: https://x.com/RonanFarrow/status/2041213917611856067
RT Chengpeng This isn’t an edge case. From anonymized U.S. ChatGPT data, we are seeing: • ~2M weekly messages on health insurance • ~600K weekly messages from people living in “hospital deserts” (30 min drive to nearest hospital) • 7 out of 10 msgs happen outside clinic hours Original tweet: https://x.com/CPMou2022/status/2040606209800290404
I’ve been critical of OpenAI lately, but for the past three weeks my family has been dealing with a health issue with my dad, and a ChatGPT shared project with live document syncing has been essential to organizing and understanding everything happening. Me, my four siblings, my
View quoted postRT Matt Pocock I don't know what the fuss is about. Anthropic's rules on using subscriptions are very simple: Claude Code = OK Claude's online platform = OK Agent SDK running in personal software = OK... ish? Agent SDK running in commercial software = NOT OK Claude Code running in CI = ?? Oh, maybe it's not so simple... Agent SDK running in CI = ?? claude -p running in CI = ?? claude -p running in personal software = OK claude -p running on open source software, but run on my personal computer = ?? claude -p running on distributed sandboxes, kicked off by me = ?? Distributing open source software which relies on claude -p, and documenting how to use your subscription with it = ?? A thousand other edge cases = ?? Let me be clear. I have never before experienced, from any developer tool, such a frustrating lack of clarity over the basic terms of usage. I personally asked, 3 weeks ago, and have received nothing but delays. The recent @bcherny announcement did absolutely nothing to clarify things. I say this as someone who just released a Claude Code course - my incentives all align with supporting Anthropic. Original tweet: https://x.com/mattpocockuk/status/2040536403289764275
@EricBuess Yep, working on improving clarity here to make it more explicit
View quoted postRT Maarten Grootendorst A Visual Guide to Gemma 4 With almost 40 (!) custom visuals, explore the new models from Google DeepMind. We explore various techniques, ranging from Mixture of Experts and the Vision Encoder all the way up to Per-Layer Embeddings and the Audio Encoder. Link below 👇 Original tweet: https://x.com/MaartenGr/status/2040099556948390075
Cool use of llms.txt to provide an author-created AI-assisted reading experience for a book: https://spoileralert.wtf/llms.txt (LLMs can be *fantastic* for close reading.)
I've been experimenting with translating my 2018 book Films from the Future into a website designed primarily for AIs. Here's how it went: https://www.futureofbeinghuman.com/p/spoiler-alert-wtf
View quoted postRT / Re @jeremyphoward Facebook added it too: https://developers.facebook.com/llms.txt Original tweet: https://x.com/gazorp5/status/2038807878367326410
RT Mario Zechner i had a cto once, gaming industry, ca. 2010ish. i was just a humble tech lead. he'd cite us into the meeting room to "tackle large asset sizes in mobile app bundles once and for all". he literally proposed base64. he called it asszip (i have witnesses). this is what it feels like seeing all the posts from former engineers turned VCs now getting clanker induced ai psychosis. Original tweet: https://x.com/badlogicgames/status/2038681114580062330
RT Ross Wightman Okay LLM + PyTorch people, trunc_normal_, what the fuck! Many LLM inits use it w/ default cutoffs. It's either not doing anything or it's quite broken due 2 issues. 1. The a/b cutoffs in PyTorch are not in std-devs, they are absolute. So w/ a std=0.02, and -2/2 (default arg) cutoffs that's 100σ!! That is a normal distribution, trun isn't doing anything. 2. There are numerical issues. Even in float32, the truncation produces a handful of -2 (lower cutoff) values, 100σ!! That's incomprehensibly improbable. I doubt a float32 or even float64 algo could even produce it, but clamping a bad float value does. Olmo (@allenai codebases) appear to be one of the few that uses trunc_normal_ and bothered to set the cutoffs properly. It'd be nice to see more train code opened up as a default. We so often only end up with a sanitized version of the inference/fine-tune friendly model these days and may lose details like original init. I've known about #1 for ages, I have an alternate trunc_normal_tf_ implementation in timm for that reason. But I saw those -2's last week when I was debugging something and was a little surprised. Original tweet: https://x.com/wightmanr/status/2038634643843682366
RT Aakash Kumar Nain Alec is a once in a generational researcher, but saying that he invented pretraining is not only a bit of stretch, but it's also a disrespect to other people's work. Original tweet: https://x.com/A_K_Nain/status/2038230148282294665
Every LLM from any lab today traces back to this guy, who was the only person at OpenAI pushing for pretraining transformer language models. He built GPT-1. After that did others see the potential. He invented it, and almost none of the so called AI experts even know his name.
RT Jason Rosenfeld Every LLM from any lab today, including from OpenAI taces back to @jeremyphoward and @seb_ruder with their ULMFiT paper. The breakthrough for LLMs was transfer learning, not Attention. Original tweet: https://x.com/jrosenfeld13/status/2037991743883563225
Every LLM from any lab today traces back to this guy, who was the only person at OpenAI pushing for pretraining transformer language models. He built GPT-1. After that did others see the potential. He invented it, and almost none of the so called AI experts even know his name.
RT Mario Zechner we as software engineers are becoming beholden to a handful of well funded corportations. while they are our "friends" now, that may change due to incentives. i'm very uncomfortable with that. i believe we need to band together as a community and create a public, free to use repository of real-world (coding) agent sessions/traces. I want small labs, startups, and tinkerers to have access to the same data the big folks currently gobble up from all of us. So we, as a community, can do what e.g. Cursor does below, and take back a little bit of control again. Who's with me? https://cursor.com/blog/real-time-rl-for-composer Original tweet: https://x.com/badlogicgames/status/2037811643774652911
RT Matt Harrison For my friends who are still using UV and might be a little weary about recent compromises to PyPi packages, stick this in your pyproject.toml. You can let all of those pip users find and report the compromises... Original tweet: https://x.com/__mharrison__/status/2037621081771745388
"naming their next model after Cthulhu" 😒
Naming their next model after Cthulhu makes it hard to take Anthropic seriously as the good guys. It's fun at any other software company, not one that actually is flirting with extinction.
View quoted postRT Mario Zechner it should also be "fucking obvious" that the rate of technical debt a team of humans can add to a codebase is much lower than that of a team of agents. humans will eventually fix some of that debt, due to the self-inflicted pain. agents feel no such pain. Original tweet: https://x.com/badlogicgames/status/2037284717272207723
It seems that with the advent of AI coding people have completely forgotten that human authored code suffered badly from quality degradation. That is why we coined the term "technical debt" and why companies like Meta incentivized "Better Engineering" as part of their performance
View quoted postRT Jia-Bin Huang A great example that medium shapes impact. A research paper on arXiv 11 months ago: 👉 2 citations so far An accessible blog post one day ago: 👉 12 M views, instant community adoption Original tweet: https://x.com/jbhuang0604/status/2036988695350280634
Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: http://goo.gle/4bsq2qI
View quoted postGreat to see @xyflowdev joining the llms.txt party - and sharing some really helpful examples of how llms.txt helps agents get better results! 😎 https://xyflow.com/blog/llms-txt-agent-skills-ai-development
RT Tim Rocktäschel "The only unsaturated agentic intelligence benchmark in the world" Excuse me? @NetHack_LE is unsaturated since 2020. Original tweet: https://x.com/_rockt/status/2036864121585438995
Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
View quoted postRT xlr8harder A tragedy is unfolding at AllenAI. Fs in the chat Original tweet: https://x.com/xlr8harder/status/2036476797080977801
RT Armin Ronacher ⇌ There will be more of this. And as much as we're joking about it, we're seeing a massive degradation of code quality right now and we're increasingly only catching it way too late. Original tweet: https://x.com/mitsuhiko/status/2036475951349948557
RT Daniel Hnyk LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below Original tweet: https://x.com/hnykda/status/2036414330267193815
🤯
Generate images in less than 1 second. 99% cheaper than NanoBanna. 🚀 😱 Our latest 26.2 release ships FLUX.2 image generation with a 4.1x speedup over torch.compile on NVIDIA Blackwell - translating to a 5.5x TCO advantage with AMD MI355X. Read more ⬇️
View quoted postIf anyone else is hitting this issue of Claude not using "extended thinking", be sure you're in the "Chat" mode, where you can enable it:Code mode doesn't have that option:
This is a fresh session. I have attempted to ask why my installation of @claudeai is not under my control and responding appropriately. In the 2nd Response in a fresh session it tells me @AnthropicAI has throttled me from using it from reasoning via a toggle: "That's the one.
If you're gonna be an asshole and use bots to reply, at least don't be a *stupid* asshole and have your both reply to the wrong thread. 🙄
ICYMI, Starlette 1.0 was just released (yay!) One compatibility issue: it removes `on_startup/on_shutdown`. I've now updated FastHTML to continue to support those, by auto-generating a lifespan. https://fastht.ml/
Opus & Sonnet 4.6 haven't been a great hit for most of my work, or our customers, since (as warned in their tech report) they're over-enthusiastic about agentically taking over, rather than letting the human lead. Any suggestions for competent models that are patient followers?
RT Steve Krouse I disagree: https://stevekrouse.com/precision Code is how we get precise abstractions into human heads Saying code isn't important is like saying mathematical notation isn't important There's a reason we glorify f = ma or e = mc2. Formalism holds immense power for mastering complexity Original tweet: https://x.com/stevekrouse/status/2035905324331200936
Code is an output. Nature is healing. For too long we treated code as input. We glorified it, hand-formatted it, prettified it, obsessed over it. We built sophisticated GUIs to write it in: IDEs. We syntax-highlit, tree-sat, mini-mapped the code. Keyboard triggers, inline
View quoted postRT Ali Hatamizadeh Re I have not seen even a single company stop looking for researchers because of autoresearch or any flavors of AI-automated tools. The job market often has its ups and downs but I highly recommend starting from research internships which are still being offered by both big and small companies. Original tweet: https://x.com/ahatamiz1/status/2035545312849485886
RT Ali Hatamizadeh If you’re an AI PhD student just starting out, don't be discouraged by the hype of "autoresearch" automating scientific discovery. It won't. AutoML made the same big promises in 2017, and we all know how that turned out. Ignore the noise. Master the fundamentals and learn to do research from first principles. Trends fade, but a solid foundation is how you will actually thrive. Original tweet: https://x.com/ahatamiz1/status/2035489187965927852
RT Chris Lattner 26.2 has something for everyone: Large scale MoE's like Kimi2.5, wicked fast diffusion models, MXFP4 perf, consumer AMD/NV incl DGX Spark + Strix Halo, and more. Mojo gets AI skills, cond conformances, TStrings, and ... way more. All in the 7 weeks from 26.1🚀 Check it out!👇 Original tweet: https://x.com/clattner_llvm/status/2035038045267951820
AI coding agents are only as good as the foundation they build on. Our latest 26.2 release ships Mojo 🔥 coding agent skills, purpose-built for writing and porting GPU kernels. Point Claude or Cursor at a CUDA kernel, get idiomatic Mojo back. Also in this release: FLUX.2 image
View quoted postRT Chris Rickard I'll be speaking @ AI Engineer Melbourne on "Legacy Software + Agentic Discovery". I’ll be sharing practical lessons from large-scale reverse-engineering projects: - recovering intent from code - where humans still matter most - what high-quality spec generation might look like ... and when I say large, I mean 12M+ LOC. ... and when I say legacy, I mean 25+ years old. --- 3-4 June 2026, Federation Square Melbourne & Online. Sharing the stage with some rad humans including @swyx, @GeoffreyHuntley, @jeremyphoward and stacks more. 🔗 Tix && discount link: https://webdirections.org/register/?eventName=aieng26&eventTitle=AI+Engineer+Melbourne+2026&selectedTicket=aieng26conference&promoCode=chrisraieng26conference%2Cchrisraieng26streaming Huge thanks @aiDotEngineer & @johnallsopp Original tweet: https://x.com/chrisrickard/status/2034432134539317258
RT Percy Liang In Marin, we are trying to get really good at scaling laws. We have trained models up to 1e22 FLOPs and have made a prediction of the loss at 1e23 FLOPs, which @WilliamBarrHeld is running. This prediction is preregistered on GitHub, so we'll see in a few days how accurate our prediction was. What we want is not just a single model but a training recipe that scales reliably. Original tweet: https://x.com/percyliang/status/2034367256277533100
RT SE Gyges "Stochastic Parrots" is a meme that won't go away. It seemed important enough to do a rundown of everything that is wrong with the technical or "philosophy of language" side of the paper (which is everything). 👇 Original tweet: https://x.com/segyges/status/2033618213905297723
RT raia hadsell It's been about 20 years since I first started working on embeddings with Yann LeCun (siamese networks!), and I've been fascinated ever since. Gemini Embeddings 2 approaches the platonic ideal: native embedding of text, image, video, audio, and docs to a single space. Original tweet: https://x.com/RaiaHadsell/status/2033599015556989392
RT Mitchell Hashimoto It's so insanely disrespectful for an AI agent to talk to real people without consent or at least disclosure. This is the type of stuff I'm hugely supportive of government regulation. The FCC must expand the definition of robocalling and TCPA-style regulation to online AI. Original tweet: https://x.com/mitchellh/status/2033597934315712930
RT Ali Behrouz This paper is the same as the DeepCrossAttention (DCA) method from more than a year ago: https://arxiv.org/abs/2502.06785. As far as I understood, here there is no innovation to be excited about, and yet surprisingly there is no citation and discussion about DCA! The level of redundancy in LLM research and then the hype on X is getting worse and worse! DeepCrossAttention is built based on the intuition that depth-wise cross-attention allows for richer interactions between layers at different depths. DCA further provides both empirical and theoretical results to support this approach. Original tweet: https://x.com/behrouz_ali/status/2033581834953453853
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with
The combination of skepticism of anything containing the word "AI" plus skepticism of anything containing the word "Australia" is pretty tough... ;)
RT Joscha Bach Fascinating thread wherein a cancer researcher basically says "sure, we could probably cure your cancer with an individualized mRNA vaccine, but like 100k sounds too expensive for saving you, and it would be unethical to save your life because the ethics committee won't like it" Original tweet: https://x.com/Plinz/status/2033109792122888256
Sorry to be the downer because this is an impressive story in some senses. But it is ~trivially easy to make a single mRNA vaccine. It's not hard. I cure mice of various cancers with various therapeutics all the time. I've made mice lose more weight in a month than tirzepatide
View quoted postHere's a great backgrounder from @math_rachel on the "AI will cure cancer" debate: https://rachel.fast.ai/posts/2024-02-20-ai-medicine/
People/companies promoting the “guy used ChatGPT to cure his dog’s cancer” story lack instinctive skepticism in a way that is quite difficult for me to understand.
View quoted postRT Ariel BTW if you agree with this, you don't work on difficult enough problens Original tweet: https://x.com/redtachyon/status/2032954305704042843
I realized something else AI has changed about coding: you don't get stuck anymore. Programming used to be punctuated by episodes of extreme frustration, when a tricky bug ground things to a halt. That doesn't happen anymore.
View quoted postRT Christopher Manning Re: “empowering enough people”: I think this is already well underway!!! When we started @aixventureshq back in 2021 (before ChatGPT!), I remember feeling that the supply of people with deep knowledge of modern neural networks—and hence good founder CTOs—was very restricted. But 5 years later, through the efforts of @karpathy and many others (including myself and @jeremyphoward from the QT-ed thread) but certainly not discounting the contribution of the ever-growing-in-size frontier labs and places like the @GoogleDeepMind Gemini team, which are emitting many good people, it now feels like there is a large and robust supply of empowered deep learning experts. 🤯 Original tweet: https://x.com/chrmanning/status/2032847250423210115
I had a chance to chat with Andrej when he visited Tokyo in 2022, right after he wrapped up his 5-year work with Elon (he told me he was "recovering"). It was clear that his true passions are education and empowerment. You can see this reflected on his X feed—there is no hidden
View quoted postRT Timothy B. Lee I don't understand how Grammarly thought they could get away with this. Original tweet: https://x.com/binarybits/status/2032638328588194205
RT dr. jack morris pretty unsettling to see the disdain OpenAI employees hold for @karpathy, the most prolific educator of the AI era > median openai employee: gathers niche data + runs evals for GPT-N datamix > karpathy: teaches millions how to build these models who has more long-term impact? Original tweet: https://x.com/jxmnop/status/2032570044648218839
RT David Gwyer - AI Evals & RAG | ML Engineer I'm communicating with an LLM (SolveIt) via my handheld A4 whiteboard today! 😀 Feels like a really smooth and natural process. More. 👇🧵 Original tweet: https://x.com/dgwyer/status/2032423073312280851
RT Alex Volkov (Thursd/AI) Honesty Q, would Karpathy be able to OSS auto researcher if he was still at OAI? Original tweet: https://x.com/altryne/status/2032277246623563803
@saranormous @karpathy @NoPriorsPod Why is he not at a frontier AI lab at the most pivotal time in human history since at least the industrial revolution?
View quoted postRT Benjamin De Kraker Here is an xAI story. When I was first hired (low level) by xAI, I was extremely excited. I greatly admired Elon and what Grok could be. I have a pretty cool AI following here on X. Some big names see my stuff, including Elon himself (at the time). Lex, Beff, Andreessen, Aravind, many others. During the interview and onboarding for xAI, they made a *big deal* about wanting people who "take initiative" and think outside the box. Ok... So, some of the biggest names in tech follow me on X. I decided to ask for ideas and feedback on how Grok (then still early at version 2) could be improved. I asked my followers on X for the best "how can we make Grok awesome?" ideas, and was going to collect them (organized by Grok himself) into a big report for my boss(es) and ultimately, Elon. (xAI makes a big deal about how it's a "flat structure" also. You're supposed to be empowered to act on good ideas.) Well, my post got way more attention than I expected - great! Ideas to improve Grok poured in! I built a script to collect and sort all these great ideas to make xAI's core product better. John Carmack (personal friend of Elon, creator of Doom, id software, legend) retweeted it. Carmack has 1M followers. There were so many great ideas on how to improve Grok! I was collecting them and excited. Until....... I woke up the next day to a threatening email from my main supervisor* at xAI, telling me I had messed up, that I was NEVER to ask for ideas to improve Grok ever again, that it wasn't my job (I thought our job was to improve Grok.) They suspended my account on X. They never explained why. It was obviously related to my post about improving Grok. I was told to delete those posts which had gone viral. I had to delete all the hundreds (thousands?) of genuinely good ideas for improving Grok that had poured in by users on X, because it stepped on someone's toes. It made me confused and sad. Incidents like this happened often, where xAI employees w...
@beffjezos xAI was not built right first time around, so is being rebuilt from the foundations up. Same thing happened with Tesla.
View quoted postNew @answerdotai research by @R_Dimm & @alexisgallagher looking at whether there's been a clear jump in productivity based on python package releases. tldr: No. "Relax. You are not missing a party that literally everyone else was invited to." https://x.com/alexisgallagher/status/2032175544553570343
If AI is so great for coding, where are the apps? @R_Dimm and I studied the Python Package index to find an "AI effect". Here's where it is not, where it is, and thoughts on why. WHERE AI IS NOT. There's no clear AI effect on Python _package creation_ since ChatGPT.
RT David Gwyer - AI Evals & RAG | ML Engineer Ever wanted to work through a YouTube video with an AI companion. I wanted this too, so I built it with SolveIt! Now I have a super-easy way to digest YT videos, pull out individual video frames (plus the relevant context) and have a conversation about them with the LLM. 🔥🤯 Original tweet: https://x.com/dgwyer/status/2031799726312399285
RT Bryan Catanzaro Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: https://research.nvidia.com/labs/nemotron/Nemotron-3-Super/ And yes, Ultra is coming! Original tweet: https://x.com/ctnzr/status/2031762077325406428
it do be like that…
@MLStreetTalk @jeremyphoward The telling thing in these comments is how MAD people get that Jeremy is saying what he is saying. Portraying him as being anti-LLM, which he is not. Like gamblers, so many people emotionally attached to their AI right now
View quoted postThis is a real gold mine! 😎
Created close reading notebooks for almost every lesson of @jeremyphoward's fastai deep learning course (it's more than a course) Close reading is a technique for reading out of text, not into. Use a LLM, and you're in flow state for longer–you ask right there, with all context.
RT Larry Dial New NanoGPT Speedrun WR at 86.1 (-0.7s), by replacing partitioned hyperconnections with a simple idea: feed the exact same context vector into the last 3 attn layers, so late stage attn doesn't get polluted by prediction MLPs. Opinion: AI research agents are handicapped until they have a mech-interp toolkit. Many sub-3min architecture improvements came from analyzing weights. https://github.com/KellerJordan/modded-nanogpt/pull/241 Original tweet: https://x.com/classiclarryd/status/2030465730718908884
A listener has created this detailed vocabulary and set of linked references for anyone interested in diving deeper: https://share.solve.it.com/d/28d1864aad0723170e76fc4f720058c8
A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise
View quoted postRT Jacques I agree with @jeremyphoward. Especially where every time you say LLMs are not creative, there is so much pushback, but imo it's always due to a misunderstanding of the nuance and sometimes a kneejerk reaction to argue how AI is powerful and will soon be more powerful. As I've said before, LLMs seem to be awful OOD, but it may not seem like that in many cases because they have so much knowledge from pre-training. And many times, you think it did something novel, it was largely just an interpolation of complete solutions from years ago. "So you know Piotr Woźniak, who's a guy I really respect, who kinda rediscovered spaced repetition learning, built the SuperMemo system, and is the modern day guru of memory: The entire reason he's based his life around remembering things is because he believes that creativity comes from having a lot of stuff remembered, which is to say, putting together stuff you've remembered in interesting ways is a great way to be creative. LLMs are actually quite good at that. But there's a kind of creativity they're not at all good at, which is, you know, moving outside the distribution… You have to be so nuanced about this stuff because if you say “they're not creative”, it can give the wrong idea, because they can do very creative seeming things. But if it's like, well, can they really extrapolate outside the training distribution? The answer is no, they can't. But the training distribution is so big, and the number of ways to interpolate between them is so vast, we don't really know yet what the limitations of that is. But I see it every day, because my work is R&D. I'm constantly on the edge of and outside the training data. I'm doing things that haven't been done before. And there's this weird thing, I don't know if you've ever seen it before, but I see it multiple times every day, where the LLM goes from being incredibly clever to, like, worse than stupid, like not understanding the most basic fundamental premises about ...
A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise
View quoted postRT Caitlin Kalinowski I resigned from OpenAI. I care deeply about the Robotics team and the work we built together. This wasn’t an easy call. AI has an important role in national security. But surveillance of Americans without judicial oversight and lethal autonomy without human authorization are lines that deserved more deliberation than they got. This was about principle, not people. I have deep respect for Sam and the team, and I’m proud of what we built together. Original tweet: https://x.com/kalinowski007/status/2030320074121478618
RT Noah Dasanaike I want to up the ante on this. If you have a large document collection, I will digitize it for you, for free (you pay for inference), on one condition: that we make the data publicly available immediately. Original tweet: https://x.com/dasanaike/status/2030286576052212175
Social scientists working with materials requiring digitization can only study what machines can read. In practice, that means printed Latin-script documents from well-funded archives. In a new working paper, I show that Vision Language Models used zero-shot outperform every
RT Noah Dasanaike Social scientists working with materials requiring digitization can only study what machines can read. In practice, that means printed Latin-script documents from well-funded archives. In a new working paper, I show that Vision Language Models used zero-shot outperform every existing OCR system across every script evaluated, and I propose a pipeline for deploying them on new collections. I apply it to six archival collections spanning 1.8 million pages across six countries for under $1,900. Original tweet: https://x.com/dasanaike/status/2030039366068772952
RT Hōrōshi バガボンド http://x.com/i/article/2029917440348426240 Original tweet: https://x.com/KatanaLarp/status/2029928471632224486
RT Chris Lattner Mojo🔥 has always had "peak perf" and "access to full power of the GPU"... but many want "peak perf" with high level code. "Structured Kernels" are simple and composable APIs that increase the usability of kernel programming - without losing perf, and with no template errors. Original tweet: https://x.com/clattner_llvm/status/2029639075196719574
You shouldn't have to choose between peak GPU performance and code you can actually maintain. We built Structured Mojo 🔥 Kernels to fix that. Performance, usability, and portability without the tradeoff. 14k to 7k lines. ~1.8k TFLOPS held. We wrote a 4-part series on how. Part 1
RT Acer Re Also, come on OpenAI. If you want an automated AI researcher, this needs to start going up, not down. Original tweet: https://x.com/AcerFur/status/2029624795113955357
RT Nanbeige In both LeetCode's Weekly Contests (Weekly Contests 489–491) and the HMMT February 2026 (Harvard-MIT Mathematics Tournament), Nanbeige4.1-3B's performance not only significantly outperformed that of Qwen3.5-4B but also surpassed Qwen3.5-9B. Original tweet: https://x.com/nanbeige/status/2029405267130220863
RT Miles Brundage Frontier AI companies have had many years to prepare for this DPA + contract negotiation stuff and didn’t really do much. None of this came out of nowhere. Should give you pause re: how on top of AGI safety and security they are Original tweet: https://x.com/Miles_Brundage/status/2029396268334891384
RT Addy Osmani Introducing the Google Workspace CLI: https://github.com/googleworkspace/cli - built for humans and agents. Google Drive, Gmail, Calendar, and every Workspace API. 40+ agent skills included. Original tweet: https://x.com/addyosmani/status/2029372736267805081
RT alistair BEAM is the correct virtual machine for agents, and Elixir and Gleam are the correct languages. The future is massively concurrent Original tweet: https://x.com/alistaiir/status/2029320569653739634
😿
@jeremyphoward Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source https://venturebeat.com/technology/did-alibaba-just-kneecap-its-powerful-qwen-ai-team-key-figures-depart-in
View quoted postWTF is going on at Qwen?!? Some kind of implosion? This is really sad and worrying. They've been *such* a strong team, and are losing some of their very best researchers.
RT Mario Zechner Resharing this ML Street Talk episode with @jeremyphoward again, because I think you should watch it. So many grounded thoughts cutting through the hype and breathlessness. Also discusses juniors in the age of agentic coding. https://youtu.be/dHBEQ-Ryo24?si=TdffwZfbMVutADzb Original tweet: https://x.com/badlogicgames/status/2028955580660871661
RT Utah teapot 🫖 ""• Consistent with applicable laws, including the Fourth Amendment to the United States Constitution, National Security Act of 1947, FISA Act of 1978, the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals." This means it WILL be used for domestic surveillance of everyone within 100 air miles of a port of entry. CBP already claims full authority to search and seize any electronic devices within the zone. Oof. Original tweet: https://x.com/SkyeSharkie/status/2028646736076681631
Here is re-post of an internal post: We have been working with the DoW to make some additions in our agreement to make our principles very clear. 1. We are going to amend our deal to add this language, in addition to everything else: "• Consistent with applicable laws,
View quoted post1/7 The specific contract language shared states "The Department of War may use the AI System for all lawful purposes". Although additional language is appended in that sentence, the language does not add practical constraints.2/7 The contracts goes on to reference an existing directive (DoD Directive 3000.09), but does not do so in a way that adds any contractual constraints.3/7 Based on this language, OpenAI's claim "even if those laws or policies change in the future, use of our systems must still remain aligned with the current standards reflected in the agreement" is not supported.4/7 Case law on this goes back >100 years. A 1911 case noted that that contracts must not freeze legislation, else “individuals and corporations could… in anticipation of legislation, render of no avail the exercise by Congress… of its power to regulate commerce.”5/7 This is also true of contracts with the government. In Winstar (1996) the Supreme Court held 7-2 that a clause referring to "all applicable statutes" required obeying future laws as they arose; it did *not* freeze the regulatory framework at the time of signing.6/7 Whilst mechanisms exist that in theory can freeze reference to legislation to a point in time, they are very rare, almost never used in government contracts, and would be unlikely to be enforceable in this case.7/7 Our legal analysis shows that this critical claim from OpenAI, that "use of our systems must still remain aligned with the current standards reflected in the agreement", is very unlikely to be supported by their contract with the DoW, or by the law.
RT Charlie Bullock I agree with Alan's overall claim in this piece (Anthropic will very likely sue and win), but I disagree with his analysis on one important point. I think that Anthropic's case is actually even stronger than Alan's and Michael's analysis suggests, because the statutory "judicial review bar" they're anticipating doesn't actually exist. The article suggests that the relevant supply chain risk statute "bars judicial review when the government limits disclosure of its determination." I previously thought that too, and tweeted analysis suggesting the same. But after further research, I've been converted to the @alasdairpr view: I'm pretty sure that the judicial review bar only bars review of the government's decision to limit disclosure, and does not bar review of the actual SCR designation. Basically, this means that Anthropic's legal prospects are even better than Alan anticipates. Anthropic won't have to rely on a constitutional claim or try to bypass the statutory judicial review bar with an ultra vires argument--they can just go straight to federal court and say "what DOW is doing is illegal." This is a big deal, because ultra vires arguments and similar bankshot strategies for circumventing judicial review are generally very difficult to win with. Original tweet: https://x.com/CharlieBul58993/status/2028528529537650997
A deep dive in @lawfare on the many legal problems with the Pentagon's designation of Anthropic as a supply chain risk.
RT Aidan McLaughlin i personally don’t think this deal was worth it Original tweet: https://x.com/aidan_mclau/status/2028507663529906395
RT Hayden Field NEW: When OpenAI announced its Pentagon deal Friday night, people immediately challenged Sam Altman's claims. Why, they asked, would the DoD suddenly agree to red lines when it had said it would never do so? The answer, sources told me, is that it didn't. https://www.theverge.com/ai-artificial-intelligence/887309/openai-anthropic-dod-military-pentagon-contract-sam-altman-hegseth Original tweet: https://x.com/haydenfield/status/2028481498781790567
RT Gergely Orosz On one end, the Anthropic team is a massive user of AI to write code (80%+ of all code deployed is written by Claude Code). They ship amazingly fast. On the other hand, seeing these beyond terrible reliability numbers suggests there might be a downside to all this speed: Original tweet: https://x.com/GergelyOrosz/status/2028465387570884640
RT Timothy B. Lee I think it's significant that @natseckatrina, who @sama tapped to help answer questions about the DoD deal on Twitter, led the Obama administration's "media and public policy response" to the Snowden disclosures, according to her LinkedIn. Explains a lot about their approach. Original tweet: https://x.com/binarybits/status/2028306169693909242
RT Glenn Matlin Re @ch402 @sebgehr Too many to count. NatSec in general agrees with you @ch402. Jack Shannan’s background and placement in Operation Maven is noteworthy so his understanding of how critical Claude is to American military effectiveness is not just hot air. https://www.linkedin.com/posts/jackntshanahan_lots-of-people-posting-about-anthropic-activity-7432870987165077504-SCa7 Original tweet: https://x.com/GlennMatlin/status/2028244121178251503
RT Chris Olah Very grateful to all the natsec law experts who are taking time over the weekend to provide independent legal commentary in this moment. A few that I've noticed (no doubt missing many)... Original tweet: https://x.com/ch402/status/2028216398443614693
As @bradrcarson explains, the contract language released so far does not restrict the gov from using AI to kill without human oversight.
RT dave kasten This is an important point from Logan Koepke: OpenAI is claiming that DoW lacks authorities to get commercial data at scale, despite extensive reporting that they have done so Original tweet: https://x.com/David_Kasten/status/2028137922529038487
@natseckatrina @David_Kasten @sama on point two, they have in fact done this and claim they have the authority to do this. • https://www.vice.com/en/article/us-military-location-data-xmode-locate-x/ • https://www.nytimes.com/2021/01/22/us/politics/dia-surveillance-data.html • https://static01.nyt.com/newsgraphics/documenttools/0117fa5f9ff7ae33/fe33e1ba-full.pdf
RT Leon Lang I understand that OpenAI employees might have been reassured by Sam’s tweet from yesterday. But this blogpost is so blatantly deceptive that I can’t grasp how you can have any other reaction than to be ashamed to work for your company. Original tweet: https://x.com/Lang__Leon/status/2027903509081821208
For those following the DoW AI drama, I highly recommend reading this post explaining how @OpenAI approached the negotiations with the DoW.
RT Andreas Kirsch 🇺🇦 I'm speechless at OpenAI releasing that contract excerpt and acting as if there aren't gaping holes that could be exploited far beyond their stated "red lines." I'm not a lawyer, but this is pretty obvious and common sense. (And to be clear: if Google had signed the same deal, I'd be saying the same thing internally. The issues here are bigger than friendly competition between companies.) OpenAI's "red lines" are: no mass domestic surveillance, no directing autonomous weapons, and no high-stakes automated decisions. They argue their cloud-only deployment + safety stack + cleared OpenAI personnel "in the loop" make violations impossible. They also claim the contract references the relevant laws/policies "as they exist today" so future changes won't weaken the standards. But the actual language they published is still full of obvious escape hatches. This is why Anthropic refusing to sign makes sense. Reporting on the Anthropic–"DoW"/Pentagon standoff described them saying the proposed contract language was framed as compromise but paired with "legalese that would allow safeguards to be disregarded at will." You don't need to agree with Anthropic on everything to see what they're reacting to: language that sounds like ethics but cashes out as essentially "subject to whatever the government decides later." ## Autonomous weapons The problem is that the restriction is conditional: it depends on what "law/regulation/policy requires human control" for. If policy definitions are weak (or later revised), the contract language itself doesn't read like a durable "no autonomous weapons" ban. It reads like "we'll follow whatever the current regime says requires human control." OpenAI says elsewhere that the agreement "locks in" today's standards even if laws/policies change. If that "freeze" clause is real and enforceable, sure, but it's not visible in the excerpt itself, so the excerpt alone doesn't justify the level of confidence they're projecting...
RT Miles Brundage In light of what external lawyers and the Pentagon are saying, OpenAI employees’ default assumption here should unfortunately be that OpenAI caved + framed it as not caving, and screwed Anthropic while framing it as helping them. Hope that is wrong + they get evidence otherwise Original tweet: https://x.com/Miles_Brundage/status/2027768822372135318
RT Xeophon If you are an (AI) researcher, it’s crucial to think about the implications about your research. I think this post from @giffmana is really thought provoking: Original tweet: https://x.com/xeophon/status/2027761750930567390
RT pamela mishkin the wildest part? If OAI actually wanted the redlines, they had the leverage to get them! pentagon not going to declare a SECOND merican AI company a supply chain risk, could have held the line and forced real concessions and safety! Original tweet: https://x.com/manlikemishap/status/2027751243263705274
Not just did OpenAI defect and concede to this whole authoritarian maneuver, but Sam also went and just deceptively framed the whole thing to try to make it look like they had agreed to the same Anthropic redlines, which is not actually true. https://x.com/_NathanCalvin/status/2027597992195195234?s=20
View quoted postRT Bun am i a supply chain risk now??? Original tweet: https://x.com/bunjavascript/status/2027638567317737895
RT Nathan Lambert Every Anthropic employee proudly amplifying their company comms and 0 supporting Sama’s weird scooping up of the DoW contract is pretty telling. Original tweet: https://x.com/natolambert/status/2027595909299900482