Andrej Karpathy
简介
Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
平台
内容历史
Committed to karpathy/nanochat
karpathy commented on commit karpathy/nanochat@ccf4b7f9bf
karpathy commented on commit karpathy/nanochat@ccf4b7f9bf
View on GitHubNew post: nanochat miniseries v1 The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models. After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots: Which is just a baby version of this plot from Chinchilla: Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8! Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size. Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so instead I use the CORE...
The majority of the ruff ruff is people who look at the current point and people who look at the current slope.
Activity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed a pull request in nanochat
karpathy closed a pull request in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubReleased karpathy/rustbpe
karpathy released v0.1.0 at karpathy/rustbpe
karpathy released v0.1.0 at karpathy/rustbpe
RT Simon Willison Here's my enormous round-up of everything we learned about LLMs in 2025 - the third in my annual series of reviews of the past twelve months https://simonwillison.net/2025/Dec/31/the-year-in-llms/ This year it's divided into 26 sections! This is the table of contents: Original tweet: https://x.com/simonw/status/2006514122977063350
The first 100% autonomous coast-to-coast drive on Tesla FSD V14.2! 2 days 20 hours, 2732 miles, zero interventions. This one is special because the coast-to-coast drive was a major goal for the autopilot team from the start. A lot of hours were spent in marathon clip review sessions late into the night looking over interventions as we attempted legs of the drive over time - triaging, categorizing, planning out all the projects to close the gap and bring the number of interventions to zero. Amazing to see the system actually get there and huge congrats to the team!
I am proud to announce that I have successfully completed the world’s first USA coast to coast fully autonomous drive! I left the Tesla Diner in Los Angeles 2 days & 20 hours ago, and now have ended in Myrtle Beach, SC (2,732.4 miles) This was accomplished with Tesla FSD V14.2
RT Peter Steinberger 📢 Confession: I ship code I never read. Here's my 2025 workflow. https://steipete.me/posts/2025/shipping-at-inference-speed Original tweet: https://x.com/steipete/status/2005451576971043097
Aggressively JIT your work. It's not about the task at hand X, it's a little bit about X but mostly about how you should have had to contribute ~no latency and ~no actions. It's digital factorio time.
Activity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubI was inspired by this so I wanted to see if Claude Code can get into my Lutron home automation system. - it found my Lutron controllers on the local wifi network - checked for open ports, connected, got some metadata and identified the devices and their firmware - searched the internet, found the pdf for my system - instructed me on what button to press to pair and get the certificates - it connected to the system and found all the home devices (lights, shades, HVAC temperature control, motion sensors etc.) - it turned on and off my kitchen lights to check that things are working (lol!) I am now vibe coding the home automation master command center, the potential is 🔥.And I'm throwing away the crappy, janky, slow Lutron iOS app I've been using so far. Insanely fun :D :D
Activity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy closed an issue in nanochat
karpathy closed an issue in nanochat
View on GitHubActivity on karpathy/nanochat
karpathy commented on an issue in nanochat
karpathy commented on an issue in nanochat
View on GitHubRT Boris Cherny When I created Claude Code as a side project back in September 2024, I had no idea it would grow to be what it is today. It is humbling to see how Claude Code has become a core dev tool for so many engineers, how enthusiastic the community is, and how people are using it for all sorts of things from coding, to devops, to research, to non-technical use cases. This technology is alien and magical, and it makes it so much easier for people to build and create. Increasingly, code is no longer the bottleneck. A year ago, Claude struggled to generate bash commands without escaping issues. It worked for seconds or minutes at a time. We saw early signs that it may become broadly useful for coding one day. Fast forward to today. In the last thirty days, I landed 259 PRs -- 497 commits, 40k lines added, 38k lines removed. Every single line was written by Claude Code + Opus 4.5. Claude consistently runs for minutes, hours, and days at a time (using Stop hooks). Software engineering is changing, and we are entering a new period in coding history. And we're still just getting started.. Original tweet: https://x.com/bcherny/status/2004887829252317325