LogoFollow AI builders
  • Home
  • Features
  • Builders
  • Submit Builder
LogoFollow AI builders

Follow Real AI Builders — Discover the Minds Behind the Next AI Revolution

TwitterX (Twitter)Email
Company
  • Contact
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 Follow AI builders All Rights Reserved.
AK

Andrej Karpathy

1 followers
421 content items
5 in the last 7 days

About

Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.

Platforms

𝕏Andrej Karpathy

Content History

AK
Andrej Karpathy
⚡github•about 22 hours ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•1 day ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•5 days ago

Activity on repository

karpathy pushed karpathy.github.io

karpathy pushed karpathy.github.io

View on GitHub
AK
Andrej Karpathy
⚡github•5 days ago

Activity on repository

karpathy pushed karpathy.github.io

karpathy pushed karpathy.github.io

View on GitHub
AK
Andrej Karpathy
𝕏x•6 days ago

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to th...

@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

View quoted post
View on X
AK
Andrej Karpathy
⚡github•7 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•7 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•7 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•7 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•8 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•8 days ago

Activity on karpathy/KarpathyTalk

karpathy closed an issue in KarpathyTalk

karpathy closed an issue in KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•8 days ago

Activity on karpathy/KarpathyTalk

karpathy commented on an issue in KarpathyTalk

karpathy commented on an issue in KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•8 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•9 days ago

Activity on karpathy/KarpathyTalk

karpathy contributed to karpathy/KarpathyTalk

karpathy contributed to karpathy/KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•9 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•9 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•9 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•9 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•9 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•9 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
⚡github•9 days ago

Activity on repository

karpathy pushed KarpathyTalk

karpathy pushed KarpathyTalk

View on GitHub
AK
Andrej Karpathy
𝕏x•11 days ago

Farzapedia, personal wikipedia of Farza, good example following my Wiki LLM tweet. I really like this approach to personalization in a number of ways, compared to "status quo" of an AI that allegedly gets better the more you use it or something: 1. Explicit. The memory artifact is explicit and navigable (the wiki), you can see exactly what the AI does and does not know and you can inspect and manage this artifact, even if you don't do the direct text writing (the LLM does). The knowledge of you is not implicit and unknown, it's explicit and viewable. 2. Yours. Your data is yours, on your local computer, it's not in some particular AI provider's system without the ability to extract it. You're in control of your information. 3. File over app. The memory here is a simple collection of files in universal formats (images, markdown). This means the data is interoperable: you can use a very large collection of tools/CLIs or whatever you want over this information because it's just files. The agents can apply the entire Unix toolkit over them. They can natively read and understand them. Any kind of data can be imported into files as input, and any kind of interface can be used to view them as the output. E.g. you can use Obsidian to view them or vibe code something of your own. Search "File over app" for an article on this philosophy. 4. BYOAI. You can use whatever AI you want to "plug into" this information - Claude, Codex, OpenCode, whatever. You can even think about taking an open source AI and finetuning it on your wiki - in principle, this AI could "know" you in its weights, not just attend over your data. So this approach to personalization puts *you* in full control. The data is yours. In Universal formats. Explicit and inspectable. Use whatever AI you want over it, keep the AI companies on their toes! :) Certainly this is not the simplest way to get an AI to know you - it does require you to manage file directories and so on, but agents also make it quite...

@Farza 🇵🇰🇺🇸

This is Farzapedia. I had an LLM take 2,500 entries from my diary, Apple Notes, and some iMessage convos to create a personal Wikipedia for me. It made 400 detailed articles for my friends, my startups, research areas, and even my favorite animes and their impact on me complete

View quoted post
View on X
AK
Andrej Karpathy
𝕏x•11 days ago

Something I've been thinking about - I am bullish on people (empowered by AI) increasing the visibility, legibility and accountability of their governments. Historically, it is the governments that act to make society legible (e.g. "Seeing like a state" is the common reference), but with AI, society can dramatically improve its ability to do this in reverse. Government accountability has not been constrained by access (the various branches of government publish an enormous amount of data), it has been constrained by intelligence - the ability to process a lot of raw data, combine it with domain expertise and derive insights. As an example, the 4000-page omnibus bill is "transparent" in principle and in a legal sense, but certainly not in a practical sense for most people. There's a lot more like it: laws, spending bills, federal budgets, freedom of information act responses, lobbying disclosures... Only a few highly trained professionals (investigative journalists) could historically process this information. This bottleneck might dissolve - not only are the professionals further empowered, but a lot more people can participate. Some examples to be precise: Detailed accounting of spending and budgets, diff tracking of legislation, individual voting trends w.r.t. stated positions or speeches, lobbying and influence (e.g. graph of lobbyist -> firm -> client -> legislator -> committee -> vote -> regulation), procurement and contracting, regulatory capture warning lights, judicial and legal patterns, campaign finance... Local governments might be even more interesting because the governed population is smaller so there is less national coverage: city council meetings, decisions around zoning, policing, schools, utilities... Certainly, the same tools can easily cut the other way and it's worth being very mindful of that, but I lean optimistic overall that added participation, transparency and accountability will improve democratic, free societies. (the quoted tweet i...

@Harry Rushworth

The British Government is a complicated beast. Dozens of departments, hundreds of public bodies, more corporations than one can count... Such is its complexity that there isn't an org chart for it. Well, there wasn't... Introducing ⚙️Machinery of Government⚙️

Quoted tweet media 1
View quoted post
View on X
AK
Andrej Karpathy
𝕏x•11 days ago

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.

@Andrej Karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating

View quoted post
View on X
AK
Andrej Karpathy
𝕏x•13 days ago

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine ma...

View on X
AK
Andrej Karpathy
𝕏x•15 days ago

New supply chain attack this time for npm axios, the most popular HTTP client library with 300M weekly downloads. Scanning my system I found a use imported from googleworkspace/cli from a few days ago when I was experimenting with gmail/gcal cli. The installed version (luckily) resolved to an unaffected 1.13.5, but the project dependency is not pinned, meaning that if I did this earlier today the code would have resolved to latest and I'd be pwned. It's possible to personally defend against these to some extent with local settings e.g. release-age constraints, or containers or etc, but I think ultimately the defaults of package management projects (pip, npm etc) have to change so that a single infection (usually luckily fairly temporary in nature due to security scanning) does not spread through users at random and at scale via unpinned dependencies. More comprehensive article: https://www.stepsecurity.io/blog/axios-compromised-on-npm-malicious-versions-drop-remote-access-trojan

@Feross

🚨 CRITICAL: Active supply chain attack on axios -- one of npm's most depended-on packages. The latest [email protected] now pulls in [email protected], a package that did not exist before today. This is a live compromise. This is textbook supply chain installer malware. axios

View quoted post
View on X
AK
Andrej Karpathy
𝕏x•18 days ago

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

View on X
AK
Andrej Karpathy
𝕏x•20 days ago

When I built menugen ~1 year ago, I observed that the hardest part by far was not the code itself, it was the plethora of services you have to assemble like IKEA furniture to make it real, the DevOps: services, payments, auth, database, security, domain names, etc... I am really looking forward to a day where I could simply tell my agent: "build menugen" (referencing the post) and it would just work. The whole thing up to the deployed web page. The agent would have to browse a number of services, read the docs, get all the api keys, make everything work, debug it in dev, and deploy to prod. This is the actually hard part, not the code itself. Or rather, the better way to think about it is that the entire DevOps lifecycle has to become code, in addition to the necessary sensors/actuators of the CLIs/APIs with agent-native ergonomics. And there should be no need to visit web pages, click buttons, or anything like that for the human. It's easy to state, it's now just barely technically possible and expected to work maybe, but it definitely requires from-scratch re-design, work and thought. Very exciting direction!

@Patrick Collison

When @karpathy built MenuGen (https://karpathy.bearblog.dev/vibe-coding-menugen/), he said: "Vibe coding menugen was exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app. Building a modern app is a bit like assembling IKEA future. There are all these services,

View quoted post
View on X
AK
Andrej Karpathy
⚡github•21 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on karpathy/autoresearch

karpathy closed an issue in autoresearch

karpathy closed an issue in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on karpathy/nanochat

karpathy closed an issue in nanochat

karpathy closed an issue in nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on karpathy/nanochat

karpathy commented on an issue in nanochat

karpathy commented on an issue in nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on karpathy/nanochat

karpathy closed an issue in nanochat

karpathy closed an issue in nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•21 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
𝕏x•21 days ago

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

View on X
AK
Andrej Karpathy
⚡github•22 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•22 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
𝕏x•22 days ago

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

@Daniel Hnyk

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

View quoted post
View on X
AK
Andrej Karpathy
⚡github•26 days ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
𝕏x•26 days ago

Thank you Sarah, my pleasure to come on the pod! And happy to do some more Q&A in the replies.

@sarah guo

Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What

View quoted post
View on X
AK
Andrej Karpathy
𝕏x•26 days ago

Had to go see Project Hail Mary right away (it's based on the book of Andy Weir, of also The Martian fame). Both very pleased and relieved to say that 1) the movie sticks very close to the book in both content and tone and 2) is really well executed. The book is one of my favorites when it comes to alien portrayals because a lot of thought was clearly given to the scientific details of an alternate biochemistry, evolutionary history, sensorium, psychology, language, tech tree, etc. It's different enough that it is highly creative and plausible, but also similar enough that you get a compelling story and one of the best bromances in fiction. Not to mention the other (single-cellular) aliens. I can count fictional portrayals of aliens of this depth on one hand. A lot of these aspects are briefly featured - if you read the book you'll spot them but if you haven't, the movie can't spend the time to do them justice. I'll say that the movie inches a little too much into the superhero movie tropes with the pacing, the quips, the Bathos and such for my taste, and we get a little bit less the grand of Interstellar and a little bit less of the science of The Martian, but I think it's ok considering the tone of the original content. And it does really well where it counts - on Rocky and the bromance. Thank you to the film crew for the gem!

View on X
AK
Andrej Karpathy
𝕏x•28 days ago
Thread • 2 tweets

The signature is alluding to NVIDIA GTC 2015, where Jensen excitedly told an audience of, at the time, mostly gamers and scientific computing professionals that Deep Learning is The Next Big Thing, citing among other examples my PhD thesis (one of the first image captioning systems that coupled image recognition ConvNet to an autoregressive RNN language model, trained end to end). This was back when most people were still unaware and somewhat skeptical but of course - Jensen was 1000% correct, highly prescient and locked in very early.(link to blast from the past) https://youtu.be/xQhb3C2hQoE?si=x3qQMjG-dktoNNv_&t=1577

The signature is alluding to NVIDIA GTC 2015, where Jensen excitedly told an audience of, at the time, mostly gamers and scientific computing professionals that Deep Learning is The Next Big Thing,...
View on X
AK
Andrej Karpathy
⚡github•29 days ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•30 days ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/jobs

karpathy commented on an issue in jobs

karpathy commented on an issue in jobs

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed jobs

karpathy pushed jobs

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/jobs

karpathy closed an issue in jobs

karpathy closed an issue in jobs

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/jobs

karpathy closed an issue in jobs

karpathy closed an issue in jobs

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed jobs

karpathy pushed jobs

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed jobs

karpathy pushed jobs

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy made this repository public

karpathy made this repository public

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
𝕏x•about 1 month ago

My autoresearch labs got wiped out in the oauth outage. Have to think through failovers. Intelligence brownouts will be interesting - the planet losing IQ points when frontier AI stutters.

View on X
AK
Andrej Karpathy
𝕏x•about 1 month ago

Expectation: the age of the IDE is over Reality: we’re going to need a bigger IDE (imo). It just looks very different because humans now move upwards and program at a higher level - the basic unit of interest is not one file but one agent. It’s still programming.

@Andrej Karpathy

@nummanali tmux grids are awesome, but i feel a need to have a proper "agent command center" IDE for teams of them, which I could maximize per monitor. E.g. I want to see/hide toggle them, see if any are idle, pop open related tools (e.g. terminal), stats (usage), etc.

View quoted post
View on X
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
𝕏x•about 1 month ago

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've ...

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were addit...
View on X
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed nanochat

karpathy pushed nanochat

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy opened a pull request in autoresearch

karpathy opened a pull request in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy created a branch

karpathy created a branch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy created a branch

karpathy created a branch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

transitive-bullshit starred karpathy/autoresearch

transitive-bullshit starred karpathy/autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy opened an issue in autoresearch

karpathy opened an issue in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
𝕏x•about 1 month ago

The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: https://github.com/karpathy/autoresearch/discussions/43 Alternatively, a PR has the benefit of exact commits: https://github.com/karpathy/autoresearch/pull/44 but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.

View on X
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy commented on an issue in autoresearch

karpathy commented on an issue in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy commented on an issue in autoresearch

karpathy commented on an issue in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy opened a pull request in autoresearch

karpathy opened a pull request in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy created a branch

karpathy created a branch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy closed a pull request in autoresearch

karpathy closed a pull request in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
𝕏x•about 1 month ago
Thread • 2 tweets

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. https://github.com/karpathy/autoresearch Part code, part sci-fi, and a pinch of psychosis :)(I still have the bigger cousin running on prod nanochat, working a bigger model and on 8XH100, which looks like this now. I'll just leave this running for a while...)

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GP...
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GP...
View on X
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy closed a pull request in autoresearch

karpathy closed a pull request in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy closed an issue in autoresearch

karpathy closed an issue in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on karpathy/autoresearch

karpathy closed an issue in autoresearch

karpathy closed an issue in autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy pushed autoresearch

karpathy pushed autoresearch

View on GitHub
AK
Andrej Karpathy
⚡github•about 1 month ago

Activity on repository

karpathy created a branch

karpathy created a branch

View on GitHub