ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)
Activity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/LLMs-from-scratch
rasbt opened a pull request in LLMs-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubClaude distillation has been a big topic this week while I am (coincidentally) writing Chapter 8 on model distillation. In that context, I shared some utilities to generate distillation data from all sorts of open-weight models via OpenRouter and Ollama: https://github.com/rasbt/reasoning-from-scratch/blob/main/ch08/02_generate_distillation_data/README.md
Activity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubhttp://x.com/i/article/2026503653893222402
Memorization & distillation. Two sides of the same scaling coin.
We extract nearly all (95.8%) of Harry Potter and the Sorcerer's Stone from Claude Sonnet 🤷🏻♂️
Am currently putting together an article, and yeah, the SWE-Bench Verified numbers are definitely a bit sus across all models -- the benchmark suggest they are more similar than they really are. So, I went down a rabbit hole looking into SWE-Bench Verified issues... And it looks like OpenAI already did really nice work there in their "Why SWE-bench Verified no longer measures frontier coding capabilities" analysis: https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/ The gist is: 1. After auditing 27.6% of frequently failed tasks, at least 59.4% had flawed tests that reject correct solutions 2. Since SWE-Bench draws from widely used open-source repos, frontier models sometimes reproduced the exact “gold patch” or problem details, which suggest data leakage. (Probably a "duh" given that the dataset has been out since 2023.) Long story short, SWE-Bench Pro seems to a bit of an improvement (for now).
Activity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubRT paulopacitti 🌐 some things are quite fundamental, but so wholesome to remember it. Dot product just check how aligned vectors are, and in the context of LLMs, how tokens are similar to each other from @rasbt amazing book Original tweet: https://x.com/paulopacitti/status/2025214154236137645
February is one of those months... - Moonshot AI's Kimi K2.5 (Feb 2) - z. AI GLM 5 (Feb 12) - MiniMax M2.5 (Feb 12) - ByteDance Seed-2.0 (Feb 13) - Nanbeige 4.1 3B (Feb 13) - Qwen 3.5 (Feb 15) - Cohere's Tiny Aya (Feb 17) (+Hopefully DeepSeek V4 soon) Anything I forgot?
Tiny Aya reimplementation From Scratch! Have been reading through the technical reports of the recent wave of open-weight LLM releases (more on that soon). Tiny Aya (2 days ago) was a bit under the radar. Looks like a nice, small 3.35B model with strongest multilingual support of that size class. Great for on-device translation tasks. Just did a from-scratch implementation here: https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/15_tiny-aya/standalone-tiny-aya-plus-kv-cache.ipynb Architecture-wise, Tiny Aya is a classic decoder-style transformer with a few noteworthy modifications (besides the obvious ones like SwiGLU and Grouped Query Attention): 1. Parallel transformer blocks. A parallel transformer block computes attention and MLP from the same normalized input, then adds both to the residual in one step. I assume this is to reduce serial dependencies inside a layer to improve computational throughput. 2. Sliding window attention. Specifically, it uses a 3:1 local:global ratio similar to Arcee Trinity and Olmo 3. The window size is also 4096. Also, similar to Arcee, the sliding window layers use RoPE whereas the full attention layers use NoPE. 3. LayerNorm. Most architectures moved to RMSNorm as it's computationally a bit cheaper and performs well. Tiny Aya is keeping it more classic with a modified version of LayerNorm (the implementation here is like standard LayerNorm but without shift, i.e., bias, parameter).
Activity on rasbt/LLMs-from-scratch
rasbt opened a pull request in LLMs-from-scratch
View on GitHubActivity on rasbt/LLMs-from-scratch
rasbt opened a pull request in LLMs-from-scratch
View on GitHubActivity on rasbt/LLMs-from-scratch
rasbt opened a pull request in LLMs-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on rasbt/LLMs-from-scratch
rasbt opened a pull request in LLMs-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt commented on an issue in reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubFinished Ch07 on Improving GRPO for Reinforcement Learning! Building on the GRPO from scratch intro, this adds (and analyzes) more bells and whistles! (Clipped policy ratios, KL term, format rewards, and couple of improvements.) https://github.com/rasbt/reasoning-from-scratch/blob/main/ch07/01_main-chapter-code/ch07_main.ipynb
Activity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on repository
rasbt pushed reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt opened a pull request in reasoning-from-scratch
View on GitHubActivity on rasbt/reasoning-from-scratch
rasbt labeled an issue in reasoning-from-scratch
View on GitHub