SR

Sebastian Raschka

0 位关注者
1133 条内容
10最近 7 天 条

简介

ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)

平台

𝕏Sebastian Raschka

内容历史

SR
Sebastian Raschka
githubabout 7 hours ago

Activity on rasbt/LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
githubabout 7 hours ago

Activity on rasbt/LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
githubabout 7 hours ago

Activity on rasbt/LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github5 days ago

Activity on rasbt/LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github5 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github5 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github5 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github5 days ago

Activity on rasbt/LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github5 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github5 days ago

Activity on repository

rasbt made this repository public

rasbt made this repository public

View on GitHub
SR
Sebastian Raschka
𝕏x7 days ago
Retweeted from @Sten

RT Sten Rüdiger I’ve uploaded a new paper on arXiv (co-authored by @rasbt): MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning In Parameter-Efficient Fine-Tuning, a key question may not just be how low-rank the update is, but *which* subspace we adapt. Original tweet: https://x.com/StenRuediger/status/2041888496927826398

RT Sten Rüdiger
I’ve uploaded a new paper on arXiv (co-authored by @rasbt):
MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning

In Parameter-Efficient Fine-Tuning, a key question may not jus...
View on X
SR
Sebastian Raschka
𝕏x7 days ago

Strong release! GLM-5.1 is a DeepSeek-V3.2-like architecture (including MLA and DeepSeek Sparse Attention) but with more layers. And the benchmarks look better throughout! Looks like THE flagship open-weight model now.

Strong release! GLM-5.1 is a DeepSeek-V3.2-like architecture (including MLA and DeepSeek Sparse Attention) but with more layers. 

And the benchmarks look better throughout! Looks like THE flagship...
@Z.ai

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations.

Quoted tweet media 1
View quoted post
View on X
SR
Sebastian Raschka
github8 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

rasbt pushed reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github8 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github9 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github9 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

rasbt pushed reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github9 days ago

Activity on rasbt/reasoning-from-scratch

rasbt contributed to rasbt/reasoning-from-scratch

rasbt contributed to rasbt/reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github9 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github9 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github9 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
𝕏x9 days ago

Added an RSS feed to the LLM Architecture Gallery so it is a bit easier to keep up with new additions over time: https://sebastianraschka.com/llm-architecture-gallery/

Added an RSS feed to the LLM Architecture Gallery so it is a bit easier to keep up with new additions over time: https://sebastianraschka.com/llm-architecture-gallery/
View on X
SR
Sebastian Raschka
github9 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github9 days ago

Activity on rasbt/mini-coding-agent

rasbt closed an issue in mini-coding-agent

rasbt closed an issue in mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on rasbt/LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on rasbt/mini-coding-agent

rasbt closed an issue in mini-coding-agent

rasbt closed an issue in mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on rasbt/mini-coding-agent

rasbt opened a pull request in mini-coding-agent

rasbt opened a pull request in mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on rasbt/mini-coding-agent

rasbt contributed to rasbt/mini-coding-agent

rasbt contributed to rasbt/mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github10 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on rasbt/LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on rasbt/LLMs-from-scratch

rasbt contributed to rasbt/LLMs-from-scratch

rasbt contributed to rasbt/LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on rasbt/LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on rasbt/LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on rasbt/LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on repository

rasbt pushed llm-architecture-gallery

rasbt pushed llm-architecture-gallery

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on rasbt/llm-architecture-gallery

rasbt contributed to rasbt/llm-architecture-gallery

rasbt contributed to rasbt/llm-architecture-gallery

View on GitHub
SR
Sebastian Raschka
github11 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
𝕏x11 days ago

Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation. Link: https://magazine.sebastianraschka.com/p/components-of-a-coding-agent

Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation.

Link: https://magazine.sebastianraschka.com/p/...
View on X
SR
Sebastian Raschka
github12 days ago

Activity on rasbt/LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github13 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github13 days ago

Activity on rasbt/mini-coding-agent

rasbt contributed to rasbt/mini-coding-agent

rasbt contributed to rasbt/mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github13 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github13 days ago

Activity on rasbt/mini-coding-agent

rasbt commented on an issue in mini-coding-agent

rasbt commented on an issue in mini-coding-agent

View on GitHub
SR
Sebastian Raschka
𝕏x13 days ago

Flagship open-weight release days are always exciting. Was just reading through the Gemma 4 reports, configs, and code, and here are my takeaways: Architecture-wise, besides multi-model support, Gemma 4 (31B) looks pretty much unchanged compared to Gemma 3 (27B). Gemma 4 maintains a relatively unique Pre- and Post-norm setup and remains relatively classic, with a 5:1 hybrid attention mechanism combining a sliding-window (local) layer and a full-attention (global) layer. The attention mechanism itself is also classic Grouped Query Attention (GQA). But let’s not be fooled by the lack of architectural changes. Looking at the benchmarks, Gemma 4 is a huge leap from Gemma 3. This is likely due to the training set and recipe. Interestingly, on the AI Arena Leaderboard, Gemma 4 (31B) ranks similarly to the much larger Qwen3.5-397B-A17B model. But as I discussed in my model evaluation article, arena scores are a bit problematic as they can be gamed and are biased towards human (style) preference. If we look at some other common benchmarks, which I plotted below, we can see that it’s indeed a very clear leap over Gemma 3 and ranks on par with Qwen3.5 27B. Note that there is also a Mixture-of-Experts (MoE) Gemma 4 variant that is slightly smaller (27B  with 4 billion parameters active. The benchmarks are only slightly worse compared to Gemma 4 (31B). I omitted the MoE architecture in the figure below because the figure is already very crowded, but you can find it in my LLM Architecture Gallery. Anyways, overall, it's a nice and strong model release and a strong contender for local usage. Also, one aspect that should not be underrated is that (it seems) the model is now released with a standard Apache 2.0 open-source license, which has much friendlier usage terms than the custom Gemma 3 license.

Flagship open-weight release days are always exciting. Was just reading through the Gemma 4 reports, configs, and code, and here are my takeaways:

Architecture-wise, besides multi-model support, G...
View on X
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt pushed mini-coding-agent

rasbt pushed mini-coding-agent

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

rasbt pushed reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

rasbt pushed reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

rasbt pushed reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

rasbt pushed reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on rasbt/reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github14 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github15 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github15 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github15 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github15 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

rasbt pushed reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github15 days ago

Activity on rasbt/reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github15 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github15 days ago

Activity on rasbt/LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github15 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
𝕏x15 days ago

http://x.com/i/article/2038978163389112321

View on X
SR
Sebastian Raschka
github16 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github16 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github16 days ago

Activity on rasbt/LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github16 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
github16 days ago

Activity on rasbt/LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github16 days ago

Activity on rasbt/LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github16 days ago

Activity on rasbt/reasoning-from-scratch

rasbt closed a pull request in reasoning-from-scratch

rasbt closed a pull request in reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
𝕏x17 days ago

It’s done. All chapters of Build A Reasoning Model (From Scratch) are now available in early access. The book is currently in production and should be out in the next months, including full-color print and syntax highlighting. There’s also a preorder up on Amazon.

It’s done.

All chapters of Build A Reasoning Model (From Scratch) are now available in early access.

The book is currently in production and should be out in the next months, including full-color...
View on X
SR
Sebastian Raschka
github18 days ago

Activity on rasbt/reasoning-from-scratch

rasbt closed an issue in reasoning-from-scratch

rasbt closed an issue in reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github18 days ago

Activity on rasbt/reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github18 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub
SR
Sebastian Raschka
github18 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

rasbt pushed reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github19 days ago

Activity on rasbt/reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github19 days ago

Activity on repository

rasbt created a branch

rasbt created a branch

View on GitHub
SR
Sebastian Raschka
𝕏x19 days ago
Retweeted from @levi

RT levi Day 83/365 of GPU Programming Looking at DeepSeek's Multi-Head Latent Attention today. The last part of the AMD challenge series is to optimize an MLA decode kernel for MI355X where the absorbed Q and compressed KV cache are given and your task is to do the attention computation. A resource that really helped internalize what MLA does was @rasbt's incredible visual guide to attention variants in LLMs (luckily he posted that last week!), which covers everything from MHA to GQA to MLA to SWA, et cetera. If there's one place to get a visual intuition for recent attention mechanisms, it's this blog post. @jbhuang0604's video on MQA, GQA,MLA and DSA was the best conceptual intro I found on the topic and progressively builds up the ideas from first principles. The Welch Labs analysis of MLA is a great watch as well. Beautiful visualization of the changes DeepSeek made for MLA. Tried out a few kernels once I had a basic understanding of MLA and I think I'm slowly getting more comfortable with at least analyzing kernels. Original tweet: https://x.com/levidiamode/status/2037663231511322831

RT levi
Day 83/365 of GPU Programming

Looking at DeepSeek's Multi-Head Latent Attention today. The last part of the AMD challenge series is to optimize an MLA decode kernel for MI355X where  the a...
RT levi
Day 83/365 of GPU Programming

Looking at DeepSeek's Multi-Head Latent Attention today. The last part of the AMD challenge series is to optimize an MLA decode kernel for MI355X where  the a...
RT levi
Day 83/365 of GPU Programming

Looking at DeepSeek's Multi-Head Latent Attention today. The last part of the AMD challenge series is to optimize an MLA decode kernel for MI355X where  the a...
RT levi
Day 83/365 of GPU Programming

Looking at DeepSeek's Multi-Head Latent Attention today. The last part of the AMD challenge series is to optimize an MLA decode kernel for MI355X where  the a...
@levi

Day 82/365 of GPU Programming Taking a closer look at Mixture of Experts today, so I can write better MoE kernels. Specifically, to optimize an MXFP4 MoE fused kernel for the GPU Mode challenge. I haven't had much prior exposure to MoEs, so lots of new concepts I learned today.

Quoted tweet media 1Quoted tweet media 2Quoted tweet media 3Quoted tweet media 4
View quoted post
View on X
SR
Sebastian Raschka
github19 days ago

Activity on rasbt/reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github19 days ago

Activity on rasbt/reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

View on GitHub
SR
Sebastian Raschka
github19 days ago

Activity on rasbt/LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github19 days ago

Activity on rasbt/LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

rasbt closed an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
github19 days ago

Activity on rasbt/LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

rasbt commented on an issue in LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
𝕏x20 days ago

Added lots of improvements to the LLM Architecture Gallery in the last 2 weeks. Imho the coolest one yet: A diff tool many of you were asking for! https://sebastianraschka.com/llm-architecture-gallery/

Added lots of improvements to the LLM Architecture Gallery in the last 2 weeks. 

Imho the coolest one yet: A diff tool many of you were asking for!

https://sebastianraschka.com/llm-architecture-g...
View on X
SR
Sebastian Raschka
github20 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

View on GitHub
SR
Sebastian Raschka
𝕏x20 days ago

Doing my tax return just made me think: TurboTax is probably something one could vibecode. But paying $190 for a reliable, worry-free experience still seems pretty reasonable. SaaS is not dead.

View on X
SR
Sebastian Raschka
github21 days ago

Activity on repository

rasbt deleted

rasbt deleted

View on GitHub