X (Twitter)

Upon request, here's an updated version with Grok 4.5 and Meta's Muse Spark 1.1. Grok 4.5 seems to sit at the Pareto frontier. Good bang for the buck. (Also added harness info).

View on X

SR

Sebastian Raschka

⚡github•3 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•3 days ago

Activity on rasbt/LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•3 days ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

⚡github•3 days ago

Activity on repository

rasbt pushed LLMs-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•4 days ago

Activity on rasbt/LLMs-from-scratch

rasbt commented on pull request rasbt/LLMs-from-scratch#1041

View on GitHub

SR

Sebastian Raschka

⚡github•4 days ago

Activity on rasbt/LLMs-from-scratch

rasbt commented on pull request rasbt/LLMs-from-scratch#1041

View on GitHub

SR

Sebastian Raschka

𝕏x•4 days ago

For agentic coding, one can say: - Unless you need Terra Ultra perf, it's always better to use a Luna model with higher effort setting (same or better performance but cheaper). - Forget everything below Sol High, use Luna with higher effort settings here - Forget Sol Extra High, use Terra Ultra here - The extra cost of Sol Ultra is probably not worth it over Max

View on X

SR

Sebastian Raschka

𝕏x•5 days ago

I like choices... but now I have: 2x modes (Codex vs. Work mode) 3x GPT-5.6 models (Sol, Terra, Luna) 5x effort levels (Light, Medium, High, Extra High, Ultra) That's 2 x 3 x 5 = 30 possible configurations for a query 🤯 What happened to "Auto" mode?

View on X

SR

Sebastian Raschka

⚡github•5 days ago

Activity on rasbt/reasoning-from-scratch

rasbt closed an issue in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•5 days ago

Activity on rasbt/reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•8 days ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•8 days ago

Activity on repository

rasbt pushed reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•8 days ago

Activity on rasbt/reasoning-from-scratch

rasbt closed an issue in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•8 days ago

Activity on rasbt/reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•8 days ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

⚡github•8 days ago

Activity on rasbt/reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•16 days ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

⚡github•18 days ago

Activity on repository

rasbt made this repository public

View on GitHub

SR

Sebastian Raschka

𝕏x•18 days ago

Have been taking different local open-weight LLMs for a test drive in different harnesses (Qwen-Code, Codex, Claude Code). 30B Mixture-of-Expert models are kind of a nice sweet spot and can solve challenging problems. And they get roughly 40 tok/sec on a Mac or DGX Spark, which is similar to GPT 5.5 in a Pro subscription and totally useable for everyday work. More interesting is also the harness choice! Claude Code seems to be using 2x many tokens as Codex. Gemma 4 E2B is here just for reference to show that the tasks can't be trivially solved by smaller models. Just finishing a longer write-up about this and will share soon (likely tomorrow)!

View on X

SR

Sebastian Raschka

⚡github•23 days ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•23 days ago

Activity on repository

rasbt pushed llm-architecture-gallery

View on GitHub

SR

Sebastian Raschka

⚡github•23 days ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

𝕏x•26 days ago

Just caught up with the recent GLM-5.2 release. The best open-weight model today. Architecture-wise, it's build on the GLM-5 and GLM-5.1 architecture that I covered previously, which means it's reusing the Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA) mechanisms from DeepSeek V3.2. (I wrote about it here: https://magazine.sebastianraschka.com/p/technical-deepseek) What's new is that they added an IndexShare mechanism. (That's a cross-layer reuse trick for DSA where instead of recomputing the sparse-attention top-k indexer in every layer, GLM-5.2 runs the full indexer only once every four layers and lets the following layers reuse those selected token indices. This keeps the same DSA idea but makes 1M-token inference much cheaper.)

View on X

SR

Sebastian Raschka

📝blog•27 days ago

GLM-5.2 and IndexShare for Long-Context Sparse Attention

Short note on GLM-5.2, an open-weight GLM update that keeps the GLM-5 sparse MoE backbone and adds IndexShare for cheaper 1M-token DSA inference.

1 min readSebastian Raschka

Read full article

SR

Sebastian Raschka

📝blog•28 days ago

VibeThinker-3B and the Strength of Post-Training

Short note on VibeThinker-3B, a 3B model based on Qwen2.5-Coder-3B whose reported coding and reasoning results point to strong post-training.

1 min readSebastian Raschka

Read full article

SR

Sebastian Raschka

𝕏x•28 days ago

Crazy model! It actually uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training stack. Need to use it in the next days to see if vibes of VibeCoder actually check out in practice. But impressive first impression! Based on the tech report, some of the important pieces of their post-training stack: 1. High-signal synthetic data (math problems with credible solutions, code with tests) 2. Multiple reasoning paths for each answer 3. Filtering, filtering, filtering 4. 2-stage SFT (start with broad training, then train on hard long-reasoning samples) 5. Use target (pass@k) accuracy over validation loss for checkpoint selection 6. MGPO (MaxEnt-Guided Policy Optimization) for RLVR: basically a GRPO-style RL method with an extra weighting that favors examples that are neither too easy nor too hard for the current policy 7. Single 64k long-context RL (they found that the usual progressive context expansion hurt this model because early truncation damaged long-thinking behavior) 8. Training data order: they do Math RL, then Code RL, then STEM RL in this particular oder which they found helped overall 9. After optimizing for accuracy, they add a stage that rewards shorter correct trajectories; basically making the model more efficient without accuracy degradation

@orcus108

WHAT THE HELL is happening in AI? A 3B parameter model just put up coding benchmark scores in the same league as Claude Opus 4.5. 3 BILLION. The weights are on Hugging Face, anyone can test it. I genuinely don't know if this is a breakthrough or if the benchmarks are broken.

⚡github•29 days ago

Activity on repository

rasbt pushed machine-learning-book

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed llm-architecture-gallery

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/llm-architecture-gallery

rasbt contributed to rasbt/llm-architecture-gallery

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

𝕏x•about 1 month ago

Cool new open-weight model by Cohere: a new lightweight 30B open-weight model for agentic coding tasks. This one builds on Command A+ using the parallel transformer design. Interestingly, even though it's almost half as big, it almost doubles the number of layers. Also, they say that it's been specifically developed for agentic coding, not just coding. I.e., the evaluation is inside a workflow, not just on a single prompt-to-code-answer task. For Terminal-Bench, the model has to use a terminal, inspect the environment, run commands, read outputs, etc. For SWE-Bench the model works on real GitHub-style software issues where it has to understand the repository, find relevant files, make a patch, pass tests, etc. SciCode and LiveCodeBench are more traditional because they mostly test whether the model can produce correct code for a specified problem. Sure, this still requires reasoning, but it's more like “Implement a numerical routine to compute a scientific quantity from given equations and inputs.” which doesn't require any interaction with the environment, existing files, tests, etc. The focus on the agentic code benchmarks is probably why it's far ahead of Gemma 4 on those. Overall, it's pretty competitive although not quite Qwen3.6-level performance.

View on X

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/reasoning-from-scratch

rasbt unlabeled an issue in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

𝕏x•about 1 month ago

Turns out Fable 5 is shadowbanning AI researchers 🫤

@elie

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

𝕏x•about 1 month ago

Always back to the basics: LatentMoE was probably inspired by MLA, which was inspired by LoRA, which was inspired by SVD, which was inspired by eigendecomposition.

View on X

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed llm-architecture-gallery

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/llm-architecture-gallery

rasbt contributed to rasbt/llm-architecture-gallery

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed llm-architecture-gallery

View on GitHub

SR

Sebastian Raschka

𝕏x•about 1 month ago

http://x.com/i/article/2063647807437705216

View on X

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Released rasbt/mlxtend

rasbt released v0.25.0 at rasbt/mlxtend

v0.25.0

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt commented on an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt commented on an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt commented on an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt commented on an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt commented on an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt commented on an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt commented on an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt closed an issue in mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on rasbt/mlxtend

rasbt contributed to rasbt/mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt created a branch

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt deleted

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

SR

Sebastian Raschka

⚡github•about 1 month ago

Activity on repository

rasbt pushed mlxtend

View on GitHub

Sebastian Raschka

About

Platforms

Content History

rasbt closed a pull request in python-machine-learning-book-2nd-edition

200,000 Subscribers

rasbt deleted

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

rasbt pushed LLMs-from-scratch

rasbt opened a pull request in LLMs-from-scratch

rasbt created a branch

rasbt pushed LLMs-from-scratch

rasbt commented on pull request rasbt/LLMs-from-scratch#1041

rasbt commented on pull request rasbt/LLMs-from-scratch#1041

rasbt closed an issue in reasoning-from-scratch

rasbt commented on an issue in reasoning-from-scratch

rasbt deleted

rasbt pushed reasoning-from-scratch

rasbt closed an issue in reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

rasbt created a branch

rasbt commented on an issue in reasoning-from-scratch

rasbt created a branch

rasbt made this repository public

rasbt deleted

rasbt pushed llm-architecture-gallery

rasbt created a branch

GLM-5.2 and IndexShare for Long-Context Sparse Attention

VibeThinker-3B and the Strength of Post-Training

rasbt pushed machine-learning-book

rasbt deleted

rasbt pushed llm-architecture-gallery

rasbt contributed to rasbt/llm-architecture-gallery

rasbt created a branch

rasbt commented on an issue in reasoning-from-scratch

rasbt unlabeled an issue in reasoning-from-scratch

rasbt deleted

rasbt pushed reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

rasbt created a branch

rasbt deleted

rasbt pushed reasoning-from-scratch

rasbt opened a pull request in reasoning-from-scratch

rasbt created a branch

rasbt commented on an issue in reasoning-from-scratch

rasbt created a branch

rasbt deleted

rasbt pushed llm-architecture-gallery

rasbt contributed to rasbt/llm-architecture-gallery

rasbt pushed llm-architecture-gallery

rasbt pushed mlxtend

rasbt pushed mlxtend

rasbt released v0.25.0 at rasbt/mlxtend

rasbt pushed mlxtend

rasbt pushed mlxtend

rasbt closed an issue in mlxtend

rasbt pushed mlxtend

rasbt pushed mlxtend

rasbt closed an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt commented on an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt commented on an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt commented on an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt commented on an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt commented on an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt commented on an issue in mlxtend

rasbt closed an issue in mlxtend

rasbt pushed mlxtend

rasbt pushed mlxtend

rasbt closed an issue in mlxtend

rasbt commented on an issue in mlxtend

rasbt pushed mlxtend