achieve ambition with intentionality, intensity, & integrity - @dxtipshq - @sveltesociety - @aidotengineer - @latentspacepod - @cognition + @smol_ai
cursor independently invented the ralph wiggum loop to solve the problems they were seeing with parallel agent orchestration
Scaling long-running autonomous coding https://cursor.com/blog/scaling-agents (https://news.ycombinator.com/item?id=46624541)
View quoted postRT Mada Seghete How SF founders do date night - interview your partner about what's new in AI and turn it into content :-p. TY @swyx for your thoughts on @claudeai Cowork! https://www.youtube.com/watch?v=_eUfRO3C1lM Original tweet: https://x.com/mada299/status/2011529243558363431
evals should be validated by vibes. i think not enough people give sufficient credit to @METR_Evals (@joel_bkr et al) for clearly identifying/quantifying the Opus 4.5 outperformance. on paper, GPT 5.2 Thinking outperforms Opus 4.5 by 55.6 vs 52% on SWE Bench Pro. in practice METR's long evals benchmark, while getting increasingly sparse in the long tail, clearly called out the huge jump that many devs are now experiencing a month later. in fact it is such an outlier that the curve fit was probably wrong/needs to be restarted as a new epoch. do see his @aiDotEngineer talk on the eval https://youtu.be/RhfqQKe22ZA?si=yyIWu17SLLEepG_x and we are releasing his 2hr longer workshop on how it works next week as our last release of AIE CODE before we prep for AIE Europe.
RT Goodfire At NeurIPS last month, @jack_merullo_ and @MarkMBissell dove deep with @swyx into the state of interpretability heading into 2026: "Pasteur's Quadrant" of use-inspired basic research deploying our methods in high-stakes industries AI for science pragmatic interpretability Original tweet: https://x.com/GoodfireAI/status/2011127174720581671
clearest minimal-jargon explanation of DeepSeek mHC I've yet read. more of this writing please. includes some anticipatory comments that echo the reader: "When I first saw this, it felt like cheating. You’re not learning stability. You’re forcing it. But some properties shouldn’t be learned; they should be guaranteed." which gives reassurance you're understanding it exactly as intended.
Earlier this year @deepseek_ai challenged the design of the Residual Connection - the backbone of every Transformer since 2016. It was replaced "hyper-connections." The promise? More expression. The problem? Without constraints, they explode. I reproduced the architecture to
just realized openai gained ~$1-2B in valuation every single day of 2025
so many ambitious startups making "the LLM OS" tried all these fancy UXes and failed so many ambitious startups making "the AI browser" tried to book your flights for you and failed meanwhile Claude Code started unpretentiously as a CLI and now can run your browser and operate your system. classic disruption theory
👋 Hi, I'm Felix and I work on Claude Cowork, bringing Claude Code closer to all kinds of knowledge work. It's an early and rough preview, please tag me in any feedback - we want to iterate very quickly and make it a little better every day.
View quoted postRT Yaser Martinez Re Based on the the @LatentSpacePod from @noam_brown with @swyx and @FanaHOVA Full episode: https://www.youtube.com/watch?v=ddd4xjuJTyg Original tweet: https://x.com/elyase/status/2010394013832986737
i see your 2021 era stackoverflow exit of $1.8b and raise you the 2024 exit of squarespace 4 months before vibe coding began
RT yi What's the highest utility per per dollar / ROI things you bought /subbed in the recent years? here's mine (i don't spend much money in general but here's my tier list anyway): S-tier [outsized value / insane] - fitness tracker / pixel watch. 300 dollars and tracks random metrics but the key thing is that it spurs you on to be more health conscious (sleep and exercise) - totally worth it. superb value for money. can't find anything more ROI efficient than this. - good quality snacks that are high in protein (sometimes pricey for snacks but good health is always worth it). For example, high protein ice cream is like 2.5x more expensive, tastes slightly worse but has decent macros. A-tier [pretty good / amazing] - car. A simple tesla in Singapore costs 200K but i found it to be so high ROI somehow because I can travel to play sports easily or go anywhere easily which improves my health & time significantly (two most important metrics!). MRT / grab is ok but somehow even though they are pareto-efficient it's still different driving. A luxury car might be C tier though but I think an affordable EV is super high ROI. - Michelin star fine dining with family. (~800 SGD for 2 people for an unforgettable meal) not bad for good experiences. occasionally though like once a month, not everyday. - X premium. Mostly any other subscription like YT premium, netflix etc. The cost to watch ads is just higher than any cost you pay for subscriptions. - home gym (we paid for a home barbel/dumbbell set years ago for $3K and did more than a thousand sessions combined on that thing). needs some floor space though. - meal prep service. $10 per meal and they ship good quality meals with reasonable stats to you. slightly troublesome but probably good value compared to eating on grab food every single meal/day. B+ / B / B- tier "okay-ish good to okay-ish maybe." - business class tickets for flights >6 hours [B+ tier] necessary for sanity even on your own dime. - top end sports equi...
07ad1b8587d0e527568ba683113cb9d549cf5b050ddd3210058fafcb65b5ce1f
## This is the Year of the Subagent if there's anything i'm hearing a lot about context management, from my podcast with @businessbarista , learnings with @cognition, and this keynote from @pirroh, it is that basically everyone is exploring subagents with scoped autonomy,
spotted at @pebble_bed, hard question for ai first call is the gpt wrapper -> agent lab shift but it was never “here” so it can’t come “back” next call is devrel (devrel is so back) but its only back in sf oh got it: midsize model labs (non biglabs that build their on models) 2023ish vintage model labs raised a bunch of money and went nowhere, was dead for a while. basically only 11labs “won”? (xai is biglab) 2025 vintage neolabs still v shiny!!
so for those who missed out on #DevWritersRetreat, @SarahChieng and i are hosting a small SF meetup for dev writers ft. Drew in 2 weeks: https://partiful.com/e/d8qlr5PNG60XEpHOlQWu space is limited, we want writers not lurkers dw i'll record and post what drew says!
Why I write (and you should too!) https://www.dbreunig.com/2025/12/27/why-i-write.html
View quoted postErdos problem 728 was postulated in 1975. 51 years later, GPT 5.2 Thinking produced a Lean proof (building upon a prior attempt from AlphaProof) and then 3 other participants used @chatgptapp to collaborate on a final proof, which Terence Tao says is finally acceptable "at a level of writing within ballpark of an acceptable standard for a research paper". But the fact that LLMs can find new proofs is no longer interesting, it's the fact that LLMs can "rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument." It changes the concept of what a paper even needs to be, because you can "doubleclick"/"simplify" at will. Decoupling the idea from the effort means that way more ideas can be explored cheaply (and explained cheaply). If anyone has a connection to Terence, would love for him to recap the past year in AI x Math for @latentspacepod! appreciate the linkup
i really should do a thread observing how cog uses devin because whats normal in this company is very not normal everywhere else i have ever seen lmao this is how they do release engineering (!??!)
RT Michelle Bakels This is the best advice I've read for submitting CFPs. I agree with everything here. Original tweet: https://x.com/MichelleBakels/status/2009737398955938303
@ChaiWithJai @jxnlco @0xRaduan @jsconf @strangeloop_stl @KubeCon_ @Official_GDC @GOTOcon @gokoyeb @cramforce indie hackers are great! @thekitze crushed his talk at aie code. just propose good topics. https://www.swyx.io/cfp-advice (i am not running aie miami to be clear)
View quoted postRT swyx Re if you're also interested in detecting model cheating, just heard about CapBencher from @tksii Paper: https://arxiv.org/abs/2505.18102 "The core trick is to intentionally cap the best achievable accuracy (Bayes accuracy) so that even a fully capable model shouldn't exceed the ceiling. If a model does beat it, it's a strong signal of contamination, leakage, or cheating. We also show experiments this idea works very well when we want to detect data contamination or leaderboard hacking. Our idea is an improvement over naively adding label noise to the answers, because the label noise approach will weaken the ability to evaluate the capabilities of LLMs and track their improvements. " Original tweet: https://x.com/swyx/status/2009731768283517022
Re @HamelHusain important update, gemini has quietly nerfed the youtube tool in gems. very disappointing.
2026 is the year of the word rotator
Activity on repository
swyxio starred swyxio/chrometaboverflow
View on GitHubRT Latent.Space From a scrappy side project built to solve their own LLM optimization problems to becoming the industry’s de-facto independent scoreboard, Micah Hill-Smith and George Cameron went through the arc of launching Artificial Analysis for free, paying benchmarking costs out of pocket, and growing it into what many now call the “new Gartner of AI” for enterprises, labs, and developers. We sat down with Micah and George to unpack why truly independent benchmarking is so hard (prompt variance, eval saturation, mystery-shopper policies), how the Artificial Analysis Intelligence Index evolved as old benchmarks broke, and what new metrics actually matter now such as agentic evals (GDPVal-AA). We also dig into the economics behind the “smile curve” of AI: why intelligence is getting 100–1000× cheaper per unit while total spend explodes, how reasoning and agents change token efficiency, and their bet that evals must continuously evolve or risk training the industry to optimize for the wrong things. @swyx @_micah_h @grmcameron Original tweet: https://x.com/latentspacepod/status/2009439986802642991
RT Arno Khachatourian I generated some summaries of the recent AI Engineer videos (https://www.learngood.com/#/youtube-series/AI%20Engineer%20-%20EOY%202025). Far from perfect, but hopefully useful. You can download anything you’re interested in as markdown and use it in context with your fav LLM. Cheers @swyx! The slop is served. Original tweet: https://x.com/arnokha/status/2009405918526505058
RT Michele Catasta We launched "Ralph mode" with Replit Agent 3 in Sep 2025. Here is how it works. First, you need to invest on the 3 pillars of Autonomy: frontier models, advanced context management, and exhaustive verification. Original tweet: https://x.com/pirroh/status/2009381577244258370
RT Artificial Analysis Artificial Analysis is on the latest episode of @latentspacepod with @Swyx Founders @_micah_h and @grmcameron talk through: ➤ Origin story of Artificial Analysis ➤ The state of AI benchmarking ➤ Our latest benchmarks including AA-Omniscience, GDPval-AA and Openness Index Link below! Original tweet: https://x.com/ArtificialAnlys/status/2009367497913585905
RT Nina Lopatina Last month, I dove deep with @swyx on @latentspacepod about the state of context engineering, and scaling it as a full-stack discipline with benchmarks, tooling, and enterprise deployments. Hosted by @LaudeInstitute on the rooftop of the Hard Rock Cafe during @NeurIPSConf (my 5th!), and my first interview in sunglasses! Catch the full video from sunny San Diego here: https://www.youtube.com/watch?v=tSRqTerZrH8. We unpacked the rapid evolution of context engineering, how agentic RAG became the baseline, why context rot is cited in every blog but industry benchmarks at real scale (100k+ documents, billions of tokens) are still rare, sub-agents with turn limits and other explicit constraints, instruction-following re-rankers for precision at scale, KV cache strategies for multi-turn agents, and why 2026 will shift to end-to-end system designs over component tweaks. For more details on the blogs, papers, and events that shaped Context Engineering in 2025 (as we referenced in our chat), join @ContextualAI's webinar next week on 1/13! Sign up here: https://www.linkedin.com/events/contextengineering-ayearinrevie7415089049692102656/ Original tweet: https://x.com/NinaLopatina/status/2009351064429183075
not actually surprising if you understand how coding agents is deployed at very large (>10k users per org) scale.* dont be surprised that ai crossing the chasm means that not every coding agent user is a “cracked” gen z mit dropout slinging CURRENT_THING (ralph/gastown/codex/goose/amp/whatever) in 12 parallel tmux sessions at once making $250k base fluent in the entire YC startup stack. the world is much much bigger than just SV, and the tech is only 1/2 the story in making IT useful/productive for actually everyone, not just the people who already talk and think like you. the Devin is in the Details. *ok it was a culture shock for me too but you see multiple 8figure deals ramp up >10k users per org again and again and you catch on real quick what a fucking giant revenue machine a scaled agent lab looks like also theres plenty parallels for students of tech history - see how the system integrators embraced the shit out of the RPA boom. stuff like is is (admittedly with hindsight) only nonobvious if you (I included) were basically born yesterday as far as enterprise IT is concerned
we thought AI would kick down indian service companies and that genai startups would go straight for the client and wipe the middlemen out. but, something far more hilariously ironic happened. startups like cognition aren’t hunting clients at all. they’re selling to infosys.
Don't forget to catch up on our holiday drops - here's @ashvinair from @cursor_ai / @openai reasoning team on the state of RL research!
Had a great chat with @swyx at NeurIPS about RL research past and present https://www.youtube.com/watch?v=4JHXU1Cpcsc
View quoted postdoes anyone have good memes/ is there a relevant xkcd about how when theres a 5 LOC PR everyone criticises it deeply, but when its >1000 LOC then it just gets "LGTM"? trying to make one of my famous charts but it doesnt have the "funny" factor
RT Ashvin Nair Had a great chat with @swyx at NeurIPS about RL research past and present https://www.youtube.com/watch?v=4JHXU1Cpcsc Original tweet: https://x.com/ashvinair/status/2009061569548959902
RT Andrej Karpathy New post: nanochat miniseries v1 The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent. For the first public release of nanochat my focus was on end-to-end pipeline that runs the whole LLM pipeline with all of its stages. Now after YOLOing a few runs earlier, I'm coming back around to flesh out some of the parts that I sped through, starting of course with pretraining, which is both computationally heavy and critical as the foundation of intelligence and knowledge in these models. After locally tuning some of the hyperparameters, I swept out a number of models fixing the FLOPs budget. (For every FLOPs target you can train a small model a long time, or a big model for a short time.) It turns out that nanochat obeys very nice scaling laws, basically reproducing the Chinchilla paper plots: Which is just a baby version of this plot from Chinchilla: Very importantly and encouragingly, the exponent on N (parameters) and D (tokens) is equal at ~=0.5, so just like Chinchilla we get a single (compute-independent) constant that relates the model size to token training horizons. In Chinchilla, this was measured to be 20. In nanochat it seems to be 8! Once we can train compute optimal models, I swept out a miniseries from d10 to d20, which are nanochat sizes that can do 2**19 ~= 0.5M batch sizes on 8XH100 node without gradient accumulation. We get pretty, non-itersecting training plots for each model size. Then the fun part is relating this miniseries v1 to the GPT-2 and GPT-3 miniseries so that we know we're on the right track. Validation loss has many issues and is not comparable, so ins...
Observation: Hyperengineering is the only form of LLM Psychosis that pays.
RT Erik Torenberg Two updates on the New Media Fellowship: 1. We’re creating different “tracks” to encompass the range of talent we’ve seen since starting the program: - Marketer track (CMO or VP of marketing) - Comms track - Writer track - Poster track - Creator track 2. To accommodate this greater range of people, we’re extending the application deadline to Jan 15th. Fellows so far include @jxnlco, @swyx, @SarahChieng, @creatine_cycle, @pallipau, @covacut, and @thejamesreina Original tweet: https://x.com/eriktorenberg/status/2008964988010922423
that oai failed to turn ChatGPT's 900m weekly users into any form of lasting social app is probably the biggest consumer ai miss of 2025* you can argue that oai did just fine ($157B -> $750B) NOT doing this, but you don't know the althistory where OAI suddenly became a full social network. X is $230B on 600m X MAU / 40m Grok MAU, in a world where every AI user is worth $5750 then the best-case OAI valuation would be around $5T rn. *yes i do like sora but no it's not a serious social network yet
xAI has the most unique strategy and culture imo. It’s the only AI lab that owns a social media: X. This means two things: real-time data and ~250M daily users. Elon’s play is clear: Push Grok everywhere on X. Hence “maximum truth-seeking AI,” bikini-on-everything, Ani the
View quoted postThe core insight here: - We know agents are great at filesystems because the models have been trained on coding tasks which operate on large filesystems - So, we all migrated our agent inputs to have file system representation - And this *also* extends to past context. Previous
View quoted postRe this talk has now 4xed and overtaken both their previous talks. should be our second millionaire talk in 3-5 months
ask and you shall receive first ride in the modern horseless carriage of our time
@isabelle_zhou @zoox ok how do i get one of these trials!! i was also early cruise and waymo
View quoted postcant believe the entire time that @ml_angelopoulos was talking to me congratulating him on his $100m raised that he already raised $150m more haha
🆕 Congrats to @arena for raising their $1.7B Series A! We did the most recent interview of @ml_angelopoulos at NeurIPS, and are reupping this now: https://www.youtube.com/watch?v=NBnOk0Uy9ig - What's all the money for? - high level numbers: 250M+ conversations on the platform, tens of millions
View quoted postRT Laude Institute Openness is infrastructure. Without shared benchmarks, shared code, and real coordination, AI progress slows - no matter how much money gets spent. @andykonwinski went on @latentspacepod with @swyx to talk about what the field is getting wrong - and how to fix it: https://youtu.be/ZagdY6UJYL4?si=F5vW19qzasGykyf1 Original tweet: https://x.com/LaudeInstitute/status/2008603261994242215
RT The Linux Foundation Re @LatentSpacePod hosted the Linux Foundation’s Jim Zemlin with leaders from Anthropic, OpenAI, and Block to unpack how the Agentic AI Foundation (AAIF) came together, why neutrality and open governance matter, and the momentum building within the foundation. #OpenSource #AgenticAI 🎧 https://hubs.la/Q03ZzmDV0 Original tweet: https://x.com/linuxfoundation/status/2008539370802893238
i find Twitter/myself quite miscalibrated on how enterprise ai koding adoption goes. here's some datapoints from an enlightening @cognition internal presentation i saw today (combined @windsurf x @devinai deal): - time from first intro to POC - 2 months - time to first 6 countries - 2 months - time to next 11 countries - 2 months - this month - full roll out to 40x more users across all geos - 8 figure ARR, multi year deal - account team size including fwd deployed engs: 4 - every onsite drives 150%-400% jump in usage - huge power law in users: many casuals, some basically made 10xing output using AI their entire identity multiply this by differnt stages across the fortune 500. its funny because as a bottoms-up guy i'm used to cohorting users by date of signup. but when one company has >10k users of the product then a company can be its own cohort. by this metric one of the things i love seeing the most (and the fact that cog tracks it) is that time-to-real-prod-rollout-wide-usage-milestone is accelerating by 2 months, which is the final boss eval.
excited to kick off the year by dropping @trq212's full Claude Agent SDK workshop from AlE CODE POV video to give you an idea of how insanely packed this one was also peek at the incredible venue at @datadoghq - was very grateful to have their support esp since they were right above the conf venue + accepted our badges!
🆕 Claude Agent SDK [Full Workshop] https://www.youtube.com/watch?v=TqC1qOfiVcQ For our first big drop of the year, excited to bring you @trq212's full 2 hour workshop covering all of @AnthropicAI's agentic SDK (formerly known as Claude Code SDK). By far the most popular workshop of AIE CODE!
View quoted postRe @im_bcooney @interaction the view email experience, doesnt seem to look back past the most recent email in a thread
the way that @turbopuffer started late but overtook Pinecone and ripped out 4-5m ARR contracts needs to be studied in devtools harvard business school case studies
Here is my latest article on the world of databases: https://www.cs.cmu.edu/~pavlo/blog/2026/01/2025-databases-retrospective.html All the hot topics from the last year: • More Postgres action! • MCP for everyone! • MongoDB gets litigious with FerretDB! • File formats! • Market movements! • The richest person in the world!
View quoted postthis might be the fastest way to tune the x feed rn (settings -> timeline -> post interaction) default twitter makes it very hard/inconvenient to downrank posts (need 2 clicks w very small surface area). now can just swipe left on clickbait. combine w tip below (x’s new recsys generated a lot of slop it thinks you want) and i think timeline hygiene is the best its ever been
guys u should check out what twitter has put as your interests (settings -> privacy and safety -> content you see -> interests)
in Singapore malls they literally use computer vision models and OCR every license plate so that you can look up your car and get personalized directions to your car from anywhere in the mall
RT Hassan I crossed 2 years at Together AI today! 🎉 I've been reflecting on how it all started. A quick X DM that led to interviews and a few weeks later, I'm flying to SF to spend my first month working in-person with the team. Fun fact: this is my 3rd job in a row that I got from X. Building in public works! Thank you @swyx for teaching me the value of learning in public. It's been super rewarding getting to build with the latest open source AI models, witnessing the crazy growth of the business first-hand, and working with driven & talented people. It's funny, I used to list everything I shipped every work anniversary. This time, I'm most proud of the team I've built over anything else. I genuinely got lucky with the quality & talent of everyone who joined. Onward to year 3 and to more open source AI apps, tutorials, videos, and more 🚀 Original tweet: https://x.com/nutlope/status/2007155333140230305
Ted Chiang’s Understand is the most inpt short story I’ve read about LLM benchmarking/evals/safety and bro wrote it in 1991 in our NeurIPS pod @jyangballin alerted me to ImpossibleBench, which measures how quickly models realize theyve been given an impossible task. We are going to need CouldDoButChooseNotToBench because we absolutely know that models are aware when they are being tested... more broadly this whole short story is a peek into the “chain of thought” of the first real AGI. I’m not sure most of us take this seriously enough.
whos gonna tell him
@nopainkiller This is ridiculous! 40B beating 1T models on terminal bench?!?
View quoted postRT Shashi 🇬🇧 Just listened to the interview of @andykonwinski by @swyx came out of @NeurIPSConf #NeurIPS2025 @LaudeInstitute Laude Lounge. The discussion on GEPA and Agent Optimization resonated with my research paper + library I just published. I noticed the prompt optimization itself is not enough in the Agentic loop after talking many agent builders. It needs to optimize all components and layers of the Agentic Systems e.g RAG. tools, memory and context. "Post-Post" Training becomes very important. That's why I built "SuperOpt" Agentic Environment Optimization for Autonomous AI Agents. This work is highly inspired by Agentic Context Engineering and GEPA thank to authors @LakshyAAAgrawal @matei_zaharia @dilarafsoylu @ChrisGPotts @lateinteraction @NoahZiems @kristahopsalong (others). This is early stage research and still experimenting with more coding agents (Feedback Welcome) P.S: I am not FT researcher but digging deeper into Agent Optimization because I truly believe until optimzation is solved agentic systems are not going to closer to prod. Original tweet: https://x.com/Shashikant86/status/2006823679901012442
🎉 SuperOpt Research Paper 📑 + Library 📚 is LIVE! @SuperagenticAI launches SuperOpt: a new approach to Agent Optimization 🎙️Just published SuperOpt: Research on Agentic Environment Optimization for Autonomous AI Agents, introducing a full-stack, unified framework for
View quoted postRT Simon Willison Here's my enormous round-up of everything we learned about LLMs in 2025 - the third in my annual series of reviews of the past twelve months https://simonwillison.net/2025/Dec/31/the-year-in-llms/ This year it's divided into 26 sections! This is the table of contents: Original tweet: https://x.com/simonw/status/2006514122977063350
Happy New Year from 🇮🇩! looking at how things accelerated from 2022 to 2023 to 2024 to 2025 year end, I genuinely cannot wait to see what’s in store this year.
If someone knows anyone who runs one of these “AI Slop” channels; esp Pouty Frenchie on youtube, I’d love to interview you with full anonymity for @latentspacepod! we just want to tell the technical/business/human story behind it!
You can literally make millions of dollars by making young kids addicted to brain rot. This should be forbidden. I dislike it.
Re @igorcosta @dexhorthy @badlogicgames @humanlayer_dev @mitsuhiko oxford style https://www.versytalks.com/blog/a-complete-oxford-debate-guide-by-today-s-experts https://www.intelligencesquared.com/about/ https://www.youtube.com/watch?v=JZRcYaAYWg4
RT Mario Zechner Recommended viewing. @dexhorthy on context engineering and avoiding the dumb zone. Only 20 minutes and contains all you need to know. I really like @humanlayer_dev. They cut through the bullshit and condense what (often, most likely) works. https://youtu.be/rmvDxxNubIg?si=Q7Zi8tItSHZmZccX Original tweet: https://x.com/badlogicgames/status/2005926955707773256
RT Lalit M Congratulations to @hidecloud, @peakji, and the @ManusAI team. Manus has been providing us with reliable agents for a while this acquisition is just an outcome of that. A new video just dropped from @aiDotEngineer and @swyx. btw do read blogs by @ManusAI - https://manus.im/blog they're pretty good! Original tweet: https://x.com/lalitmadan/status/2005889079410581853
RT BradWMorris absolute savage @Steve_Yegge the John Deere era of software engineering on @latentspacepod , great convo w/@swyx Original tweet: https://x.com/bradwmorris/status/2005886260808691961
if you're wondering what @natfriedman saw in Manus, my team had immaculate timing to drop the Manus AIE workshop video today :)
🆕Apropos of completely nothing, here's the full 1.5 hour workshop from @ivanleomk introducing @ManusAI and the Manus API: https://youtu.be/xz0-brt56L8 Six ways to use Manus: - WEB APP: Full-stack applications with backends, databases, authentication, and Al capabilities. - API:
View quoted postprobably the single most helpful vibe coding tip i can offer that i havent seen anyone talk about yet: in development, log ~every execution step out with the idea of helping your LLM debug its own code help your LLM help you. one upfront intentional investment in logging, then you just copy paste bugs and execution traces to modify (even dynamically linked runtime) behavior with high certainty that the LLM will "get" what you need.
if we were using last year's calendar as any guide we should have gotten DeepSeek V4 by now in preparation for DeepSeek R2 everything ok whalebros? the world is rooting for you.
[26 Dec 2024] DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens https://buttondown.com/ainews/archive/ainews-deepseek-v3-671b-finegrained-moe-trained/
I’ve been maining @GeminiApp* while on holiday and I think it might be sticking. It turns out that “have less false positive refusals” is a real feature I actually value over (leading alternatives). other callouts: - @NanoBanana Pro is the closest to AGI for content creators ive felt all year - Veo is good ofc I’m just not that patient - YouTube/Gsuite integration is ofc their unfair advantage - better search index because Google? (try to chatgpt a nytimes article) - good enough memory - worse voice than OAI or @grok - worse instruction following - slightly worse ab testing/app ux design - Gems is good but it could use a Projects concept (outside of NotebookLM) *was given free Ultra to try out everything, disclaimer
RT Austin King gonna start posting more. i attended @swyx ai engineer conference in nyc this year and was made aware twitter is required reading. i haven’t been disappointed. looking forward to contributing more to the conversation around ai, specifically ai coding. Original tweet: https://x.com/austron_24/status/2005460263169130841
btw getting an abnormal amount of lovely youtube comments for the @Steve_Yegge pod on Vibe Coding. he is of course an S tier ranter, and to some extent i knew i was just there to give a prompt and let him loose, but i think theres a certain gravity to the fact that it was HIM saying these hypey things. You can get excited and yap on about the potential of vibe coding as a 20something anon build in public hustler or midlife crisis nontechnical hasbeen marveling at a pretty purple brochure website, but when it’s Steve goddamn Yegge, who has done all the hard things at early Amazon, Google, and Grab, from assembly to databases to OSes to games, people do sit up and take notice. pod link since you read all the way down here https://youtu.be/zuJyJP517Uw?si=35Cpy_McGG8jvhA5 gratifying to be able to create 2 good platforms for ai engineering debates to shine through.also dont miss him and @RealGeneKim serving it up last month https://youtu.be/7Dtu2bilcFs?si=OvvTUfA1L4TmX0eC already one of the top talks of AIE CODE.
Another banger from the @latentspacepod. Really great convo @swyx and @Steve_Yegge . Go queue it up peeps! https://open.spotify.com/episode/20iTChEyuXaXryZOVAJoSi?si=RuT0UKCKQnWxmWhSa3SScQ&t=2100&pi=iVVFhwSKQCeH2
View quoted postRT Latent.Space Good time to share that we're releasing some great end of year recap pods every day for your holiday listening! just posted: a great end of year convo recapping @OpenAI Codex and GPT5-Codex-Max with @bfioca and @realchillben! Original tweet: https://x.com/latentspacepod/status/2005057429219008526
lots of insights in this @swyx AIE interview on models, harnesses, and how the Codex team thinks about optimizing a model for code gen question I’ve seen floated on tpot: if the model memorizes all economically valuable tasks, does it matter that it doesn’t generalize? the
View quoted post