HH

Hamel Husain

0 位关注者
787 条内容
16最近 7 天 条

简介

Evals evals evals https://t.co/Zrmp6LRd9c About Me: https://t.co/P6WyeKkyTa

平台

𝕏Hamel Husain

内容历史

HH
Hamel Husain
𝕏x3 days ago

Run the following command and you can see some of what Codex is cooking. TIL they have remote_control too! > codex feature list P.S. its worth reading the manual

Run the following command and you can see some of what Codex is cooking.  TIL they have remote_control too!

> codex feature list 

P.S. its worth reading the manual
View on X
HH
Hamel Husain
𝕏x3 days ago
Retweeted from @Shreya

RT Shreya Shankar Looking forward to CHI this week! We have a ✨Best Paper ✨ on a "what-if" analysis tool for RAG. Reach out to chat! I'm interested in: MLOps/LLMOps, data analysis, and better interfaces for human-AI collaboration (and, very soon i'll be recruiting students/postdocs) Original tweet: https://x.com/sh_reya/status/2043436179643830517

RT Shreya Shankar
Looking forward to CHI this week! We have a ✨Best Paper ✨ on a "what-if" analysis tool for RAG. Reach out to chat! I'm interested in: MLOps/LLMOps, data analysis, and better inter...
View on X
HH
Hamel Husain
github4 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github4 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github4 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github4 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github4 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github4 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
𝕏x4 days ago
Retweeted from @Omar

RT Omar Khattab As promised, here's a recording of my 30-min keynote and the subsequent Q&A for the inaugural late interaction retrieval (LIR) workshop, cc @bclavie @antoine_chaffin. The talk is admittedly advanced, as it's directed at an expert IR community. But hopefully still broadly useful! Original tweet: https://x.com/lateinteraction/status/2043053506504925588

@Amélie Chatelain

Lots of people interested in the late Interaction workshop, listening to @lateinteraction's keynote!

Quoted tweet media 1
View quoted post
View on X
HH
Hamel Husain
github4 days ago

Activity on repository

hamelsmu created a branch

hamelsmu created a branch

View on GitHub
HH
Hamel Husain
𝕏x5 days ago
Retweeted from @Anthony

RT Anthony Morris ツ btw you can ssh into your Mac mini from Claude code desktop now Original tweet: https://x.com/amorriscode/status/2042733568410161326

@Anthony Morris ツ

@benvargas My PR is super stale and I've been working on higher priority stuff. Let me try to get this out by Friday.

View quoted post
View on X
HH
Hamel Husain
𝕏x5 days ago

People really need to be reading the prompts underneath off the shelf evals so they can learn how pointless they are. This is "faithfulness score" from RAGAS Read the Context, why the fuck is the statement factually consistent? If you use this in your "harness" good luck

People really need to be reading the prompts underneath off the shelf evals so they can learn how pointless they are. 

This is "faithfulness score" from RAGAS

Read the Context, why the fuck is th...
View on X
HH
Hamel Husain
𝕏x5 days ago
Retweeted from @Ben

RT Ben Vargas Not sure when this shipped, but just checked and ssh to mac is supported! Thanks @amorriscode Original tweet: https://x.com/benvargas/status/2042675707625771246

RT Ben Vargas
Not sure when this shipped, but just checked and ssh to mac is supported! Thanks @amorriscode
Original tweet: https://x.com/benvargas/status/2042675707625771246
@Anthony Morris ツ

@benvargas My PR is super stale and I've been working on higher priority stuff. Let me try to get this out by Friday.

View quoted post
View on X
HH
Hamel Husain
𝕏x6 days ago
Retweeted from @Bryan

RT Bryan Bischof fka Dr. Donut Sorry I couldn't quite hear you over ALPHA ZONE. But seriously check out my podcast it's called In Practice and it's weird and technical about real AI applications https://www.youtube.com/@theoryvc Original tweet: https://x.com/BEBischof/status/2042379114561282103

RT Bryan Bischof fka Dr. Donut
Sorry I couldn't quite hear you over ALPHA ZONE.

But seriously check out my podcast it's called In Practice and it's weird and technical about real AI applications h...
RT Bryan Bischof fka Dr. Donut
Sorry I couldn't quite hear you over ALPHA ZONE.

But seriously check out my podcast it's called In Practice and it's weird and technical about real AI applications h...
RT Bryan Bischof fka Dr. Donut
Sorry I couldn't quite hear you over ALPHA ZONE.

But seriously check out my podcast it's called In Practice and it's weird and technical about real AI applications h...
RT Bryan Bischof fka Dr. Donut
Sorry I couldn't quite hear you over ALPHA ZONE.

But seriously check out my podcast it's called In Practice and it's weird and technical about real AI applications h...
@Benyam Ephrem

the current state of production design in tech is laptop, table, books, wall

Quoted tweet media 1Quoted tweet media 2Quoted tweet media 3Quoted tweet media 4
View quoted post
View on X
HH
Hamel Husain
𝕏x6 days ago
Retweeted from @Thariq

RT Thariq you'll need to explicitly prompt Claude Code to use it, but the Monitor Tool is super powerful e.g. "start my dev server and use the MonitorTool to observe for errors" Original tweet: https://x.com/trq212/status/2042335178388103559

@Noah Zweben

Thrilled to announce the Monitor tool which lets Claude create background scripts that wake the agent up when needed. Big token saver and great way to move away from polling in the agent loop Claude can now: * Follow logs for errors * Poll PRs via script * and more!

View quoted post
View on X
HH
Hamel Husain
𝕏x6 days ago
Retweeted from @Harrison

RT Harrison Chase 🎙️Introducing Max Agency Max Agency is a new podcast where we go deep on how the best agents are actually being built: architecture decisions, tradeoffs, evals, and everything in between. Each episode, I sit down with engineering leaders who are doing this work in production. Our first episode features Izzy Miller (@isidoremiller), AI Engineer at Hex (@_hex_tech). Hex has been shipping data agents since before most teams were even thinking about them, starting with single-cell text-to-SQL and graduating to a full Notebook agent that can work autonomously for 20 minutes on a complex analysis. Izzy has a lot of perspective on what it actually takes to get agents working well in production, and what breaks along the way. A few takeaways from our conversation: - Keep your eval sets small enough to hold in your head: Izzy runs 30-50 handcrafted "traps" with multiple repetitions, rather than hundreds of variants. If you can't explain why your agent fails each one, your eval set is too big - Day zero performance is almost irrelevant: The more interesting question is how the agent compounds. Izzy is building a 90-day simulation where the warehouse evolves and the agent has to accumulate understanding - You can catch agent errors without seeing the raw outputs: By running an LLM-as-a-judge over production usage and clustering the results, you can surface places where something likely went wrong, without needing to read individual conversations Watch the full episode on: - Youtube: https://www.youtube.com/watch?v=Xyh1EqcjGME - Apple Podcasts: https://podcasts.apple.com/us/podcast/how-hex-builds-ai-agents-making-agents-reason-like/id1891551672?i=1000760489140 - Spotify: https://open.spotify.com/episode/1BJlg3SOJrjnaPXFHTNuux?si=bffc89cb4f774617 Original tweet: https://x.com/hwchase17/status/2042279493050740916

View on X
HH
Hamel Husain
𝕏x7 days ago
Retweeted from @Cursor

RT Cursor You can now run Cursor on any machine and control it from anywhere. Kick off agents from your phone to run on your devbox. Original tweet: https://x.com/cursor_ai/status/2041912812637966552

View on X
HH
Hamel Husain
𝕏x7 days ago
Retweeted from @Mario

RT Mario Zechner People of pi. BIG NEWS. I've sold out. Let me know how you feel about this in the comments below. https://mariozechner.at/posts/2026-04-08-ive-sold-out/ Original tweet: https://x.com/badlogicgames/status/2041808475336941725

View on X
HH
Hamel Husain
𝕏x8 days ago
Retweeted from @Chris

RT Chris Tate New Skill: Email Emulation Test magic links, verification codes w/o sending real emails → Send via the Resend SDK → Retrieve emails from a local inbox → Extract codes to complete auth flows → One env var to reroute traffic npx skills add vercel-labs/emulate --skill resend Original tweet: https://x.com/ctatedev/status/2041654204771500547

View on X
HH
Hamel Husain
𝕏x9 days ago
Retweeted from @Alexis

RT Alexis Gallagher My friend @HamelHusain interviewed me about Sparky. (This was recorded back in February, before Sparky and I went to NVIDIA GTC.) https://youtu.be/LcupCy9loxY?si=Aw5_TC3pYHgAqC1F Original tweet: https://x.com/alexisgallagher/status/2041293277849362708

View on X
HH
Hamel Husain
𝕏x9 days ago
Retweeted from @Kyle

RT Kyle Kelley It's official, http://nteract.io is back in action. Localfirst Desktop App for interactive computing, notebooks built in Frictionless REPLs for Humans and Agents iframed outputs, interactive widgets Original tweet: https://x.com/KyleRayKelley/status/2041167795921285426

View on X
HH
Hamel Husain
𝕏x10 days ago
Retweeted from @Han

RT Han http://x.com/i/article/2040694045102788609 Original tweet: https://x.com/HanchungLee/status/2040696176383853003

View on X
HH
Hamel Husain
𝕏x12 days ago

Re ex: @isaac_flath with this epic jailbreak attempt 🤣🤣 I'm shocked that OC resisted this

Re ex: @isaac_flath with this epic jailbreak attempt 🤣🤣

I'm shocked that OC resisted this
View on X
HH
Hamel Husain
𝕏x12 days ago
Retweeted from @dex

RT dex if you care about coding agents and tasteful software def go watch this talk by @badlogicgames it’s very good https://youtu.be/Dli5slNaJu0?si=Dm6__OAg1dlBx_9u Original tweet: https://x.com/dexhorthy/status/2040068971102408946

View on X
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on hamelsmu/hamel

hamelsmu closed a pull request in hamel

hamelsmu closed a pull request in hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu deleted

hamelsmu deleted

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on repository

hamelsmu pushed hamel

hamelsmu pushed hamel

View on GitHub
HH
Hamel Husain
github13 days ago

Activity on hamelsmu/hamel

hamelsmu opened an issue in hamel

hamelsmu opened an issue in hamel

View on GitHub
HH
Hamel Husain
𝕏x13 days ago

Yup

Yup
@Bryan Bischof fka Dr. Donut

Everything is dead. I'm sick of it. Here's our answer: https://www.rip-grep.com/

Quoted tweet media 1
View quoted post
View on X
HH
Hamel Husain
𝕏x14 days ago
Thread • 2 tweets

There are lots of other categories too. Great work from @BEBischof & @adam__conway Nice data vizAmazing Meme Project backed by real data https://www.rip-grep.com/ of all the things that are "dead" ex: > RAG is extremely Dead. And even though it has died 12 times, this time is definitely for real. It’s probably good to avoid this category as an investor and instead focus on Anthropic secondaries. 🤣🤣🤣 Even calls out the top tweets

There are lots of other categories too.  Great work from @BEBischof & @adam__conway  Nice data vizAmazing Meme Project  backed by real data https://www.rip-grep.com/ of all the things that are "dea...
There are lots of other categories too.  Great work from @BEBischof & @adam__conway  Nice data vizAmazing Meme Project  backed by real data https://www.rip-grep.com/ of all the things that are "dea...
View on X
HH
Hamel Husain
𝕏x14 days ago
Retweeted from @Bryan

RT Bryan Bischof fka Dr. Donut Everything is dead. I'm sick of it. Here's our answer: https://www.rip-grep.com/ Original tweet: https://x.com/BEBischof/status/2039360923773632977

RT Bryan Bischof fka Dr. Donut
Everything is dead.

I'm sick of it.

Here's our answer: https://www.rip-grep.com/
Original tweet: https://x.com/BEBischof/status/2039360923773632977
View on X
HH
Hamel Husain
𝕏x14 days ago
Retweeted from @Tibo

RT Tibo Our Codex dashboards are showing increased rate of users hitting rate limits and since we don't fully understand why I have made the cautious decision of resetting the usage limits for all plans. Enjoy. I also wanted to celebrate us finding a pocket of fraudulent accounts that we banned and have helped us regain some compute. The fight against abuse never stops, but it's important to mark the moment and make it a little shared victory. Original tweet: https://x.com/thsottiaux/status/2039248564967424483

View on X
HH
Hamel Husain
𝕏x14 days ago

This looks really cool

@Kyle Kelley

Announcing nteract 2.0 🎉 https://www.nteract.io/blog/nteract-2.0

View quoted post
View on X
HH
Hamel Husain
𝕏x15 days ago
Retweeted from @Scott

RT Scott Wu Devin Review caught the axios supply chain attack for multiple Cognition customers before the attack was publicly known. These attacks will be 10x more frequent in the age of AI; it is critical that repo maintainers start using AI for defense as well. (showing one example below where Devin Review caught the attack within an hour of its release - text minorly edited for anonymization) Original tweet: https://x.com/ScottWu46/status/2038865428693332094

RT Scott Wu
Devin Review caught the axios supply chain attack for multiple Cognition customers before the attack was publicly known.

These attacks will be 10x more frequent in the age of AI; it is...
View on X
HH
Hamel Husain
𝕏x16 days ago

This seems useful

This seems useful
@dominik kundel

I built a new plugin! You can now trigger Codex from Claude Code! Use the Codex plugin for Claude Code to delegate tasks to Codex or have Codex review your changes using your ChatGPT subscription. Start by installing the plugin: http://github.com/openai/codex-plugin-cc

View quoted post
View on X
HH
Hamel Husain
𝕏x16 days ago
Retweeted from @Claude

RT Claude Computer use is now in Claude Code. Claude can open your apps, click through your UI, and test what it built, right from the CLI. Now in research preview on Pro and Max plans. Original tweet: https://x.com/claudeai/status/2038663014098899416

View on X
HH
Hamel Husain
𝕏x16 days ago
Retweeted from @dex

RT dex when people ask about custom tools vs. letting users bring MCPs, the answer is always "both". Custom tools take work and taste, MCPs give flexibility but will always lead to lower quality results 1) for high-volume tools (e.g. Read/Write/Edit in a coding agent) build these as first-class tools 2) for long tail stuff like 'fetch data from random saas', let users bring MCPs 3) LOOK AT YOUR F****** DATA (thanks @HamelHusain ) 4) The most popular MCPs, turn these into first-class tools in your system 5) repeat until AGI another dope episode with @vaibcode Original tweet: https://x.com/dexhorthy/status/2038648255358394576

View on X
HH
Hamel Husain
𝕏x17 days ago
Retweeted from @Bryan

RT Bryan Bischof fka Dr. Donut too real Original tweet: https://x.com/BEBischof/status/2038471833729876447

View on X
HH
Hamel Husain
𝕏x17 days ago
Retweeted from @Boris

RT Boris Cherny I wanted to share a bunch of my favorite hidden and under-utilized features in Claude Code. I'll focus on the ones I use the most. Here goes. Original tweet: https://x.com/bcherny/status/2038454336355999749

View on X
HH
Hamel Husain
𝕏x17 days ago
Retweeted from @Anthony

RT Anthony Morris ツ This 6 year old used Claude Code desktop to build a space game. Couldn't be more excited for the future. Original tweet: https://x.com/amorriscode/status/2038384070045151588

RT Anthony Morris ツ
This 6 year old used Claude Code desktop to build a space game. Couldn't be more excited for the future.
Original tweet: https://x.com/amorriscode/status/2038384070045151588
View on X
HH
Hamel Husain
𝕏x18 days ago

When claude and codex review each other's work I get very consistent comments from each: > Claude: this is over engineering. > Codex: this is sloppy. 🤣 I've looked at each, and they are both right ~ 50% of the time.

View on X
HH
Hamel Husain
𝕏x19 days ago
Retweeted from @Charles

RT Charles 🎉 Frye still hiring! http://modal.jobs. Original tweet: https://x.com/charles_irl/status/2037645043981574271

RT Charles 🎉 Frye
still hiring!

http://modal.jobs.
Original tweet: https://x.com/charles_irl/status/2037645043981574271
@Charles 🎉 Frye

we're hiring btw https://modal.jobs

Quoted tweet media 1
View quoted post
View on X
HH
Hamel Husain
𝕏x19 days ago
Retweeted from @Matt

RT Matt Stockton This is really fantastic. I agree with so many of these points made. "Classical" Machine Learning skills are incredibly valuable right now, and they will become even more valuable as folks realize the things @HamelHusain is pointing out here (likely through battlescars acquired from off-the-rails AI products) I'm building a lot of agentic AI systems, and honestly feel like I have super-powers given my more classical MLE background (combined with knowledge of how to use the agent harnesses, etc.) If you are building agentic AI stuff, and don't have the background - that's fine, but you should spend some time learning things. This is a great post to start pointing you in some good directions. Original tweet: https://x.com/mstockton/status/2037573815543206220

@Hamel Husain

I hand wrote this the slow way. Was a good feeling

View quoted post
View on X
HH
Hamel Husain
𝕏x19 days ago
Retweeted from @Marc

RT Marc Hatton Neat artisanal writing from @HamelHusain Is data science dead? No... - Trace reading → EDA - LLM judge validation → Model Eval - Test set building → Experimental Design - Expert labeling → Data Collection - Prod monitoring → Production ML Original tweet: https://x.com/marchattonhere/status/2037433700841889995

@Hamel Husain

http://x.com/i/article/2037041238030114819

View quoted post
View on X
HH
Hamel Husain
𝕏x20 days ago

Maybe token austerity will force people to make valuable things (ex: not Twitter reply bots)

View on X
HH
Hamel Husain
𝕏x20 days ago
Retweeted from @Erik

RT Erik Bernhardsson Re @graceisford Every company in 2030 is a neocloud or neolab or neofirm Original tweet: https://x.com/bernhardsson/status/2037313572296917230

View on X
HH
Hamel Husain
𝕏x20 days ago
Retweeted from @Gergely

RT Gergely Orosz I explained in today’s @Pragmatic_Eng newsletter: That’s a repo where OpenAI is merging external contributions. Some are made by Claude Code, some with GitHub Copilot, some with Codex. Codex doesn’t add itself as a contributor - on purpose - that’s why. https://newsletter.pragmaticengineer.com/p/the-pulse-is-github-still-best-for Original tweet: https://x.com/GergelyOrosz/status/2037252214486393065

RT Gergely Orosz
I explained in today’s @Pragmatic_Eng newsletter:

That’s a repo where OpenAI is merging external contributions. Some are made by Claude Code, some with GitHub Copilot, some with C...
@Hamel Husain

I was shocked to learn this is true https://github.com/openai/parameter-golf

View quoted post
View on X
HH
Hamel Husain
𝕏x20 days ago
Retweeted from @James

RT James Cham I read this the slow, traditional way and it was very good! Original tweet: https://x.com/jamescham/status/2037249006901092852

@Hamel Husain

http://x.com/i/article/2037041238030114819

View quoted post
View on X
HH
Hamel Husain
𝕏x20 days ago

I hand wrote this the slow way. Was a good feeling

@Hamel Husain

http://x.com/i/article/2037041238030114819

View quoted post
View on X
HH
Hamel Husain
𝕏x20 days ago

I was shocked to learn this is true https://github.com/openai/parameter-golf

@NZ ☄️

OpenAI's latest repo has Claude as the third top contributor 😭😂

Quoted tweet media 1
View quoted post
View on X
HH
Hamel Husain
𝕏x20 days ago
Retweeted from @Thiyagarajan

RT Thiyagarajan Maruthavanan (Rajan) Every data scientist will rebrand as a harness engineer within 18 months. Original tweet: https://x.com/mtrajan/status/2037214298402152795

@Hamel Husain

http://x.com/i/article/2037041238030114819

View quoted post
View on X
HH
Hamel Husain
𝕏x20 days ago
Retweeted from @Bryan

RT Bryan Bischof fka Dr. Donut We need you! (To bring your intuition and problem framing to an industry overrun by influencers and trend followers) Original tweet: https://x.com/BEBischof/status/2037186501977776321

@Hamel Husain

http://x.com/i/article/2037041238030114819

View quoted post
View on X
HH
Hamel Husain
𝕏x20 days ago

http://x.com/i/article/2037041238030114819

View on X
HH
Hamel Husain
𝕏x21 days ago
Retweeted from @Bryan

RT Bryan Bischof fka Dr. Donut if all press is good press and no news is good news then - f(x)=f(−x) and - f(0)=f(r) ∀ r≥0, so the only invariant is constant and ℝ quotients to a point – everything is good. Original tweet: https://x.com/BEBischof/status/2036917037851959445

View on X
HH
Hamel Husain
𝕏x22 days ago
Retweeted from @Bryan

RT Bryan Bischof fka Dr. Donut Hamel was the first talk of the day in my track with a Talk title that we’ve been throwing around for over a year. Original tweet: https://x.com/BEBischof/status/2036594140352487796

@Prefect

@HamelHusain brought the memes and eval hats to PyAI Conf 🐍 In his talk, he walks through five common eval mistakes: generic metrics, unverified judges, poor experimental design, bad data and labels, and automating too much. And his fix to avoid these pitfalls? Let's just

View quoted post
View on X
HH
Hamel Husain
𝕏x22 days ago

It was fun giving this meme packed presentation titled “The Revenge of The Data Scientist”💪

@Prefect

@HamelHusain brought the memes and eval hats to PyAI Conf 🐍 In his talk, he walks through five common eval mistakes: generic metrics, unverified judges, poor experimental design, bad data and labels, and automating too much. And his fix to avoid these pitfalls? Let's just

View quoted post
View on X
HH
Hamel Husain
𝕏x22 days ago
Retweeted from @Lenny

RT Lenny Rachitsky Engineering job openings are at the highest levels we’ve seen in over 3 years There are over 67,000 (!!!) eng openings at tech companies globally right now, with 26,000 just in the U.S. We don’t know if there would have been more open roles if not for AI or if AI is actually leading to more open roles, but since the start of this year, the increase in open eng roles is accelerating even more. Original tweet: https://x.com/lennysan/status/2036535460726767793

RT Lenny Rachitsky
Engineering job openings are at the highest levels we’ve seen in over 3 years

There are over 67,000 (!!!) eng openings at tech companies globally right now, with 26,000 just in ...
@Lenny Rachitsky

STATE OF THE PRODUCT JOB MARKET IN EARLY 2026 In spite of the headlines about layoffs and AI taking jobs, we’re actually seeing a lot of promising signs in tech hiring, and some interesting new trends: 1. PM openings are at the highest levels we’ve seen in over three years 2. AI

Quoted tweet media 1
View quoted post
View on X
HH
Hamel Husain
𝕏x22 days ago

Someone sent me a coding challenge that requires you to build a LLM Judge that produces a 1-5 score on correctness, clarity, neutrality - together!

Someone sent me a coding challenge that requires you to build a LLM Judge that produces a 1-5 score on correctness, clarity, neutrality - together!
View on X
HH
Hamel Husain
𝕏x22 days ago

re: LiteLLM exploit - if you like to re-write all your software from scratch "NIH" today is your redemption

View on X
HH
Hamel Husain
𝕏x22 days ago
Retweeted from @Simon

RT Simon Willison Thankfully the LiteLLM package has now been marked as "quarantined" on PyPI so attempting to install the compromised update via pip et al shouldn't work Original tweet: https://x.com/simonw/status/2036451896970584167

RT Simon Willison
Thankfully the LiteLLM package has now been marked as "quarantined" on PyPI so attempting to install the compromised update via pip et al shouldn't work
Original tweet: https://x....
@Daniel Hnyk

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

View quoted post
View on X
HH
Hamel Husain
𝕏x23 days ago
Retweeted from @Doug

RT Doug Turnbull Cheat at Search Essentials, coming back :) Free "Retrieval 101" course. Don't know what BM25 is? Or embedding based retrieval? Or how to spell NDCG? This is the class for you :) Three part series, links below Original tweet: https://x.com/softwaredoug/status/2036235386851074183

RT Doug Turnbull
Cheat at Search Essentials, coming back :) 

Free "Retrieval 101" course. Don't know what BM25 is? Or embedding based retrieval? Or how to spell NDCG?

This is the class for you :)...
View on X
HH
Hamel Husain
𝕏x25 days ago

re: Software without APIs are going to die. I am already using the Claude Chrome extension to interact with internal APIs of web applications to do things through agents. Claude is really good about reverse engineering internal APIs (b/c it has access to the dev console), and programmatically perform tasks. And ofc I just document this in a skill

View on X
HH
Hamel Husain
𝕏x25 days ago

I have been using this - its a very nice addition. Still not quite as nice of a UX as Claw, mainly because it doesn't chat with you in the foreground For example, if you ask it to do something, it often doesn't say anything back until its done doing that thing (so you wonder if its died or not), which is not ideal especially for larger coding tasks. The long tail of UX really matters I think

@Thariq

We just released Claude Code channels, which allows you to control your Claude Code session through select MCPs, starting with Telegram and Discord. Use this to message Claude Code directly from your phone.

View quoted post
View on X
HH
Hamel Husain
𝕏x25 days ago
Retweeted from @Shreya

RT Shreya Shankar Fun article on plugging together auto research-style search loops with qualitative coding-style evaluators. I am very optimistic about this approach on non-verifiable (ie subjective) tasks Original tweet: https://x.com/sh_reya/status/2035407816488550881

@George from 🕹prodmgmt.world

http://x.com/i/article/2034580623201824768

View quoted post
View on X
HH
Hamel Husain
𝕏x25 days ago
Retweeted from @George

RT George from 🕹prodmgmt.world http://x.com/i/article/2034580623201824768 Original tweet: https://x.com/nurijanian/status/2035257434365976671

View on X
HH
Hamel Husain
𝕏x26 days ago
Retweeted from @Randy

RT Randy Olson We've opened up the Tufte Test as a free, limited-use public API endpoint. Send any chart URL, get back a pass/fail verdict and specific feedback on what failed. No account required. Full details and a copyable example at the bottom of the article: https://www.goodeyelabs.com/insights/the-tufte-test Original tweet: https://x.com/randal_olson/status/2035138282280165830

@Randy Olson

This week, I encoded Edward Tufte's data visualization principles into an API. Then I let an AI agent try to pass it. I gave @ManusAI a CSV of women's bachelor's degree percentages across STEM fields (1970-2011) and one prompt: visualize this data. It produced a standard chart.

Quoted tweet media 1Quoted tweet media 2
View quoted post
View on X
HH
Hamel Husain
𝕏x26 days ago
Retweeted from @Randy

RT Randy Olson This week, I encoded Edward Tufte's data visualization principles into an API. Then I let an AI agent try to pass it. I gave @ManusAI a CSV of women's bachelor's degree percentages across STEM fields (1970-2011) and one prompt: visualize this data. It produced a standard chart. Correct data, readable axes, nothing wrong. But a legend box instead of direct labels. No annotations calling out the rise and fall of women in Computer Science. Default colors. This is what every AI agent produces right now. So I pointed it at the Tufte Test, a quality standard I built in Truesight that checks charts against seven of Tufte's core principles. The API came back: fail on direct labeling and integrated annotations. Five other criteria passed. A quality standard gives an agent something a vague prompt never can: a precise list of exactly what to fix. Manus revised on its own. Legend box became direct endpoint labels. A subtitle surfaced the key insight. An annotation marked the Computer Science peak at 37.1% in 1983. Two prompts total from me. Everything else was autonomous. Any AI agent that can call an API could do this. What matters is the pattern: encode expert judgment once, deploy it as an API, and every AI agent in your stack builds against it. Your taste becomes infrastructure at scale instead of manual review. The Tufte Test is available as a template in Truesight if you want to try it on your own charts. Full writeup + demo video: https://www.goodeyelabs.com/insights/the-tufte-test Original tweet: https://x.com/randal_olson/status/2034978267397313021

RT Randy Olson
This week, I encoded Edward Tufte's data visualization principles into an API. Then I let an AI agent try to pass it.

I gave @ManusAI a CSV of women's bachelor's degree percentages ...
RT Randy Olson
This week, I encoded Edward Tufte's data visualization principles into an API. Then I let an AI agent try to pass it.

I gave @ManusAI a CSV of women's bachelor's degree percentages ...
View on X
HH
Hamel Husain
𝕏x27 days ago
Retweeted from @Bryan

RT Bryan Bischof fka Dr. Donut Ai influencers be like Original tweet: https://x.com/BEBischof/status/2034825827016425807

RT Bryan Bischof fka Dr. Donut
Ai influencers be like
Original tweet: https://x.com/BEBischof/status/2034825827016425807
@Adam Azzam

which is better

Quoted tweet media 1Quoted tweet media 2
View quoted post
View on X
HH
Hamel Husain
𝕏x27 days ago

It really is like being an ML engineer - how much compute to spend? - when outputs are stochastic, how to measure & test? - how do I run experiments? - looking at data to form better hypotheses ML engineers are so back

@Thariq

an increasingly large part of the job of an engineer is deciding how much compute to spend on a problem

View quoted post
View on X
HH
Hamel Husain
𝕏x27 days ago
Retweeted from @Bryan

RT Bryan Bischof fka Dr. Donut Original tweet: https://x.com/BEBischof/status/2034708135022325921

RT Bryan Bischof fka Dr. Donut

Original tweet: https://x.com/BEBischof/status/2034708135022325921
@Imbue

Your parallel agents needed scalable test coverage yesterday Introducing Offload: a Rust CLI that spreads your test suite across 200+ @Modal sandboxes, freeing your CPU to keep your agents shipping. On our Playwright suite, it took a 12 min run to 2, at $0.08 a run

View quoted post
View on X
HH
Hamel Husain
𝕏x27 days ago
Retweeted from @Simon

RT Simon Willison Thoughts on OpenAI acquiring Astral and uv/ruff/ty https://simonwillison.net/2026/Mar/19/openai-acquiring-astral/ Original tweet: https://x.com/simonw/status/2034672725088997879

View on X
HH
Hamel Husain
𝕏x28 days ago

They pulled off the impossible: made a real human video look 100% AI generated 🤣

@Barry Malone

Just two humans having a perfectly natural conversation.

View quoted post
View on X
HH
Hamel Husain
𝕏x28 days ago
Retweeted from @Felix

RT Felix Rieseberg By popular demand, Dispatch can now launch Claude Code sessions. Ask it to build, make, or improve something! To use it, update your Claude desktop app and make sure you have Code enabled. Original tweet: https://x.com/felixrieseberg/status/2034381385134399913

RT Felix Rieseberg
By popular demand, Dispatch can now launch Claude Code sessions. Ask it to build, make, or improve something!

To use it, update your Claude desktop app and make sure you have Co...
View on X
HH
Hamel Husain
𝕏x28 days ago

The highest leverage thing you can do to de-slopify AI writing is to delete at least half of it Seriously any email, post etc try to delete 50%

View on X
HH
Hamel Husain
𝕏x28 days ago

Excited to try this

@SpecStory

Cursor, Codex and Claude Code are all single-player. Your whole team builds alone and no one knows what anyone else decided. But building product is a team sport. AI should be too. The conversations, decisions, specs and builds. All of it, together, with your whole team.

View quoted post
View on X
HH
Hamel Husain
𝕏x28 days ago

TIL Google Colab has a MCP https://developers.googleblog.com/announcing-the-colab-mcp-server-connect-any-ai-agent-to-google-colab/ (came out yesterday but for some reason missed it)

View on X
HH
Hamel Husain
𝕏x29 days ago

This the only guy flexing correctly

@Xeophon

@HamelHusain

Quoted tweet media 1
View quoted post
View on X