Posts Tagged: LLM

August 25, 2025

The Solveit approach turns frustrating AI conversations into learning experiences

Disclosure: I took the Solveit course as a student. Later, after seeing what I built and shared in the Discord community, AnswerAI invited me to join their team.

"Here's a complete web app that does what you asked for!"

ChatGPT cheerfully responded, while presenting me with an intimidating wall of code. I had simply asked for help creating a small weather dashboard. At a first glance, the code looked fine. Imports, API calls, state management, and even error handling. But when I tried to run it, nothing worked. So of course, I paste the error message and ask for a fix.

"Oh, I see the issue, here's the corrected version..."

...And two new bugs appeared. Three responses later, the code was worse than when I started. And more importantly, I had no idea what was going on.

To me, most AI tools feel like using a self-driving car that all of a sudden decides to drive off a cliff.

This pattern, the "doom loop of deteriorating AI responses", is something I've encountered repeatedly. It's particularly frustrating because these AI tools seem so promising at first!

Enter Solveit, a tool designed specifically to transform these frustrations into learning experiences. In AnswerAI's "Solve It With Code" course, led by Jeremy Howard and Johno Whitaker, I learned not just how to use this tool, but the fundamental principles behind effective AI interaction that apply to working with any AI assistant.

In this post, I'll share the three key properties of LLMs that cause the doom loop, and the three techniques that transform it into a learning loop. It's what they've come to call "the Solveit approach."

The TL;DR of this post can be summarized with this table:

LLM Property Consequence Solve It Solution
RLHF Over eager to give long complete responses Work in small steps, ask clarifying questions, check intermediate outputs
Autoregression Deterioration over time Edit LLM responses, pre-fill responses, use examples
Flawed and outdated training data Hallucinations and outdated information Include relevant context

Or if you're the visual type, with this beautiful illustration:

And while the course also introduced the "Solveit" tool, which is specifically designed to work with this approach, the principles apply to any AI interaction. Whether you're using ChatGPT, Claude Code, Github Copilot, Cursor, or any other AI tool.

So even if you don't have access to Solveit, I'm sure you'll find something valuable in this post.

May 08, 2024

DIY LLM Evaluation: A Case Study of Rhyming in ABBA Schema

Originally posted on Xebia's blog, my employer at the time of writing.

It's becoming common knowledge: You should not choose your LLMs based on static benchmarks.

As Andrej Karpathy, former CTO of OpenAI, once said on Twitter: "I pretty much only trust two LLM evals right now: Chatbot Arena and the r/LocalLlama comments section". Chatbot Arena is a website where you can submit a prompt, see two results, and then choose the best result. All results are then aggregated and scored. On the r/LocalLlama subreddit people discuss finetuning LLMs on custom usecases.

The lesson is: only trust people evaluating LLMs on the tasks they themselves care about.

But there's something better: evaluate LLMs yourself on tasks you care about! Then you do not only get the most relevant scoring metrics for your task. But, in the process, you will also learn a whole lot more about the problem you're actually trying to solve.

In this blogpost, I will share with you my journey into evaluating LLMs on a ridiculous task. I've been obsessed with it for almost a year: rhyming in ABBA schema. For some reason, most LLMs can't create a 4-line poem, where the first line rhymes with the last, and the second rhymes with the third.

Curious to know why this is the case? In this rest of this blogpost I will share with you:

  1. Why rhyming in ABBA schema is an interesting task
  2. What the results were of my analysis
  3. What lessons I learned from going through this exercise