The five-second gut check

Everyone's automating the work. Nobody's designing the five seconds where a human actually decides.

Greenstep HQ in Espoo, halfway through a hackathon. We had a fraud triage tool working. Four AI agents scored each customer. A human swiped through the queue with arrow keys. Our CFO walked over and asked why we were only thinking about defending revenue. Why not grow it too?

Same customer signals, different question. A usage spike is either fraud risk or expansion intent depending on the lens. We added expansion signals in the last hour and placed second out of 70 teams.

But we only applied the triage UI to the fraud side. The expansion signals sat in a dashboard.

Two weeks ago at an offsite dinner with people from our GTM team, we ended up talking about outreach. Cold calls, emails, all the usual pain. How hard it is to hit the right note. How AI-drafted messages are often close, but slightly off.

It clicked later. Same idea, but applied to leads.

It matters even more there than in fraud. Your average VP gets buried in outreach. AEs send dozens of emails a day and most of them go nowhere. If it's not specific, it's noise. AI can draft something decent. But a human can scan the draft, catch the wrong tone, and swap "your team's usage is growing" for "you're two weeks from hitting your API limit." That's what makes someone actually reply.

Everyone picks the wrong extreme

A company decides to "use AI" and usually picks one of two dead ends. Full automation and hope nothing breaks. Or give everyone a Claude subscription and some prompts and call it a day.

We keep asking AI to make the final call. It shouldn't be making the call. It should be preparing the call for us.

People kept asking why we didn't just fully automate the fraud tool. If the agents are right 85% of the time, what happens in the other 15%? In fraud, that's suspending a legitimate account. In outreach, that's sending a tone-deaf email to your biggest expansion opportunity.

Five seconds instead of fifteen minutes

So I built a demo. The version we should have built for the expansion opportunities at the hackathon.

Each card: company name, four signals, a draft email the AI wrote, and a send button. Press right to send, left to override, down if you want to see why the AI suggested this outreach. Most decisions take two seconds. Glance at the signals, read the subject line, press right. The ones where something feels off, you press down, read the evidence, maybe edit the draft inline.

I didn't expect how little information you need when it's laid out right. The fifteen minutes people spend digging through dashboards and CRM notes, that's the AI's job now. The human's job is the five-second gut check.

0:00
/0:41

The swipe flow: approve, override, or ask why.

Overrides are the point

This is the part most people miss. The override button matters more than the send button.

When someone presses left instead of right, that's training data. You know which account, which signals, which human disagreed. Track enough of those and patterns show up. One agent gets overridden 40% of the time on borderline cases. Enterprise drafts get edited twice as often as mid-market ones. A specific AE overrides everything under a score of 70 because they know something the model doesn't.

The override rate should drop over time. Not because people stop paying attention, but because the system starts to learn what they would have done anyway.

Right now it's just a UI. I'll update this post once we've put it to work and have real override data to look at.

What I'm still working out

Since the hackathon I notice the pattern everywhere. Support ticket routing, invoice approvals, screening resumes. Any queue where a human applies a gut check probably has some version of this hiding inside it.

Can overrides actually train the system over time? I think so, but I haven't tested it yet.

But the thing I keep coming back to is that most of the time goes into the investigation, not the actual decision. Once you have a suggestion and the evidence sitting right there, the decision takes about five seconds.


I wrote up the implementation notes for the pattern here.