The brief
ApplyAI had a working product and a bottleneck. For every job a user wanted, someone on the team searched LinkedIn, cross-referenced the company, and guessed which recruiter owned the requisition. It worked. It did not scale past a few hundred users a day.
The brief was one page. Build an agent that takes a job posting and returns the three most likely recruiters to contact, ranked, with a confidence score and a reason. Ship it behind the existing API. Do not touch the front end.
The constraint
The hard part was not the model call. It was trust. A wrong recruiter is worse than no recruiter, because the user burns a real introduction on a dead end. The agent had to know when it did not know.
The second constraint was cost. ApplyAI runs thin. A matching call that cost forty cents would have eaten the unit economics. The target was under three cents per match, including retries.
The approach
We split the problem into retrieval and judgment. Retrieval pulls candidate recruiters from a Postgres table seeded from public sources and the company graph. Judgment is a single Claude call that ranks the shortlist and writes the reason. The model never searches. It only decides.
That split is what kept it cheap and honest. The expensive, error-prone part (finding people) became a database query. The model does the one thing it is genuinely good at: reading a job description and a set of profiles and explaining a ranking.
The build
Days 1 to 3 were the spec and the schema. We modeled recruiters, companies, and requisition signals, then wrote the retrieval query and tested it against fifty real jobs before the model was involved at all.
Days 4 to 10 were the agent loop, the confidence gate, and an eval harness. We built a set of 120 graded examples and ran every prompt change against it. No change shipped that dropped the eval score.
// The judgment call is deliberately boring.
const ranked = await rankRecruiters({
job,
shortlist, // from Postgres, never from the model
model: "claude-sonnet-4-5",
});
if (ranked.confidence < 0.62) {
return { matches: [], fallback: "manual" };
}Days 11 and 12 were the handoff: production deploy behind a feature flag, a README, and a ten-minute video walking the ApplyAI team through the eval harness so they could keep tuning it without us.
The outcome
The agent shipped on day 12, two days inside the window. In the first month it returned 4.2 times more qualified recruiter matches per hour than the manual flow, at 2.4 cents per match.
The confidence gate fired on about 9 percent of jobs. Those went to the manual flow. Nobody complained, because nobody got a wrong recruiter.
“Holdfast treated our cost ceiling as a hard requirement, not a nice-to-have. The agent does one thing, does it cheaply, and tells us when it is unsure. That last part is why we trust it.”