Interview calibration is mostly a vocabulary problem

Every team that has ever tried to improve its hiring quality runs into calibration, sooner or later. Two interviewers sit in the same session, hear the same answers from the same candidate, and come out with completely different reads. One says "solid hire." The other says "weak hire, not ready." The debrief turns into a negotiation. The outcome depends on which interviewer is more senior, more stubborn, or more rested.

The instinct at this point is to fix it with a rubric. Write down the categories. Score each candidate on each. Force the disagreement into the grid, so at least you can see where it's coming from.

Rubrics can help. But they don't fix the underlying problem, because the underlying problem isn't that interviewers are scoring different things. It's that they're scoring the same thing and meaning different things by it.

The vocabulary problem

When one interviewer writes "strong system design" and another writes "weak system design" about the same candidate, the disagreement is usually semantic, not evidential. They both saw the same session. They both know what they saw. What they don't share is an agreed definition of what "strong" means in this context.

To one interviewer, "strong system design" means the candidate listed several valid architectural options and picked one thoughtfully. To another, it means the candidate arrived at the same answer the interviewer would have arrived at. To a third, it means the candidate caught a subtle failure mode the interviewer wasn't expecting. Those are three different things, and if you haven't made them explicit, your rubric is aggregating apples, oranges, and a small piece of plywood.

The point isn't that one of those definitions is right. All three are defensible. The point is that the team has to pick one, or explicitly combine them, and then use that same definition consistently. A rubric filled out from three different underlying definitions produces scores that look structured but are actually more misleading than "I liked this candidate, I didn't like that one."

What a shared vocabulary looks like

A shared vocabulary is a short document that defines, for your team specifically, what each interview signal means and what evidence supports each level.

For example, "candidate reasons about trade-offs clearly" might be defined as: The candidate names at least two distinct options before picking one. They explain why they chose one in terms of constraints the problem gave them, not in terms of aesthetic preference. They are willing to change their answer if a new constraint is introduced. That definition gives interviewers something they can actually agree or disagree about. It also forces the author of the definition to commit to what the team actually cares about.

The exercise of writing the shared vocabulary is usually the useful part. Most teams find out they don't agree on what they're looking for, and discovering that is itself a calibration win. The document is the artefact. The conversation that produced it is the signal.

How to find the words that are doing the most damage

Pull the last ten hiring debriefs your team ran. Highlight every adjective interviewers used to describe candidates. This can be anything, like; strong, solid, weak, junior, senior, smart, scrappy, sharp, thoughtful, concerning, brilliant, slow. For each word, ask: would another interviewer on the team use this word to mean the same thing?

Some of them will be obvious (junior is fairly well-defined). Others will be a mess. "Scrappy" is usually a euphemism for something the interviewer couldn't articulate, and it's worth pulling on. Does it mean "willing to use imperfect tools to get something done"? Does it mean "quick to act without checking"? The same word is often doing different jobs in different debriefs, and producing inconsistent decisions because of it.

The calibration win isn't to ban those words. It's to write down, per word, what your team means when they say it. A few sentences per word is often enough.

Calibration is continuous, not a kick-off

Calibration workshops at the start of a hiring process are better than nothing, but they don't take. People drift back to their own definitions within a quarter. Calibrating well tends to mean doing it continuously: every debrief can be used update to the shared vocabulary, and the vocabulary is referenced out loud during the each debrief.

If this sounds like too much process, consider the alternative. An uncalibrated process makes inconsistent decisions. Inconsistent decisions produce bad hires and missed good hires in roughly equal measure. The cost of the process is almost certainly smaller than the cost of either of those.