Published 5/13/2026Last Updated 5/14/2026By Cyril Gaillard
Explore the knowing-doing gap in training and why quizzes alone can't guarantee performance. Learn how pairing quizzes with AI-driven spoken rehearsal helps learners apply knowledge and perform confidently in real-life situations.
Learn how combining quizzes with AI-powered spoken rehearsal closes the knowing-doing gap to boost training effectiveness and real-world performance.
On Monday, a new sales rep finishes onboarding and scores 96% on the product-knowledge quiz. On Tuesday, on the first real customer call, she freezes when the prospect asks a simple pricing question, fumbles the objection, and ends the call apologizing. Nothing she said was technically wrong. She just couldn't say it out loud, in real time, while another human was listening.
Every L&D team has seen some version of this. The learner knows the material. The performance still falls apart. The reason is not laziness or nerves — it is a well-documented gap between knowing what to do and being able to do it. Closing that gap is one of the harder problems in training design, and it is the reason a serious assessment strategy in 2026 needs more than one mode of measurement.
Stanford professors Jeffrey Pfeffer and Robert Sutton coined the term knowing-doing gap in their 1999 book of the same name. Their argument, after four years of fieldwork inside large organizations, was simple and uncomfortable: companies pour enormous resources into training, consulting, and executive education, yet most of what is "known" never translates into behavior. Knowledge sits in slide decks and certification scores. Action stays roughly where it was.
The gap is not only an organizational problem. It also sits inside any individual learner. Benjamin Bloom's taxonomy, the framework most instructional designers still build on, ladders cognitive activity from remember and understand up through apply, analyze, evaluate, and create. The lower rungs are about recognising and recalling. The higher rungs are about doing things under conditions that aren't tidy.
The same distinction shows up in how training is evaluated. Donald Kirkpatrick's four-level model, in use since the 1950s, separates Level 2 — learning, meaning what the participant has acquired — from Level 3 — behavior, meaning what the participant actually does on the job. A learner can pass Level 2 cleanly and never reach Level 3 at all. Most training programs measure Level 2 because Level 2 is what assessments can easily see.
"One of the main reasons for the knowing-doing gap is that companies overestimate the importance of the tangible, specific, programmatic aspects of what competitors do."
— Jeffrey Pfeffer & Robert Sutton, The Knowing-Doing Gap
It is fashionable in some L&D circles to dismiss quizzes as old technology. That is a mistake. A well-designed quiz is one of the cheapest, most scalable assessment formats in existence. It runs asynchronously, scores itself, and gives a learner immediate feedback at the exact moment the material is still fresh in working memory. For compliance training, product knowledge, policy refreshers, safety procedures, and most of onboarding, the quiz is still the right tool.
Quizzes are also doing more than rote memorisation. Retrieval practice — the act of pulling information out of memory rather than re-reading it — is one of the most replicated findings in cognitive science. Every quiz attempt is a retrieval rep. That is why teams use quizzes during employee onboarding and why educators rely on classroom quiz tools to harden recall before exams. Skipping the quiz layer means skipping the foundation.
If your training team is choosing an assessment platform for the comprehension layer, the criteria are familiar: branching logic, scoring rules, lead capture if you need it, integrations with your LMS or CRM, and the ability to turn an existing training manual into questions quickly. None of this is controversial. Quizzes are doing exactly what they are good at.
The ceiling appears the moment the desired behaviour leaves the screen. A multiple-choice item can verify that a rep knows the correct rebuttal to a pricing objection. It cannot verify that the rep can deliver that rebuttal in a calm voice, at the right point in the conversation, without sounding rehearsed, while the prospect is sighing audibly on the other end.
The same applies in many other roles. A new support agent can pass a quiz on de-escalation frameworks and still escalate the first angry caller. A founder can ace a pitch-deck assessment and still talk over the most important slide on stage. A manager can score full marks on a difficult-conversations module and still avoid the actual conversation for another quarter.
What lives above the quiz ceiling is everything that depends on improvisation, tone, pacing, silence, eye contact, and the willingness to be heard. Reading the room. Knowing when to stop talking. Choosing the next sentence while still processing the last one. These are performance skills, and performance skills do not fit inside a radio button.
The traditional fix for the performance gap is roleplay. Two trainees pair up, one plays the customer, the other tries not to laugh. Or a coach watches a recorded pitch and gives notes a week later. Or a Toastmasters chapter meets on Wednesday evenings. These methods work — but they are expensive, they don't scale, and they depend on having the right human in the room. Most learners simply don't get enough reps.
The newer answer is an AI synthetic audience. Instead of a human partner, the learner faces one or more AI personas who listen, push back, ask follow-ups, and surface feedback after the session. The format is closer to a flight simulator than to a quiz: imperfect, but available at 11pm on a Tuesday, and reproducible across an entire cohort. Tools in this category — for example, the AI audience inside Softrun, a spoken-rehearsal platform we built for the performance layer of training — let a learner deliver a pitch, handle an objection, or run through a difficult conversation in front of characters who respond out loud. The point is not novelty. The point is the rep count. A coach can sit in on five rehearsals a week. A synthetic audience can sit in on five before lunch.
This is also where the most interesting recent research clusters. Recent industry data suggests teams using AI roleplay see materially higher win rates than teams that don't, and that simulation-based training produces faster time-to-competency than traditional classroom delivery. The mechanism is unsurprising: more reps under realistic conditions, more feedback per rep, less waiting for a calendar to align.
Most teams do not need to rebuild their curriculum to start closing the knowing-doing gap. They need a deliberate pairing of two assessment modes, in the right order.
Step 1 — Comprehension. Use a quiz to verify the learner has absorbed the underlying material: product details, policy, frameworks, scripts, decision rules. This is where a tool like Fyrebox lives. Branching, scoring, instant feedback, attempts logged for the LMS or for managers.
Step 2 — Performance. Once the learner has passed the comprehension check, send them into a spoken rehearsal. The same material — but now delivered out loud, against an audience that talks back. The rehearsal is recorded, scored against a rubric, and replayed with timestamped feedback.
Three concrete examples of what this looks like in practice:
SaaS sales onboarding. Step 1 is a Fyrebox quiz covering pricing tiers, the discovery framework, ICP fit signals, and the top five objections. Step 2 is a spoken rehearsal: a 10-minute discovery call against an AI prospect who is mildly skeptical, has a competing tool in mind, and asks at least one off-script question. The rep cannot proceed to shadowing real calls until both layers are cleared.
Customer-support difficult-call training. Step 1 is a quiz on the de-escalation protocol, refund authority, and escalation paths. Step 2 is a rehearsal with an AI caller who is angry, talks over the agent, and threatens to cancel. The supervisor reviews not the policy answers (those are already scored) but the agent's tone, pacing, and recovery after interruption.
Founder pitch prep. Step 1 is an assessment-style quiz the founder builds for themselves using an assessment maker — covering their own metrics, market sizing, and competitive narrative. Step 2 is a rehearsal of the 7-minute pitch in front of an AI panel composed of a skeptical VC, a domain expert, and a generalist who keeps asking "why now". The founder iterates until the pitch holds up under pressure, not just on paper.
The most underrated benefit of pairing the two layers is what it does to your Kirkpatrick evaluation. Quizzes give you a clean Level 2 signal: did the learner acquire the knowledge? Useful, but limited. Behaviour change — Level 3 — is traditionally measured three to six months after training, by waiting for real-world performance data to come in. By that point, several quarters of bad calls and underperforming deals have already happened.
A spoken rehearsal compresses that loop. It is not a substitute for true on-the-job observation, but it is a credible proxy. A scored rehearsal against a realistic audience tells you, within days of training, whether the learner can do the thing, not just describe it. For enablement teams under pressure to show training ROI, that is the difference between hoping the cohort gets better and being able to demonstrate it.
"Level 2 asks what was learned during training. Level 3 asks whether that learning is actually being applied in the workplace."
— Kirkpatrick four-level model
We built two products that sit on either side of this argument. Fyrebox is our answer to the comprehension layer — quizzes, assessments, and lead-capture experiences that verify a learner has the underlying knowledge. Softrun is our answer to the performance layer — spoken rehearsal with an AI audience that responds in real time. They are separate products. Either one is useful on its own. Used in sequence, they form a complete loop from remember to apply to perform under pressure, and they measure both Kirkpatrick Level 2 and a reasonable proxy for Level 3.
The broader point is independent of either tool. If your training program currently ends at the quiz, the quiz is not the problem. The missing step is the one that comes after it. Add a spoken layer — with a human coach, a peer roleplay, a Toastmasters meeting, or an AI audience — and you will see fewer Monday-to-Tuesday surprises.
Use Fyrebox to create quizzes and assessments that verify what your team knows — onboarding checks, product-knowledge tests, policy quizzes, certification flows. Free to start, no credit card required.
Start with Fyrebox