How a Cambridge Examiner Grades B2 Writings

Interview: How a Cambridge Official Examiner Grades Your B2 Writings

An official Cambridge Assessment English examiner has disclosed the precise rubric and cognitive process behind grading B2 First writing tasks, addressing persistent misconceptions among test-takers who often misjudge what examiners prioritize. In an exclusive conversation, the examiner—who has evaluated over 12,000 scripts across eight years—explained that candidates frequently overestimate the weight of vocabulary range while underestimating discourse management, a reversal that costs thousands of students their target score annually.

The insight matters because B2 First remains the world’s third most-taken Cambridge exam, with approximately 350,000 candidates in 2025 according to Cambridge Assessment data, yet pass rates hover around 68% globally, leaving nearly one in three test-takers below the B2 threshold despite months of preparation.

📊 Claves rápidas

Cambridge examiners assess B2 writings across four criteria weighted equally: content, communicative achievement, organisation, and language.
The most frequent mistake is neglecting cohesive devices, which accounts for roughly 40% of low organisation scores.
Examiners spend an average of 90 seconds per task, making first-paragraph clarity critical for impression formation.
A single well-deployed linking phrase can lift a script from Band 2 to Band 3 in organisation if paragraphing is already competent.

Context: The Four-Pillar Assessment Framework

Cambridge employs a four-domain analytic scale for B2 First Writing, allocating up to five marks per criterion—Content, Communicative Achievement, Organisation, and Language—with each task capped at 20 marks before conversion to the Cambridge English Scale. The system, standardised globally in 2015 and refined annually through examiner norming sessions, aims to measure whether a candidate can produce coherent, purpose-driven text appropriate for upper-intermediate social, academic, and professional contexts.

Each criterion operates independently, meaning a candidate can score a 5 in Content yet a 2 in Organisation if the ideas are complete but poorly sequenced. The examiner interviewed—who requested anonymity due to Cambridge confidentiality protocols—clarified that examiners receive quarterly calibration exercises to align scoring thresholds, reducing inter-rater variability to below 8% according to internal quality assurance metrics shared with accredited centres.

The rubric distinguishes five bands per criterion, from 0 (absent or unintelligible attempt) to 5 (full realisation of task demands). Most failures cluster at Band 2, where candidates «address all content points minimally» but lack the development or control expected at B2 level. The examiner noted that Band 3 scripts—the minimum for a passing contribution—must demonstrate consistent rather than perfect performance across all four areas.

What Examiners See First: The 90-Second Window

The initial 90 seconds of examiner reading time disproportionately shape the holistic impression that anchors subsequent scoring, a phenomenon examiners acknowledge during norming but cannot entirely eliminate. Eye-tracking studies conducted by Cambridge in 2022 revealed that examiners fixate on the opening sentence, the presence of paragraph breaks, and the final sentence within the first half of their reading, forming a provisional band estimate before systematic criterion application begins.

This cognitive shortcut means structural clarity in the first paragraph—topic sentence, supporting detail, transition—signals competence more efficiently than lexical sophistication scattered across disorganised text. The examiner described a «rescue read» protocol where ambiguous scripts receive a second full pass, but noted that fewer than 15% of scripts fall into this category; the majority are self-evident within two minutes.

«If I can’t identify your main argument by the end of paragraph one, I’m already questioning whether you’ve understood the task. That’s not bias—it’s the reality of B2 communication standards.»
— Anonymous Cambridge B2 First examiner, eight years of service

Candidates who open with generic preambles («In today’s world, many people think…») forfeit this impression advantage, forcing examiners to hunt for substance in later paragraphs. By contrast, scripts that immediately address the prompt with a clear stance («The proposed ban on single-use plastics would reduce ocean pollution but impose short-term costs on local businesses») establish authority within 20 words.

The Organisation Trap: Why Cohesion Outweighs Cohesive Devices

Candidates systematically conflate cohesive devices—linking words like «however» or «furthermore»—with cohesion, the logical flow of ideas, a confusion that leads to mechanical insertion of connectors without meaningful progression between sentences. The examiner estimated that 60% of scripts at Band 2 for Organisation contain adequate or even excessive linking phrases but fail to sequence arguments in a reader-friendly order, producing what examiners term «decorated chaos.»

True cohesion emerges from pronoun reference, lexical chaining (repeating or paraphrasing key terms), and logical cause-effect relationships that persist even if all linking words were removed. A script might achieve Band 4 Organisation with minimal explicit connectors if each sentence builds inevitably on the prior one, whereas a text saturated with «moreover» and «on the other hand» will score Band 2 if ideas appear randomly ordered.

The examiner provided a diagnostic heuristic: read each paragraph’s first sentence in sequence, ignoring the rest. If those topic sentences form a coherent outline of the essay’s argument, the Organisation score will likely reach Band 3 or higher. If they feel disconnected or repetitive, no quantity of transitional phrases will salvage the mark.

Paragraph breaks also carry weight. The rubric implicitly expects two to four paragraphs for a 140-190 word task, with each paragraph addressing a distinct subtopic. Single-block texts or six-paragraph fragments both signal planning deficits that cap Organisation at Band 2 regardless of linguistic accuracy.

Language Versus Content: The Scoring Hierarchy

Language errors are tolerated up to the threshold where they impede meaning, a higher bar than most candidates assume, while Content shortfalls—missing task requirements—trigger immediate score penalties with no recovery mechanism. A script with frequent grammatical slips but complete task coverage can still accumulate 15-16 total marks, whereas a linguistically flawless response that omits one of two required content points will score no higher than 3 in Content, capping the total near 13-14.

The Content criterion assesses whether all input prompts or bullet points received development. For essays, this means addressing both sides of an argument or all specified reasons; for letters, fulfilling every communicative function (apologise, explain, request). The examiner clarified that «development» requires at least two sentences per content point—a single-sentence mention registers as «minimal treatment,» limiting Content to Band 3.

Language assessment splits implicitly into range and control. Range credits varied structures (conditionals, passives, relative clauses) and vocabulary beyond the A2-B1 plateau, while control measures accuracy. A candidate deploying only simple present and past tenses with 95% accuracy will score Band 3 for Language; one attempting complex sentences with 70% accuracy may also score Band 3, as ambition and control balance each other at B2 level.

Criterion	Band 3 Threshold	Common Pitfall
Content	All points addressed with some development	Single-sentence treatment of bullet points
Communicative Achievement	Register and tone suit task consistently	Mixing formal/informal language in letters
Organisation	Clear paragraphing, logical idea sequence	Over-reliance on linking words without logical flow
Language	Range of B2 structures, errors don’t obscure meaning	Attempting C1 vocabulary without accuracy

Communicative Achievement: The Overlooked Fourth Pillar

Communicative Achievement evaluates whether the text achieves its genre-specific purpose—persuading in an essay, informing in a report, engaging in an article—a criterion many test-preparation courses neglect in favour of linguistic drills. The examiner observed that scripts often read as generic exercises rather than authentic instances of the target genre, a mismatch that limits this domain to Band 2 even when grammar and vocabulary meet B2 standards.

For essays, Communicative Achievement rewards a clear thesis and supporting arguments; for reviews, evaluative language and recommendations; for letters, appropriate openings, closings, and register. A formal letter beginning «Hi there!» or an article lacking rhetorical questions and direct reader address will lose marks here regardless of idea quality.

The examiner noted that candidates rarely score Band 5 in Communicative Achievement because full marks require not just meeting but exceeding task demands—an essay that anticipates counterarguments, or an article that uses humour effectively. Band 3, the passing threshold, simply demands consistent adherence to genre conventions without notable lapses.

Implications for Test-Takers and Preparation Strategies

The examiner’s account suggests that effective B2 preparation should prioritise task analysis and paragraph planning over isolated vocabulary memorisation, a shift that aligns with Cambridge’s own recommended study pathways but diverges from common test-prep market practices. According to a 2025 survey by the European Association of Language Testing, 71% of B2 candidates report spending more time on vocabulary lists than on writing full practice tasks under timed conditions, an inverse of the skill hierarchy examiners actually apply.

Practical adjustments include templating openings for each genre (formal letter, informal email, essay, article, review, report) to secure Communicative Achievement, then drilling paragraph outlining—writing only topic sentences for ten prompts in ten minutes—to build Organisation reflexes. The examiner recommended that candidates seek feedback specifically on whether all content points appear developed, as self-assessment reliably detects language errors but rarely identifies Content gaps.

Regarding vocabulary, the examiner advised restraint: deploying three or four less-common items naturally («mitigate,» «proliferation,» «compelling») signals range more effectively than saturating a script with low-frequency words that invite misspelling or misuse. Examiners notice lexical variety but do not tally unique words; they assess whether vocabulary choices fit context and purpose.

For candidates retaking the exam after a Band 2-dominant failure, the examiner suggested ordering a clerical check—which verifies marks were added correctly—only if the result falls within two marks of the passing threshold. Otherwise, the priority should be diagnosing which criterion or criteria underperformed, a task the examiner acknowledged is difficult without examiner-level training but approximable by comparing one’s script against published sample answers at each band level.

Digital preparation tools, including adaptive platforms that simulate exam conditions, have emerged across European and Latin American markets, though the examiner cautioned that automated feedback systems as of early 2026 still struggle to assess Organisation and Communicative Achievement with examiner-level reliability, making human tutor review indispensable for scripts scoring near the pass/fail boundary.

The Standardisation Process Behind the Scores

Cambridge convenes biannual examiner standardisation sessions where raters discuss borderline scripts to align interpretation of band descriptors, a quality assurance mechanism that reduces but cannot eliminate subjective variance in holistic judgment. The examiner described sessions in which a script might provoke initial disagreement—one rater seeing Band 2 Organisation, another Band 3—before group discussion anchors the boundary through reference to official benchmark samples.

These benchmarks, refreshed every examination cycle, represent consensus exemplars: a Band 3 Language script, for instance, displays exactly the error density and structural range that a global panel of senior examiners agreed typifies minimal B2 control. Examiners carry these mental anchors into live marking, comparing each new script against the internalised standard rather than scoring in a vacuum.

The examiner acknowledged that fatigue and cognitive load affect consistency, which is why Cambridge limits continuous marking to two-hour blocks and requires examiners to re-score a subset of their own scripts days later as a reliability check. Discrepancies beyond one band trigger supervisory review. Still, the examiner admitted that scripts marked early in a session might receive marginally more generous readings than those marked after 90 minutes, a phenomenon norming sessions address but cannot fully erase.

Candidates occasionally question whether different examiners would award the same mark. Internal Cambridge studies suggest that two examiners agree within one total mark (out of 40 for both tasks combined) in approximately 89% of cases, with exact agreement in 62% of cases. Discrepancies exceeding two marks escalate to a third examiner, whose decision is final.

Isabel A.M. — Isabel A.M. escribe sobre pedagogía, métodos de estudio y el impacto de la tecnología en la vida del estudiante. Co-fundadora de una startup EdTech, sigue de cerca el sector universitario, las oposiciones y las certificaciones de idiomas.

The examiner’s revelations underscore a persistent gap between test-taker intuition—vocabulary and grammar as primary success factors—and the rubric’s actual weighting of structural and rhetorical competencies. Whether preparation providers will adjust curricula to foreground Organisation and Communicative Achievement remains an open question, one that will shape pass rates and candidate experience as Cambridge extends its digital testing infrastructure across Asia-Pacific markets through 2027. For those pursuing higher certifications, understanding the assessment logic at B2 level offers a foundation for navigating the steeper discourse demands of C1 and beyond.

cambridge examiner grades interview official

StudyVerso Editorial

Context: The Four-Pillar Assessment Framework

What Examiners See First: The 90-Second Window

The Organisation Trap: Why Cohesion Outweighs Cohesive Devices

Language Versus Content: The Scoring Hierarchy

Communicative Achievement: The Overlooked Fourth Pillar

Implications for Test-Takers and Preparation Strategies

The Standardisation Process Behind the Scores

DELE, CELU y SIELE: qué certificado de español elegir para estudiar fuera en 2026

Khan Academy vs Duolingo vs Coursera: The Battle for the Hispanic Student in 2026

Why Duolingo Max Is Changing the Economics of Language Learning

Stay ahead of the curve