Select Page
The History of IQ Tests: From “Mental Age” to Modern Psychometrics

Resources › IQ Basics

The History of IQ Tests: From “Mental Age” to Modern Psychometrics

Last updated: October 9, 2025 • Reading time: ~10 min

How did IQ testing evolve from a classroom tool into today’s statistically robust, multi-index assessments? This guide walks the timeline—from Binet’s mental age to Wechsler’s deviation IQ, Raven’s nonverbal reasoning, CHC theory, the Flynn Effect, and computer-adaptive testing.

Key takeaways

  • Early tests used mental age; modern tests report a deviation IQ (mean 100, SD ~15).
  • Contemporary batteries measure multiple broad abilities (e.g., verbal, fluid reasoning, working memory, processing speed).
  • The Flynn Effect raised average scores across the 20th century; tests are regularly renormed.
  • Online, computer-adaptive designs use IRT to improve fairness, precision, and efficiency.
  • IQ predicts some outcomes—but it doesn’t capture creativity, personality, or emotional intelligence.

Before IQ: the search for a general ability (late 1800s–early 1900s)

Late-19th-century pioneers tried to quantify individual differences using sensory and reaction-time tasks. The decisive turn came with Charles Spearman (1904), who observed positive correlations across diverse tasks and proposed a general factor, g, underlying cognitive performance.

Binet & Simon (1905): a practical school tool

Alfred Binet and Théodore Simon built the first widely adopted intelligence scale to help schools identify children who might benefit from tailored instruction. Items were arranged by the typical age of mastery, yielding a child’s mental age based on performance.

Stern’s ratio IQ (1912)

William Stern introduced the tidy formula IQ = (mental age / chronological age) × 100. It worked reasonably for children but not for adults, whose “mental age” does not scale linearly with years.

Stanford–Binet & mass testing (1916–1920s)

Lewis Terman’s Stanford–Binet adaptation popularized IQ in the U.S. Group tests like Army Alpha and Army Beta scaled administration during WWI, demonstrating efficiency—and the risk of cultural and educational bias when interpretation lags behind usage.

Wechsler & deviation IQ (1939 →)

David Wechsler’s scales (e.g., WAIS for adults, WISC for children) reframed IQ as a deviation score: how far an individual’s performance sits from the age-based mean in standard deviation units (mean 100, typically SD ≈ 15). He also emphasized multi-index profiles—verbal comprehension, working memory, processing speed, and visual–spatial reasoning—moving beyond a single number.

Nonverbal & “culture-fair” tests

Raven’s Progressive Matrices (late 1930s) and Cattell’s Culture Fair Test sought to reduce language and cultural loading, focusing on abstract pattern reasoning (fluid intelligence). Fairness improved, though no test is entirely culture-free; these tools remain valuable for tapping reasoning with minimal verbal demands.

From g to CHC: modern ability models

Raymond Cattell distinguished fluid (Gf) and crystallized (Gc) intelligence. Work by Horn and Carroll converged into the CHC framework (Cattell–Horn–Carroll), a hierarchical model with g at the top, broad abilities in the middle (e.g., Gf, Gc, Gv, Gs, Gwm), and specific skills at the base. Modern batteries explicitly sample these domains.

The Flynn Effect

Throughout the 20th century, average performance on many subtests rose by a few points per decade—a pattern dubbed the Flynn Effect. Proposed drivers include improved nutrition, expanded schooling, cognitively richer environments, and test familiarity. To keep the population mean at 100, publishers renorm tests regularly.

Measurement advances: IRT & computer-adaptive testing

Contemporary assessments leverage Item Response Theory (IRT) and computer-adaptive testing (CAT), estimating both item difficulty and a person’s latent ability. This improves precision and efficiency, and supports transparent reporting with confidence intervals and domain-level indices.

What IQ does—and doesn’t—tell us

IQ correlates with academic achievement and some job performance, particularly where complex learning and reasoning matter. Yet IQ is not a measure of creativity, wisdom, personality, motivation, values, or emotional intelligence. Responsible practice emphasizes multi-method assessment, attention to language and cultural context, and cautious, humane interpretation.

Timeline at a glance

  • 1904 — Spearman proposes g
  • 1905 — Binet–Simon scale (mental age)
  • 1912 — Stern’s ratio IQ
  • 1916 — Stanford–Binet (U.S. adaptation)
  • WWI — Army Alpha/Beta group tests
  • 1930s–40s — Raven’s Matrices; Wechsler scales
  • 1960s–1990s — Gf/Gc and CHC frameworks mature
  • 1980s–2000s — Flynn Effect widely documented; regular renorming
  • 2000s → — IRT/CAT mainstream; online delivery; richer index reporting

What to read next

Try the main IQ test now

About this page: Part of the TakeIQtest learning hub. Our goal is to explain IQ with clarity and care—no hype, no shortcuts.

Prefer a conversational walkthrough? Subscribers can chat with our AI to explore concepts from this article, interpret scores, and plan practice.