How IQ Tests Are Scored: The Complete Guide to Norms, Scales and Percentiles

You answer a series of questions. Someone converts your performance into a number between roughly 70 and 145. That number gets treated as a meaningful measure of your intelligence. But how does the conversion actually happen — and what does it mean that the process works this way?

Understanding IQ scoring mechanics is not just academic. It changes how you interpret your own results, why scores from different tests are not directly comparable, and what the limitations of any given score actually are.

Step 1 — The Raw Score

Every IQ test begins with a raw score — the number of questions you answered correctly, sometimes weighted by question difficulty or adjusted for omissions and time bonuses. A raw score of 23 correct out of 30 questions tells you how you performed on that specific test. It does not yet tell you what your IQ is.

Raw scores are not IQ scores and cannot be treated as such. A raw score of 23/30 on one test and 23/30 on a different test could correspond to completely different IQ scores depending on the difficulty of each test's questions and the population each test was normed on. The conversion of raw score to IQ score is what does all the meaningful work.

Step 2 — The Norming Process

Before an IQ test can produce meaningful scores it must be normed — administered to a large, representative sample of the population it is designed to measure. For a test designed for the general adult population, the norming sample might consist of several thousand adults carefully selected to represent the demographic composition of the target population in terms of age, sex, educational background, geographic distribution, and socioeconomic status.

The raw score distribution of this norming sample becomes the reference against which all future test-takers are compared. If the median test-taker in the norming sample answered 18 out of 30 questions correctly, then a raw score of 18 corresponds to average performance — which is defined as IQ 100. A raw score of 23 — higher than roughly 91% of the norming sample — corresponds to approximately IQ 120.

This is the fundamental mechanism of IQ scoring: your raw performance is compared to the reference population, and your position in that distribution is converted into a standardised score. You are not being measured against an absolute standard — you are being measured relative to other people.

Step 3 — Standard Deviation and the IQ Scale

The specific number system used to express IQ scores is not arbitrary. Modern IQ tests use what is called a deviation IQ scale, anchored so that the population mean equals 100 and the standard deviation equals 15.

Standard deviation is a statistical measure of spread — how much scores vary around the average in the reference population. Setting the standard deviation at 15 means that approximately 68% of the population scores between 85 and 115 (within one standard deviation of the mean), approximately 95% score between 70 and 130 (within two standard deviations), and approximately 99.7% score between 55 and 145 (within three standard deviations).

This is why IQ scores follow a bell curve distribution. The scoring system is explicitly designed to produce this shape — the conversion from raw scores to IQ scores forces the resulting distribution into a normal bell curve regardless of what the underlying raw score distribution looks like.

IQ Score	SD from Mean	Percentile	Classification
145	+3.0 SD	99.9th	Exceptionally Gifted
130	+2.0 SD	97.7th	Very Superior / Gifted
120	+1.33 SD	90.9th	Superior
115	+1.0 SD	84.1st	High Average
100	0 SD	50th	Average
85	-1.0 SD	15.9th	Low Average
70	-2.0 SD	2.3rd	Borderline

Why Scores From Different Tests Are Not Directly Comparable

A crucial implication of the norming-based scoring system is that IQ scores are only meaningful relative to the specific norming sample of the specific test that produced them. A score of 125 on one test and a score of 125 on a different test are not necessarily equivalent — they could represent meaningfully different levels of performance depending on differences in the norming samples, the difficulty distributions of the questions, and the specific cognitive abilities each test emphasises.

This is why psychologists do not simply add IQ scores from different tests together or treat them as interchangeable. When comparing scores across assessments, the specific tests used, the norming populations, and the conditions under which each was administered all matter.

For online tests specifically, this is one of the most important caveats. A score from an online test is only as meaningful as the quality of the norming behind it. Tests that were never properly normed on a representative population produce numbers that look like IQ scores but have no reliable relationship to the established IQ scale — which is the case for the majority of free online tests.

Composite Scores and Index Scores

Most comprehensive IQ assessments produce not one score but several. The full-scale IQ is a composite of performance across multiple cognitive domains — verbal reasoning, visual-spatial ability, fluid reasoning, working memory, and processing speed in the WAIS-V structure. Each domain produces its own index score on the same 100-mean, 15-SD scale before being combined into the composite.

The composite is calculated through a standardised weighted combination of index scores — not a simple average. Different versions of major assessments have used different weighting schemes, which is another reason scores from different assessment editions are not perfectly interchangeable.

The index scores are often more informative than the composite. The composite aggregates meaningfully different abilities into a single number that may obscure an uneven profile. Two people with identical full-scale IQ scores of 115 may have arrived at that number via completely different domain profiles — one highly verbal with modest spatial ability, the other highly spatial with modest verbal ability. Their composite scores are equivalent but their cognitive profiles are entirely different.

The Flynn Effect: Why Norms Must Be Updated

One of the most important and counterintuitive findings in intelligence research is the Flynn Effect — the well-documented phenomenon of rising raw IQ test performance across generations. Average raw scores on standardised IQ tests have increased by approximately 3 points per decade throughout the twentieth century in most developed countries.

This means that a test normed in 1980 would produce higher IQ scores for today's test-takers than a test normed in 2020, if both were used today — not because today's people are smarter but because the 1980 norms are outdated. This is why major IQ assessments are periodically re-normed — to ensure the scale remains calibrated to the current population rather than a historical one.

It also means that IQ scores from different decades are not directly comparable in the same way that scores from different tests are not directly comparable. A score of 120 from a test normed in 1990 and a score of 120 from the same test re-normed in 2020 represent different levels of performance relative to the contemporary population.

See your score in context

Now that you understand how scoring works, take the test and see your result — with percentile rank, SD position, and full domain breakdown all explained clearly.

Take the Free IQ Test →

🧠

What Does Your IQ Score Actually Mean?

6 min read

→

📊

IQ Score Chart: What Every Score Range Really Means

5 min read

→

🎯

Are Online IQ Tests Accurate? What to Trust

6 min read