Lynn–Vanhanen Dataset — Key Facts

185
Countries included in later editions
0.82
Reported IQ–GDP correlation (r)
~30%
Countries estimated, not directly tested

What Is the Lynn–Vanhanen National IQ Dataset?

In 2002, psychologist Richard Lynn and political scientist Tatu Vanhanen published IQ and the Wealth of Nations, a book that assigned estimated average IQ scores to 81 countries and correlated those estimates with GDP per capita. The project was expanded in IQ and Global Inequality (2006) and again in Intelligence: A Unifying Construct for the Social Sciences (2012), ultimately covering 185 nations. The dataset has since been updated and extended by other researchers, most notably David Becker, whose publicly available National IQ dataset draws on the same tradition while attempting to address some of the original methodological criticisms.

The central claim of the project was striking: national average IQ, more than any other factor including natural resources, geography, or colonial history, predicted a country's economic output. Lynn and Vanhanen reported a correlation of approximately r = 0.82 between national IQ and per-capita GDP — an extraordinarily strong association by social science standards. For context, most correlations in psychology and economics that are considered practically meaningful fall in the r = 0.30 to r = 0.50 range. An r of 0.82 is the kind of figure that either reflects a genuinely powerful relationship or a serious confound baked into the measurement methodology.

The dataset quickly became one of the most cited — and most contested — compilations in modern psychometrics. It has been used to argue for everything from the primacy of human capital in development economics to deeply troubling racialist frameworks that most mainstream scientists explicitly reject. Understanding what the data actually shows, and where its construction fails, is essential for anyone engaging with comparative IQ data across nations seriously.

How Was the Dataset Constructed?

Lynn and Vanhanen did not conduct new testing. Instead, they systematically searched the published literature for existing cognitive test studies conducted in each country, extracted the mean score, converted it to an IQ scale normed to a British reference population of 100, and averaged across available studies to produce a single national estimate.

The source studies varied enormously in scope and design. Some national estimates rested on large, nationally representative samples using well-validated instruments — such as the standardisation samples for the Raven's Progressive Matrices or the Wechsler scales. Others rested on convenience samples of a few hundred urban schoolchildren using a single test administered once. A small number of national estimates in the original dataset were based on a single study with fewer than 100 participants. The heterogeneity in data quality across countries is, as we will see, one of the most serious problems with any cross-national comparisons drawn from this compilation.

Understanding why the scores vary so widely across studies requires an appreciation of what IQ scores actually measure and how sensitive they are to the conditions under which they are collected. A test administered to well-nourished, well-educated, urban children in a language that is their native tongue will produce systematically higher scores than the same test administered to rural children with inconsistent schooling in a language that is their second or third. Both results get entered into a national average as if they are equivalent data points.

For countries where no direct data existed at all — approximately 30% of the original 81-country dataset — Lynn and Vanhanen imputed estimates by averaging the known scores of neighbouring or "culturally similar" countries. This is a methodologically controversial procedure. The accuracy of an imputed estimate depends entirely on the validity of the assumption that neighbouring countries have similar population cognitive distributions — an assumption that is difficult to justify scientifically and that introduces unknown amounts of error into the dataset.

How Are National IQ Scores Normed and Scaled?

All IQ scores in the Lynn–Vanhanen compilation were rescaled to a common metric with the United Kingdom set at 100 as the reference population. This norming decision was not arbitrary — the UK had extensive, high-quality standardisation data for the Raven's Progressive Matrices, the most widely used non-verbal cognitive test globally, making it a reasonable anchor for cross-national comparison.

However, the conversion procedure involved several layers of assumption. When a study used a different test (say, a local educational assessment rather than a standardised IQ instrument), Lynn converted it using an assumed equivalence that was not always empirically validated. When a study used an older normative standard, the conversion had to account for the Flynn Effect — the well-documented secular rise in IQ test scores of approximately three IQ points per decade throughout the 20th century — but this correction was not applied consistently across all studies in the dataset.

This matters enormously for fair cross-national comparison. The way IQ tests are scored and normed means that a score of 85 obtained on a test standardised in 1970 is not directly comparable to a score of 85 obtained on a test standardised in 2000. In a dataset that aggregates studies spanning six decades across countries with vastly different rates of cognitive score improvement over time, the failure to apply uniform Flynn Effect corrections introduces systematic bias — one that disproportionately affects developing nations, where score gains over the 20th century were fastest and most dramatic.

📌 The Flynn Effect Problem

The Flynn Effect — rising IQ scores across generations — has been steepest in developing nations. Studies from Kenya, Sudan, and rural Brazil show gains of 10–20 IQ points over two to three decades. This means a national IQ estimate derived from a 1975 study and one derived from a 2005 study are not measuring the same thing. The Lynn–Vanhanen dataset mixes studies from across this entire span without systematic correction, making temporal comparisons particularly unreliable for lower-income countries.

The IQ–GDP Correlation: What Does It Actually Show?

The headline finding — an r = 0.82 correlation between national IQ and GDP per capita — is real in the sense that it is reproducible using the Lynn–Vanhanen data. Subsequent researchers who have used the same dataset, applied different statistical controls, or extended the analysis to later years have consistently found a strong positive association between estimated national cognitive ability and economic output. The correlation is not a statistical artefact or a rounding error. Something genuine is being captured.

The interpretive dispute is about what that genuine signal reflects. Lynn and Vanhanen argued that national cognitive ability directly drives economic productivity — that smarter populations generate more innovation, accumulate more human capital, and build more efficient institutions, producing higher GDP. This interpretation treats national IQ as a cause of national wealth.

The alternative — and in the view of most development economists and educational psychologists, better-supported — interpretation is that both national IQ scores and national GDP are jointly caused by a third set of variables: public health infrastructure, nutrition quality, educational access, political stability, and historical investment in human capital. On this account, the IQ–GDP correlation is high not because intelligence drives wealth, but because the same conditions that make people healthier and better educated also make them score higher on cognitive tests, and also drive economic growth. The direction of causation is not from IQ to GDP; it is from shared environmental determinants to both.

The evidence for this shared-determinants interpretation is substantial. Iodine deficiency alone — entirely preventable with a few cents' worth of iodised salt per person per year — is estimated to reduce IQ by 10–15 points in affected populations. Lead exposure from petrol and paint, now largely eliminated in high-income countries but historically prevalent in lower-income ones, has similar effects. Chronic malnutrition in early childhood has well-documented negative impacts on cognitive development that can reduce measured IQ by 10 points or more. A country's average cognitive test score in 1990 was, in part, a readout of its public health record over the preceding two decades — which is also, of course, strongly correlated with its economic output in 1990. This is the confound that the IQ–GDP correlation cannot escape.

The close relationship between cognitive test performance and economic conditions is also one reason why the research on IQ and individual income is more methodologically tractable than national comparisons: at the individual level within a single country, researchers can more effectively control for nutritional and educational background, isolating the contribution of cognitive ability itself rather than the environmental conditions that produced it.

Advertisement

Scientific Criticisms of the Dataset

The Lynn–Vanhanen project attracted sustained methodological criticism from researchers across psychology, economics, anthropology, and education. The criticisms are not ideological objections to the research question — asking whether cognitive test scores predict economic outcomes is a legitimate scientific inquiry. They are substantive concerns about whether the data construction is rigorous enough to support the conclusions drawn from it.

Jelte Wicherts and colleagues published what remains the most thorough methodological audit of the dataset in a 2010 series of papers in the journal Intelligence. They found that Lynn and Vanhanen had systematically selected studies with lower scores for sub-Saharan African countries while excluding higher-scoring studies that were equally available in the published literature. For South Africa, for instance, studies using representative national samples — which produced substantially higher estimates — were excluded in favour of smaller, rural convenience samples that produced lower ones. Wicherts et al. recalculated the sub-Saharan African average using all available data and found the corrected estimate to be approximately 82, compared to Lynn's figure of 69 — a difference of more than one standard deviation.

A difference of that magnitude is not a rounding error. It is the difference between a score that falls just below the global average and one that would imply severe cognitive deficit at the population level. If the data selection was systematically biased — even unconsciously — in one region, there is no statistical guarantee that it was unbiased elsewhere. The entire edifice of the IQ–GDP correlation rests on the assumption that the national averages are measured with comparable accuracy and representativeness across countries, and that assumption is directly contradicted by the Wicherts audit.

A second major critique concerns the distinction between fluid and crystallized cognitive ability. Most of the tests used in the Lynn–Vanhanen studies measured performance on abstract reasoning tasks — particularly Raven's Matrices — which loads heavily on fluid intelligence. But performance on Raven's Matrices is particularly sensitive to test familiarity, schooling quality, and prior exposure to abstract visual reasoning tasks. A child who has spent years in a school system that practises matrix-type problems will outperform an equally intelligent child who has not, purely on the basis of test familiarity. Using a fluid intelligence measure as a proxy for overall national cognitive capacity therefore systematically penalises countries with lower formal schooling rates, independent of any actual difference in underlying cognitive ability.

A third line of criticism addresses the imputation of scores for countries without direct data. Barnett and Williams (2004) demonstrated that many of Lynn's imputed estimates — derived by averaging neighbouring countries — were inconsistent with available data that had been published but apparently overlooked. In several cases, direct test data existed for a country but had been ignored in favour of a lower imputed estimate. This pattern reinforces the concern that the dataset construction was not conducted with the systematic, pre-registered protocol that would be required for a compilation of this scientific significance to be trustworthy. These and related concerns are explored in depth in the full critical analysis of Lynn and Vanhanen's methodology.

🔬 The Wicherts Reanalysis

Wicherts, Dolan, and van der Maas (2010) re-examined 57 studies of cognitive ability in sub-Saharan Africa that were available in the published literature. After applying consistent inclusion criteria, they obtained a mean IQ estimate of approximately 82 for the region — substantially higher than Lynn's estimate of 69. Their analysis also found that the available African studies showed score gains over time consistent with the global Flynn Effect, which the original dataset had not adequately incorporated. The paper was published in Intelligence, the same peer-reviewed journal in which much of Lynn's own work appeared.

What the Dataset Gets Right

Acknowledging the serious methodological problems with the Lynn–Vanhanen dataset does not require dismissing the underlying research question or every finding that emerges from it. Several aspects of the project have been independently replicated and are considered broadly valid by researchers who reject the specific dataset construction.

First, the existence of a positive correlation between population-level cognitive test performance and economic development is well-established and not seriously disputed in the literature. Hanushek and Woessmann, using international educational assessment data (PISA, TIMSS) that is far more methodologically rigorous than the Lynn–Vanhanen compilation, have consistently found that cognitive skills as measured by standardised tests predict economic growth rates with striking reliability — more reliably, in their analyses, than years of schooling alone. This finding, independent of Lynn and Vanhanen, suggests that the relationship between measured cognitive performance and economic output is real, even if the Lynn–Vanhanen dataset is too flawed to be used as the primary evidence for it.

Second, the project drew attention to the value of collecting cognitive test data in low- and middle-income countries, where psychometric research had historically been sparse. Whatever the problems with Lynn and Vanhanen's specific compilation, their work stimulated a generation of subsequent researchers to conduct more rigorous, representative cognitive assessments in underrepresented regions — and some of that research has materially improved our understanding of cognitive development and environmental influences on it.

Third, the broad rank ordering of national cognitive test scores in the Lynn–Vanhanen dataset — with East Asian countries typically scoring highest in formal testing, followed by European and North American countries, with lower scores in many sub-Saharan African and parts of South Asian regions — is broadly consistent with the rank ordering that emerges from the far more rigorous PISA and TIMSS international assessment data. This does not validate Lynn and Vanhanen's specific estimates, which carry large error margins, but it does suggest that the dataset captures a real signal about educational and environmental conditions even if it cannot be used to make precise country-by-country comparisons.

The Political Misuse of National IQ Data

Any scientifically honest treatment of the Lynn–Vanhanen dataset must acknowledge that the project has been extensively weaponised by groups with explicitly racialist political agendas — and that Lynn himself held and published views that most mainstream researchers regard as well outside the bounds of scientific consensus on the causes of group cognitive differences.

The distinction that matters scientifically is between the correlation (national average cognitive test scores are associated with economic output) and the causal interpretation (genetic differences in cognitive capacity explain national income differences). The first is a statistical observation, though one with serious measurement problems. The second is an interpretive leap that the existing data — even if it were perfectly collected — cannot support, because the causal pathway from environmental conditions through cognitive development to economic output is sufficiently well-documented to explain the entire observed correlation without invoking genetic differences between national populations.

The scientific consensus, reflected in position statements from the American Psychological Association and in major reviews of the heritability literature, is that observed average cognitive score differences between nations and large ethnic groups are more plausibly explained by environmental factors — nutrition, education, health infrastructure, test familiarity, historical legacy of inequality — than by genetic differences. This is not a political judgment; it is a conclusion from the best available evidence on what drives cognitive test performance at the population level. Researchers interested in how IQ test accuracy is affected by environmental and cultural factors will find a substantial literature documenting exactly these mechanisms at the individual level.

Updated Datasets and Where the Research Has Gone

The Lynn–Vanhanen tradition has been continued and substantially refined by subsequent researchers. David Becker's National IQ dataset, maintained publicly and updated regularly, applies more consistent inclusion criteria, incorporates Flynn Effect corrections, distinguishes between directly measured and imputed estimates, and includes quality ratings for each underlying study. The Becker dataset is generally regarded as methodologically superior to the original Lynn–Vanhanen compilation, though many of the underlying structural challenges — the heterogeneity of source studies, the sensitivity of scores to environmental conditions — remain.

Separately, the Programme for International Student Assessment (PISA) and the Trends in International Mathematics and Science Study (TIMSS), both administered by international educational organisations, provide the most methodologically rigorous comparative cognitive data available. These assessments use carefully translated, culturally reviewed instruments administered to nationally representative samples of 15-year-olds under standardised conditions. They are not IQ tests in the traditional sense, but they measure cognitive skills — particularly mathematical and scientific reasoning — that load heavily on the same general factor that IQ tests measure. Their country rankings correlate strongly with the Lynn–Vanhanen estimates while being substantially more reliable for the purposes of policy analysis and economic research.

For researchers interested in understanding how individual cognitive profiles fit into broader population distributions — rather than the contested terrain of national averages — the CMIAS Assessment at DesperateMinds provides a comprehensive individual-level evaluation across multiple cognitive domains, calibrated to contemporary normative standards and interpreted within the context of current psychometric science, giving you the kind of granular, individually meaningful cognitive picture that country-level averages cannot offer.

What to Conclude About the Lynn–Vanhanen Dataset

The Lynn–Vanhanen national IQ dataset is simultaneously one of the most influential and most methodologically problematic compilations in modern psychometrics. The project identified a genuine and important research question — whether population-level cognitive performance relates to economic and social outcomes — and produced data that, despite its flaws, has stimulated two decades of subsequent research, replication attempts, and methodological improvements.

The specific national IQ estimates it generated, however, are not reliable enough for country-by-country comparisons. The combination of unrepresentative samples, inconsistent test instruments, selective data inclusion, and inadequate Flynn Effect correction produces error margins that are large enough to render the precise rank ordering of individual countries meaningless. The broad pattern — wealthier, better-educated, and healthier populations tend to score higher on cognitive assessments — is real, but it is also exactly what you would expect given what we know about the environmental determinants of cognitive test performance. It does not require, and is not evidence for, genetic explanations of cross-national differences.

The most responsible use of the Lynn–Vanhanen tradition is as a starting point for better research rather than as a definitive dataset. The studies it points to are real; the aggregation method is flawed; the causal interpretation its originators promoted is not supported by the data. That is a precise and defensible scientific summary — and it is more useful than either uncritical acceptance or wholesale dismissal.


References

  1. Lynn, R., & Vanhanen, T. (2002). IQ and the Wealth of Nations. Praeger.
  2. Wicherts, J.M., Dolan, C.V., & van der Maas, H.L.J. (2010). A systematic literature review of the average IQ of sub-Saharan Africans. Intelligence, 38(1), 1–20.
  3. Barnett, S.M., & Williams, W. (2004). National intelligence and the Emperor's new clothes. Contemporary Psychology: APA Review of Books, 49(4), 389–396.
  4. Hanushek, E.A., & Woessmann, L. (2008). The role of cognitive skills in economic development. Journal of Economic Literature, 46(3), 607–668.
  5. Flynn, J.R. (2007). What Is Intelligence? Beyond the Flynn Effect. Cambridge University Press.
  6. Becker, D. (2019). National IQs updated: A compilation of national cognitive ability estimates. Ulster Institute for Social Research.