The dark history of IQ tests Stefan C. Dombrowski

In 1905, psychologists
Alfred Binet and Théodore Simon

designed a test for children
who were struggling in school in France.

Designed to determine which children
required individualized attention,

their method formed
the basis of the IQ test.

Beginning in the late 19th century,

researchers hypothesized that cognitive
abilities like verbal reasoning,

working memory, and visual-spatial skills

reflected an underlying
general intelligence, or g factor.

Simon and Binet designed a battery of
tests to measure each of these abilities

and combine the results
into a single score.

Questions were adjusted
for each age group,

and a child’s score reflected how they
performed relative to others their age.

Dividing someone’s score by their age
and multiplying the result by 100

yielded the intelligence quotient, or IQ.

Today, a score of 100 represents
the average of a sample population,

with 68% of the population
scoring within 15 points of 100.

Simon and Binet thought the skills
their test assessed

would reflect general intelligence.

But both then and now,

there’s no single agreed upon
definition of general intelligence.

And that left the door open
for people to use the test

in service of their own preconceived
assumptions about intelligence.

What started as a way to identify
those who needed academic help

quickly became used to sort
people in other ways,

often in service of deeply flawed
ideologies.

One of the first large-scale
implementations

occurred in the United States during WWI,
when the military used an IQ test

to sort recruits and screen
them for officer training.

At that time, many people
believed in eugenics,

the idea that desirable
and undesirable genetic traits

could and should be controlled
in humans through selective breeding.

There were many problems
with this line of thinking,

among them the idea that intelligence
was not only fixed and inherited,

but also linked to a person’s race.

Under the influence of eugenics,

scientists used the results
of the military initiative

to make erroneous claims
that certain racial groups

were intellectually superior to others.

Without taking into account
that many of the recruits tested

were new immigrants to the United States

who lacked formal education
or English language exposure,

they created an erroneous
intelligence hierarchy of ethnic groups.

The intersection of eugenics and IQ
testing influenced not only science,

but policy as well.

In 1924, the state of Virginia
created policy

allowing for the forced sterilization
of people with low IQ scores—

a decision the United States
Supreme Court upheld.

In Nazi Germany, the government
authorized the murder of children

based on low IQ.

Following the Holocaust
and the Civil Rights Movement,

the discriminatory uses of IQ tests

were challenged on both
moral and scientific grounds.

Scientists began to gather evidence
of environmental impacts on IQ.

For example, as IQ tests were periodically
recalibrated over the 20th century,

new generations scored consistently
higher on old tests

than each previous generation.

This phenomenon,
known as the Flynn Effect,

happened much too fast to be caused
by inherited evolutionary traits.

Instead, the cause was likely
environmental—

improved education,
better healthcare, and better nutrition.

In the mid-twentieth century,

psychologists also attempted
to use IQ tests

to evaluate things other than
general intelligence,

particularly schizophrenia, depression,
and other psychiatric conditions.

These diagnoses relied in part on
the clinical judgment of the evaluators,

and used a subset of the tests
used to determine IQ—

a practice later research found does
not yield clinically useful information.

Today, IQ tests employ many similar
design elements and types of questions

as the early tests,

though we have better techniques for
identifying potential bias in the test.

They’re no longer used to diagnose
psychiatric conditions.

But a similarly problematic practice
using subtest scores

is still sometimes used to diagnose
learning disabilities,

against the advice of many experts.

Psychologists around the world
still use IQ tests

to identify intellectual disability,

and the results can be used
to determine

appropriate educational support,
job training, and assisted living.

IQ test results have been used
to justify horrific policies

and scientifically baseless ideologies.

That doesn’t mean the test itself
is worthless—

in fact, it does a good job of measuring
the reasoning and problem-solving skills

it sets out to.

But that isn’t the same thing
as measuring a person’s potential.

Though there are many complicated
political, historical, scientific,

and cultural issues wrapped up
in IQ testing,

more and more researchers
agree on this point,

and reject the notion that individuals
can be categorized

by a single numerical score.

1905 年,心理学家
阿尔弗雷德·比奈 (Alfred Binet) 和西奥多·西蒙 (Théodore Simon)


法国在学校苦苦挣扎的儿童设计了一项测试。 他们的方法

旨在确定哪些孩子
需要个性化关注,

他们的方法
构成了智商测试的基础。

从 19 世纪后期开始,

研究人员假设
语言推理、

工作记忆和视觉空间技能等认知能力

反映了潜在的
一般智力或 g 因素。

Simon 和 Binet 设计了一系列
测试来衡量这些能力中的每一个,

并将结果组合
成一个分数。 每个年龄段的

问题都进行了调整

,孩子的分数反映了
他们相对于同龄人的表现。

将某人的分数除以他们的年龄
并将结果乘以 100

得出智商或智商。

今天,100 分代表
样本人口的平均值

,68% 的人口
得分在 100 分的 15 分之内。

西蒙和比内认为
他们的测试评估的技能

将反映一般智力。

但无论是过去还是现在,

通用智能的定义都没有统一的共识。


为人们使用

测试服务于他们自己对智力的先入为主的
假设敞开了大门。

最初是一种
识别需要学术帮助的

人的方法,很快就被用来以其他方式对人们进行分类,

通常是为存在严重缺陷的
意识形态服务。

第一次大规模
实施之一

发生在第一次世界大战期间的美国,
当时军方使用智商测试

对新兵进行分类并筛选
他们以进行军官培训。

当时,许多人
相信优生学,

即理想
和不理想的遗传特征

可以而且应该
通过选择性育种来控制人类。 这种思路

存在很多问题

其中
智力不仅是固定的、遗传的,

而且与人的种族有关。

在优生学的影响下,

科学家们利用军事倡议的结果

错误地
声称某些种族群体

在智力上优于其他种族。

没有考虑
到许多被测试的新兵

缺乏正规教育
或英语语言的新移民,

他们创造了一个错误
的种族群体智力等级。

优生学和智商测试的交集
不仅影响了科学,

也影响了政策。

1924 年,弗吉尼亚州
制定了

允许对
智商低的人进行强制绝育

的政策——美国最高法院维持了这一决定

在纳粹德国,政府
授权

根据低智商谋杀儿童。

在大屠杀
和民权运动之后,

智商测试的歧视性使用


道德和科学方面都受到了挑战。

科学家们开始收集
环境对智商影响的证据。

例如,随着 20 世纪 IQ 测试定期
重新校准,

新一代人
在旧测试中的得分始终

高于前一代。

这种
被称为弗林效应的现象

发生得太快,不可能是
由遗传的进化特征引起的。

相反,原因很可能是
环境问题——

改善教育、
改善医疗保健和改善营养。

在 20 世纪中叶,

心理学家还
尝试使用智商测试

来评估一般智力以外的事物

特别是精神分裂症、抑郁症
和其他精神疾病。

这些诊断部分依赖于
评估者的临床判断,

并使用了用于确定智商的测试子集——

后来的研究发现这种做法并
不能产生临床有用的信息。

今天,智商测试采用了许多与早期测试相似的
设计元素和问题类型

尽管我们有更好的技术来
识别测试中的潜在偏见。

它们不再用于诊断
精神疾病。

但是,在许多专家的建议下,使用分测验分数的类似问题实践

有时仍被用于诊断
学习障碍

世界各地的心理学家
仍在使用智商测试

来识别智力障碍

,其结果可
用于确定

适当的教育支持、
工作培训和辅助生活。

智商测试结果被
用来证明可怕的政策

和科学上毫无根据的意识形态是正当的。

这并不意味着测试本身

毫无价值——事实上,它在
衡量推理和解决问题的能力

方面做得很好。

但这
与衡量一个人的潜力不同。

尽管智商测试涉及许多复杂的
政治、历史、科学

和文化问题
,但

越来越多的研究人员
同意这一点,

并拒绝个人
可以

通过单一数字分数分类的观念。