In my work researching and
writing about occupations, I encounter a lot of statistics. And this year, with
an election coming ever closer, we are likely to see the results of many
surveys of voters. I want to emphasize that numbers reported from surveys tell
less than half of the story. They are the results of mere tabulation. What
makes the numbers meaningful is the nature of the sample. Or, to put it another
way, you can’t understand what a study
tells you unless you understand the sample it’s based on.
To illustrate this point, I like
to bring up two anecdotes. I think you’ll find them interesting even if (maybe especially if) you’ve never taken a
course in statistics.
The first anecdote is based on
the research that social scientists did when they essentially invented the
science of jury selection. This happened in 1972, when seven radicals were
about to go on trial in Harrisburg,
Pennsylvania, for conspiracy to
raid draft boards and destroy records, among other planned antiwar actions.
This was a time of great political polarization and in a place that is
characterized by political conservatism. The researchers, working on behalf of
the antiwar activists’ lawyers, wanted to find a way to predict the political
leanings of jurors so the lawyers could seat a jury that would be less
conservative than one chosen at random from the Harrisburg population. The
lawyers would not be able to ask the potential jurors flat-out about their
politics; instead, they needed an indirect way to assess this.
The social scientists surveyed
citizens of that community to identify their political attitudes and then
correlated these attitudes with other facts about the jurors. They discovered
that the surest way to predict a Harrisburger’s politics was to ask how much
education the person had: The more educated the person was, the more conservative that person’s
politics.
The researchers eventually
realized why this was so: Young people in Harrisburg
who became highly educated acquired the occupational mobility to leave the
region if they were not conservative;
therefore, the sample of highly educated people who remained had to be quite
conservative. If the results of their survey surprised you, it’s because you
didn’t stop to think about what the
sample really was: not everyone who ever lived in Harrisburg, but rather those who remained—by choice or because
they were less able to move out.
The second anecdote is from the
Second World War. British bomber planes flying missions over Germany were often
shot down by anti-aircraft fire. The Royal Air Force wanted to shield
vulnerable parts of the aircraft with armor, but they wanted to use a minimal
amount of armor to avoid weighing down (and slowing down) the planes. The RAF
commissioned the statistician Abraham Wald to examine the planes after bombing
missions to determine where on the planes’ undersides it was most critical to
apply anti-flak armor.
Wald counted bullet holes in the
planes and recommended that armor be applied where there were the fewest bullet holes.
This may seem like a mistake to
you. Maybe you’re thinking that armor is supposed to protect against
anti-aircraft fire, so shouldn’t the RAF have armored the places that got hit
the most?
Again, consider the sample: Wald was not looking at every bomber that flew
a mission, but rather those that returned
from missions. Bombers that got shot down were removed from the sample. The
bombers that returned and made up the sample were the ones that were hit only in
places that were not critical for staying airborne. The places where the surviving planes were not hit, therefore, were the most likely to be critical and in need
of armor.
If you’re wondering why I’m
writing about this subject in a blog about careers, consider this blog entry a
look at how complicated statisticians’ work can be, not so much in terms of the
mathematics, but rather in terms of the concepts that must be understood.
The nonstatistical lesson to
take away from these anecdotes is that you
have to be careful when you make a generalization about a population—for
example, the notion that educated people are more liberal politically (or, to
draw on today’s politics, the notion that people of one religion are a greater
threat to security). Such generalizations may be true in some global sense, but
the particular population you are dealing
with may really be a subset of the global population, either self-selecting or
selected by some exterior factor you have not considered. The global
generalization may be a poor fit for this subset, or the subset may be a
misleading basis for a global generalization.