Thursday, July 21, 2016

The Importance of the Sample

In my work researching and writing about occupations, I encounter a lot of statistics. And this year, with an election coming ever closer, we are likely to see the results of many surveys of voters. I want to emphasize that numbers reported from surveys tell less than half of the story. They are the results of mere tabulation. What makes the numbers meaningful is the nature of the sample. Or, to put it another way, you can’t understand what a study tells you unless you understand the sample it’s based on.
To illustrate this point, I like to bring up two anecdotes. I think you’ll find them interesting even if (maybe especially if) you’ve never taken a course in statistics.
The first anecdote is based on the research that social scientists did when they essentially invented the science of jury selection. This happened in 1972, when seven radicals were about to go on trial in Harrisburg, Pennsylvania, for conspiracy to raid draft boards and destroy records, among other planned antiwar actions. This was a time of great political polarization and in a place that is characterized by political conservatism. The researchers, working on behalf of the antiwar activists’ lawyers, wanted to find a way to predict the political leanings of jurors so the lawyers could seat a jury that would be less conservative than one chosen at random from the Harrisburg population. The lawyers would not be able to ask the potential jurors flat-out about their politics; instead, they needed an indirect way to assess this.
The social scientists surveyed citizens of that community to identify their political attitudes and then correlated these attitudes with other facts about the jurors. They discovered that the surest way to predict a Harrisburger’s politics was to ask how much education the person had: The more educated the person was, the more conservative that person’s politics.
The researchers eventually realized why this was so: Young people in Harrisburg who became highly educated acquired the occupational mobility to leave the region if they were not conservative; therefore, the sample of highly educated people who remained had to be quite conservative. If the results of their survey surprised you, it’s because you didn’t stop to think about what the sample really was: not everyone who ever lived in Harrisburg, but rather those who remained—by choice or because they were less able to move out.
The second anecdote is from the Second World War. British bomber planes flying missions over Germany were often shot down by anti-aircraft fire. The Royal Air Force wanted to shield vulnerable parts of the aircraft with armor, but they wanted to use a minimal amount of armor to avoid weighing down (and slowing down) the planes. The RAF commissioned the statistician Abraham Wald to examine the planes after bombing missions to determine where on the planes’ undersides it was most critical to apply anti-flak armor.
Wald counted bullet holes in the planes and recommended that armor be applied where there were the fewest bullet holes.
This may seem like a mistake to you. Maybe you’re thinking that armor is supposed to protect against anti-aircraft fire, so shouldn’t the RAF have armored the places that got hit the most?
Again, consider the sample: Wald was not looking at every bomber that flew a mission, but rather those that returned from missions. Bombers that got shot down were removed from the sample. The bombers that returned and made up the sample were the ones that were hit only in places that were not critical for staying airborne. The places where the surviving planes were not hit, therefore, were the most likely to be critical and in need of armor.
If you’re wondering why I’m writing about this subject in a blog about careers, consider this blog entry a look at how complicated statisticians’ work can be, not so much in terms of the mathematics, but rather in terms of the concepts that must be understood.
The nonstatistical lesson to take away from these anecdotes is that you have to be careful when you make a generalization about a population—for example, the notion that educated people are more liberal politically (or, to draw on today’s politics, the notion that people of one religion are a greater threat to security). Such generalizations may be true in some global sense, but the particular population you are dealing with may really be a subset of the global population, either self-selecting or selected by some exterior factor you have not considered. The global generalization may be a poor fit for this subset, or the subset may be a misleading basis for a global generalization.


  1. An excellent information provided thanks for all the information i must say great efforts made by you. thanks a lot for all the information you provided.please visit:Packers And Movers Bangalore

  2. That is underestimated importance. You can only feel how it works by yourself. You can do many things, write effective statement or doing yoga ,but without sample you definitely get stuck or even fail.

  3. I totally agree with you on the fact that one should develop his skills and learn to check what he can achieve with those set of skills. A continuous up-gradation of your skill set is always required. For example cheap essay writing service is one of the most competitive niche. But without competitors you can't grow anyway