Understanding Survey Sampling April 2017
Using survey questions, the Strategic Business Insights white paper "Garbage In, Garbage Out: Survey Question Design" covers some fundamental considerations about measurement. Writing effective survey questions reduces measurement error in survey efforts and increases researchers' ability to draw conclusions about concepts an organization cares about, such as satisfaction, engagement, or various measures of the self-concept (for example, how frugal a person is). By contrast, many organizations' frequently observed practice of fielding survey questions containing double barrels—which Garbage In, Garbage Out discusses—limits researchers' ability to draw conclusions, because the survey questions contain more than one measurement concept.
This paper covers several fundamental concerns about representation in survey research that affects inferences an organization can make about a target population of interest. The failure of survey organizations and pollsters to predict the victory of Donald Trump in the 2016 presidential election resulted in renewed interest in questions of representation: Who, exactly, responds to surveys, including opinion polls, and why may conclusions based on surveys mislead? Together, measurement (writing questions) and representation (involving respondents in surveys) make up survey statistics, the calculated numbers that give information about a phenomenon of interest. The goal of most survey research firms is to minimize the amount of error survey statistics may contain.
Planning a survey research study from a representational perspective involves at least the following basic concerns:
- The target population. This population comprises the people who are the focus of a study. Of note here is the reality that some target populations may meet very narrow definitions. For example, an organization may have an interest in studying the views of living history professors who were born in Germany but emigrated to the United States. In the case of narrowly defined target populations, the organization may be able to reach an entire population, eliminating the need to sample from the population. Sending a survey request to a population as a whole is possible if contact information exists in some central source that is relatively complete. For example, using mailing lists, professional societies frequently survey their entire membership base.
- Sampling. Most of the time, however, a survey project does not allow for the study of an entire population. For example, a municipal authority may be interested in studying the shopping habits of residents in a medium-size town in Northern California. It is unlikely that one could easily ask all residents of a town to participate. Similarly, it would be difficult to survey all adults in the United States about a topic of interest, such as whom anyone plans to vote for in an upcoming national election. When the target population is too large to include everyone in the actual study, a need exists to draw a sample—a subset of the population that is representative of the population. A sample is representative of the target population if it is generally unbiased. Bias may come in if the sample contains too many members of a certain demographic, in comparison with the target population. For example, if a sample of the US adult population contained 60% female retirees, one would be unable to draw valid conclusions about the opinions of the US adult population, because the US population does not contain 60% female retirees, and the sample is missing other important demographic groups. Estimates of percentages of demographic groups to expect in a target population must exist in some reliable form. In the case of the US adult population, the Census Bureau collects sufficiently accurate information about population parameters to use in survey studies.
- Sampling frames. To achieve relatively unbiased sampling, the selection of people into a sample needs to be based on a sampling frame. Existence of a sampling frame allows for probability sampling. The word probability in this context means that every person in the sample has a known likelihood of selection into the study from the general population. Generally, every member of the target population should have an equal, nonzero chance of being in the sample. An analogy here is to taking a random draw of names from a hat. The names in the hat are the sampling frame, allowing a person to draw a probability sample. Without names in a hat, it would be difficult to know the characteristics of the target population, including the basic number of people in the target population and their characteristics. Sampling frames are biased when they fail to contain members of the population that the sample should cover, resulting in biased sampling. An example of a biased sampling frame would be to ask only shoppers at a particular store to participate in the study of shopping habits in a medium-size town in Northern California. Shoppers at a particular store constitute a nonprobability sample of the town's shoppers, because one cannot know ahead of time what the probability of someone's shopping at that location is, relative to shopping habits of the entire town. The sample would be biased because many residents do not shop at that location and are therefore excluded a priori from the study. Selection of a particular store means that many residents have a zero chance of being in the study, violating probability sampling. This type of sampling is sometimes convenience sampling, because a survey researcher finds it convenient to stand at a shopping location to ask questions. Many equivalent convenience samples exist, such as samples from biased sampling frames involving names on mailing lists and catalog subscriptions. If one were to compute the results of the shopping survey based on shoppers at only one shopping location, one could not conclude what shoppers in the town as a whole do. One could only conclude, under certain conditions, what shoppers at that particular store do or believe (and only if one stopped at random times of the week and day to interview them). A sampling frame resulting in a less biased sample of shoppers would be a complete list of all the town's residents, maybe from city utility records. However, tourists in the town would not be covered—a fact that may or may not be a problem, depending on the goals of the research and the intended target population. For the US adult population, an appropriate sampling frame would be a list of all households, such as the one provided by the United States Postal Service Master Address file. Once one is in possession of this very long list, one is in a position to randomly draw a probability (or random) sample. A side note: Before the proliferation of cell phones, a probability sample of US adults was possible to draw by engaging in random-digit-dialing of landline phones. Coverage of the population through landline phones used to be nearly complete for the Untied States, until many households disconnected their landlines in favor of cell-phone-only telecommunications. The undercoverage of cell-phone-only households biases random-digit-dialing samples today, resulting in a need to hand-dial cell-phone-only households—and increasing costs. However, it is possible to create representative online panels of US households for survey research using the United States Postal Service Master Address file. When researchers randomly identify a household for inclusion in the online panel, a research company can ensure the household is connected to the internet by sending a laptop and internet connection, ensuring coverage of everyone who has been invited to participate.
- Response bias. Once an organization has identified a sample of potential respondents, more opportunity for bias arises from the simple fact that not everyone in a random sample will want to participate in the study. An organization may invite everyone in the representative sample to participate, but a systematic tendency will exist for some people to respond to the invitation and for some other people not to respond. Many busy business professionals from the sample are little likely to respond, and retirees are often more likely to respond. Because the original sample was random, however, organizations know from the census statistics how many retirees should have responded to the study invitation.
One can make statistical adjustments after receiving the results that reweigh the data so that an overrepresentation of retirees and an underrepresentation of busy business professionals become less problematic in computing the results. In the end, organizations wish the demographic makeup of the respondents to match the characteristics of the general population. One can generally do this kind of statistical adjustment and reweighing of respondents in probability samples only where every person in the sample had a known probability of participating (from population demographics)—not from convenience samples (as in the shopping sample), where the characteristics of all store shoppers are unknown and not statistically relatable to town shoppers as a whole.
In summary, representation involves these steps:
Representation = Target Population → Sampling Frame → Sample → Respondents → Statistically Adjusted Responses.
On the basis of the above discussion, one can make educated guesses about the failures of pollsters to predict Donald Trump's victory. Perhaps, the original samples drawn by the pollsters were biased in some way, in comparison with the characteristics of registered voters. Additionally, only certain subgroups may have responded to the survey invitation, distorting results. Finally, much discussion about concealment of true preferences on surveys has emerged since the Trump election. Citizens are wary of answering survey calls, because they receive too many, and when they do, they may conceal their true beliefs and preferences. Survey research is likely to continue but will see significant changes and rediscovery of best practices in the future to approach valid results.