The Logic of Sampling
Having specified what is to be studied, the next task in the research process is selecting a sample. Sampling affords the social scientist the capability of describing a larger population based on only a selected portion of that population. Two broad types of sampling methods are probability sampling and nonprobability sampling. Some sampling methods are more accurate than others; the early failures and the more recent successes of political pollsters illustrate the refinement of sampling procedures and the increasing reliance on probability methods.
Many research situations preclude the use of probability sampling, especially when it is impossible to create lists of the elements to be sample. In these situations, researchers employ nonprobability sampling strategies. Reliance on available subjects is used often, but is a very risky method because it limits the generalizability of the results. Purposive (or judgmental) sampling occurs when a researcher selects a sample based on her or his own knowledge of the population, its elements, or the nature of the research study. Snowball sampling is particularly useful when the members of a special population are difficult to locate. It involves collecting data on those members of the target population one is able to find and then asking these respondents to provide information needed to locate other members of that population.
Quota sampling strives to attain representativeness by constructing a matrix representing one or more characteristics of the target population, and then collecting data from persons having the required characteristics of a given cell. But it is often difficult to secure adequate information to create accurate quota frames (the proportions that different cells represent), and bias remains a major problem. As a final nonprobability design, field researchers may sometimes use informants, who are members of the group under study, to provide information about the group itself rather than about themselves. It is best to select informants who adequately represent the group under study, but realize that informants' willingness to respond to outsiders may reflect their marginal status within the group.
Because humans are very heterogeneous, it is important to select samples that adequately represent the population under study. Probability sampling allows the selection of samples not subject to the researcher's biases. A sample is considered representative of the population from which it is selected if the aggregate characteristics of the sample closely approximate those same aggregate characteristics in the population. Hence, probability sampling occurs when every element has an equal chance of being selected (an EPSEM sample). Realize that probability samples rarely represent perfectly the populations from which they are drawn. However, probability samples are more representative than other types of samples, and probability theory permits a statistical estimate of the accuracy or representativeness of a sample.
Probability sampling requires familiarity with several components. An element is that unit about which information is collected, and it is typically the unit of analysis of the study. A population is the theoretically specified aggregation of the elements in a study, and a study population is that aggregation of elements from which the sample is actually selected. A sampling unit is that element or set of elements considered for selection in some stage of sampling. Random selection, which means that each element has an equal chance of selection, is central to this process.
The elements in a population may be described in terms of their individual attributes on a particular variable. The summary description of a given variable for a given population is known as a parameter, and the analogous description for a sample is known as a statistic.
The calculation and interpretation of sampling error lies at the root of probability sampling theory and is grounded in the principle of the sampling distribution. The sampling distribution dictates that sample statistics will be normally distributed around the population mean. Probability theory also provides formulas for estimating how closely the sample statistics are clustered around the true population value. This is accomplished through the use of confidence levels (how confident we are that our sample estimate is within a set number of sampling errors of the population value) and confidence intervals (the range between the upper and lower values for a given level of confidence). Random selection is the key to all this. In random selection, each element has an equal chance of selection independent of any other event in the selection process. We use these methods to eliminate researcher bias in sample selection and also to use probability theory to provide estimates of population parameters and estimates of error.
However, probability theory operates on a number of assumptions that are seldom met in real-life survey situations. For example, an infinitely large population, an infinite number of samples, and sampling with replacement are assumed. Probability theory is useful only to the extent that the researcher can actually select a probability sample. A sampling frame is the list of elements from which a probability sample is selected. Difficulties in securing an adequate sampling frame put constraints on the requirement to select a probability sample. Three guidelines are crucial in considering how accurately sampling frames represent populations. First, findings based on a sample can be taken as representative only of the aggregation of elements that compose the sampling frame. Second, sampling frames do not truly include all the elements that their names might imply. Third, all elements must have equal representation in the sampling frame.
Several types of probability sampling designs exist. Simple random sampling is generally assumed in probability applications. This strategy involves assigning a number to each element and using a table of random numbers to select elements for the sample. But simple random sampling is seldom used because it is not generally feasible and it may not be the most accurate method. Systematic sampling is generally preferred over simple random sampling because of simplicity. Once the sampling ratio (the proportion of the population) is determined, the researcher simply selects the elements corresponding to the sampling interval (the distance between elements selected), with the first element selected with a table of random numbers. But researchers must be on guard for periodicity, a cyclical pattern that coincides with the sampling interval.
Stratification may be used with both these strategies, and it increases representativeness by first organizing the sampling frame into homogeneous groups reflecting variables that may be related to the variables under study. Such homogeneous groupings reduce sampling error.
Multistage cluster sampling is most useful when no master list exists to provide a sampling frame. The researcher employs multiple sampling units such that groups of elements are sampled at different stages; the element is the final stage. But this design produces higher sampling errors because each stage yields additional sampling error. A general guideline is to maximize the number of clusters selected while decreasing the number of elements selected per cluster, because clusters tend to be homogeneous. Cluster designs may employ either simple random or systematic sampling, with or without stratification at any of the stages.
When clusters are of varying sizes, it is important to vary the procedure by employing probability proportionate to size sampling. In this modification, larger clusters are given a greater chance of being selected, but the same number of elements are still selected from each cluster. This procedure results in an equal chance of selection for each element. Sometimes a researcher may deliberately or inadvertently over represent a segment of the population. When this occurs, weighting can be used to correct for the disproportionate sampling.