This page complements our survey design advice page and contains more detailed information about various research methods topics:
- Sampling Strategies (probability and nonprobability sampling)
- Determining Sample Size (factors to consider when choosing a sample size)
- Response Rate Strategies (tips for increasing response rates)
- Survey Sample Characteristics (sample representativeness, nonresponse bias)
- Types of Response Scales (Likert, semantic differential rating, gap analysis)
- Assessing Question Validity
- Assessing Question Reliability
- Additional Resources
Sampling Strategies
While there are some sampling strategies that are more commonly utilized than others, determining which strategy to use is dependent upon the intent and conditions of your project. To aid in the comprehension of the different strategies available, the following list has been sorted based on those used for quantitative research (probability sampling) and those used for qualitative research (nonprobability sampling).
- Probability Sampling: Beyond the investigation of a research hypothesis, quantitative research projects often have an implicit goal of identifying knowledge from a sample which can be applied to an entire population. As a result, the sampling strategies employed are concerned with, and designed to allow for, generalization of the results to a wider population. These strategies rely upon random processes to increase such generalizability.
- Simple random sampling: After defining the population, and assigning a number to each individual, a researcher uses a random list of numbers to decide which individuals to include in the sample. Several software packages now have the ability to assist researchers in randomly selecting a sample.
- Systematic sampling: After defining the population, and assigning a number to each individual, a researcher begins at a random starting point and selects individuals based on a predetermined interval (e.g. every fourth individual); similar to simple random sampling, but with an interval of selection (system) dictating selection of the sample. Not advisable when the population list is organized in a meaningful and/or purposeful way (i.e. spousal pairs), as the chosen interval may produce a skewed sample.
- Stratified sampling: After defining the population, a researcher divides the population into two or more subpopulations, using one or more characteristics of the population as the basis for stratification. Once the population stratum(s) (subpopulation groups) have been created, the researcher then uses simple random sampling or systemic sampling to create the sample. Applicable when a population characteristic (i.e. sex, age) is thought to impact the phenomenon being studied. Assuming accurate knowledge of the population, stratums allow a researcher to ensure the sample mirrors the population on the basis of the characteristics chosen.
- Cluster sampling: A method for drawing samples when one of two obstacles exists: (1) a good list of a dispersed population does not exist, and/or (2) the cost to reach individuals in a dispersed sample would be very high. Surveying college students will be used as an example. After defining the population as best they can, a researcher randomly selects a cluster of colleges from a list. Within this cluster of colleges, a researcher then randomly selects clusters of students and surveys them. Now, rather than potentially contacting every college to survey one or two students, a researcher contacts a portion of colleges and surveys a larger, randomly selected sample of students at each.
- Nonprobability Sampling: Many qualitative research projects have an implicit goal of creating a deeper understanding of a critical issue. As a result, the sampling strategies employed are not concerned with, or designed to allow, generalization of the results to a wider population. Instead, these nonprobabilistic strategies focus on the extent to which the sample chosen provides illuminating information on the phenomenon being studied.
- Haphazard, accidental, or convenience sampling: Identified by many names, this strategy involves a researcher haphazardly selecting potential respondents based solely on the convenience of access to them. This strategy can produce ineffective, highly unrepresentative samples and as a result is not recommended unless no other options are feasible. Commonly encountered examples of haphazard sampling include person-on-the-street interviews and television interviews.
- Quota sampling: Requiring a slight modification of haphazard sampling, quota sampling involves a researcher to first identifying relevant categories of people (e.g. male and female) to sample. The researcher than determines a quota to meet in gathering responses from those categories, and accomplishes this task using haphazard methods.
- Purposive or judgmental sampling: This strategy is used in situations where a researcher believes some respondents may be more knowledgable than others, and requires an expert to use their judgment in selecting cases with that purpose in mind. The use of judgmental sampling is appropriate in three situations: (1) to select unique respondents who are especially informative; (2) to select members of a difficult-to-reach, specialized population; (3) to identify particular types of respondents for in-depth investigation.
- Snowball sampling: Snowball sampling is a method for effectively identifying and sampling the respondents in a network. Initial research participants recruit additional participants through their social networks. The crucial feature is that each person or unit is connected with another through a direct or indirect linkage.
- Deviant case sampling: Deviant case sampling is similar to purposive sampling in that it is used to seek out respondents that differ from the dominant pattern or that differ from the predominant characteristics of other respondents. This strategy differs from purposive sampling in that the goal is to locate a collection of unusual, different, or peculiar respondents that are not representative of the larger population.
- Sequential sampling: A researcher tries to find as many relevant respondents as possible, with the only limit being the exhaustion of relevant respondents or resources. This is similar to judgmental sampling, in that it is entirely dependent upon the judgement of the researcher. When the researcher deems there is no of new information left to be collected, the process is concluded.
Determining Sample Size
Many times those conducting surveys are told that larger samples are always preferable to smaller ones. This is not necessarily the case. An array of factors, including the degree of accuracy desired, degree of variability in the population, and the analysis the results will be subject to, should be considered when deciding upon a sample size.
- Degree of accuracy desired: Related to the subject of Power Analysis (which is beyond the scope of this site), this method requires the researcher to consider the acceptable margin of error and the confidence interval for their study. The online resource from Raosoft at the bottom of this page uses this principle.
- Degree of variability (homogeneity/heterogeneity) in the population: As the degree of variability in the population increases, so too should the size of the sample increase. The ability of the researcher to take this into account is dependent upon knowledge of the population parameters.
- Number of different variables (subgroups) to be examined: As the number of subgroups to be examined increases, so too should the size of the sample increase. For example, should a researcher wish to examine the differences between ethnicities for a given phenomenon, the sample must be large enough to allow for valid comparisons between each ethnic group.
- Sampling ratio (sample size to population size): Generally speaking, the smaller the population, the larger the sampling ratio needed. For populations under 1,000, a minimum ratio of 30 percent (300 individuals) is advisable to ensure representativeness of the sample. For larger populations, such as a population of 10,000, a comparatively small minimum ratio of 10 percent (1,000) of individuals is required to ensure representativeness of the sample.
- Response rate and oversampling: Are all the individuals in your sample likely to complete your survey? If not, oversampling (sampling more individuals than would otherwise be necessary) may be required. Here the goal is to ensure that a given minimum raw count of respondents is met. While this is straightforward for a project using simple random sampling, this can become increasingly complex as the number of variables to be examined grows, since the researcher must ensure that each critical subgroup attains the required response rate.
- Statistical analysis desired: Specific minimum sample sizes are required for some statistical procedures, particularly those involving the investigation of multiple variables.
Response Rate Strategies
Even the most carefully crafted survey cannot yield reliable results if the response rate is lacking in some way. The tendency of individuals to complete a survey can be modified using a variety of methods. A list of several of such options can be found below.
- Pre-mails: Whenever possible, provide the sample population with advanced notice that they should expect to be contacted for the purposes of completing a survey. This is particularly effective if an exact date can be specified in the pre-mail, as it allows those who are interested in participating to actively watch their email. The pre-mail also gives you an opportunity to increase the motivation of the sample population to complete the survey by making a substantive case for its importance and communicating a sense of urgency for completion by detailing the window of time for doing so (see “Usefulness of results” and “Window of time” below).
- Reminders: While disinterest plays a role in low response rates, so too does forgetfulness. Busy schedules and full inboxes can cause a request to complete a survey to become physically and mentally lost. Reminders are an excellent way to combat this problem, particularly when they include a link to the survey which precludes the potential respondent from having to search for the original invitation. However, it is important to be conservative in the use of reminders. Sending more than one per week threatens to transform helpful and friendly reminders into annoying and nagging messages. The salience of reminders can be increased if they are sent from figures the sample population will perceive as being influential (see “Involvement of influential figures” below).
- Window of time: Though windows of time are more of an indirect means of impacting response rates, two contradictory methods have the potential to garner more responses: limiting windows and extending windows.
- When first contacting the sample population about a survey, it is important to give them a multitude of reasons to make responding a top priority. To that end, a limited window, which provides the sample population no more than three or four weeks to respond, can be useful. This helps combat the tendency to put off until tomorrow what does not need to be done today.
- Alternatively, the ability to extend the window of a survey after the administration has begun can improve the response rate. Assuming that the original window was simply poorly timed for the sample population, extending the window (and providing a rationale and encouragement for completion via a reminder) can allow those who were interested, but unable to complete the survey in the original window of time, the opportunity to do so.
- Usefulness of results: It is important to provide the sample population with a plethora of reasons to perceive yours as valuable. By convincing the potential respondents that the results will have a real impact on something they care about, the odds of them choosing to complete the survey increases. Will your survey improve a service they utilize? Will the results be shown to people of perceived importance? Make sure that your sample population is aware that the results will do more than sit in a spreadsheet.
- Involvement of influential figures: There are times when the individual who asks the sample population to complete the survey provides a crucial element of motivation. Is there a particular individual, or set of individuals, whose opinion is of great importance to your sample population? If so, convincing them to write in support of your survey can be very beneficial. If you are lucky to have more than one influential figure to utilize, assigning them each to send separate reminders increases the effectiveness of those reminders.
- Incentives: Literature on the topic of incentives suggests that, while they are an effective means of increasing response rates, not all incentives are created equal. Incentives that are given unconditionally to all potential respondents may be more effective than those that are conditional upon the completion of the survey or given through a random drawing. Additionally, though incentives may increase response rates, it often requires asking respondents for identifying information, a practice which can decrease the willingness of potential respondents to complete a survey.
- Identity of sample population: The extent to which a sample population is motivated to complete a survey can have an important impact on the proportion that follow through. As a result, consulting any existing literature that addresses the issue of response rates for specific populations is advisable. For example, institutional-related surveys of current college students often achieve higher response rates than institutional-related surveys of alumni.
Survey Sample Characteristics
Obtaining an adequate response rate requires being attentive to more than just the percent of the sample who have responded to the survey. Factors such as the response rates to individual questions and representativeness of the survey sample are also important to take into consideration.
- Response rate (quantity): A survey response rate is defined as the number of responses received divided by the total number of individuals invited to complete the survey. Beyond this percentage, it’s also important to look more closely at the number of responses to individual survey questions. If a large number of respondents dropped off after answering just a few questions and very few people actually completed the full survey, it will be difficult to rely on the data you have gathered to make any meaningful decisions. Additionally, when presenting the survey results, it is often helpful to provide additional context by reporting the number of responses to individual questions.
- Representativeness of sample population (quality): Unless a researcher is distributing a survey to an absolutely homogenous sample population, there is reason to look beyond the simple response rate to determine if the survey sample is adequately representative. Consider the data in the examples below.
Example 1
Population Statistics | Respondent Statistics |
---|---|
Gender Ratio (Man/Woman/Nonbinary): 50/40/10 | Gender Ratio (Man/Woman/Nonbinary): 80/20/0 |
Population Size: 875 | Sample Size: 656 (75% response rate) |
Example 2
Population Statistics | Respondent Statistics |
---|---|
Gender Ratio (Man/Woman/Nonbinary): 50/40/10 | Gender Ratio (Man/Woman/Nonbinary): 60/35/5 |
Population Size: 875 | Sample Size: 350 (40% response rate) |
- While the survey in Example 1 gathered more responses than the survey in Example 2, the results collected from Example 1 are unlikely to be as representative because the disparity between the gender demographics of the population and respondent groups is greater. In this case the quality of the information gathered in Example 2 will likely be of more use to the researcher, despite the fact that the quantity of the information (response rate) is lower.
- Nonresponse bias: Rather than focusing upon the identity of those who have completed the survey, nonresponse bias considers the identity of those who have chosen not to respond. This can become a particularly salient issue when the survey broaches sensitive or potentially embarrassing topics. If a specific segment of a sample chooses not to answer questions, it introduces a bias into the results collected. The significance of that bias increases when (a) the nonresponse is widespread among a segment of the sample and (b) the responses of the segment are theoretically likely to have a large impact on the results. In order to gauge the extent of potential nonresponse bias, a researcher must have a good working knowledge of the characteristics of the sample population.
Types of Response Scales
Many closed-ended questions depend upon a scale to communicate a respondent’s preferences, making them well-suited for gradational measurements of attitude (agreement, satisfaction, etc.). Chief among the scales used for such measurement is the Likert scale, though the semantic differential rating scale and gap analysis scale are both common as well.
- Likert scales: Perhaps the most broadly recognizable scale, the Likert scale is often presented on a continuum of 1 to 5 (or occasionally 1 to 7), and is frequently associated with measures of agreement or satisfaction.
If I were to make the choice all over again, I would choose to attend St. Olaf College. | ||||
---|---|---|---|---|
1 Strongly disagree | 2 Disagree | 3 Neither agree nor disagree | 4 Agree | 5 Strongly agree |
- Semantic differential rating scales: Similar to Likert scales, semantic differential rating scales are also on a scale of 1 to 5 (or again 1 to 7). They differ from Likert scales in that they provide descriptive words only at the top and bottom of the scale, leaving the middle vacant and open to participant interpretation. The descriptive words utilized can be antonyms (such as “adequate/inadequate”), or other words that could be used to classify the phenomenon being investigated.
The library services are: | ||||
---|---|---|---|---|
1 Unhelpful | 2 | 3 | 4 | 5 Helpful |
1 Slow | 2 | 3 | 4 | 5 Fast |
1 For Faculty | 2 | 3 | 4 | 5 For Students |
- Gap analysis scales: Also known as ecosystem rating scales, gap analysis is intended to measure the attainment of goals (or to examine the gap between goals and outcomes, hence “gap analysis”). Using a numerical scale (often 1 to 5), respondents are asked to provide two ranks in response to a statement: one rank that indicates what they had hoped to accomplish (i.e. their goal) and one rank that represents what they ultimately did accomplish (see example below). An advantage to using gap analysis scales is the way in which it allows the researcher to put results into context. By being able to contrast expectations versus results, a researcher can more easily identify a problem.
A college education is intended to develop in students various types of knowledge. Look over the following types of knowledge. On a scale of 1 to 5 (1 being the least and 5 being the greatest), please rank the extent to which you expected your knowledge to be increased, and then the extent to which your knowledge was actually increased over the course of your college experience. | ||
---|---|---|
Expected Increase in Knowledge | Actual Increase in Knowledge | |
Discipline-specific knowledge | ||
Interdisciplinary knowledge | ||
Vocational knowledge |
Assessing Question Validity
The concept of validity is concerned with the extent to which your survey measures what it purports to measure, and is often rephrased as “truthfulness,” or “accuracy.” The concept is analogous to using the right instrument to measure a concept, such as using a scale instead of a ruler to measure weight. Determining the accuracy of a question involves examining both the validity of the question phrasing (the degree to which your question truly and accurately reflects the intended focus) and the validity of the responses the question collects (the degree to which the question accurately captures the true thoughts of the respondent). While perfect question validity is impossible to achieve, there are a number of steps that can be taken to assess and improve the validity of a question.
- Face validity: Collecting actionable information often involves asking questions that are commonplace, such as those querying the respondent about their age, gender, or marital status. In such instances, one means of lending validity to a question is to rely on the collective judgment of other researchers. If the consensus in the field is that a specific phrasing or indicator is achieving the desired results, a question can be said to have face validity.
- Content validity: Related to face validity, content validity also relies upon the consensus of others in the field. It differs from face validity in that content validity relies upon an exhaustive investigation of a concept in order to ensure validity. Nardi (2003, 50) uses the example of “the content of a driving test.” Determining the preparedness of a driver is dependent on the whole of a drivers test, rather than on any one or two individual indicators. In this way, the driving test is only accurate (or valid) when viewed in its entirety.
- Criterion validity: Criterion validity relies upon the ability to compare the performance of a new indicator to an existing or widely accepted indicator. Whereas face validity encourages the adoption of existing indicators, criterion validity uses existing indicators to determine the validity of a newly developed indicator. Criterion validity can be broken down into two subtypes: concurrent and predictive validity.
- Concurrent validity: If the widely accepted indicator is currently (concurrently) available, and the results of the new indicator can be compared against the existing indicator, then concurrent validity can be established. By assuming that both indicators measure the same phenomenon, concurrent validity allows us a means by which to determine whether or not our new indicator measures what we believe it should.
- Predictive validity: If an indicator can be shown to reliability predict a future outcome, it can be said to have predictive validity. This type of validity is restricted to situations where the indicator and the outcome are distinct from each other, while still measuring the same concept. As an example, Neuman (2007, 119) uses the Scholastic Achievement Test (SAT), which purports to measure “the ability of a student to perform in college.” For the SAT to be said to have a high degree of predictive validity, those who score well on the test must also perform well in college. Should the relationship between these two be inconsistent, the SAT would be said to have low predictive validity.
- Construct validity: Similar in some ways to content validity, construct validity relies on the idea that many concepts have numerous ways in which they could be measured. A researcher can choose to utilize several of these indicators, and then combine them into a construct (or index) after the survey is administered. As with content validity, construct validity encourages the use of multiple indicators to increase the accuracy with which a concept is measured.
Assessing Question Reliability
Reliability is concerned with the consistency or dependability of a question, and is analogous to the repeated use of a scale. If a scale is reliable, it will report the same weight for the same item measured successively (assuming the weight of the item has not changed). As is the case with validity, perfect reliability can be difficult, if not impossible, to achieve. Even so, increasing the reliability of a question or survey has important implications for our ability to use the results. To that end, there are several methods that can be used to increase the reliability of a question or survey.
- Test-retest reliability: Identical to the process described in the scale sample above, test-retest reliability relies upon administering the question to a subject multiple times. Assuming no changes in the sample subject, the question/survey/indicator should return consistent results over multiple administrations.
- Parallel form and inter-item reliability: Oftentimes there is more than one way to measure the phenomenon in which we are interested. Both parallel form and inter-item reliability depend upon such duplication to lend support to the consistency of our indicators. With parallel form, the researcher purposefully creates at least two versions of a survey (similar to using two versions of a test in a course). If the same individuals score the same on both versions, the indicators can be said to have parallel form reliability. Alternatively, inter-item reliability retains the duplicate indicators within a single survey instrument. The researcher than examines the results to determine if the two indicators, each phrased differently, are producing similar results. If so, the indicators can be said to have inter-item reliability.
- Split-half reliability: This procedure is meant to be used on indicator constructs (or indexes), where a series of questions are thought to collectively measure a phenomenon. In these cases, a researcher can split the construct in half, and compare the results of the two halves to each other. If the construct halves are in agreement, the construct can be said to have split-half reliability. In addition, a Cornbach’s Alpha test can quantify the internal consistency of the construct.
- Interrater reliability: Applicable to open-ended questions, or the results of interviews or focus groups, interrater reliability is concerned with how those charged with reading the results are interpreting them. To increase the consistency of how responses are categorized, this technique relies upon multiple individuals reviewing the same results. By arriving at consensus decisions with regard to response categorization, interrater reliability increases the confidence that can be placed in the coding of results.
Sources
Nardi, P.M. (2003). Doing survey research: A guide to quantitative methods. Boston, MA: Allyn and Bacon.
Neuman, W. L. (2007). Basics of social research: Qualitative and quantitative approaches (2nd ed.). Boston, MA: Allyn and Bacon.
Nulty, D.D. (2008). The adequacy of response rates to online and paper surveys: What can be done? Assessment & Evaluation in Higher Education, 33(3) 301-314.
Rea, L.M. & Parker, R.A. (2005). Designing and conducting survey research: A comprehensive guide (3rd ed.). San Francisco, CA: Jossy-Bass.
Suskie, L.A. (1996). Questionnaire survey research: What works (2nd ed.). Tallahassee, FL: Association for Institutional Research
Weathington, B.L., Cunningham, C.J.L., & Pittenger, D.J. (2010). Research Methods for the behavioral and social sciences. Hoboken, NJ: Wiley.
Additional Resources
Exploring Reliability and Validity in Academic Assessment (University of Northern Iowa College of Humanities and Fine Arts)
Nonprobability Sampling (Research Methods Knowledge Base)
Nonprobability Sampling (Statistics Canada)
Probability Sampling (Research Methods Knowledge Base)
Probability Sampling (Statistics Canada)
Sample Size Calculator (Raosoft)