Question Reliability

Reliability is concerned with the consistency or dependability of a question, and is analogous to the repeated use of a scale. If a scale is reliable, it will report the same weight for the same item measured successively (assuming the weight of the item has not changed).

As is the case with validity, perfect reliability can be difficult, if not impossible, to achieve. Even so, increasing the reliability of a question or questionnaire has important implications for our ability to use the results. As with the example of the scale, a questionnaire which is unable to provide consistent results has little to no useful purpose. To that end, there are several methods that can be used to increase the reliability of a question or questionnaire.

Test-retest reliability: Identical to the process described in the scale sample above, test-retest reliability relies upon administering the question to a subject multiple times. Assuming no changes in the sample subject, the question/questionnaire/indicator should return consistent results over multiple administrations.

Parallel form and inter-item reliability: Oftentimes there is more than one way to measure the phenomenon in which we are interested. Both parallel form and inter-item reliability depend upon such duplication to lend support to the consistency of our indicators. With parallel form, the researcher purposefully creates at least two versions of a questionnaire (similar to using two versions of a test in a course). If the same individuals score the same on both versions, the indicators can be said to have parallel form reliability. Alternatively, inter-item reliability retains the duplicate indicators within a single survey instrument. The researcher than examines the results to determine if the two indicators, each phrased differently, are producing similar results. If so, the indicators can be said to have inter-item reliability.

Split-half reliability: This procedure is meant to be used on indicator constructs (or indexes), where a series of questions are thought to collectively measure a phenomenon. In these cases, a researcher can split the construct in half, and compare the results of the two halves to each other. If the construct halves are in agreement, the construct can be said to have split-half reliability. In addition, a Cornbach’s Alpha test can quantify the internal consistency of the construct.

Interrater reliability: Applicable to open-ended questions, or the results of interviews or focus groups, interrater reliability is concerned with how those charged with reading the results are interpreting them. To increase the consistency of how responses are categorized, this technique relies upon multiple individuals reviewing the same results. By arriving at consensus decisions with regard to response categorization, interrater reliability increases the confidence that can be placed in the coding of results.

Other Online Resources

Test Validity and Reliability (AllPsych Online)
Reliability and Validity (University of South Florida – Florida Center for Instructional Technology)
Exploring Reliability in Academic Assessment (University of Northern Iowa College of Humanities and Fine Arts)

Further Reading

Nardi, P.M. (2003). Doing survey research: A guide to quantitative methods. Boston, MA: Allyn and Bacon.
Neuman, W. L. (2007). Basics of social research: Qualitative and quantitative approaches (2nd ed.). Boston, MA: Allyn and Bacon.
Suskie, L.A. (1996). Questionnaire survey research: What works (2nd ed.). Tallahassee, FL: Association for Institutional Research