RELIABILITY
Reliability is the degree to which a test is consistent, dependable, and repeatable. In other words, reliability is the "degree to which test scores are free from errors of measurement." A reliability coefficient is expressed as a number from 0 to 1; the higher the coefficient, the more reliable the test is.
Methods of Assessing Reliability:
1. Test/retest: The same test is given twice with a time interval between tests. The coefficient measured by this is stability. Problems related to this procedure include the effect on memory, the effect of practice and change over time.
2. Alternate form: Equivalent forms of the same test are given with time between tests. This measures both equivalence between the two forms and stability. Development of equivalent test forms may be difficult. Additionally, changes in behavior over time may have an effect on reliability.
3. Internal consistency or split half method: After splitting a single test into two parts, one test is given at a time, and the correlation between the halves is calculated. This measures internal consistency and equivalence. The Spearman-Brown Formula is used to determine if splitting the test has any effect on its reliability. This formula identifies the effect of shortening the length of a test on its reliability. The more homogeneous a test is, the greater is its reliability.
4. Inter-rater reliability: Reliability is determined by the rater’s judgment in essay tests, behavioral observational scales or projective personality assessments. The correlation between two or more raters is discovered. Problems with this method include lack of motivation on the part of the rater or rater bias as well as characteristics of the measuring device itself.
Factors That Affect Reliability:
1. Length of test: Generally the longer the test, the larger the reliability coefficient.
2. Range or variability in scores: The larger the range of scores, the higher the reliability.
3. Guessing: When the probability of guessing correct answers increases, the reliability of the test decreases. Therefore true/false tests generally have lesser reliability than multiple-choice exams, and they, in turn, have lesser reliability than free recall tests when all other factors remain the same.
4. Interpretation of reliability coefficient: A reliability coefficient of .84 indicates that 84% of the variability of the test scores was due to true score differences between exams and the other 16% was due to measurement error. Usually, a reliability coefficient of .80 and above is considered acceptable.