Psychometric properties
What is the reliability/validity of the TOD?
The answer to this question is complicated and requires some context. Reliability and validity evidence occurs over time and from multiple studies and methodologies. It cannot be defined by single scores. Expert recommendations for interpreting reliability and validity estimates are produced by statistical equations.
Reliability estimates are assumed to reflect the percentage of systematic variance in a test (vs. error) and can be calculated using statistical equations. The two most common reliability formulae are Cronbach’s Alpha and Spearman-Brown. Both yield coefficients designed to estimate the magnitude of the linear relationship between two variables, perhaps two versions of the same test. The most common correlational statistics assess relationships between either two interval-level variables (i.e., the Pearson Product Moment Correlation Coefficient) or two ordinal variables (i.e., Spearman’s Rho). Authors of tests and measurement texts (e.g., Sattler, 2018) provide criteria for interpreting these reliability coefficients (after Murphy & Davidshofer, 2005).
Values above .90 are considered high/excellent; .80 to .89, moderately high or good; .70 to .79, moderate or fair; .60 to .69 low or poor; and .00 to .59, very low. Reliability coefficients set the limits on validity; for example, the square root of a reliability coefficient of a test defines the maximum predictive validity of that instrument.
Validity is traditionally defined as the extent to which a test does what it was designed to do and, like reliability, is typically operationalised using correlation coefficients. Understanding the conceptual nature of “validity” is more complicated than reliability. Tests are valid for particular purposes. For example, validity coefficients address the extent to which a test assesses the construct it was created to address.
Thus, construct validity is determined in part by how strong the correlation coefficient is between the test in question, say a newly developed test, and an established instrument that measures similar skills. In addition, validity data can inform prediction, i.e., how well a test predicts some criterion of interest. As an example, intelligence tests are often used to predict academic achievement. Across many studies reported in test manuals and in the general literature, these validity coefficients typically range from about .40 to .70, indicating that the percentage of the variance accounted for in achievement by intelligence ranges from about 16% to 49%.
Reliabilities for the TOD tests and composites are typically good (i.e., greater than .80), as reported in the TOD Manual. Similarly, validity estimates are good also. For example, the TOD-C Dyslexia Diagnostic Index is a strong predictor of the probability of dyslexia.
In summary, data generally support the reliability of the TOD indexes, composites, and tests and the validity of these scores for their intended purposes. Consequently, examiners can have confidence in the TOD scores.