Alpha factor analysis extracts factors from data that can be generalized to broader groups of people or situations.

Concurrent validity is a form of criterion validity. Test “A”, for example, demonstrates concurrent validity if, during the same occasion in time, it correlates with Test “B”.

Construct validity means that a test designed to measure a particular construct (i.e. intelligence) is actually measuring that construct.

Convergent validity is a sub-type of construct validity. Convergent validity takes two measures that are supposed to be measuring the same construct and shows that they are related.

Cronbach’s alpha is a measure of internal consistency, or how closely related a set of items are as a group. It is considered to be a measure of scale reliability.

Criterion validity is the extent to which operationalizing a construct, such as a test, relates to, or predicts, a theoretical representation of the construct (i.e., a criterion).

Discriminant validity is a sub-type of construct validity. Discriminant validity shows that two measures that are not supposed to be related are, in fact, unrelated.

An eigenvalue is a measure of how much of the variance, of the observed variables, is explained by a factor. Any factor equal to or greater than 1 explains more variance than a single observed variable.

Face validity is the least sophisticated measure of validity. It simply answers the question as to whether a test appears, at face value, to measure what it claims to. Tests in which the purpose is clear, even to naïve respondents, are said to have high face validity.

Factor Analysis is a statistical technique that reduces a large number of variables into a fewer numbers of factors. Factor Analysis extracts maximum common variance from all variables and puts them into a common score.

Factor loading is the correlation coefficient for the variable and factor. Factor loading shows the variance explained by the variable on that particular factor. In general, a coefficient of 0.7 or higher indicates that the factor extracts sufficient variance from that variable.

Inter-Rater Reliability measures the similarity of data collected by different raters are. If, for instance, raters significantly differ in their observations then either measurements or methodology are not correct and need to be refined.

Intraclass correlation measures the reliability of ratings or measurements for clusters of data, i.e., data that has been collected as groups or sorted into groups.

Logistic regression is used to explain the relationship between a test’s binary score (e.g., yes/no, good/bad) and one or more independent variables.

The multitrait-multimethod matrix is an approach to examining construct validity that organizes convergent and discriminant validity evidence for comparison of how a measure relates to other measures.

Negative predictive value is the probability that subjects with a negative screening test truly don’t have the disease

Norm-referenced data compares and ranks test takers in relation to one another. To that end, norm-referenced scores are generally reported as a percentage or percentile ranking.

Predictive validity is a type of criterion validity. Test “A”, for example, is a valid predictor if it can forecast the outcome of Test “B”, which is utilized at some point later in time.

Positive predictive value is the probability that subjects with a positive screening test truly have the disease.

A p-value is a measure of the probability that an observed difference could have occurred just by random chance. The lower the p-value, the greater the statistical significance of the observed difference.

A receiver operating characteristic curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system under varying thresholds. It was developed during World War II to differentiate signal from noise in radar detection. Hence its name.

Reliability refers to the consistency of measurement across items, time, raters, observers, or some other dimension that could add variability to scores.

Retest reliability measures test consistency or the reliability of a test measured over time. In other words, give the same test twice to the same people at different times to see if the scores are the same.

The split-half method assesses the internal consistency, or the extent to which all parts of the test contribute equally to what is being measured, by comparing the results of one half of a test with the results from the other half.

The sensitivity of a test (also called the true positive rate) is defined as the proportion of people with the disease who will have a positive result. Therefore, a highly sensitive test can be useful for ruling out a disease if a person has a negative result.

Specificity is the percentage of true negatives (e.g. 90% specificity = 90% of people who do not have the target disease will test negative).

T-test analysis measures the size of the difference between two samples of data. Higher t values indicate larger differences between the two sample sets. In other words, a large t-score indicates that the groups are different.

Validity refers to the extent to which measurements truly reflect the underlying constructof interest; that is, does the test or measurement procedure actually measure what it was designedto measure?