In validating a selection test (x) as a predictor of y,
an incomplete xy data set must often be dealt with. A
well-known correction formula is available for estimating
the xy correlation in some total group using the
xy data of the selected cases and x data of the unselected
cases. The formula yields the r[subscript yх] correlation (1)
when the regression of y on x is linear and homoscedastic
and (2) when selection can be assumed to be
based on x alone. Although previous research has considered
the accuracy of the correction formula when
either Condition 1 or 2 is violated, no studies have
considered the most realistic case where both Conditions
1 and 2 are simultaneously violated. In the present
study six real data sets and five simulated selection
models were used to investigate the accuracy of the
correction formula when neither assumption is satisfied.
Each of the data sets violated the linearity and/or
homogeneity assumptions. Further, the selection
models represent cases where selection is not a function
of x alone. The results support two basic conclusions.
First, the correction formula is not robust to violations
in Conditions 1 and 2. Reasonably small
errors occur only for very modest degrees of selection.
Secondly, although biased, the correction formula can
be less biased than the uncorrected correlation for certain
distribution forms. However, for other distribution
forms, the corrected correlation can be less accurate
than the uncorrected correlation. A description of this
latter type of distribution form is given.
Gross, Alan L.; Fleischman, Lynn E..
Restriction of range corrections when both distribution and selection assumptions are violated.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital
Conservancy may be subject to additional license and use
restrictions applied by the depositor.