Applied Psychological Measurement, Volume 19, 1995No Descriptionhttps://hdl.handle.net/11299/1148352024-08-11T11:45:31Z2024-08-11T11:45:31Z271The distribution of person fit using true and estimated person parametersNering, Michael L.https://hdl.handle.net/11299/1200112016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: The distribution of person fit using true and estimated person parameters
dc.contributor.author: Nering, Michael L.
dc.description.abstract: A variety of methods have been developed to determine
the extent to which a person’s response vector fits
an item response theory model. These person-fit methods
are statistical methods that allow researchers to
identify nonfitting response vectors. The most promising method has been the lz statistic, which is a standardized
person-fit index. Reise & Due (1991) concluded that under
the null condition (i.e., when data were simulated to
fit the model) lz performed reasonably well. The present
study extended the findings of past researchers (e.g.,
Drasgow, Levine, & McLaughlin, 1987; Molenaar &
Hoijtink, 1990; Reise and Due). Results show that lz
may not perform as expected when estimated person parameters
(θˆ) are used rather than true θ. This study also
examined the influence of the pseudo-guessing parameter,
the method used to identify nonfitting response
vectors, and the method used to estimate θ. When θ was
better estimated, lz, was more normally distributed, and the false positive rate for a single cut score did not characterize the distribution of lz. Changing the c parameter
from .20 to 0.0 did not improve the normality of the lz.
distribution. Index terms: appropriateness measurement,
Bayesian estimation, item response theory, maximum
likelihood estimation, person fit.
1995-01-01T00:00:00ZPairwise parameter estimation in Rasch modelsZwinderman, Aeilko H.https://hdl.handle.net/11299/1183982016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: Pairwise parameter estimation in Rasch models
dc.contributor.author: Zwinderman, Aeilko H.
dc.description.abstract: Rasch model item parameters can be estimated
consistently with a pseudo-likelihood method based
on comparing responses to pairs of items irrespective
of other items. The pseudo-likelihood method is
comparable to Fischer’s (1974) Minchi method. A
simulation study found that the pseudo-likelihood
estimates and their (estimated) standard errors were
comparable to conditional and marginal maximum
likelihood estimates. The method is extended to
estimate parameters of the linear logistic test model
allowing the design matrix to vary between persons.
Index terms: item parameter estimation, linear logistic
test model, Minchi estimation, pseudo-likelihood,
Rasch model.
1995-01-01T00:00:00ZAnalyzing homogeneity and heterogeneity of change using Rasch and latent class models: A comparative and integrative approachMeiser, ThorstenHein-Eggers, MonikaRompe, PamelaRudinger, Georghttps://hdl.handle.net/11299/1183992016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: Analyzing homogeneity and heterogeneity of change using Rasch and latent class models: A comparative and integrative approach
dc.contributor.author: Meiser, Thorsten; Hein-Eggers, Monika; Rompe, Pamela; Rudinger, Georg
dc.description.abstract: The application of unidimensional Rasch models
to longitudinal data assumes homogeneity of change
over persons. Using latent class models, several
classes with qualitatively distinct patterns of development
can be taken into account; thus, heterogeneity of
change is assumed. The mixed Rasch model integrates
both the Rasch and the latent class approach by
dividing the population of persons into classes that
conform to Rasch models with class-specific parameters.
Thus, qualitatively different patterns of change
can be modeled with the homogeneity assumption retained
within each class, but not between classes. In
contrast to the usual latent class approach, the mixed
Rasch model includes a quantitative differentiation
among persons in the same class. Thus, quantitative
differences in the level of the latent attribute are disentangled
from the qualitative shape of development.
A theoretical comparison of the formal approaches is
presented here, as well as an application to empirical
longitudinal data. In the context of personality development
in childhood and early adolescence, the
existence of different developmental trajectories is
demonstrated for two aspects of personality. Relations
between the latent trajectories and discrete
exogenous variables are investigated. Index terms:
latent class analysis, latent structure analysis, measurement
of change, mixture distribution models,
Rasch model, rating scale model.
1995-01-01T00:00:00ZIRT-based internal measures of differential functioning of items and testsRaju, Nambury S.Van der Linden, Wim J.Fleer, Paul F.https://hdl.handle.net/11299/1183972016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: IRT-based internal measures of differential functioning of items and tests
dc.contributor.author: Raju, Nambury S.; Van der Linden, Wim J.; Fleer, Paul F.
dc.description.abstract: Internal measures of differential functioning of items
and tests (DFIT) based on item response theory (IRT) are
proposed. Within the DFIT context, the new differential
test functioning (DTF) index leads to two new measures
of differential item functioning (DIF) with the following
properties: (1) The compensatory DIF (CDIF) indexes for
all items in a test sum to the DTF index for that test and,
unlike current DIF procedures, the CDIF index for an item
does not assume that the other items in the test are unbiased
; (2) the noncompensatory DIF (NCDIF) index, which
assumes that the other items in the test are unbiased, is
comparable to some of the IRT-based DIP indexes; and
(3) CDIF and NCDIF, as well as DTF, are equally valid for
polytomous and multidimensional IRT models. Monte
carlo study results, comparing these indexes with Lord’s
X² test, the signed area measure, and the unsigned area
measure, demonstrate that the DFIT framework is accurate
in assessing DTF, CDIF, and NCDIF. Index Terms:
area measures of DIF, compensatory DIF, differential
functioning of items and tests (DFIT), differential item
functioning, differential test functioning, Lord’s X²;
noncompensatory DIF, nonuniform DIF, uniform DIF.
1995-01-01T00:00:00ZSelection of unidimensional scales from a multidimensional item bank in the polytomous Mokken IRT modelHemker, Bas T.Sijtsma, KlaasMolenaar, Ivo W.https://hdl.handle.net/11299/1183962016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: Selection of unidimensional scales from a multidimensional item bank in the polytomous Mokken IRT model
dc.contributor.author: Hemker, Bas T.; Sijtsma, Klaas; Molenaar, Ivo W.
dc.description.abstract: An automated item selection procedure for selecting
unidimensional scales of polytomous items from multidimensional
datasets is developed for use in the context
of the Mokken item response theory model of monotone
homogeneity (Mokken & Lewis, 1982). The selection
procedure is directly based on the selection procedure
proposed by Mokken (1971, p. 187) and relies heavily
on the scalability coefficient H (Loevinger, 1948;
Molenaar, 1991). New theoretical results relating the
latent model structure to H are provided. The item selection
procedure requires selection of a lower bound
for H. A simulation study determined ranges of H for
which the unidimensional item sets were retrieved from
multidimensional datasets. If multidimensionality is
suspected in an empirical dataset, well-chosen lower
bound values can be used effectively to detect the unidimensional
scales. Index terms: item response theory,
Mokken model, multidimensional item banks, nonparametric
item response models, scalability coefficient H,
test construction, unidimensional scales.
1995-01-01T00:00:00ZReliability estimation for single dichotomous items based on Mokken's IRT modelMeijer, Rob R.Sijtsma, KlaasMolenaar, Ivo W.https://hdl.handle.net/11299/1183852016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: Reliability estimation for single dichotomous items based on Mokken's IRT model
dc.contributor.author: Meijer, Rob R.; Sijtsma, Klaas; Molenaar, Ivo W.
dc.description.abstract: Item reliability is of special interest for Mokken’s
nonparametric item response theory, and is useful for
the evaluation of item quality in nonparametric test
construction research. It is also of interest for nonparametric
person-fit analysis. Three methods for the
estimation of the reliability of single dichotomous
items are discussed. All methods are based on the
assumptions of nondecreasing and nonintersecting item
response functions. Based on analytical and monte
carlo studies, it is concluded that one method is superior
to the other two, because it has a smaller bias and
a smaller sampling variance. This method also demonstrated
some robustness under violation of the condition
of nonintersecting item response functions.
Index terms: item reliability, item response theory,
Mokken model, nonparametric item response models,
test construction.
1995-01-01T00:00:00ZAnalysis of differential item functioning in translated assessment instrumentsBudgell, Glen R.Raju, Nambury S.Quartetti, Douglas A.https://hdl.handle.net/11299/1183842016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: Analysis of differential item functioning in translated assessment instruments
dc.contributor.author: Budgell, Glen R.; Raju, Nambury S.; Quartetti, Douglas A.
dc.description.abstract: The usefulness of three IRT-based methods and the
Mantel-Haenszel technique in evaluating the measurement
equivalence of translated assessment instruments
was investigated. A 15-item numerical test and an 18-item reasoning test that were originally developed in
English and then translated to French were used. The
analyses were based on four groups, each containing
1,000 examinees. Two groups of English-speaking examinees
were administered the English version of the
tests; the other two were French-speaking examinees
who were administered the French version of the tests.
The percent of items identified with significant differential
item functioning (DIF) in this study was similar
to findings in previous large-sample studies. The four
DIF methods showed substantial consistency in identifying
items with significant DIF when replicated. Suggestions
for future research are provided. Index
terms: area measures, differential item functioning,
item response theory, language translations, Lord’s X²,
Mantel-Haenszel procedure.
1995-01-01T00:00:00ZThe Rasch Poisson counts model for incomplete data: An application of the EM algorithmJansen, Margo G. H.https://hdl.handle.net/11299/1183832016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: The Rasch Poisson counts model for incomplete data: An application of the EM algorithm
dc.contributor.author: Jansen, Margo G. H.
dc.description.abstract: Rasch’s Poisson counts model is a latent trait model
for the situation in which K tests are administered to
N examinees and the test score is a count [e.g., the
repeated occurrence of some event, such as the number
of items completed or the number of items answered
(in)correctly]. The Rasch Poisson counts model assumes
that the test scores are Poisson distributed random
variables. In the approach presented here, the Poisson
parameter is assumed to be a product of a fixed test
difficulty and a gamma-distributed random examinee
latent trait parameter. From these assumptions, marginal
maximum likelihood estimators can be derived for the
test difficulties and the parameters of the prior gamma
distribution. For the examinee parameters, there are a
number of options. The model can be applied in a situation
in which observations result from an incomplete
design. When examinees are assigned to different subsets
of tests using background information, this information
must be taken into account when using marginal
maximum likelihood estimation. If the focus is on test
calibration and there is no interest in the characteristics
of the latent traits in relation to the background information,
conditional maximum likelihood methods may be
preferred because they are easier to implement and are
justified for incomplete data for test parameter estimation.
Index terms: EM algorithm, incomplete designs,
latent trait models, marginal maximum likelihood estimation,
Rasch Poisson counts model.
1995-01-01T00:00:00ZHyperbolic cosine latent trait models for unfolding direct responses and pairwise preferencesAndrich, Davidhttps://hdl.handle.net/11299/1183822016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: Hyperbolic cosine latent trait models for unfolding direct responses and pairwise preferences
dc.contributor.author: Andrich, David
dc.description.abstract: The hyperbolic cosine unfolding model for direct
responses of persons to individual stimuli is elaborated
in three ways. First, the parameter of the stimulus,
which reflects a region within which people
located there are more likely to respond positively
than negatively, is shown to be a property of the data
and not arbitrary as first supposed. Second, the model
is used to construct a related model for pairwise
preferences. This model, for which joint maximum
likelihood estimates are derived, satisfies strong stochastic
transitivity. Third, the role of substantive
theory in evaluating the fit between the data and the
models, in which unique solutions for the estimates
are not guaranteed, is explored by analyzing responses
of one group of persons to a single set of stimuli
obtained both as direct responses and pairwise
preferences. Index terms: direct responses, hyberbolic
cosine model, item response theory, latent trait
models, pair comparisons, pairwise preferences, unfolding
models.
1995-01-01T00:00:00ZTesting the equality of scale values and discriminal dispersions in paired comparisonsDavison, Mark L.McGuire, Dennis P.Chen, Tsuey-HwaAnderson, Ronald O.https://hdl.handle.net/11299/1182822016-05-18T18:48:46Z1995-01-01T00:00:00Zdc.title: Testing the equality of scale values and discriminal dispersions in paired comparisons
dc.contributor.author: Davison, Mark L.; McGuire, Dennis P.; Chen, Tsuey-Hwa; Anderson, Ronald O.
dc.description.abstract: General normal ogive and logistic multiple-group
models for paired comparisons data are described. In
these models, scale value and discriminal dispersion
parameters are allowed to vary across stimuli and respondent
populations. Submodels can be fit to choice
proportions by nonlinearly regressing sample estimates
of choice proportions onto a complex design matrix. By
fitting various submodels and by appropriate coding of
parameter effects, selected hypotheses about the equality
of scale value and dispersion parameters across groups
can be tested. Model fitting and hypothesis testing are
illustrated using health care coverage data collected in
two age groups. Index terms: Bradley- Terry-Luce
Model, choice models, logistic regression, paired comparisons,
probit regression, Thurstone’s Law of Comparative,General normal ogive and logistic multiple-group
models for paired comparisons data are described. In
these models, scale value and discriminal dispersion
parameters are allowed to vary across stimuli and respondent
populations. Submodels can be fit to choice
proportions by nonlinearly regressing sample estimates
of choice proportions onto a complex design matrix. By
fitting various submodels and by appropriate coding of
parameter effects, selected hypotheses about the equality
of scale value and dispersion parameters across groups
can be tested. Model fitting and hypothesis testing are
illustrated using health care coverage data collected in
two age groups. Index terms: Bradley- Terry-Luce
Model, choice models, logistic regression, paired comparisons,
probit regression, Thurstone’s Law of ComparativeGeneral normal ogive and logistic multiple-group
models for paired comparisons data are described. In
these models, scale value and discriminal dispersion
parameters are allowed to vary across stimuli and respondent
populations. Submodels can be fit to choice
proportions by nonlinearly regressing sample estimates
of choice proportions onto a complex design matrix. By
fitting various submodels and by appropriate coding of
parameter effects, selected hypotheses about the equality
of scale value and dispersion parameters across groups
can be tested. Model fitting and hypothesis testing are
illustrated using health care coverage data collected in
two age groups. Index terms: Bradley-Terry-Luce
Model, choice models, logistic regression, paired comparisons,
probit regression, Thurstone’s Law of Comparative Judgment.
1995-01-01T00:00:00Z