213
A Global Information Approach to
Computerized Adaptive Testing
Hua-Hua Chang, Educational Testing Service
Zhiliang Ying, Rutgers University
Most item selection in computerized adaptive testing
is based on Fisher information (or item information). At
each stage, an item is selected to maximize the Fisher
information at the currently estimated trait level (&thetas;).
However, this application of Fisher information could be
much less efficient than assumed if the estimators are
not close to the true &thetas;, especially at early stages of an
adaptive test when the test length (number of items) is
too short to provide an accurate estimate for true &thetas;. It is
argued here that selection procedures based on global
information should be used, at least at early stages of a
test when &thetas; estimates are not likely to be close to the
true &thetas;. For this purpose, an item selection procedure
based on average global information is proposed. Re-
sults from pilot simulation studies comparing the usual
maximum item information item selection with the pro-
posed global information approach are reported, indicat-
ing that the new method leads to improvement in terms
of bias and mean squared error reduction under many
circumstances. Index terms: computerized adaptive
testing, Fisher information, global information, infor-
mation surface, item information, item response theory,
Kullback-Leibler information, local information, test in-
formation.
Computerized adaptive testing (CAT) was proposed by Lord (1971), Owen (1975), and Weiss (1976),
among others, to measure the trait levels (0s) of examinees with greater precision than conventional tests by
building an individualized test for each examinee. Test items are selected sequentially, according to the cur-
rent performance of an examinee. The test is tailored to each examinee’s 0 level, thus matching the difficulties
of the items to the examinee being measured. Able examinees can avoid responding to too many easy items,
and less able examinees can avoid being exposed to too many difficult items. The major advantage of CAT is
that it provides more efficient trait estimates with fewer items than that required in conventional tests (e.g.,
Weiss, 1982). Significant progress has been made in the development and implementation of CAT due, in part,
to the rapid advancement of computer technology (Wainer, 1990). However, methodological as well as theo-
retical developments in CAT appear to be rather limited.
A basic ingredient in CAT is the item selection procedure that is used to select items during the course of the
test. For the past two decades, the most commonly used item selection procedure has been based on maximiz-
ing item information. More specifically, an item is selected that has maximum information at the currently
estimated 0 level (b), which is estimated from the available responses at that time. An alternative to the
maximum information approach is the Bayesian method (e.g., Owen, 1975). Instead of using item informa-
tion at 6, the Bayesian approach uses the posterior variance as the criterion for item selection. At the initial
stages, posterior distributions depend heavily on the choice of prior distribution for 0, but the dependency
diminishes at the later stages. Furthermore, according to Chang & Stout (1993), the posterior variance ap-
proaches the reciprocal of the test information when the number of items becomes large.
Item information typically has been defined as Fisher information, which varies from examinee to exam-
inee and therefore is a function of 0. The value of Fisher information at the true 0 level of a particular
examinee, denoted by 0,, indicates the efficiency of the item for estimation of 0. However, its value at a 0 level
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
214
distant from 00 may not be a good indicator of efficiency. Because it uses Fisher information at the current 11
level, the information criterion can be inefficient if 6 is not close to 0.. This may well be the case at early
stages of a CAT when there are only a small number of items (providing little information) to construct a
reliable estimator for 0. Consequently, items selected at an early stage may not be an efficient choice. For this
reason, the issue of selecting &dquo;best&dquo; items at early stages has attracted much attention recently (e.g., Davey &
Parshall, 1995; Fan & Hsu, 1995; Stocking, 1993; van der Linden, 1995; Veerkamp & Berger, 1994).
New methods and suggestions, along with theoretical and empirical studies, have been proposed to over-
come inefficiency due to inaccurate estimation of 0,. In particular, Veerkamp & Berger (1994) proposed an
&dquo;interval information criterion&dquo;: Instead of the item information at a point, their selection procedure is based
on the highest mean value of the information function in a confidence interval [see Chang & Ying ( 1996b) for
a discussion of Veerkamp and Berger’s proposal]. However, Stocking (1993) argued that, in addition to item
information, item selection should incorporate some further criteria, such as conditional and absolute expo-
sure rates, item pool refreshment or test and ordering.
It appears that further progress, if there is to be any, in the fc~~~d~ti&reg;~~l research of CAT could occur in the
area of item selection procedures. The usual large-sample such as consistency and asymptotic poste-
rior normality, have been established for item response theory (IRT) models. Under general regularity condi-
tions, these results ensure that commonly used estimators converge to &reg;&reg;.1t then follows that the item information
criterion described above should be close to optimal at later stages in a CAT when the number of administered
items is already sufficiently large. Note that a major goal of CAT is to more efficiently estimate 0 with fewer
items. Reducing the number of items used in the test thus the quality of item selection at early stages
extremely crucial. Therefore, developing necessary concepts and methods for small-sample selection be-
comes very important.
This paper presents (1) ~ new concept of information-global information and related information func-
ti&reg;res~th~t provides information when the estimator is not close to the true parameter; (2) an item selection
procedure based on average global information; and (3) some results from a pilot simulation study comparing
the standard maximum information approach with the new global information approach.
Fisher. Information in IRT Models
The item response function for the ith item will be denoted by P and its complement by QI = I - P¡, Thus,
an examinee with trait parameter 8 will answer the item correctly with ’p¡(8) and incorrectly with
probability Qi(O). Following Lord (1980), the Fisher item information function is defined as
For a test consisting of items = 1, ..., n, the test i~af~~maty~~9 as a function of 0, is simply the sum of the
individual item information functions:
Fisher information is closely related to maximum likelihood (ML) estimation. If an examinee’s item re-
sponses are denoted by&dquo;&dquo;. , the likelihood function can be written as
Denote the log-likelihood function as
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
215
~B is used to denote the ML estimator
or equivalently,
Recall that 00 is true 0. The response to the ith item, Xi, is a random variable with probability mass function
where xi denotes the value taken by Xa.
The asymptotic variance of ên is the reciprocal of /(n)(80), the test information function (Lehmann, 1983, p.
465). In other words, the Fisher information is inversely proportional to the error of the ML estimator.
When a CAT is administered to an examinee, a series of item selection decisions are made, each of which
depends on the examinee’s responses to the preceding items using the current 6, usually 6. if m items have
been answered. An information-based sequential decision rule is to select the next item so that the informa-
tion at &reg;,~ is maximized. Apparently, the appropriateness of this rule may depend on how close 6. is to 0,. The
item selection procedure typically used is based on 1, which is reasonable when 6. is close to 00. However, if
6. is not close to 80, then I at 6. may not reflect the true information of the item. The deviation of 6. from 00
is likely to be non-negligible when m is small (i.e., at early stages of a CAT). In addition, I is relatively unstable
for commonly used IRT models, including the three-parameter logistic model (3PLM; Chang & Ying, 1996a).
It is also questionable whether a univariate function of 0 alone is sufficient to capture the entire information
content of an item. A more flexible approach may be needed for this problem.
Local Information 
_
If the information around a small region of 80 is viewed as local information, then the information outside
that region can be viewed as global information. In statistical testing theory, there are two kinds of alternatives
to the null hyp&reg;thesis-1&reg;c~l and fixed. For example, if the null hypothesis is Ho: 8 = 0,, then a fixed alterna-
tive could be 1-Ile 8 = &reg;1, and local alternatives, relative to a sample of size m, could be Hge e = 90 + (8/’¡¡;;),
The local alternatives approach the null hypothesis as m increases, whereas the fixed alternative does not. It
is reasonable to expect that local information would be related to the power of detecting local alternatives,
and global information to that for a fixed alternative. With respect to CAT, local information may serve as a
benchmark for item selection when there is sufficient knowledge about the location of 0, and global informa-
tion might be preferred when there is lack of such knowledge.
In practice, 0, is unknown. For an information-based criterion to be useful, the value of information at
every possible 0 has to be specified. When information is defined for every 0, it effectively becomes a func-
tion on the entire parameter space. The local information (function) should then mean that at each 0, its value
measures the amount of information the item contains when the examinee’s true but unknown trait level is 0.
Test Information is Local Information
tn) in IRT represents a local information function. Recall that /(n) for a given item response sequence X,,
..., ~~ can be written as
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
216
Because ¡(n) in IRT is defined as the Fisher information, its meaning and justification may be described by
paraphrasing Lehmann (1983, p. 117):
The function
is the relative rate at which the density changes at xl, ..., xn. The average of the square of this rate is
expressed by Equation 8. It is plausible that the greater this expectation is at a given value 9,, the easier it
is to distinguish 80 from neighboring values 0, and, therefore, the more accurately 8 can be estimated at 0
= 0., The quantity /(n)C8) is called the information or the Fisher information that Xl’ 00&dquo; Xn contains about
parameter 0.
Lehmann (1983) emphasized that ‘6...the surmise turns out to be correct when sample size is large&dquo; (p. 11~).
The asymptotic theory of Le Cam (Le Cam & Yang, 1990) provides quantification in terms of statistical
hypothesis testing for local alternatives.
Suppose a test with ~c items is to be designed to estimate 0,. According to Hambleton & Swaminathan
(1985), 1 &dquo;...can be interpreted as providing per unit discrimination between ability levels&dquo; (p. 102) that are
close together. This implies that for any fixed individual with 00, 1 is the discrimination power between 0, and
any 8, that is close to 00, Thus, for any fixed 00, 1 is the local information that the item contains about 00.
Let ên denote the ML estimator or its asymptotically equivalent variant. It is important that items be se-
lected to make ên as close as possible to 00. As n increases, 6~ approaches 80; in fact, it is asymptotically
normal with mean 80 and variance 1/I~&dquo;~(Oo). The closeness of ên to So is thus governed by ¡(nBSo): the larger
/(n)(80) is, the closer ên is to 0,. Thus, provided n is large, an efficient test may be obtained by making ¡(n)(80)
as large as possible.
However, for small ~as, the estimator may not be close to 0,, in which case the information inside a small
region around 6, would not be useful. The term &dquo;information function&dquo; may be misleading if it is used
without considering its asymptotic properties (Lord, 1~71, p. 10). Thus, global information for the situation
in which ên is not close to 80 is needed.
Global Information
Given an examinee’s responses X~, ..., Xn to the n items in a test, the quantity that summarizes all the
information for the examinee’s 8 is the likelihood function L(8) = L(6; ~,..., Xn) defined by Equation 3. To
distinguish any fixed 0, from 80, examine the difference between values of L at 81 and 0,. Such a difference
can be captured by the ratio of the two values, resulting in the well-known likelihood ratio test (Neyman &
Pearson, 1936). By Neyman-Pearson theory (Lehmann, 1986), the likelihood ratio method is optimal for
testing 0 = 0, versus 8 = 01. In other words, it is the best way to tell 0, from 80 when the IRT model is assumed
for Xl’ ..., Xn observed.
Because the errors associated with the likelihood ratio test decrease to 0 exponentially fast (Serfling, 1980,
§ 10.3.2), it is convenient to take the logarithm of the likelihood ratio. Moreover, according to Lehmann (per-
sonal communication, September 1, 1995), one of the main reasons for taking the logarithm is that the likeli-
hood is a product, but its logarithm is a sum, which is much easier to work with. One of the consequences is the
additivity of information that would not be possible without taking logs. The expected value of the log-likeli-
hood ratio quantifies how powerful (efficient) the statistical test is and is commonly known as the Kullback-
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
217
Leibler (KL) information (Cover & Thomas, 1991; Kullback, 1959). It also measures the discrepancy between
the two probability distributions specified by 80 and &reg;,.
ullb~~l~&reg;Leibl~r Information
Definition 2.1 KL item information. Let 80 be the true parameter. For any 8, the KL information of
the ith item (with response Xt) is defined by
where Eea denotes expectation over Xi and
is the likelihood function for the ith item.
A straightforward probability calculation using Equation 7 shows that the item KL information can be
expressed explicitly as
[The notation of double vertical bars is standard for KL information (Cover & Thomas, 19919 p. 18). The
double bars, which signify that 0 needs to be separated from 0,, are used to avoid confusion with the single
bar, which typically indicates conditioning.]
Note that as a function of 6 and 0~, Kt is not symmetric K¡(81180) :¡é K¡C801Ie)]. Furthermore, ~.(ejjej
> 0 and K¡(solleo) = 0. Mimicking 1(’), the corresponding KL test information can be defined.
Definition 2.2: KL test information. Let 0, be the true parameter. For any 0, the KL information for
a test is defined by
where X19 ..., X~ are the scored responses.
Note again that the expectation is with respect to (XI9 ...9 XJ. From this definition it follows that
I ~ - - n .
Again K ~&dquo;~(&reg; ~) go) > 0, and it is equal if 0 = 00. K is sometimes referred to as the relative entropy or the KL
distance (Cover & Thomas, 1991).
Analogous to ¡(n) defined by Equation 8, an important feature &reg;f ~~%~ is that the contribution of each item
to the total information is additive. Thus, the total amount of information for a test can be readily determined.
This feature is highly desirable in cATs because it enables test developers to separately calculate the informa-
tion for each item and combine them to form updated test information at each stage.
Another important feature is that is a function of two levels, 8 and 0,. K represents the discrimination
power of the item on the two levels. It does not require that 8 be close to 0,. In this sense, K summarizes
information content of the item with respect to a broad spectrum of 0 levels. In contrast, I is a function of 0,,
only and represents discrimination power around 0, (Hambleton & Swaminathan, 1985, p. 102).
KL Information Is Global Information
The purpose of a CAT is to accurately estimate an examinee’s 80 by efficiently selecting items. To this end,
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
218
it is desirable to find a quantity that distinguishes all those 0 # 0&reg; from 00. As argued above, the log-likelihood
ratio is, in a sense, the best quantity constructed from data that can be used to distinguish 0 from 60. K is the
average (expectation) of the log-likelihood ratio. For item as 0 varies over the parameter space, K generates
a global profile about the discrimination power of the item. There is no requirement that 0 be close to &reg;&reg;° In
this sense, K may be viewed as a way to quantify global information.
For each 90, K is a function of 0, and I is a fixed number. This is one of the key distinctions between and
I. If 0, is allowed to vary across the entire scale, F~ becomes a global information surface in a three-dimen-
sional space (~,,v9x), with À corresponding to 0~, b to ê, and K to K; (see Figure I ). Figure 1 shows the KL
information surface intersected with a vertical plane at X = 0, for an item with 3PLM parameters [a (discrimi-
nation), b (difficulty), and c (ps~~d&reg;guessin~)l. The resulting curve on the plane is the KL information func-
tion at &reg;&reg; _ 0. The geometrical meaning ofaKL information function for a fixed 60 is a curve, which represents
the intersection of the vertical plane A< = g and the information surface. From Figure 1, observe that the KL
information function changes its shape as 60 changes its values. No matter how it changes, K is always 0 along
the entire 45° line (6 = 0~) . Note that the curvature at 00 = 11 equals I at 00.
Figure 1
KL Information Surface for an Item With a = 3.0, b = 0.0, c = . I , Intersected With a Vertical Plane X = 0
Use of K~&dquo;> (or K) as a global characteristic is not new. In addition to the above discussion of its use in
testing statistical hypotheses (the likelihood ratio test), statistical estimation is another use. In theoretical
development of ML estimation, two basic properties-consistency and asymptotic normality&horbar;are commonly
investigated. To establish consistency, the behavior of the likelihood function in the entire parameter space is
examined to show that the values not close to the true parameter are not likely to maximize the likelihood
function. This is often accomplished using K(n), To be more specific, it is expected that for 0 distant from 80,
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
219
L(8; X) < L(0~; X) or equivalently l~(0; X) < ~n(&reg;&reg;9 X). Eln(&reg;9 X) < EI/80; ~) is used to rule out those 8 that are
not close to 0,. Having shown the consistency, the likelihood function in a region close to 0, is examined and
asymptotic normality is established, which is connected to I~~~. In this context, K(&dquo;> is used to study the likeli-
hood function on the entire parameter space, whereas ¡(n) is only used locally around the true parameter.
The Relationship Between Local and Global Information
Global information should be used when n is small, and local information should be used when n is large.
Thus, to design a good CAT, both global and local information are needed at different stages of the test. A
practical problem from this consideration is how to determine the cut-off point and whether a smoother
transition is needed. Moreover, it would certainly be desirable if a single measure could be constructed that
mimics global information with small n, and local information with larger n. Thus, a connection between the
local and the global information functions must be established.
Recall that for a person with 6 &reg; 0,, ~~~~(gl~&reg;o) is minimized at 0 = 0~ with minimum value K~&dquo;~(&reg;II&reg;o) =
0. Thus, the derivative of K ~&dquo;’(4II go) at 0 = 80 must be 0:
Through its Taylor series expansion at 00, the local variation of K(n)(elleo) is then characterized primarily by
its second derivative, which, not surprisingly, is IL&dquo;~e More precisely,
For any 0, Ken) represents the ease or difficulty of distinguishing 0 from 0,. In particular, for 0 varying around
&reg;o, it also gives local information, which is connected to 1 ~n>. Equation 16 is simply a mathematical statement
about this. Geometrically speaking, if K is viewed as a curve on the plane, I becomes the curvature of the
curve at 8 = 9,. Note that both Equations 15 and 16 hold with K ~&dquo;~ replaced by and ¡en) by I.
Figure 2 plots KL information functions for five items with 00 1. For each function, the curvature at 1 is
equal to the value of I at 1. All the well-known influences, such as guessing and discrimination (Lord &
Novick, 1968, pp. 460-464), on I have corresponding effects on the curvature of the KL functions. Note that
in terms of I, Item 5 provides more information than Item 4; however, this is not the case for K, which shows
that their relationship is more complex.
Figure 3 plots both K and ¡ for two items at 0, = 0. Although I for Item 1 is greater than that for Item 2
around 0, = 0, it appears that Item 2 might be a better choice based on K, which shows that Item 2 is more
&dquo;robust&dquo; and has more overall power when considering the entire parameter range.
Finally, from Equation 16, I can be fully recovered fr&reg;m ~ by taking derivatives. In other words, if the
profile &reg;f K is known, then I is known exactly. However, K cannot not be recovered from L In this sense, it can
be said that test or item KL information is more informative than conventional test or item Fisher information.
However, is also more complicated and, therefore, not directly applicable for obtaining a selection proce-
dure for CAT. The main complication arises from the fact that, even with a given 0, l~ is a function on the
parameter space whereas I produces a single number. Replacing 9, with the current estimator, the item infor-
mation readily becomes an index. Hence, the next logical step is to use the KL information function to con-
struct a summary quantity as an index.
New Item Selection Procedures for CAT
Information Index
A simple way to construct a single index from K is by taking the average over an appropriate interval of
11. An average KL information index can be defined as
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
220
Figure 2
KL Information Functions of Five Items at 6, = 1
Here on determines the size of the interval over which the average is computed.
The index given by Equation 17 is the area under the KL function from ê n - on to 6 ~ + On. The effect of
the curvature at ê is clear. For small 5,, it is essentially determined by the curvature of Klell ê) at ê n’ It
follows that the maximum area is equivalent to the maximum curvature and hence the maximum value
of I. The effect of the tails is also clear. For large on’ the area is also very much influenced by the tails of
1~,(&reg;II&reg;n). In this respect, selection of an item based on the maximum area defined in Equation 17 reflects
the idea of the global information approach.
An example showing the difference between the two item selection procedures at early stages is provided
in Figure 3. Suppose both methods start with the same estimator, say &reg; = 0. Then, according to Figure 3b, the
Fisher information method will clearly select Item 1 as the next item, because its information is larger at 0.
However, the KL method (Figure 3a) will likely select Item 2 because the area under the KL information
function of Item 2 becomes larger (if 6 is not too small). For I, note that in this example both items reach their
maximum (&dquo;informax&dquo;) at 0 (-.05 for Item 2, actually). Without assuming informax, more complicated sce-
narios may arise (see Figure 2). Further research to gain insight into the general cases is certainly of interest.
Implementation of the average KL information index requires specifying 8,. The preceding discussion
indicates that in order to make efficient use &reg;f ~~, information in the context of CAT, it is reasonable to require
that on decrease to 0 as n approaches -, To determine how fast the on should go to 0, recall that one of the
concerns with Fisher item information is that 6 may deviate substantially from 00. In selecting on’ it is ex-
pected that the resulting interval (ên - §~, ên + 8,,) will contain 00. It follows from general asymptotic theory for
ML estimators that 6~ is asymptotically normal with mean 00 and variance 1/~~°~(&reg;o). This entails that confi-
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
221
Figure 3
Information Functions for Two Items (Item 1: a = 2.0, b = -.1, c = .l; Item 2: a = 1.5, b = 0.0, c = 0.0)
dence intervals for 0, should be of the type
, . - - . - .
.II
with constant c selected according to a specified coverage probability. Because ¡en) is of order n, it is con-
cluded that a reasonable class for õn is
Note that the integration in Equation 17 is with respect to the Lebesgue measure (Billingsley, 1986) on
(ên - 6~, 6~ + 6,,). The density function (up to a normalizing constant) is uniform; that is,
The Lebesgue measure was selected for convenience; other measures may also be considered.
In general, let Jln be any probability measure on the parameter space. The associated KL index is defined as
This index includes Equation 17 as a special case, with pn taken to be the Lebesgue measure inside the
interval (6~ - 6,, ên + 8,,) and 0 the measure outside the interval.
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
222
Bayesian Information Index
If a Bayesian approach is followed, then a Bayesian information index analogous to Equation 17 may be
formed. Let X, = (Xl’ ’00’ X,,). Denote p (0 ) I Xn) as the posterior density of the parameter, which will be denoted
by 0 (capitalized to indicate that it is considered as a random variable here). Define the Bayesian index for the
ith item by
where the integration is over the 0 range. In practice, it is not easy to evaluate l~,$(&reg;n)9 due to the fact that it is
usually prohibitively difficult to compute the posterior density, especially when n is not small. This is even
more problematic in this situation because the CAT must update pee I Xn) in real time. One way to overcome
the difficulty is by approximating the posterior density. According to Chang & Stout (1993),
as n approaches 00, where 6/ = 1/~~(~), and CP(<B» is the standard normal distribution (density) function.
Consequently, p(0 ) X~) is approximated by <~[(e - ~)/o,,], and an approximation to the ~~yesiar~ index
I~~~{&reg;n) can be written as
Simulation Studies
Two simulation studies were conducted to compare the global information method with the Fisher item
information method. All data were generated from the 3PLM. In Study 1, the values of the item parameters
were simulated from prespecified uniform distributions; in Study 2, these values were taken from a calibra-
tion of 254 items from the 1992 National Assessment of Educational Progress (NAEP) reading assessment
(Johnson & Carlson, 1994).
Study 1
Item pool structure. There were 800 items in the pool. The values of the item parameters were generated
from uniform distributions LT(.5, 2.5), D(-3,6, 3.6), and U(0.0, .25) for ai’ b¡, and ct, respectively. These distri-
butions cover wide ranges of reasonable item parameters.
Test length and termination rule. Maximum test length was set at 14 items for all cases; thus, each test
was terminated after the 14th item was administered. The relatively short test length was selected because
interest was mainly in the performance of the item selection procedure during the early stage of CATS.
Simulation procedure, Eight different values of 80 were used in the simulation: 80 = -3.0, -2.0, -1.5,
-1.0, 0.0, 1.0, 2.0, and 3.0. 1,000 replications were used. The resulting ML estimators of 00 were denoted by
0,~ and A i,Klwhere subscript i indicates that the ML estimator was calculated from (;Cj, ...,~.), and the subscripts
F and K indicate use of the Fisher information criterion and the KL information criterion, respectively,.
Iraatic~liz~ti&reg;n. For both methods, the initial item was selected with parameters (a,, b,, cl) = (c~&reg;, bo, c.)- If
the outcome of the first item, X,, was 1, then the next ko items were selected with increasing difficulty param-
eters (bl <) b2 <_ ... < b,~ (b¡+I = b;.~2, i < ko), ~rhcrc ko = min ( ii e X, = 0 was the first time a 0 occurred. If the first
response was a 0, then (bl 2) bI 2 ... ~ b~ (bi+l = b; &reg; 2, i <_ ko) was selected, where ko was the first time a 1
occurred. In Study 1, ~o = l, bo = -6, and co &reg; .2. The as and cs remained unchanged during the init~alizati&reg;n.
Note that instead &reg;f bo = 0, the starting value for the b parameter was bo = -6, because 0, = 0 was included in
the simulation study. As a result, the CAT started with a very easy item for all eight conditions.
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
223
0 Estimation. As soon as the score sequence contained both a 0 and 1, the ML estimator of 0 was
calculated, For both Fisher information-based and KL information-based selection methods, examinees’ Os
were estimated recursively using ML estimation (see Equations 5 or 6). More specifically, for each i, if the
components in (xl’ ’00’ x~) were not all the same, the ~., estimator 6, was calculated according to a numerical
algorithm that mimicked a subroutine in the LOGIST program (Wingersky, Barton, & Lord, 1982).
The ML estimation algorithm used standard Newton-Raphson iterations (Cheney & Kincaid, 19~5). When
the 3PLM is used, it may have multiple roots (Samejima, 1973). Thus, instability may be encountered at early
stages of the estimation. However, no multiple-roots searching technology was used here. Further discus-
sions concerning improving the stability of ML estimation calculation can be found in Chang & Ying (l996a),
Item selection. Given a ML estimate of 0, for the Fisher information-based method the (i + 1)th item was
selected such that 
~1+~(~i) had the maximum value among all items in the pool; for the KL information-based
method, the (i+ 1)th item was selected such that K~+1(~a) had the maximum value. Each time an item was used,
it was then deleted from the item pool. For Study 1, 8, = 3/(nl2, in accordance with Equation 18, and c = 3.
Evaluation criteria, Ai,F and gi,K were calculated and their bias functions were calculated by
and
The mean squared errors (MSES) also were calculated for every ~9 ~ &reg; 59 ...~ 14. The MSEs were defined by
and
Note that if the test length is short, ML estimation might not provide a solution, For both MSE and bias, only
those 6s with > 5 were used.
Results. Figures 4 and 5 summarize the simulation results. Under several of the eight simulation condi-
tions, both average bias (Figure 4) and MSE (Figure 5) were uniformly smaller for item selection using KL
information than using Fisher information. For example, in three of the eight cases in Figure 4 the KL method
resulted in substantial bias reduction (00 = -3, -2, &reg;i.5), while in the remaining cases the performance of KL
was either slightly better or similar to that of Fisher. Improvements in terms of MSBs was either more signifi-
cant or similar, as shown in Figure 5.
Study 2
The methods used in Study 2 were essentially the same as those in Study l. The differences were (1) the 00
range was -2.0, -1.0, 1.0, and 2.0; (2) the starting value for the b parameter was bo = 0; (3) the test length was
set to n = 40; and (4) the item parameters were taken from the Reading Assessment of the 1992 NAEP main
assessment sample (Johnson & Carlson, 1994). For the 254 items, 122 had parameter estimates from the two-
parameter logistic model, and 132 had parameter estimates from the 3PLM. These parameters were not uni-
formly distributed, as can be seen from the histograms of the parameter distributions (Figure 6).
Figures 7 and 8 summarize the results of Study 2. In two of the four cases summarized in Figure 7, KL gave
better bias reduction (00 = -2, -1). There was essentially no difference between the two methods for the
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
224
>1
&dquo;a
g
èi5
~ ~
-q- p
6 ~U fJ=4
m m :3 cA ID ’A
h -a
CQ
c)
010
~
Q)
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
225
>1
&dquo;a
M
E0tn &dquo;,
k8
a h
.N~M
~T4 -,=
2
m
3
0&dquo;
V)
cls
C)
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
226
Figure 6
Histograms of the Item Parameter Distributions in Study 2
remaining two cases. Figure 8 indicates that the reduction of ~ts~s was significant only for one of the four
cases (00 = -2). For the remaining three cases, the reduction was pronounced only when was small (i.e., in
the early stages of a CAT). For n > 30, there was essentially no difference. This may be expected because, for
large n, KL should be equivalent to that of Fisher.
Discussion
The results of the pilot simulation studies indicated that the proposed global information index is a prom-
ising alternative to the Fisher information-based item selection methods. The performance of the KL approach
was slightly better than that of Fisher. In many cases, the KL approach outperformed the Fisher method in
terms of bias reduction and smaller mean squared errors. The improvements were rather noticeable at early
stages of the simulated tests.
Many issues remain to be investigated. Both the global information and Fisher information selection
procedures lack the&reg;retical justification. The main difficulty results from the dependent structure arising from
sequential item selection. Recently, however, Chang & Ying (1996a, 1996b) demonstrated that recursively
calculated ML estimators are consistent and asymptotically normal under suitable regularity conditions,.
To explore the full capacity ofKL information, more extensive simulation studies are needed. The choice for
the bandwidth on deserves special attention. Moreover, because I~ is not symmetric about 80, it is reasonable to
consider nonsymmetric averaging in Equation 17 (i.e., integrate from ên - Sn to 6~ + 8&dquo;,2 with &eth;n,l :;t. ~n~2)~
Both local and global information can be projected together into a three-dimensional space. Because the
true parameter 80 is unknown, it may be more informative to consider KL information as a function of two
variables. In other words, consider K( ê11(0) as a function of both 6 and 80, This effectively creates a surface
in three-dimensional Euclidean space, where the third axis is the value of K(êlleo)’ The geometry of this is
as follows: KL information functions for different 8 levels are the slices of the information surface. For
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
227
Figure 7
Average Biases From Study 2
example, functions K(oI180) and K(o)) 0§) with 00 and O~ fixed are the KL information functions for 0~ and 8~. 9
respectively. Note that the curvature at 00 = 6 of the function intersected by the surface and the vertical
plane À = 0, is Fisher information at 80 (see Figure 1). In this connection, another new index can be defined
that represents the volume under the information surface:
where 11 is a quantity similar to on’ but may be independent of on’ Note that the uniform density can be
replaced by a general measure.
Finally, this conceptualization of global information may change the traditional view of low discriminating
items,. Figure 9 indicates that if there is little knowledge about the location of On, then an item with a low a
parameter (Figure 9b) may be a better choice for the examinee than an item with a high a parameter (Figure 9a).
Note that for any 00 # &reg;, the item in Figure 9b tends to contain a certain amount of global information and, thus,
is more robust. However, the item in Figure 9a has adequate information content only in part of the region, It
delivers almost no information for approximately 50% of the entire region. However, if the specific range
around 00 is known, say around 0, then it will be more efficient to select the item in Figure 9a for the examinee,
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
228
Figure 8
Mean Squared Errors From Study 2
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 
229
References
Billingsley, P. (1986). Probability and measure. New
York: Wiley.
Chang, H.-H., & Stout, W. F. (1993). The asymptotic
posterior normality of the latent trait in an IRT model.
Psychometrika, 58, 37-52.
Chang, H.-H., & Ying, Z. (1996a). Nonlinear sequential
designs for logistic item response theory models with
applications to computerized adaptive tests, Manu-
script submitted for publication.
Chang, H.-H., & Ying, Z. (1996b, June). Building a sta-
tistical foundation for computerized adaptive testing.
Paper presented at the annual meeting of the Psycho-
metric Society, Banff, Alberta, Canada.
Cheney, W., & Kincaid, D. (1985). Numerical mathemat-
ics and computing. Monterey CA: Brooks/Cole.
Cover, T. M., & Thomas, J. A. (1991). Elements of in-
formation theory. New York: Wiley.
Davey, T., & Parshall, C. G. (1995, April). New algo-
rithms for item selection and exposure control with
computerized adaptive testing. Paper presented at the
annual meeting of the American Educational Research
Association, San Francisco.
Fan, M., & Hsu, Y. (1995, June). The effect of ability esti-
mation for polytomous CAT in different item selection
procedures. Paper presented at the annual meeting of
the Psychometric Society, Minneapolis MN.
Hambleton, R. K., & Swaminathan, H. (1985). Item re-
sponse theory: Principles and applications. Boston:
Kluwer Nijhoff.
Johnson, E. G., & Carlson, J. E. (1994). The NAEP 1992
technical report. Washington DC: National Center of
Education Statistics.
Kullback, S. (1959). Information theory and statistics.
New York: Wiley.
Le Cam, L., & Yang, G. L. (1990). Asymptotics in statis-
tics : Some basic concepts. New York: Springer-
Verlag.
Lehmann, E. L. (1983). Theory of point estimation. New
York: Wiley.
Lehmann, E. L. (1986). Testing statistical hypotheses.
New York: Wiley.
Lord, M. F. (1971). Robbins-Monro procedures for tai-
lored testing. Educational and Psychological Mea-
surement, 31, 3-31.
Lord, M. F. (1980). Applications of item response theory
to practical testing problems. Hillsdale NJ: Erlbaum.
Lord, F. M., & Novick, M. R. (1968). Statistical theories
of mental test scores. Reading MA: Addison Wesley.
Neyman, J., & Pearson, E. S. (1936). Contributions to
the theory of testing statistical hypotheses. I. Unbi-
ased critical regions of type A and type A1. Statisti-
cal Research Memorandum, 1, 1-37.
Owen, R. J. (1975). A Bayesian sequential procedure
for quantal response in the context of adaptive men-
tal testing. Journal of the American Statistical Asso-
ciation, 70, 351-356.
Samejima, F. (1973). A comment on Birnbaum’s three-
parameter logistic model in the latent trait theory.
Psychometrika, 38, 221-233.
Serfling, R. J. (1980). Approximation theorems of math-
ematical statistics. New York: Wiley.
Stocking, M. L. (1993, February). Modem computerized
adaptive testing. Paper presented at the Joint Statis-
tics and Psychometrics Seminar, Princeton NJ.
van der Linden, W. J. (1995, June). Bayesian item selec-
tion in adaptive testing. Paper presented at the annual
meeting of the Psychometric Society, Minneapolis MN.
Veerkamp, W. J., & Berger, M. P. F. (1994). Some new
item selection criteria for adaptive testing (Research
Rep. 94-6). Enschede, The Netherlands: University
of Twente, Department of Educational Measurement
and Data Analysis.
Wainer, H. (1990). Computerized adaptive testing: A
primer. Hillsdale NJ: Erlbaum.
Weiss, D. J. (1976). Adaptive testing research in Minne-
sota : Overview, recent results, and future directions. In
C. L. Clark (Ed.), Proceedings of the first conference
on computerized adaptive testing (pp. 24-35). Wash-
ington DC: United States Civil Service Commission.
Weiss, D. J. (1982). Improving measurement quality and
efficiency with adaptive testing. Applied Psychologi-
cal Measurement, 6, 473-492.
Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982).
LOGIST user’s guide [Computer program manual].
Princeton NJ: Educational Testing Service.
Acknowledgements
This research was partially supported by Educational Test-
ing Service Allocation Project No. 7942 7, and the National
Assessment of Educationl Progress (Grant No.
R999J40001 and CFDA No. 84.999J) as administered by
the Office of Educational Research and Improvement, U. S.
Department of Education, by the National Science Foun-
dation, and by the National Secaar°ity Agency. The authors
thank Erich Lehmann, Barbara Dodd, Bert Green, Xuming
He, Frank Jenkins, Charles Lewis, Spence Swinton,
Howard Wainter, Bo Wang, and Ming-Mei Wang for many
helpful comments and discussions. They pcarticulcarly thank
the Editor and two anonymous reviewers for their sugges-
tions, which led to numerous improvements.
Author’s Address
Send requests for reprints or further information to Hua-
Hua Chang, Mail Stop 02-T, Educational Testing Ser-
vice, Princeton NJ 08541, U.S.A., or to Zhiliang Ying,
Department of Statistics, Rutgers University, Hill Cen-
ter, Busch Campus, New Brunswick NJ 08903, U. S. A.
Email: hchang@ets.org. or zying@stat.rutgers.edu.
Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227.  
May be reproduced with no cost by students and faculty for academic use.  Non-academic reproduction  
requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/