Item Analysis
Item Analysis allows us to observe the characteristics of a particular question (item) and can be used to ensure that questions are of an appropriate standard and select items for test inclusion.

Item Analysis describes the statistical analyses which allow measurement of the effectiveness of individual test items. An understanding of the factors which govern effectiveness (and a means of measuring them) can enable us to create more effective test questions and also regulate and standardise existing tests.

There are three main types of Item Analysis: Item Response Theory, Rasch Measurement and Classical Test Theory. Although Classical Test Theory and Rasch Measurement will be discussed, this document will concentrate primarily on Item Response Theory.

The Models
Classical Test Theory
Classical Test Theory (traditionally the main method used in the United Kingdom) utilises two main statistics - Facility and Discrimination.

  • Facility is essentially a measure of the difficulty of an item, arrived at by dividing the mean mark obtained by a sample of candidates and the maximum mark available. As a whole, a test should aim to have an overall facility of around 0.5, however it is acceptable for individual items to have higher or lower facility (ranging from 0.2 to 0.8).
  • Discrimination measures how performance on one item correlates to performance in the test as a whole. There should always be some correlation between item and test performance, however it is expected that discrimination will fall in a range between 0.2 and 1.0.
The main problems with Classical Test Theory are that the conclusions drawn depend very much on the sample used to collect information. There is an inter-dependence of item and candidate.

Item Response Theory
Item Response Theory (IRT) assumes that there is a correlation between the score gained by a candidate for one item/test (measurable) and their overall ability on the latent trait which underlies test performance (which we want to discover). Critically, the 'characteristics' of an item are said to be independent of the ability of the candidates who were sampled.

Item Response Theory comes in three forms: IRT1, IRT2, and IRT3 reflecting the number of parameters considered in each case.

  • For IRT1, only the difficulty of an item is considered,
    (difficulty is the level of ability required to be more likely to correctly answer the question than answer it wrongly).
  • For IRT2, difficulty and discrimination are considered,
    (discrimination is how well the question is at separating out candidates of similar abilities).
  • For IRT3, difficulty, discrimination and chance are considered,
    (chance is the random factor which enhances a candidates probability of success through guessing.

IRT can be used to create a unique plot for each item (the Item Characteristic Curve - ICC). The ICC is a plot of Probability that the Item will be answered correctly against Ability. The shape of the ICC reflects the influence of the three factors:

  • Increasing the difficulty of an item causes the curve to shift right - as candidates need to be more able to have the same chance of passing.
  • Increasing the discrimination of an item causes the gradient of the curve to increase. Candidates below a given ability are less likely to answer correctly, whilst candidates above a given ability are more likely to answer correctly.
  • Increasing the chance raises the baseline of the curve.

This simple simulation allows the user to investigate the factors governing the shape of the Item Characteristic Curve. All three well known IRT models are represented (referred to as IRT1, IRT2 and IRT3) and Item Characteristic Curves can be super-imposed on one another to see how they relate.

thumbnail of item response theory simulation
Click to View the Simulation

Of course when you carry out a test for the first time you don't know the ICC of the item because you don't know the difficulty (and discrimination of that item). Rather, you estimate the parameters (using paramater estimation techniques) to find values which fit the data you observed.

Using IRT models allows Items to be characterised and ranked by their difficulty and this can be exploited when generating Item Banks of equivalent questions. It is important to remember though, that in IRT2 and IRT3, question difficulty rankings may vary over the ability range.

Rasch Measurement
Rasch measurement is very similar to IRT1 - in that it considers only one parameter (difficulty) and the ICC is calculated in the same way. When it comes to utilising these theories to categorise items however, there is a significant difference. If you have a set of data, and analyse it with IRT1, then you arrive at an ICC that fits the data observed. If you use Rasch measurement, extreme data (e.g. questions which are consistently well or poorly answered) is discarded and the model is fitted to the remaining data.

Further Resources
Item Analysis is an enormous field, and is particularly popular in the United States where much of the research has been conducted.

  • CAA Centre Bluepapers
    The CAACentre TLTP3 project published two documents on issues relating to Item Analysis, both by Mhairi McAlpine of the University of Glasgow. These papers cover the areas of "Methods of Item Analysis" (Bluepaper 2) and "Item Banking" (Bluepaper 3) and provide an ideal introduction to this extensive subject.
  • Institute of Objective Measurement
    The web site of the Institute of Objective measurement (an American Organisation) is full of useful resources relating to Item Analysis.
  • ERIC Clearinghouse on Assessment and Evaluation
    Ericae is the ERIC Clearing House on Assessment and Evaluation. (ERIC is the Educational Resources Information Center, an American resource set up to provide easy access to education research and literature). Like the site, this site provides a vast resource of useful papers, articles and links.
  • Item Response Theory, Frank Baker
    One of the most useful resources on the Ericae web site is the online book "The Basics of Item Response Theory" by Frank Baker, (2001).
  • ETS: The Educational Testing Service
    ETS, The Educational Testing Service is a private testing and measurement organisation based in the United States. It has a well respected research group.
  • The IRT Modelling Lab, University of Illinois
    This site provides another good general Introduction to IRT, and includes a tutorial on IRT explaining the underlying theory as well as how to utilise it.

Where Next?
SCROLLA ran a Symposium on Measurement and Item Analysis on 12th February 2003 at Heriot-Watt University. More details here.

If you have any further questions on this or any other assessment topic please contact the SCROLLA team at Heriot-Watt University. (see - and click for Cliff, Colin or Ruth.

If you have any comments on this page (errors, omissions, relevant resources) please tell Colin.

page last updated 2nd December 2003, CM
(thanks to Mhairi McAlpine of SQA for help with terminology)