Questions and Answers from the Institute
Participants asked questions about testing and about some of the presentations
The following questions were posed by participants at the ends of Days 1 and 2. Given their technical nature, we asked Marylou Lennon and Kentaro Yamamoto from Educational Testing Service (ETS) to offer answers. We thank them both.
The cut-off points for the IALS levels represented critical, shifting points – what are those skills and what makes them critical? What are the thresholds that these levels mark?
The five (5) levels used for reporting IALS results, with the cut-offs at particular points along the scale, were determined based on a combination of expert judgment and research. The research focused on understanding what factors seemed to distinguish the difficulty of tasks along each scale and what common characteristics seemed to be shared among tasks that fell along the same points on the scale. Kentaro Yamamoto noted in his presentation at the IALS Institute that this research focused on the features of the tasks as well as on the materials and the level of processing required to successfully complete the tasks. This is different from readability formulas which attempt to characterize the overall complexity of an entire text or large segments of texts. The approach in the literacy surveys recognizes that not all tasks require an individual to read and fully understand an entire text. Therefore, the purpose for reading plays an important role in contributing to the overall difficulty of a task. In other words, a task in the survey represents the interaction between the question that is asked (the “purpose”) and the text that must be read and processed.
For more information: See a paper written by Irwin Kirsch, entitled “The International Adult Literacy Survey: Understanding What was Measured” at www.ets.org/Media/Research/pdf/RR-01-25-Kirsch.pdf. It explains these variables in detail.
Additional explanations are also included in Kirsch, I. S., Jungeblut, A., & Mosenthal, P. (1998). The measurement of adult literacy. This chapter appears in S. Murray, I. Kirsch, & L. Jenkins (Eds.), Adult Literacy in OECD Countries: Technical Report on the First International Adult Literacy Survey. Washington, DC: National Center for Educational Statistics.
How do you test for the progression of the skills?
Testing for the progression of skills, in terms of the large-scale international assessments (IALS, ALL and PIAAC) is done by using the models mentioned in the previous answer and explained in Irwin Kirsch’s paper. Items in these assessments are distributed along each scale as the result of the responses by various populations in each participating country. Thus, they represent the relative difficulty that individuals with varying abilities have in responding correctly. This distribution is reflected in the item maps that are in the various reports for these assessments. Research over many years attempts to explain the progression of tasks along each scale from easy to more difficult based on the task presented in the questions, the characteristics of the stimulus material, and the relationship between the task and stimulus material. For example, the research shows that a task can be easier or harder depending on a number of characteristics that include: the structure and characteristics of the text; the type of information requested; the number of conditions that must be satisfied; and, the process that the respondent must use (e.g., must the respondent locate one or more pieces of information, integrate information to make a comparison, or respond based on the information provided plus some outside knowledge). As we discussed in the Institute session about the PDQ instructional system, if the stimulus is a document, it can be more or less difficult based on its structure (simple list, combined list, etc.) Prose materials also vary in difficulty based on structural characteristics such as length, organization, and use of headings or other visual cues. Finally, the relationship between the task and stimulus can make a task more or less difficult based on the match between wording in the question and the stimulus (e.g., whether the question wording exactly matched words or phrases in the stimulus or not). Again, this is discussed in detail in the papers referenced above, both of which provide a more complete context for this information.
Can you explain the difference between RP67 and RP80? Knowing this will be important to interpret and understand the results of PIACC.
RP stands for response probability. The scale point assigned to each task is the point at which individuals with that proficiency score have a given probability of responding correctly. In IALS, a response probability of 80% (RP80) was used. This means that individuals estimated to have a particular scale score are expected to perform tasks at that point on the scale correctly with an 80% probability. It also means they will have a greater than 80% chance of performing tasks that are lower on the scale. It does not mean, however, that individuals with given proficiencies can never succeed at tasks with higher difficulty values; they may do so some of the time. It does suggest that their probability of success is “relatively” low—that is, the more difficult the task relative to their proficiency, the lower the likelihood of a correct response. An analogy might help clarify this. The relationship between task difficulty and individual proficiency is much like the high jump event in track and field, in which an athlete tries to jump over a bar that is placed at increasing heights. Each high jumper has a height at which he or she is proficient—that is, the jumper can clear the bar at that height with a high probability of success, and can clear the bar at lower heights almost every time. When the bar is higher than the athlete’s level of proficiency, however, it is expected that the athlete will be unable to clear the bar consistently. In this sense the idea of consistency is reflected in the notion of proficiency.
So what is the impact of changing the response probability (RP) from 80% to 67%?
As we decrease the RP value from 80% to 67%, the estimated likelihood that someone with this level of proficiency will succeed on a given task decreases. This is because we are adding more uncertainty about whether or not we expect individuals at particular points along the scale to respond correctly to items with various characteristics that are defined by each level. For example, using an RP value of .67, the correct inference would be that someone at the midpoint of a particular level would be expected to get 67 percent of items that are defined by that level correct. Someone at the bottom of a given level would have less than 67 percent chance of responding correctly to this set of items while someone at the top of a level would have greater than 67 percent chance. Using an RP value of .80 adds more certainty to estimates. With an RP value of .80, someone in the middle of a level would be expected to correctly respond to 80 percent of the items drawn randomly from that level. Someone at the bottom of a level would be expected to score less than 80 percent on that pool of tasks while someone at the top of the level would be expected to score at more than 80 percent on tasks from that level.
It is important to keep in mind that the selection of a response probability comes after estimation of the item parameters and ability. Thus, the choice of an RP value does not impact either the statistical characteristics of the items or the estimation of proficiency along the scale. The RP value also does not impact the precision of measurement along a scale . The same items are used to define the underlying scale regardless of which RP value is selected. The selection of the RP value does impact where a particular item falls along the scale. As will likely be the case with PIAAC, which is keeping the cut points for levels that were used in IALS (e.g. Level 1 is 0 -225, Level 2 is 226-275, etc), this determines the Level in which some items are located. However, the selection of the RP value has no impact on the ability distribution or the percentage of people who fall within a particular level.
How were the words selected for the reading test – (job/sky) passage comprehension that Marylou Lennon described? Are thinking types considered?
For the comprehension section of the Reading Components test, passages were written that varied in difficulty and in text type (narrative, persuasive, expository). Then words within those passages were selected as targets, or correct choices. The incorrect choice next to the target was a high-frequency word in the same part of speech – so if the target word was a noun (as in the example) the associated incorrect choice was also a noun. Research has shown that this type of item is correlated with how well individuals are able to comprehend certain types of texts. The methodology is more commonly used with prose passages of narrative and expository texts.
Evaluation of digital information – Can you give an example of what is considered entry level?
If this is in reference to the digital materials in PIAAC – the digital texts in the Literacy domain and the digital materials in Problem Solving in Technology Rich Environments – these materials are, by design, intended to measure skills at the higher end of the ability distribution. Currently, we do not have very simple, or entry level, digital materials in the assessments. Additionally, because “evaluation tasks”, on average, are more difficult than basic “locate tasks” or simple “integrate tasks” such as comparing two easy-to-locate pieces of information, applying this more complex cognitive task in the context of using digital materials is most likely to lead to tasks that are quite difficult. This will be determined empirically using data from the PIAAC Main Survey. If you are interested, you might want to investigate some of the work that Don Leu has done on what is referred to as “new literacies”, or the skills that readers need when using digital information. In terms of evaluation, one of Leu’s colleagues has identified “at least five different types of evaluation that take place during online reading comprehension:
1) Evaluating understanding: Does it make sense to me?
2) Evaluating relevancy: Does it meet my needs?
3) Evaluating accuracy: Can I verify it with another reliable source?
4) Evaluating reliability: Can I trust it? and
5) Evaluating bias: How does the author shape it?”
For more information, see: http://www.newliteracies.uconn.edu/pub_files/What_is_new_about_new_literacies_of_online_reading.pdf) ·
Has the fact that we process words on paper differently than on computer been considered?
We have developed two types of tasks for PIAAC. One type is delivered in a paper-and-pencil mode or a computer-based assessment mode. For these tasks we have developed adaptation procedures so they can be delivered in either mode with no significant impact on the difficulty of the task. Data collected so far has demonstrated that these items are, in fact, performing consistently across those two modes. Other tasks, however, have been specifically developed to reflect the changing nature of how digital texts are constructed and used in various environments. These items can only be delivered through computer-based assessments. Thus the electronic Literacy items as well as tasks in the Problem-Solving in Technology Rich Environments domain are designed to measure ways in which digital texts put different demands on traditional reading skills.
Is PDQ still available?
The PDQ instructional system is currently not available, although we are investigating how we might be able to disseminate some of the instructional and support materials. You might want to check some of the work done by Mike Hardt (who attended the Institute) as well as materials developed by SkillPlan (Lynda Fownes attended the Institute)– both are based on the framework used to develop PDQ. From time to time, we also discuss internally at ETS whether or not there would be interest in and support for bringing the approach up to date in terms of delivery platform and content. Egil Gabrielsen noted in his country story that Norway is working with ETS to use PDQ.