Description
Computerized adaptive testing (CAT) is a form of computer-based test that adapts to the examinee's ability level. For this reason, it has also been called tailored testing.
Development : The CAT 2013 test development process was conducted in alignment with the Standards for Educational and Psychological Testing. The exam was designed with two sections: (1) Quantitative Ability & Data Interpretation, (2) Verbal Ability and Logical Reasoning. These two sections are consistent with the knowledge domains historically assessed by the IIMs and are also aligned with the content areas covered in equivalent global admission examinations that measure performance along similar scales. Content of the examination was developed and confirmed by individuals with high levels of expertise in each of these content domains. Post-administration analysis will be conducted by credentialed psychometricians to confirm the validity of the examination scores and to ensure that every candidate was provided a fair and equal opportunity to display their knowledge.
Scoring : Prometric employs an industry-standard, psychometrically-sound approach to the scoring process for all IIM candidates. The three-step process is outlined here and is supported by the Standards for Educational and Psychological Testing and the ETS Standards for Quality and Fairness.
Step 1: Raw Score is Calculated Your raw scores are calculated for each section based on the number of questions you answered correctly, incorrectly, or that you omitted.
Correct Answer
Incorrect Answer
Omitted
+3 points for questions you answered correctly
-1 point for questions answered incorrectly
0 points for questions you did not answer
This scoring methodology ensures that candidates are only awarded points for what they know, while having points deducted for inappropriate random guessing. This is a standard process in the testing industry and is a methodology employed in scoring similar admissions tests such as the Graduate Record Examination (GRE).
Step 2: Raw Score is “Equated” Equating is a statistical process used to adjust scores on two or more alternate forms of an assessment so that the scores may be used interchangeably. Industry standard processes were used for equating, such as those outlined within the ETS Standards for Quality and Fairness.
Step3: Equated Raw Score is “Scaled” In order to ensure appropriate interpretation of an equated raw score, the scores must be placed on a common scale or metric. A linear transformation is used for this scaling process, which is an industry standard practice (Kolen & Brennan, 2004). The IIM scaling model is as follows: Section Scores = 0 to 225 Total Exam Score = 0 to 450 Three scaled scores are presented for each candidate: an overall scaled score and two separate scaled scores for each section. As the two sections evaluate two distinct sets of knowledge and skills, scores may not correlate across sections. A high score in one section does not guarantee a high score in another section. Percentile rankings are provided for each individual section as well as for the overall exam score. About Test Difficulty.... The CAT exam has been developed to accurately identify top performing candidates and that design makes use of a scaled score range of 0 to 450. In order to appropriately identify the top performing candidates, the
CAT exam is, by design, a very difficult exam. As would be expected with the more difficult CAT exam, no candidate would likely answer 100% of the items correctly or achieve the top theoretical score. The exam design will accomplish the goal of identifying the top performing candidates who are, indeed, ranked at the top of the list. If the exam were designed to be substantially easier, it would be theoretically possible for a candidate to achieve a score of 450. However, an exam constructed to be that easy would not serve the distinct purposes of the IIMs. Reference American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME). (1999). Standards for Educational and Psychological Testing. Washington, D.C.: Author. Educational Testing Service (ETS). (2002) ETS Standards for Quality and Fairness. Princeton, N.J. Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling and linking: Methods and practices. 2nd Ed. Springer.
Fairness and Equivalency in IIM Exams
A significant number of examination forms are used on behalf of the IIMs to evaluate the large candidate population. With the use of multiple forms comes the need to ensure fairness and equivalency of the examinations used for assessment. A post-equating process is necessary to ensure validity and fairness. Equating is a psychometric process to adjust differences in difficulty so that scores from different test forms are comparable on a common metric and therefore fair to candidates testing across multiple days. The equating process was designed with three phases: exam creation, post-equating, and scaling. Each form contains a pre-defined number of statistically profiled questions selected from a large bank. These questions form an equating block within a form, which can be used as an anchor to rescale candidates’ scores to the metric of the item bank. This rescaling process adjusts for the difference in the form difficulties, taking into account of candidates’ differential performance on the equating block. As a result, the candidates’ rescaled scores can be placed and compared on the common metric regardless which form they take. This approach provides support for equating without significantly impacting the security of the items. The second phase of the process is post equating. In this process, items are concurrently analyzed and the estimated item parameters (item difficulty and item discrimination) are put onto a common metric. Item Response Theory (IRT), a psychometrically supported statistical model, is utilized in this process. The result is a statistically equated raw score that takes into account the performance of the candidate along with the difficulty of the form administered.
Once post-equating has resulted in an equated raw score, scaling of the scores is done to reduce confusion to candidates. Scaling can be done using a linear or non-linear transformation of the original, equated number correct. Though the number as presented to candidates is placed on a common scale for ease of interpretation, the position of candidates in the score distribution does not change.
Lastly, once scaled scores are established, the final step in the scoring process is to rank candidates in their performance. A percentile rank is the percentage of scores that fall below a given score. With the total scale scores arranged in rank order from the lowest to highest, in 100 equally sized groups, a table with the total scale scores to percentile ranks will be created. This ranked list of candidates will allow for the identification of candidates from the highest performers at the very top of the list to the lower performers in the middle and low end of the scale.
The test development and equating models outlined have substantial advantages to candidates. First, they confirm with a high level of psychometric rigor that all examination scores are valid, equitable and fair. Post equating takes into account any statistical differences in examination difficulty and ensures all candidates are evaluated on a common scale. Reporting scores on this statistically equivalent scale creates an environment where the very high performing candidates will be ranked appropriately at the top end of the scale.
doc_989466470.docx
Computerized adaptive testing (CAT) is a form of computer-based test that adapts to the examinee's ability level. For this reason, it has also been called tailored testing.
Development : The CAT 2013 test development process was conducted in alignment with the Standards for Educational and Psychological Testing. The exam was designed with two sections: (1) Quantitative Ability & Data Interpretation, (2) Verbal Ability and Logical Reasoning. These two sections are consistent with the knowledge domains historically assessed by the IIMs and are also aligned with the content areas covered in equivalent global admission examinations that measure performance along similar scales. Content of the examination was developed and confirmed by individuals with high levels of expertise in each of these content domains. Post-administration analysis will be conducted by credentialed psychometricians to confirm the validity of the examination scores and to ensure that every candidate was provided a fair and equal opportunity to display their knowledge.
Scoring : Prometric employs an industry-standard, psychometrically-sound approach to the scoring process for all IIM candidates. The three-step process is outlined here and is supported by the Standards for Educational and Psychological Testing and the ETS Standards for Quality and Fairness.
Step 1: Raw Score is Calculated Your raw scores are calculated for each section based on the number of questions you answered correctly, incorrectly, or that you omitted.
Correct Answer
Incorrect Answer
Omitted
+3 points for questions you answered correctly
-1 point for questions answered incorrectly
0 points for questions you did not answer
This scoring methodology ensures that candidates are only awarded points for what they know, while having points deducted for inappropriate random guessing. This is a standard process in the testing industry and is a methodology employed in scoring similar admissions tests such as the Graduate Record Examination (GRE).
Step 2: Raw Score is “Equated” Equating is a statistical process used to adjust scores on two or more alternate forms of an assessment so that the scores may be used interchangeably. Industry standard processes were used for equating, such as those outlined within the ETS Standards for Quality and Fairness.
Step3: Equated Raw Score is “Scaled” In order to ensure appropriate interpretation of an equated raw score, the scores must be placed on a common scale or metric. A linear transformation is used for this scaling process, which is an industry standard practice (Kolen & Brennan, 2004). The IIM scaling model is as follows: Section Scores = 0 to 225 Total Exam Score = 0 to 450 Three scaled scores are presented for each candidate: an overall scaled score and two separate scaled scores for each section. As the two sections evaluate two distinct sets of knowledge and skills, scores may not correlate across sections. A high score in one section does not guarantee a high score in another section. Percentile rankings are provided for each individual section as well as for the overall exam score. About Test Difficulty.... The CAT exam has been developed to accurately identify top performing candidates and that design makes use of a scaled score range of 0 to 450. In order to appropriately identify the top performing candidates, the
CAT exam is, by design, a very difficult exam. As would be expected with the more difficult CAT exam, no candidate would likely answer 100% of the items correctly or achieve the top theoretical score. The exam design will accomplish the goal of identifying the top performing candidates who are, indeed, ranked at the top of the list. If the exam were designed to be substantially easier, it would be theoretically possible for a candidate to achieve a score of 450. However, an exam constructed to be that easy would not serve the distinct purposes of the IIMs. Reference American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME). (1999). Standards for Educational and Psychological Testing. Washington, D.C.: Author. Educational Testing Service (ETS). (2002) ETS Standards for Quality and Fairness. Princeton, N.J. Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling and linking: Methods and practices. 2nd Ed. Springer.
Fairness and Equivalency in IIM Exams
A significant number of examination forms are used on behalf of the IIMs to evaluate the large candidate population. With the use of multiple forms comes the need to ensure fairness and equivalency of the examinations used for assessment. A post-equating process is necessary to ensure validity and fairness. Equating is a psychometric process to adjust differences in difficulty so that scores from different test forms are comparable on a common metric and therefore fair to candidates testing across multiple days. The equating process was designed with three phases: exam creation, post-equating, and scaling. Each form contains a pre-defined number of statistically profiled questions selected from a large bank. These questions form an equating block within a form, which can be used as an anchor to rescale candidates’ scores to the metric of the item bank. This rescaling process adjusts for the difference in the form difficulties, taking into account of candidates’ differential performance on the equating block. As a result, the candidates’ rescaled scores can be placed and compared on the common metric regardless which form they take. This approach provides support for equating without significantly impacting the security of the items. The second phase of the process is post equating. In this process, items are concurrently analyzed and the estimated item parameters (item difficulty and item discrimination) are put onto a common metric. Item Response Theory (IRT), a psychometrically supported statistical model, is utilized in this process. The result is a statistically equated raw score that takes into account the performance of the candidate along with the difficulty of the form administered.
Once post-equating has resulted in an equated raw score, scaling of the scores is done to reduce confusion to candidates. Scaling can be done using a linear or non-linear transformation of the original, equated number correct. Though the number as presented to candidates is placed on a common scale for ease of interpretation, the position of candidates in the score distribution does not change.
Lastly, once scaled scores are established, the final step in the scoring process is to rank candidates in their performance. A percentile rank is the percentage of scores that fall below a given score. With the total scale scores arranged in rank order from the lowest to highest, in 100 equally sized groups, a table with the total scale scores to percentile ranks will be created. This ranked list of candidates will allow for the identification of candidates from the highest performers at the very top of the list to the lower performers in the middle and low end of the scale.
The test development and equating models outlined have substantial advantages to candidates. First, they confirm with a high level of psychometric rigor that all examination scores are valid, equitable and fair. Post equating takes into account any statistical differences in examination difficulty and ensures all candidates are evaluated on a common scale. Reporting scores on this statistically equivalent scale creates an environment where the very high performing candidates will be ranked appropriately at the top end of the scale.
doc_989466470.docx