The number of exam items was adjusted according to the course blueprint and the tested domains.
The KR-20 examination was 0.906, ideal, and showing high reliability of the standard examination (12, 13). Values such as 0.8 and higher are the aims of medical education. This finding is in agreement with the work of Kehoe (12-14). He reported that for short tests (10-15 items) values of as low as 0.5 are reasonable, but those tests with more than 50 items should yield values of 0.8 or higher. Low values of KR-20 were linked to many easy or difficult questions, poorly written nondiscriminating items, non-homogeneity of educational contents, and the discrepancy between the assessment level and the educational task(14, 15).
The both of majority of exam items (72.9%) and the average exam difficulty (69.4±21.86) were within the acceptable difficulty index range. Moreover, 69.5% of the items were categorized as excellent discriminating, and the average exam discrimination index was 0.3 (±0.16).
The type of correlation between DE and DIF indicates that items with less non-functional distractors have high difficulty index (easy items) and vice versa. This finding in agreement with recent works of Hingorjo et al., Burud et al. and Kheyami et al. (8, 16, 17). The decreased number of non-functional distractor increases the difficulty index of items (become easier). Also, they reported that the DE reduces the DIS of items. In the current study, DE has a weak positive correlation with the DIS (P=0.047437, is significant at p < .05). NFDs can affect the discrimination power of the item (8) and should be replaced by more plausible distractors (11) or the item removed from the test (18). Such items have high DIF (8) as all students well got them right 23(11)or become distracting and causing a false assessment (10).
NFDs were linked to minimal training of items writing and distractors selection (7, 19, 20). It is clear that DE has an impact on both DIF and DIS individually, but whether it can affect both of them at the same time, need more research work.
Items with nonfunctional distractors can be present in any exam or test; the second step after defining them in the running exam remains open. In such items, the nonfunctional distractors can be changed with more plausible ones or deletion of the question from the bank. The area of debate is the status of these items in the current exam or test. In the current study deletion of items with two or three non-functional distractors increased the average difficulty index of the exam to from 36.83 to 42.82 and DE showed non-significant correlation with DIF of items (r= 0.2296, p= 0.133806). Deletion of such items from exam or test can affect students results and raises ethical debate. Kehoe (1995) reported that deletion of such items is ethical and justifiable (14). He argued that the test aims to determine the rank of each student. Using items or questions with unacceptable psychometrics is against this objective, and the accuracy of the resulting ranking is degraded.
Limitations of this study include the fewer number of students and items and application on one course. The strength of the study, the test is considered valid and reliable.