摘要:Background Properly constructed multiple-choice questions (MCQs) in high stakes examinations are expected to have high validity and reliability scores. However, several reports show that teacher-generated high stakes examinations do not always achieve the required high level of quality if item constructors are not trained in item writing, or if they are not proficient in the principles of assessment. Aim This evaluation aimed to assess the validity, reliability and quality of a 150 item multiple choice question test in the Membership of the Royal College of General Practitioners International Examination in Oman. Design of the study Computer-based test-item analysis according to a set of pre-validated quality criteria. Participants and setting Twenty doctors who underwent Family Medicine Residency Programme of the Oman Medical Speciality Board, or its equivalent, and were eligible to sit the test. Method The test-item analysis included item difficulty, item discrimination level and quality of distractors. Results Across 150 A-type items, 69% were of applied format. Kuder- Richardson 20 was 0.81. The mean test score was 86.3% and standard error of measurement was ± 5.0. The mean difficulty index of the 150 items was 43%. Of all items, 50.7 % were at the level of moderate or better discrimination. Only 20% of items had more than two distractors functioning according to a quality criterion. Conclusion Distractor performance was found to be less than optimal and, if the time spent on test-item construction can be made more effective, that would be of great practical significance to teaching faculty. Despite the limitations of the study by low numbers of examinees, which impacts upon its validity, it is still the belief of the authors, that the analysis and suggestions made are useful as a guide to item writers, providing some answers as to how to improve the overall quality of MCQs in the future. To further improve this study it is now the intention to collect data from a larger number of subsequent examinations to increase the validity of the item analysis.