DESIGNING MULTIPLE CHOICE TEST OF VOCABULARY FOR THE FIRST SEMESTER STUDENTS AT ENGLISH EDUCATION DEPARTMENT OF ALAUDDIN STATE ISLAMIC UNIVERSITY OF MAKASSAR

This study aims to develop and design Vocabulary Test in the first semester students at English Education Department of Alauddin State Islamic University of Makassar. The research design was Research and Development (R&D). It totally applied ADDIE Model. The steps of the model are Analysis, Design, Develop, Implementation and Evaluation. The type of data of this research was quantitative data. The research instrument was a rubric dealing the quality of the test produced. The findings showed that the content of material, language, and layout of the product were totally clear and understandable. The product was valid to be implemented in testing the students’ vocabulary mastery. It can be seen from the difficulty level, discrimination power, validity, and reliability of the product obtained from the score of the students’ answers.


INTRODUCTION
esting is very important in learning because it can measure and collect the information about the students' ability.English test can also benefit students in measuring their language mastery.Besides that, testing given by the lecturers or teachers aims at knowing whether the objectives of the course were achieved significantly or not and know how effective their learning process the lecturers conducted was in the last session.Based on the preliminary study conducted on April 2015 at English Education Department of Allauddin state Islamic University of Makassar, the problems faced by the lecturers were the practical constrain in measuring vocabulary ability of the students.Then, the lecturers were lack of understanding about designing test.
The problems stated previously occur because of many factors.First, The lecturers did not pay much attention to test vocabulary when they designed the test, they also did not create the test based on the characteristic of a good test such as difficult level, discrimination level, validity and reliability.Second, the test created by the lecturers were not acceptable with the materials because they designed the test only based on the ability of them and they not pay attention to make a blue print before design test.The lecturers were designing tests without based on syllabus and materials.Third, the lecturers only developed method how to master vocabulary.

T
The reason why the lecturers lack of understanding about language testing assessment; there was no more information about how to design test based on the good characteristic of test, the lecturers thought if the mastering material was more important than evaluation.There were many steps in analyzing a test.It made the lecturers lazy to do every step in designing test whereas it helped the lecturers in process on teaching.Consequently, the lecturers cannot measure the student vocabulary level.Besides that, there was no motivation in learning, because the students did not know how high or how low their level in vocabulary ability.In this case, assessing vocabulary helped greatly the lecturer and students to know the students' vocabulary ability.Then, the lecturer cannot know whether the question was acceptable for the students or not when they design a test.Also, the lecturer cannot know whether the goals and objective of the course achieved or not.
After identifying and analyzing the factors, the researcher became aware that in order to solve the problems, the researcher has to design and develop multiple choice tests which can be acceptable and appropriate with the materials.The researcher also created the test in accordance with the characteristic of a good test; difficulty level, discrimination power, validity and reliability.
Considering the factors affected the problems above, the researcher viewed that the test which was appropriate to measure vocabulary ability was multiple choices.For the reason, the researcher viewed that it was very easy and quick for the examiner to correct this test because he or she just put ticks or crosses.On the other hand, we do not have to worry about subjectivity because only one answer should be correct (Pavlů 2009:19).
In others word, multiple choice is one of the tests for making testing that be simple but may serve as a vocabulary check (Brown 2004:194).Hopefully, the researcher can design multiple choice tests based on a good characteristic of the test.So, the researcher designed multiple choice test of vocabulary and made the testing more interesting.Besides that, this research would be information source for lecturer who will design test based on a good characteristic of test.Moreover, the other goal was to use the vocabulary more in practice and more intensively so that the students would remember the vocabulary better.
Based on the problem stated previously, the researcher conduct a research entitled "Designing Multiple Choice Test of Vocabulary at English Education Department of Alauddin State Islamic University of Makassar".

LITERATURE REVIEW
Some researchers have conducted researches related to "designing test" and what they have found are shown such as Zhongshannvgao (2007) conducted a study on Designing and Revising a Multiple Choice Vocabulary Test.He found out that multiple choice testing appeals to many people for its high reliability and efficiency in terms of scoring, but the construction of a good item requires a tremendous amount of time and effort.In vocabulary assessment, the decision on whether to attempt this format and how to design a test depends on the context, the needs of the taster, the test purpose and, above all, the selected construct to measure.As long as a test is proved to be valid and can bring benefits to both students and teachers".
Another research come from Öztürk (2007) conducted a research on the designing test faced by Multiple-Choice Test Items of Foreign Language Vocabulary.The research results reveal that the English Foreign Language teachers made much more mistakes in vocabulary section than in grammar section.The findings imply that even though the EFL teachers have been provided with the principles for constructing multiple-choice items in advance, the teachers still construct improper items.Language testing plays an important role in both teaching and learning.Well-constructed tests can enhance learning and motivate students.
On the other side, Pavlů ( 2009) with the research "Testing Vocabulary" dealt with options how vocabulary may be tested.The thesis was divided into theoretical and practical part.The theoretical part comprised in two big subdivisions which were testing itself and Vocabulary.In the first part he dealt with the question whether testing was important and different reasons for testing, and the next part explains two basic principles of testing which were reliability and validity.And the last was focused on techniques of testing and the examples.
The related of those research findings above with this research in designing test is how to designing test in vocabulary, especially in multiple choice.They have found that much more mistakes in designing test; the test is not reliable and valid.The mistakes can make the bad test, with the result; the test cannot measure the student ability favorably.Therefore, this research tries to design and develop strategy for designing multiple choice tests in vocabulary.So, the researcher will explain how to design multiple choice tests of vocabulary in this research.

RESEARCH METHOD
The research method used by researcher in this research was Research and Development (R&D).R&D is a name of research designs involving the classroom problems, studying recent theories of educational product development, developing the educational products, validating the product to experts, and field testing the product (Latif, 2012).The researcher adopted ADDIE model.The ADDIE model as "a colloquial term used to describe a systematic approach to instructional development, virtually synonymous with instructional systems development" Molenda (2003:34), Addie is a generic instructional design model that provides an organized process for developing instructional materials (Shelton & saltsman 2011:566).ADDIE is acronym which stands for Analysis, Design, Development, Implementation, and Evaluation.
ADDIE model is design for the learners to achieve the goals and objectives of the course or syllabus.It allows for the evaluation of the materials.It also provided simple procedures to design and develop the tests.The procedures in design multiple choice test of vocabulary deals with ADDIE model which provides five phases in terms of analysis, design, development, implementation, and evaluation.

Analysis
In this phase, the researcher identified and developed clear understanding of materials.She also identified a set the goals and objective of the course based on materials that was given from their lecturer.Then, the researcher considered timeline and budget needed in designing the test that is also important.In Addition, this phase refer to need analysis.
Need analysis is a set of procedures used to collect information about learners, needs (Richards, 2003:51) as cited in Sukirman 2012.

Design
In this phase, the researcher designed multiple choice test of vocabulary considering the goals and objective of the learning process, designing blue print (see more in appendix 2), determining target population description, selecting delivery materials which the ma terials were appropriate that the signed be a test.

Development
This phase was done based on the two previous phases, analyze and design phase.Before phase, we have been said about blue print.In this phase, the research developed blue print in this stage.In the blue print, there are lists of materials, so the blue print guided the researcher to designing multiple choice test based on materials and syllabus.There are some steps in doing this phase.First, the researcher listed what activities which can assist the learners learn the materials.Second, she selected the best way which was appropriate with learners' styles.Third, she designed, developed and produced multiple choice test of vocabulary dealing with the materials and syllabus of the course.The n, she organized the test.After that, she validated the test to experts to make sure whether the test was appropriate to materials as well as the syllabus of the course or not.Finally, the final product was ready to be implemented.

Implementation
This phase deals with trying-out the product.In this case, the product was implemented in the real learning/teaching.The purpose of this phase to prove whether the test was appropriate for the target learners or not.If not, the product was revised and was tried out again.

Evaluation
This phase was designed to measure the rate of quality of the materials as being implemented.It measured the appropriateness of the designing test.In this evaluation, one expert involved to check the quality of the product.
There were two kinds of evaluation in this phase generally, Formative and summative evaluation.Formative evaluation was ongoing and during between phases.The purpose to improve the quality of the content of the test before the final steps of test was impleme nted.Meanwhile, summative evaluation was the final evaluation of the process designing test.

A. Finding
The result of this research finished based on steps of R&D which have been done on the design test.There were five steps that have done to get a good product.The steps were; 1. Analysis In this phase, the researcher observed about testing that teacher gave to the student in vocabulary in context course and the researcher found some problems in the item of test.
There were some lecturer did not pay much attention to design test of vocabulary, therefore the lecturer did not design the test based on syllabus and materials and the lecturer also did not measure difficult level, discrimination level, validity and reliability of the testing.

Design
The researcher designed what she did in this research.The researcher designed blue print based on syllabus and materials of vocabulary in context deals with synonym, antonym, rewording, details, collocation, reference, inference, and word form.

Development
The product of this research consists of 40 items of testing.Every single number of test developed based on syllabus and material that had been designed on blue print.(See more in appendix 2)

Implementation
This phase dealt with trying-out the product.Before trying the product, the product was analyzed by the expert.It identified the validity instrument of testing by using rubrics.It included some indicators to measure the validity of the product (see more in appendix 5).

a. Tried out 1
After analyzed by the expert, the product tried out.The product revised in the first based on comment expert and students answer.Based on the researcher's statically calculation, the data of the students' answers demonstrated that there were 13 valid items of the test, namely 1, 3, 5, 7, 11, 15, 17, 19, 25, 26, 27, 37and  On the contrary, the other items that invalid for the data showed that their validity was not appropriated with the indexes in the table of the critical values of product moment.The researcher also analyzed the reliability of the item test.As explain that implici tly that the result of r in a test items was not appropriate with the table of product moment.It meant that the item was considered to be not reliable.To be clearer, the researcher provided the table that gave a brief description about the validity of each item.Each item of this product analyzed about difficulty levels and capacity of distinctive.
The researcher provided the table that gave a brief description about the status of each item.

c. Tried out III
In this phase, the researcher revised the items that repaired and filed in the second tried out.As the result of the second tried out, there were 5 items which have to revised.In the last tried out, the researcher added 5 items to the items that repaired and filed as selection to find good items.With the result that there were 10 items tried out and analyzed in the third tried out times as the final tried out.Based on researcher's statically calculation, the data of the students' answer demonstrated that there were 5 invalid items of the test.The item was analyzed the reliability in the third times.As explain implicitly that if the result of r in 10 items test were not appropriate with the table of product moment, it meant that the items was considered to be not reliable.Each item of this product analyzed about difficulty levels and capacity of distinctive in the third times.The researcher provided the table that gave a brief description about the status of each item.

Easy
Average Difficult 1, 3, 6, 7 2, 4, 5, 8, 9, 10 -Based on difficulty level analysis, the items demonstrated that there were 4 items in easy level, 6items in average level, and there was no item in difficulty level analysis.Also the items were analyzed by the discrimination power.The researcher provides the analysis that ggave brief description about the status of each item.

Good
Receive and repair Repair Fail 1, 2, 8, 9, 10 5, 7 -3, 4, 6 Based on the discrimination power analysis of the items demonstrated that there were 5 items in good level, 2 item were received and repaired level, there was no item were repaired level, and 3 items were failed level.As the result, there were 5 items tha t had good criteria of the test.The items fulfill the items' needed in the second tried out.As the result the good items in the second and third tried out can measure the students' knowledge.
The aim of final product revision was to determine whether the product was ready to use.Where, the product completed some of the criteria of good test.In this step the product was analyzed from the expert, such as; the validity, reliability, difficulty level, discrimination power that the researcher revised the product.But after the researcher analyzed the product and was commented by the expert, there was a character that not fulfill be a good test.It was reliability.It will be explained in discussion part.

Evaluation
There were two kinds of evaluation in this phase generally, Formative and summative evaluation.Formative evaluation was ongoing and during between phases.The purpose was to improve the quality of the content of the test before the final steps of test was implemented.Meanwhile, summative evaluation was the final evaluation of the process designing test.

Discussion
This part presents the result of the data analysis.The data were found in five steps of R&D which was adopted from ADDIE models (Shelton & saltsman 2011:566).The researcher analyzed about validity, reliability, and difficulty level and discrimination power.
The results of all items were valid.It was analyzed in three try out steps.The product was appropriate with phopam:83 that validity is, hands down, the most significant concept is in assessment.He revealed that validity is the important step of analysis of the test that helped the lecturer to make suitability between the test and materials.Therefore, the lecturer should consider the validity in the test that they made to create a good test.
Also, the all items were analyzed in order to measure their reliability, difficulty level and discrimination power.It was found that the results of the analyses were difficulty level and discrimination power showing that the items test were balance and acceptable.This statement appropriate with Madsen:181 that difficulty level is simply the percentage of student (high and low combined) who got each question right, so the researcher could find the differentiate between difficult items test or easy items test.It was in line with discrimination power that was how well it differentiates between high and low level with more advanced language skill and those with less skill.So, the researcher can measure the ability of the student through the test.
After the researcher analyzed all of the characteristics of the item test, the researcher found a different result from reliability.In rate of r-Table (significant level 5% with 36 testers) was 0.329.But, the result of the analyzing the test based on the student answer was 0,286.It meant that the reliability for all the items was not acceptable.
The reasons why the reliability was not acceptable because the data analysis was not fulfill the criteria of reliability that was consistent and dependable.The test can use in a classroom to measure the ability of the student, but the test cannot be being a part of a bank test.It because the test was not constant, it can be used in a change situation, also just for measure the ability of the student in midterm or final examination..The other reason came from the result of the analyzing data was in an unbalance way.
The diagram of the result showed the ability of the student and the difficulty level of the test was unevenness of the data (see more in appendix 7), and for the chart statically collection see more in Appendix 8. Due to the researcher was limited by the time, ultimately the researcher decided to end this research.With the consequence, this research will be developed in the future.

CONCLUSION
Based on the research findings and discussion in the previous chapter, the researcher comes to the following conclusion.First, the product analyzed about Difficulty level analysis the items demonstrated that is good to use it in the students, because each item on the difficulty level was balanced between easy, average and difficult levels.Second, the product analyzed about Discrimination Power.The result showed that the entire items test was good and acceptable.Third, the product analyzed about the validity.The result showed that the product was good.Fourth, the product analyzed about Reliability.The result showed that the product was not reliable.

Suggestion
Concerning with the result of this research, the researcher would like to give the following suggestion: 1.The result of this research found that the validity, discrimination power, difficulty level were appropriate with the purpose of the research, except the reliability.So, for the next researcher can be more focused in reliable analysis.4. When the lecturer took the items test from the bank test, they have to select the items that appropriate with the materials.Considering, not all the test appropriate with the material that the student have learned.

Figure 2 .
Figure 2. ADDIE Model, Diagram by: Steven J. McGriff 2. For the lecturers, they should know how to develop Vocabulary test based on Difficulty level, Discrimination Power, validity and Reliability.Because a good test is the important thing to measure the knowledge and ability of the students.3. The lecturer should analyze their item of testing before use it.Considering, not all the student are good at the same test.And the students have different level of ability.
39.They had validated index appropriate by the indexes in the table of the critical values of product moment stated in Arikunto (2003:76).There were two items that received and repair, namely 9 and 23.They validated index gone up to the indexes in the table of the critical values of product moment.

Table IV . Validity index
The item analyzed the reliability.As explain implicitly that if the result of r in a test item was not appropriate with the table of product moment, it meant that the items was considered to be not reliable.