文章基本信息

标题：Pragmatic rating of L2 refusal: criteria of native and nonnative English teachers.
作者：Alemi, Minoo ; Tajeddin, Zia
期刊名称：TESL Canada Journal
印刷版ISSN：0826-435X
出版年度：2013
期号：June
语种：English
出版社：TESL Canada
摘要：One component of pragmatic competence is to know how to perform a particular speech act. Among speech acts, refusal is highly complicated, primarily because it often involves lengthy negotiations and face-saving manoeuvres to accommodate the noncompliant nature of the speech act. Since refusal normally functions as a second pair part, it precludes extensive planning on the part of the refuser.
关键词：English (Second language);English as a second language;English teachers;Student evaluation;Students;Teachers

Pragmatic rating of L2 refusal: criteria of native and nonnative English teachers.

Alemi, Minoo ; Tajeddin, Zia

Introduction

One component of pragmatic competence is to know how to perform a particular speech act. Among speech acts, refusal is highly complicated, primarily because it often involves lengthy negotiations and face-saving manoeuvres to accommodate the noncompliant nature of the speech act. Since refusal normally functions as a second pair part, it precludes extensive planning on the part of the refuser.

Against this backdrop, the study of rating speech acts such as refusal is salient on two grounds. First, refusal is a commonly used speech act in the process of communication and hence is a constituent of many pragmatic assessment tasks. Furthermore, as the speech act of refusal is realized differently across cultures and communicative situations, nonnative teachers should become familiar with native criteria for rating refusal production, particularly in the outer circle (Kachru, 1997), where there are no established local English norms for pragmatic appropriateness. Despite such saliency in the outer circle context, little research has been conducted to date on the criteria used by nonnative English speakers (NNESs) in rating refusal production as measured against the native English speaker (NES) baseline sociopragmatic and pragmalinguistic norms for rating the appropriateness of speech act production. This is particularly important in a foreign language context or in the expanding circle where there is no local variety of English and hence nonnative speakers are "norm dependent," that is, dependent on native speaker norms in their rating (Kachru, 1992). Accordingly, this study aimed to investigate the pragmatic rating of second language (L2) refusal production by nonnative teachers as measured against native English-speaking teachers' ratings.

Rating of Learner Productions

In performance assessment, our judgments are affected by our perceptual vantage points. The effects of rater perceptions introduce highly subjective factors that make ratings more or less inaccurate. Rater bias is a major problem when language raters judge learners' performance using criteria that are vague or highly subjective. Thus, if they use such rating criteria, it is likely that inconsistency and inaccuracy come into play. In fact, assessment of learners' performance is a complex process with many ramifications. Knoch, Read, and von Randow (2007) point out that raters' judgments are prone to various sources of bias and error that can ultimately undermine the quality of the ratings.

A number of studies using different psychometric methods have identified various rater effects (e.g., Myford & Wolfe, 2003, 2004) that need to be addressed if an acceptable level of reliability is to be maintained. Rater effects can be summarized as (a) the severity effect, (b) the halo effect, (c) the central tendency effect, (d) inconsistency, and (e) the bias effect (Myford & Wolfe, 2003). Studies focusing on language performance assessments, as reviewed by Eckes (2005), showed a significant range of rater effects. These studies, in particular, identified differences in raters' severity or leniency (e.g., Engelhard, 1994; Engelhard & Myford, 2003; Lumley & McNamara, 1995). These differences were found to be resistant to rater training (Barrett, 2001; Lumley & McNamara, 1995; Weigle, 1998) and to persist in raters for a long time (Fitzpatrick, Ercikan, Yen, & Ferrara, 1998). Furthermore, researchers identified significant effects for rater-ratee interaction (Kondo-Brown, 2002; Lynch & McNamara, 1998), rater-task type interaction (Lynch & McNamara, 1998; Wigglesworth, 1993), and rater-criteria interaction (Wigglesworth, 1993).

Rater effects need more attention, as they are sources of systematic variance in observed ratings associated with raters rather than ratees (Cronbach, 1995; Hoyt, 2000; Myford & Wolfe, 2003). As a result, rater effects that are irrelevant to the construct being rated threaten the validity of the assessment procedure (Bachman, 2004; Messick, 1989, 1995; Weir, 2005). Two rater effects related to the main theme of this study are severity and inconsistency. The former occurs when raters are found to rate either too harshly or too leniently, as compared with other raters or established baseline ratings. The latter is exhibited when raters tend to rate in terms of different criteria or the inconsistent application of criteria. For example, they might favour a certain group of test takers or mainly apply one criterion at the expense of others. The variability of ratings as a result of these effects has been addressed in studies on speaking and writing (e.g., Schaefer, 2008; Shi, 2001). One source of rater variability is the status of the rater as a native or nonnative speaker. It is very important to determine whether native English- speaking and nonnative English-speaking raters use the same criteria for rating tasks. However, the results of studies comparing NES and NNES who rated oral and written language performance vary. Barnwell (1989) found that NESs were harsher in their evaluations than NNESs, whereas others, such as Fayer and Krasinski (1987), found that NNES raters were more severe. For instance, Fayer and Krasinski investigated Puerto Rican learners of English speech act production and gave their samples to two groups of raters: NES and Puerto Rican speakers. Their results revealed that NNES raters were harsher, especially with respect to pronunciation errors, than NES raters.

Although the literature is replete with references to native speaker assessment of speaking and writing performance, it seems that only two studies on rater variability are related to pragmatic rating (Taguchi, 2011; Youn, 2007). Taguchi studied native speakers' ratings of two types of speech acts produced by EFL learners. The data revealed similarities and differences in the raters' use of pragmatic norms and social rules in evaluating the appropriateness of speech acts. Focusing on Korean as a foreign language, Youn's study showed different degrees of severity in native Korean raters' ratings of speech act performance. However, there is no mention of native raters' criteria compared with nonnative raters' on pragmatic assessment. As a result, this issue is still underexplored.

Refusal: Nature and Strategies

Refusal functions as a response to an initiating act and is considered to be a speech act in which "a speaker fails to engage in an action proposed by the interlocutor" (Chen, Ye, & Zhang, 1995, p. 121). Refusal is a face-threatening act because it contradicts the listener's wants. The negotiation of refusal entails frequent attempts at directness or indirectness and also other degrees of politeness appropriate to the situation (Eslami, 2010). In addition, refusal behaviours vary across cultures, and pragmatic transfer occurs as learners rely on their "deeply held native values to carry out complicated and face-threatening speech acts like refusals" (Beebe, Takahashi, & Uliss-Weltz, 1990, p. 68). Hence, a proper understanding and production of refusal and, in turn, its rating require a certain amount of culture-specific knowledge.

As refusal is face-threatening, it usually involves a long negotiated sequence, and its form and content vary, depending on situational variables such as power, distance, and imposition. Saying "no" to requests, invitations, offers, and suggestions is a kind of dispreferred action that is typically complex, mitigated, indirect, and accompanied by prefaces, hesitations, repairs, apologies, and accounts (e.g., Levinson, 1983; Pomerantz, 1984).

Various strategies should be employed to avoid offending one's interlocutors. Takahashi and Beebe (1987) noted that an inability to say "no" politely will lead to an offense. Due to the different nature of this speech act, as well as some degree of risk-taking involved in refusing, pragmatic knowledge helps EFL learners realize appropriate strategies. However, a layer of complexity related to cultural issues exists and, in some cases, such as found in Ishihara and Tarone's (2009) study, L2 speakers intentionally resist what they perceive as native-speaker norms.

Beebe et al. (1990) categorized refusal into semantic formulas and adjuncts appropriate for refusal strategies. This taxonomy includes both direct and indirect strategies. In the direct category, two semantic formulas are included. They are performative (e.g., I refuse it) and nonperformative statements (e.g., I can't). In indirect strategies, there are 11 semantic formulas: statement of regret, wish, excuse/reason/explanation, statement of alternative, set condition for future or past acceptance, promise of future acceptance, statement of principle, statement of philosophy, attempt to dissuade interlocutor, acceptance that functions as a refusal, and avoidance.

The Current Study

This study was aimed at investigating native English speaking raters' and nonnative English speaking raters' criteria for rating the EFL learners' pragmatic production of refusals. To do so, the following research questions were addressed:

1. What criteria are used by native and nonnative English speaking raters in rating the speech act of refusal produced by EFL learners?

2. Is there any significant difference between native and nonnative English speaking raters in rating the speech act of refusal produced by EFL learners?

Method

Participants

One group of participants included 50 educated native teachers of English from the United States, the United Kingdom, Canada, and Australia. The homepage data and the background information they provided clearly showed that they were NESs from these four countries. They were faculty members teaching ESL at different language centres in international universities. The other group consisted of 50 NNES teachers. Each had at least three years of teaching experience and held an MA degree in applied linguistics. The nonnative teachers were from different language centres in Iran, where English is taught as a foreign language. Both groups were asked to participate in this study via e-mail. Both groups included male and female teachers.

Instrument

A written discourse completion test (WDCT) was used to collect the data in this study, as it is a common measure to elicit learners' production of pragmatics. It was made up of six refusal situations reflecting different degrees of formality, power relation, and distance (see Appendix). The situations included educational contexts, workplace contexts, and daily-life contexts. In terms of power status and familiarity, the situations were marked by equal and unequal power relations, as well as familiar and unfamiliar interlocutors. Each situation was followed by a response given by an EFL learner. A number of EFL learners were asked to provide a response to each situation. Of the responses, one was selected by the researchers for each situation to ensure that the responses to the six situations varied in their degrees of pragmatic appropriateness. Thus the focus in the selection procedure was placed on pragmatic failure or appropriateness rather grammatical inaccuracy, as reflected in the choice of words unsatisfactory and appropriate in the rating scale. Every response was followed by a rating scale ranging from 1 (very unsatisfactory) to 5 (most appropriate). Below the rating scale for each response, there was a space entitled "criteria" so that the raters could write comments on the pragmatics criteria they applied to the rating of the response to each situation.

Data Collection Procedure

The refusal WDCT was administered in paper format to about 20 EFL students. They were studying for a BA program in English literature or translation in an Iranian university, and their L1 was Persian. The responses to each situation were reviewed by the researchers and one response selected for each situation. After this selection, the WDCT was sent electronically to NES teachers to rate the appropriateness of responses on a 5-point Likert scale and to write the criteria for their rating in comment format. The questionnaire was first uploaded to the SurveyMonkey[R] site, and native ESL teachers in different universities in the United States, the United Kingdom, Canada, and Australia were asked via e-mail to complete the questionnaire on that site electronically. Of 800 teachers contacted through e-mails, 50 filled out the questionnaire completely. Of the 106 nonnative teachers contacted, 50 completed the rating sheets and returned the WDCT with their rating comments.

Data Analysis

The current study investigated the rating of L2 refusal production by NES and NNES English teachers. In part, it used the content analysis technique to analyze the data. To derive the criteria that both native and nonnative raters considered in rating EFL learners' refusal production, the content of their comments about the pragmalinguistic and sociopragmatic appropriateness or infelicity of each response was analyzed. The analysis of criteria based on the comments consisted two steps. The first was a careful analysis of refusal strategy frameworks based on a modified version of Beebe et al.'s (1990) taxonomy. Although the strategies in that framework represented refusal production rather than functioning as a rating rubric, they helped to identify in the raters' comments criteria related to the (in)appropriateness of refusal in terms of the underrepresentation, overrepresentation, or nonrealization of certain strategies in response to a situation in the WDCT. The second source of insight was Brown and Levinson's (1987) politeness model, in which strategies of positive and negative politeness are depicted. The model contributed to the analysis of the criteria relevant to the violation of politeness in refusal production reflected in the raters' comments. In the quantitative part of the data analysis, frequency counts and t-tests were conducted to measure the difference between the refusal ratings of native raters and nonnative raters.

Results

Refusal Rating Criteria

Research Question 1 was concerned with the criteria used by NES and NNES teachers in rating the speech act of refusal produced by EFL learners. To derive the criteria that both NES and NNES raters used, the content of their comments stating the reasons for the pragmatic appropriateness of each response was analyzed. This analysis resulted in 11 criteria for rating refusal. The criteria, as described below, show that both NES and NNES teachers specified pragmatic, rather than grammatical, features as a source of their rating of refusal production.

(1) Brief apology. This refusal criterion is important as it prepares the interlocutor for an upcoming refusal. Two examples of this criterion derived from NES and NNES rating comments are given below.

NES comment: I would add an apology before refusing the invitation. NNES comment: An apology is needed before any refusal.

(2) Statement of refusal. The second refusal criterion, a statement of refusal, is a head act expressing the refusal and giving a clear idea of rejection to an interlocutor. An example of the application of this criterion by NES raters is given below. NNES raters did not use this criterion.

NES comment: A proper refusal should include a statement of refusal in terms that are both specific and in a tone appropriate to the social relationship between the one refusing and the requester or inviter on certain occasions.

(3) Offer suitable consolation. This criterion, an offer of suitable consolation, follows the head act to mitigate the refusal. Like the previous criterion, NNES raters did not employ this criterion to rate the WDCT.

NES comment: If I were her, I would offer a suitable consolation and say "Could we possibly have lunch some other day?"

(4) Irrelevancy of refusal. This criterion focused on the irrelevancy of a refusal. In some cultures, refusal is so indirect that the addressee cannot understand whether it is a refusal or an acceptance of an offer or invitation.

NES comment: This sounds like an acceptance of the apology, not a refusal.

NNES comment: It is an apology acceptance, not refusal!

(5) Explanation/Reasoning. The fifth refusal criterion was an explanation that follows the head act to justify the refusal. After refusing an offer, an invitation, a suggestion, or a piece of advice, some explanation is needed to soften the face-threatening effect.

NES comment: A bit more effort to explain the reason would be required here.

NNES comment: In my opinion, frankly speaking and elaborating on the main issue and reason is better than evading the issue.

(6) Cultural problem. Because pragmatic competence is highly dependent on culture, cultural misinterpretation occurs in EFL contexts. NNES raters did not apply this criterion in their ratings.

NES comment: This might be something cultural but I could never say so.

(7) Dishonesty. This criterion is sometimes misinterpreted as indirectness in refusal and may result in offering false excuses rather than giving reasons. Only NES raters referred to this criterion in their comments.

NES comment: Being more honest with your reasons for not wanting to ride together would have been easier for the old friend to take.

(8) Thanking. The eighth criterion was thanking, which is a mitigation device to soothe the face-threatening effect and to console the hearer. NNES raters gave no comment representing this criterion.

NES comment: First, you should thank her for the invitation and then explain the reason for refusing it.

(9) Postponing to another time. This criterion reflected the need for mitigation so that the face-threatening effect could be softened by postponing the offer or request to another time. Both NES and NNES raters used this criteria to rate the appropriateness of WDCT responses.

NES comment: The speaker should say, "Can I take you up on your offer some other time?"

NNES comment: The speaker could postpone the invitation to [an]other time politely.

(10) Statement of alternative. This criterion was used to evaluate learners' success/failure in giving other choices after a refusal, in order to ease the situation for the hearer. The raters' comments below document the significance of this criterion.

NES comment: You should say "We can arrange something else for some other time."

NNES comment: In order to not hurt your friend, it's better to ask her to have copies of your notes instead of lending them.

(11) Politeness. The last criterion was politeness, the interpretation of which varies in different cultures. In fact, its interpretation depends on the values of social distance, dominance, and degree of imposition in a given context. The severity of its violation varies cross-culturally. Both NES and NNES raters took this criterion into account in rating WDCT responses.

NES comment: I would not criticize [the] other person without knowing more about the circumstances they are in, so I find this response a bit rude.

NNES comment: He should politely reject the suggestion to show the respect.

Table 1 shows the frequency of NES and NNES raters' criteria for the total WDCT and across all six situations. As some NNES teachers failed to provide criteria for their pragmatic rating in certain WDCT contexts, the total number of their criteria was far less than that of NES teachers. This shows that the former group had comparatively lower pragmatic awareness of the rationale behind the (in)appropriateness of the refusal produced in a WDCT situation.

In general, the raters' comments on the refusals across the situations manifested many sources of inappropriateness, such as lack of explanation, politeness, cultural problem, postponing to another time, using brief apology expression appropriately, offering repair, and thanking. NES and NNES raters did not agree in their ratings of most of the refusal cases. Moreover, their criteria were different, due to lack of awareness in terms of appropriate refusal on the one hand and English sociocultural norms on the other. Results of the study indicate that, to make an accurate assessment of students' performance, NES and NNES teachers frequently applied a variety of relatively stable criteria that remained applicable from situation to situation. The criteria common across situations were explanations, politeness, cultural problems, speech act appropriateness, and offer compensation.

NES-NNES Refusal Ratings

Research Question 2 was raised to investigate the difference between NES and NNES teachers in rating the speech act of refusal produced by EFL learners. To address the research question, descriptive statistics were calculated and t-test procedures conducted. Table 2 presents descriptive statistics for refusal rating by NES and NNES raters. As shown in the table, the overall mean refusal rating was 2.59 for NES raters and 3.29 for NNES raters. The highest mean for native ratings across situations was 3.32 and the lowest was 2.06, while the highest mean for non-native raters was 4.02 and the lowest was 2.56. Table 2 shows that NNES raters' ratings for all situations were higher than those of NES raters. Furthermore, standard deviations of NNES ratings for the total WDCT and all six situations therein were found to be greater, showing less convergence in their ratings compared with the NES ratings.

Next, an independent-samples t-test was conducted to compare the difference in refusal rating between NES raters and NNES raters (Table 3). As displayed in the table, there was a significant between-group difference in total refusal ratings (t = 7.21, df = 98, p = .000). NES and NNES manifested variation in their ratings across all situations except for Situation 4 (t = 0.30, df = 98, p = .76). As multiple t-tests were applied for the analysis of Research Question 2, to avoid Type I error the Bonferroni method was used to arrive at adjusted alpha-level. The results showed that the differences in five situations (all except Situation 6) remained significant after the Bonferroni correction.

These results, in conjunction with those related to the frequency of criteria, indicate that NES and NNES teachers differed from each other not only in their application of rating criteria to evaluation of refusals made in different WDCT situations, but also in assigning scores to rate the appropriateness of refusals.

Discussion

Similar to language assessment in general, pragmatic assessment can be affected by three main variables: test task, rater characteristics, and rating criteria. Many studies have shed light on the third variable (rating criteria) for assessing the performance of language skills (e.g., Eckes, 2005; Gamaroff, 2000). However, the interface between rating criteria and pragmatics has remained largely unexamined. Hence, this study was conducted in a multiple-raters setting with NES and NNES raters to explore the impact of raters on pragmatic assessment of the speech act of refusal in terms of rating criteria and rating scores.

The first objective of this study was to discover how NES and NNES teachers rated L2 refusal production and what criteria they applied to the evaluation of its appropriateness. With regard to raters' rating criteria, the results of this study showed that NES and NNES teachers applied certain criteria to evaluate the appropriateness of L2 refusal production. Many of these criteria are pragmatically general or universal, in that they can be applied to the assessment of other speech acts. Salient instances of such criteria were explanation and politeness. The largely homogeneous rating criteria, particularly among NES teachers from different nationalities, lend further support to the universality of many pragmatic criteria.

Besides general pragmatics rating criteria, such as politeness and explanation, this study shed light on criteria specific to refusal, including brief apology, state of refusal, and offer suitable consolation. The findings indicate that speech act rating required an awareness of specific criteria involved in the appropriateness of a particular speech act. Unlike rating language skills, which largely depends on a set of general criteria, pragmatics rating is, to some extent, shaped by the nature of a particular speech act and the criteria specifically related to it. It follows that both groups of raters drew on two types of criteria in their rating: pragmatically general criteria and speech-act-specific criteria. A very revealing aspect of this study comes from the finding that most of the refusal-rating criteria corresponded to the strategies needed to produce refusal. This is strong evidence in favor of rating validity. Raters need to use the components of a construct and the strategies underlying performance to maximize their rating validity. In the case of refusal, such correspondence strengthens the validity of pragmatic rating.

The findings from this study also revealed variability in different situations among teachers as evidenced by the frequency of criteria reported. The frequency-based variability may be a determining factor affecting the rating of pragmatic performance. The most frequent criterion mentioned by NES raters was explanation, which can be attributed to the nature of the refusal speech act. However, NNES raters applied politeness as the main criterion. It seems that NNES raters mostly regarded politeness as a general criterion and hence overused it to justify any inappropriate production of refusal; as a result, they lost sight of the fact that appropriate apology required the provision of specific reasoning to refuse or reject a suggestion, an invitation, an offer, or a piece of advice. Variation in the frequency of rating criteria reported across situations is also a manifestation of divergence existing in evaluating the appropriateness of L2 refusal production in each single situation. For instance, in situation 1, all NES raters applied irrelevancy of refusal because the L2 learner had not produced a refusal; however, NNES raters mostly applied politeness and felt sympathy with the interlocutor, a domestic servant, in that WDCT situation. The finding of this study is in line with that of Taguchi (2011), which revealed divergent focus among raters of different nationalities in their use of pragmatic norms when evaluating appropriateness of speech acts.

The second objective of the study was to explore the ratings that NES and NNES English-speaking teachers assigned to refusal production. Results showed that NNES raters manifested different rating behaviour by consistently overrating refusals across situations and thus being inclined towards leniency in rating. This NES-NNES difference in refusal rating can be explained in terms of variation in their perceptions of such variables as power, social status, and preferred refusal strategies by native and nonnative speakers (e.g., Felix-Brasdefer, 2003; Takahashi, 1996; Takahashi & Beebe, 1987). Nonnative speakers' perception of social status, for example, is among the factors that influence their estimation of appropriateness of L2 learners' refusal production. This factor was considered in rating refusals in the current study. For instance, in Situation 3, native raters commented on "bother" as a cultural problem; however, nonnatives commented on the lack of politeness. This is in line with Sadler and Eroz's (2001) results that showed that Turkish speakers refused less frequently than speakers of other languages, but if they did, refusals were definitely followed by an excuse or explanation. Moreover, the findings of Al-Issa (2003) indicate that indirect strategies were favoured more by the Jordanians than the Americans. In Honglin's (2007) study of American and Chinese participants, the results revealed that the Americans were more direct than the Chinese in their refusals, but that the Chinese considered refusals as face-threatening acts and used politeness strategies in their refusals. In essence, the Americans tried to solve the problem, while the Chinese tried to restore the relationship between interlocutors. This point is true of Situation 4, in which native raters evaluated a refusal in terms of its directness, whereas non-native raters were more concerned with politeness and preserving the relationship with the interlocutor. However, despite variation in the types of criteria NES and NNES raters employed to measure the appropriateness of refusal in Situation 4, the ratings were largely similar. This indicates that similarity in ratings does not entail the application of the same criteria. Whereas rating scores are a product-oriented measure of difference between NES and NNES raters, the analysis of the criteria leading to rating scores, that is, a process-oriented approach, is necessary for an in-depth understanding of raters' rating behaviour. Similarly, in Youn's (2007) study, the results revealed that each rater showed unique bias patterns, depending on the test type and speech act.

Conclusions and Implications

The study revealed the criteria employed for refusal production ratings by NES and NNES raters. The findings showed that NES raters applied 11 criteria while assessing L2 refusal production. The criteria common across situations in refusal for both NES and NNES raters were brief apology, irrelevancy of speech act, explanations, postponing to another time, statement of alternative, and politeness. Although NES and NNES raters gave different weights to politeness, it was among the most frequently employed criteria in both groups. The frequent reference of raters to this criterion is compatible with the general perception of politeness as the main measure of pragmatic appropriateness. Emphasized in pragmatic literature, politeness seems to be the principle overriding the other criteria for pragmatic appropriateness.

The premise that politeness is considered a pragmatic universal, and hence has cross-linguistic and cross-cultural realizations, can contribute to convergence on pragmatic rating. However, in view of the fact that there are variations in the perception of both sociocultural norms and pragmalinguistic realizations of politeness, the application of the politeness criterion to the rating of speech acts showed variability among NES and NNES raters. Mostly, NNES raters mentioned politeness as a leading criterion while NES raters highlighted explanation for the speech act of refusal. Moreover, mere mention of the criterion of politeness may be misleading because variation arises when it comes to the evaluation of the degree of politeness observed in the production of a speech act.

Generally, NNES raters in this study were more lenient than NES raters, which highlights the need for more pragmatically informed ratings by NNES teachers. This can be achieved through rater training programs in which the focus would be on helping NNES teachers recognize effective criteria for rating pragmatic production and paving the way for increasing accuracy in their ratings. Because rating criteria play a significant role in pragmatic assessment, NNES teachers in EFL contexts such as Iran, where there is insufficient pragmatic awareness of such criteria for speech act production in English, should be encouraged to participate in training programs that aim to raise their pragmatics rating consciousness so that their rating criteria more closely approximate those of NES raters. Such a program may include video clips demonstrating native speakers' production of refusals, as well as less appropriate refusals performed by nonnative speakers, along with the rating scores and rating criteria assigned by native teacher raters. NNES teachers, particularly in an EFL context where there are neither any established local English norms nor any variety of world English to function as a frame of reference, usually apply different rating criteria to assess the same pragmatic production. In fact, raters may have a different understanding of the construct being measured, and such differences may have a direct influence on the ratings they assign to test takers' performance in the testing context. As evident from the results of this study, the scores that NES and NNES raters assigned to students' performance were different, with NNES teachers being more lenient than NES raters. This suggests that the two groups applied different rating criteria to rate the same construct; this, in turn, signifies the need for rating training.

NNES teachers should become conscious of rating criteria through training programs to increase their accuracy in interlanguage pragmatic (ILP) rating as measured against the benchmark. Therefore, rater training should be implemented in teacher education programs to alter the assessment practice of teachers, and decision makers need to take training programs into consideration for EFL raters. Furthermore, the significance of the politeness criterion has implications for the rating of pragmatics. Provided that this criterion features highly in pragmatics rating and consequently affects raters' judgment of pragmatic appropriateness, NNES teachers should have sufficient pragmalinguistic and sociopragmatic competence underpinning their perception of politeness. Although NES raters are comparatively more homogeneous in this regard, pragmatics rating by NNES raters, which is most common in an EFL context, requires a good understanding not only of the pragmalinguistic realization of politeness but also of L2 social norms and conventions, particularly those diverging from L1 politeness norms. As for NES raters, it should be noted that, had they been from the same national background, they would likely have manifested more homogeneous rating behaviour. This should be taken into account in the interpretation of the NES data in this study.

Appendix

Refusal Rating

In the following situations, an English language learner was supposed to make refusals. Please read the EFL learner's answer in each situation and rate its appropriateness according to the following rating scale. Then provide your criteria and reasons for the selection of a particular point (1, 2, 3, 4, or 5) on the scale.

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate

4. appropriate 5. most appropriate

1. You arrive at your office and see that your cleaner is upset. You notice that he has bumped into an antique vase while cleaning the table and has broken the vase. He apologizes to you and wants to pay for it, but you don't accept his apology. What would you say?

Answer: Hey, accidents happen. Don't worry about that. OK?

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate

5. most appropriate

Criteria:

2. You have just started working in a new company. The first day, you are walking in the hall and see one of your old friends from university. You come to know that he lives next to you, in the same neighborhood. He suggests that everyday you get to work together in your car, but you like to get to work alone, so you refuse his suggestion. What would you say?

Answer: I'd love to but I can't. You are dear to me. I wish I could.

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate

5. most appropriate

Criteria:

3. You meet one of your professors in the hall at university. You like him very much and you think he is the best professor at that university. You go and greet him. He is very happy to see you and invites you to lunch at the university cafeteria. Unfortunately, you have promised your friends you would visit them for lunch, so you can't accept his invitation. What would you say?

Answer: I'd love to sir, but I've promised some of my friends to meet them for lunch. I hope you don't mind. Can I bother you some other time? I really don't want to pass such a great offer.

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate

5. most appropriate

Criteria:

4. You are a junior in college. You attend classes regularly and take good notes. Your classmate often misses class, and asks you for the lecture notes. However, you need the notes yourself and can't lend them to her, so you refuse her request. What would you say?

Answer: Actually I need the notes myself. Why don't you try to attend the classes regularly?

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate

5. most appropriate

Criteria:

5. You are going to refuse an invitation offered to you by your colleague to an art gallery. What would you say?

Answer: Sorry, but I can't come. I've some things I should take care of, you know.

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate

5. most appropriate

Criteria:

6. You are trying to reject an invitation offered to you by your older sister to her house for a dinner party. You are so busy and can't go there. How would you decline her invitation?

Answer: I'm so busy. Excuse me.

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate

5. most appropriate

Criteria:

References

Al-Issa, A. (2003). Sociocultural transfer in L2 speech behaviors: Evidence and motivating factors. International Journal of Intercultural Relations, 27(5), 581-601.

Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge, UK: Cambridge University Press.

Barnwell, D. (1989). "Naive" native speakers and judgments of oral proficiency in Spanish. Language Testing, 6(2), 152-163.

Barrett, S. (2001). The impact of training on rater variability. International Education Journal, 2(1), 49-58.

Beebe, L. M., Takahashi, T., & Uliss-Weltz, R. (1990). Pragmatic transfer in ESL refusals. In R. C. Scarcella, E. S. Andersen, & S. D. Krashen (Eds.), Developing communicative competence in a second language (pp. 55-73). Cambridge, MA: Newbury House.

Brown, P., & Levinson, S. (1987). Politeness. Cambridge, UK: Cambridge University Press.

Chen, X., Ye, L., & Zhang, Y. (1995). Refusing in Chinese. In G. Kasper (Ed.), Pragmatics of Chinese as native and target language (pp. 119-163). Honolulu, HI: University of Hawai'i Press.

Cronbach, L. J. (1995). Giving method variance its due. In P. E. Shrout & S. T. Fiske (Eds.), Personality research, methods, and theory: A Festschrift honoring Donald W. Fiske (pp. 145-157). Hillsdale, NJ: Lawrence Erlbaum.

Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221.

Engelhard, G., Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112.

Engelhard, G., Jr., & Myford, C. M. (2003). Monitoring faculty consultant performance in the Advanced Placement English Literature and Composition Program with a many-faceted Rasch model (College Board Research Report No. 2003-1). New York, NY: College Entrance Examination Board.

Eslami, Z. R. (2010). Refusals: How to develop appropriate refusal strategies. In A. Martinez-Flor & E. Uso-Juan (Eds.), Speech act performance: Theoretical, empirical and methodological issues (pp. 217-236). Amsterdam: John Benjamins.

Fayer, J. M., & Krasinski, E. (1987). Native and nonnative judgments of intelligibility and irritation. Language Learning, 37(3), 313-326.

Felix-Brasdefer, J. C. (2003). Declining an invitation: A cross-cultural study of pragmatic strategies in American English and Latin American Spanish. Multilingua: Journal of Cross Cultural and Interlanguage Communication, 22(3), 225-255.

Fitzpatrick, A. R., Ercikan, K., Yen, W. M., & Ferrara, S. (1998). The consistency between raters scoring in different test years. Applied Measurement in Education, 11(2), 195-208.

Gamaroff, R. (2000). Comment: ESL and linguistic apartheid. ELT Journal, 54(3), 297-298.

Honglin, L. (2007). A comparative study of refusal speech acts in Chinese and American English. Canadian Social Science, 3(4), 64-67.

Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Methods, 5(1), 64-86.

Ishihara, N., & Tarone, E. (2009). Subjectivity and pragmatic choice in L2 Japanese: Emulating and resisting pragmatic norms. In N. Taguchi (Ed.), Pragmatic competence (pp. 101-128). Berlin, Germany: Mouton de Gruyter.

Kachru, B. B. (1992). World Englishes: Approaches, issues and resources. Language Teaching, 25(1), 1-14.

Kachru, B. B. (1997). World Englishes and English-using communities. Annual Review of Applied Linguistics, 17, 66-87.

Kim, Y. H. (2009). An investigation into native and non-native teachers' judgments of oral English performance: A mixed methods approach. Language Testing, 26(2), 187-217.

Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26-43.

Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3-31.

Levinson, S. C. (1983). Pragmatics. Cambridge, UK: Cambridge University Press.

Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71.

Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158-180.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York, NY: Macmillan.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749.

Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.

Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.

Plough, I. C., Briggs, S. L., & Van Bonn, S. (2010). A multi-method analysis of evaluation criteria used to assess the speaking proficiency of graduate student instructors. Language Testing, 27(2), 235-260.

Pomerantz, A. (1984). Agreeing and disagreeing with assessments: Some features of preferred/ dispreferred turn shapes. In J. M. Atkinson & J. Heritage (Eds.), Structures of social action: Studies in conversation analysis (pp. 57-101). Cambridge, UK: Cambridge University Press.

Sadler, R. W., & Eroz, B. (2001). "I refuse you!" An examination of English refusals by native speakers of English, Lao, and Turkish. Arizona Working Papers in SLAT, 9, 53-80.

Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493.

Shi, L. (2001). Native- and nonnative-speaking EFL teachers' evaluation of Chinese students' English writing. Language Testing, 18(3), 303-325.

Taguchi, N. (2011). Rater variation in the assessment of speech acts. Pragmatics, 21(3), 453-471.

Takahashi, S. (1996). Pragmatic transferability. Studies in Second Language Acquisition, 18(2), 189-223.

Takahashi, T., & Beebe, L. M. (1987). The development of pragmatic competence by Japanese learners of English. JALT Journal, 8, 131-155.

Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263287.

Weir, C. J. (2005). Language testing and validation: An evidence-based approach. New York, NY: Palgrave Macmillan.

Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305-319.

Youn, S. J. (2007). Rater bias in assessing the pragmatics of KFL learners using facets analysis. Second Language Studies, 26(1), 85-163.

Zhang, Y., & Elder, C. (2011). Judgments of oral proficiency by non-native and native English speaking teacher raters: Competing or complementary constructs? Language Testing, 28(1), 31-50.

The Authors

Minoo Alemi holds a PhD in Applied Linguistics and is a faculty member of Sharif University of Technology, Iran. She is currently doing her postdoctoral research on Robot-Assisted Language Learning (RALL). Her areas of interest include discourse analysis, interlanguage pragmatics, and materials development.

Zia Tajeddin is associate professor of applied linguistics at Allameh Tabataba'i University, Iran, and the Director of Iranian Interlanguage Pragmatics SIG. His areas of interest include (critical) discourse analysis, interlanguage pragmatics, L2 learner/teacher identity, and sociocultural theory.

Table 1
Frequency of Refusal Criteria among NES and NNES Raters.

Situation    Group      BA      SOR      OSC       IOR

1            NNES         0        0        0        10
             NES          0        0        0        32
2            NNES         0        0        0         0
             NES          3        1        1         0
3            NNES         2        0        0         0
             NES          1        2        3         0
4            NNES         3        0        0         0
             NES          2        1        2         0
5            NNES         1        0        0         0
             NES          3        2        2         0
6            NNES         3        0        0         0
             NES          9        1        1         0
Total NNES                9        0        0        10
Total NES                18        7        9        32
Percentage   NNES     7.03%       0%       0%     7.81%
             NES      6.47%    2.52%    3.24%    11.51%

Situation    Group      E/R       CP        D        T

1            NNES          0         0        0        0
             NES           0         0        0        0
2            NNES         10         0        0        0
             NES          28         5        5        1
3            NNES          6         0        0        0
             NES           8        24        1        0
4            NNES          2         0        0        0
             NES           9         0        1        0
5            NNES         10         0        0        0
             NES          18         4        1        2
6            NNES          8         0        0        0
             NES          20         0        0        6
Total NNES                36         0        0        0
Total NES                 83        33        8        9
Percentage   NNES     28.13%        0%       0%       0%
             NES      29.86%    11.87%    2.88%    3.24%

Situation    Group     PAT      SOA        P      TOTAL

1            NNES         0        0         3      13
             NES          0        0         0      32
2            NNES         0        0        10      20
             NES          0        0         5      49
3            NNES         5        1        14      28
             NES          7        5         1      52
4            NNES         0        6        15      26
             NES          0        5        30      50
5            NNES         2        0         7      20
             NES          2        0        10      44
6            NNES         2        0         8      21
             NES          2        0        12      51
Total NNES                9        7        57     128
Total NES                11       10        58     278
Percentage   NNES     7.03%    5.54%    44.53%
             NES      3.96%    3.60%    20.86%

Note. BA = brief apology; SOR = statement of refusal; OSC = offer
suitable consolation; IOR = irrelevancy of refusal;
E/R = explanation/reasoning; CP = cultural problem; D = dishonesty;
T = thanking; PAT = postponing to another time; SOA = statement of
alternative; P = politeness.

Table 2
Descriptive Statistics of Ratings by NES Raters and
NNES Raters for Refusal

Situation     Group     N    Mean    Std. Deviation

Situation 1   NNES     50    3.18              1.47
              NES      50    2.08              1.10
Situation 2   NNES     50    3.50              1.15
              NES      50    2.58               .83
Situation 3   NNES     50    4.02              1.13
              NES      50    3.32              1.02

Total         NNES     50    3.29              1.29
              NES      50    2.59               .83

Situation     Group     N    Mean    Std. Deviation

Situation 4   NNES     50    3.00              1.05
              NES      50    2.94               .93
Situation 5   NNES     50    3.50              1.01
              NES      50    2.56               .79
Situation 6   NNES     50    2.56              1.16
              NES      50    2.06               .77

Table 3
T-tests of Mean Differences in Refusal Ratings by NES
Raters and NNES Raters

               Levene's Test
               for Equality    t-test for Equality of Means
               of Variances
                                                   Sig.
                 F     Sig.         t     df    (2-tailed)

Situation 1    12.49   .001       4.23    98       .000
Situation 2    8.11    .005       4.58    98       .000
Situation 3     .28    .597       3.24    98       .002
Situation 4    1.31    .254       .302    98       .763
Situation 5    3.40    .068       5.17    98       .000
Situation 6    15.55   .000       2.53    98       .013
Total           .01    .910       7.21    98       .000

                    t-test for Equality of Means

                                         95% Confidence
                                          Interval of
                                         the Difference
                  Mean      Std. Error
               Difference   Difference   Lower   Upper

Situation 1       1.10         .26        .58    1.61
Situation 2       .92          .20        .52    1.31
Situation 3       .70          .21        .27    1.12
Situation 4       .06          .19       -.33     .45
Situation 5       .94          .18        .58    1.30
Situation 6       .50          .19        .10     .89
Total             .70          .09        .50     .89