首页    期刊浏览 2025年07月11日 星期五
登录注册

文章基本信息

  • 标题:The accuracy, agreement and coherence of decision-making in rugby union officials.
  • 作者:Mascarenhas, Duncan R.D. ; Collins, Dave ; Mortimer, Patrick
  • 期刊名称:Journal of Sport Behavior
  • 印刷版ISSN:0162-7341
  • 出版年度:2005
  • 期号:September
  • 语种:English
  • 出版社:University of South Alabama
  • 摘要:Inaccurate decision-making by game officials can change the course of a game, and may lead to significant financial implications for the clubs and hence alter the course of a player's career (Craven, 1999). In many professional team sports, referees have to consider numerous sources of information, make rapid decisions, and contend with commentators who scrutinize their accuracy from slow-motion replays, often culled from several different camera angles. Yet, comparatively little is taking place to improve the performance of officials (Ford, Gallagher, Lacy, Bridwell, & Goodwin, 1999; see Garcia, 2003; cf., Ste-Marie, 2003) despite initiatives such as the British World Class Performance Plan, the United States Olympic Centre and the Australian Institute of Sport seeking to enhance the performance of competitors (Eady, 1999). Furthermore, England's rugby union premier league coaches want a lot more consistency in the application of the laws (Melrose, 1998). Consequently, more research is needed to develop accurate, objective and reliable measurement systems for assessing referee performance (Sloan, 2004) and subsequently for deploying psychologically based training methods to enhance such performance (see Oudejans, Verheijen, Bakker, Gerrits, Steinbriickner, & Beek, 2000).
  • 关键词:Referees;Rugby football;Sports officiating

The accuracy, agreement and coherence of decision-making in rugby union officials.


Mascarenhas, Duncan R.D. ; Collins, Dave ; Mortimer, Patrick 等


Inaccurate decision-making by game officials can change the course of a game, and may lead to significant financial implications for the clubs and hence alter the course of a player's career (Craven, 1999). In many professional team sports, referees have to consider numerous sources of information, make rapid decisions, and contend with commentators who scrutinize their accuracy from slow-motion replays, often culled from several different camera angles. Yet, comparatively little is taking place to improve the performance of officials (Ford, Gallagher, Lacy, Bridwell, & Goodwin, 1999; see Garcia, 2003; cf., Ste-Marie, 2003) despite initiatives such as the British World Class Performance Plan, the United States Olympic Centre and the Australian Institute of Sport seeking to enhance the performance of competitors (Eady, 1999). Furthermore, England's rugby union premier league coaches want a lot more consistency in the application of the laws (Melrose, 1998). Consequently, more research is needed to develop accurate, objective and reliable measurement systems for assessing referee performance (Sloan, 2004) and subsequently for deploying psychologically based training methods to enhance such performance (see Oudejans, Verheijen, Bakker, Gerrits, Steinbriickner, & Beek, 2000).

However, an essential precursor to such enhancement is the identification of those factors that determine performance in this particular sphere.

Research that has focused on referee performance found that rugby union and basketball officials believed "demonstrating a mastery of the rules" to be the most important aspect of referee performance (Anshel, 1995; Anshel & Webb, 1991).

Such mastery of the rules, or laws in the case of rugby union, demands rapid decision-making, requiring referees to evaluate the important characteristics of an event and present an appropriate solution in about 1 second (Jones, Paull, & Erskine, 2002), without the opportunity for reassessment or contemplation on the implications of their decision. Referees have to respond quickly to dynamically unfolding events, which may hold many uncertainties and ambiguities, and often in response to input from touch-judges (the officials responsible for controlling the sidelines in rugby union who have a microphone link to the referee).

Thus, investigating referee performance through a Naturalistic Decision Making (NDM) perspective, defined as the study of experts making decisions in complex environments under time pressure where incomplete and ambiguous information is inherent, seems appropriate as a basis for psychological intervention (Cannon-Bowers, Salas, & Pruitt, 1996; Orasanu & Connolly, 1993). Prior to the NDM paradigm, classic DM strategies often prescribed a rational choice method where decision-makers would be asked to deliberate amongst a range of alternatives. More recently however, researchers recognized that time pressures preclude this strategy in complex situations and have become more intent on assessing and developing the expert's strategies to improve situation awareness skills and declarative knowledge (in this case of both the laws and their application) in a realistic environment that more closely resembles the real world (Klein, 1997; for a review see Yates, 2001).

NDM investigations have studied the efficacy of DM assessment and training methods (e.g., Stout, Cannon-Bowers, & Salas, 1996) concluding that they need to be of sufficient functional quality to test the experienced decision-maker's ability (Alessi, 1988; Klein, 1997a). In this context, video and audio presentations provide a suitable format for assessing perceptual skills (Abernethy, 1996; Williams & Grant, 1999) and DM (Cannon-Bowers & Bell, 1997). Similarly, Omodei, McLennan and Whitford (1998) suggest that 'own-point-of-view' video recordings can provide the best representation of the complexity and dynamics of naturalistic environments and in addition allow the selection of pertinent events from a wide variety of data. Unfortunately, despite the evidence supporting video as a medium through which to assess and train DM, there is no empirical research that has examined suitable criteria to measure relative success in referee DM performance.

Measuring DM Performance--Accuracy, Coherence and Shared Mental models

The accuracy of law-application, which is substantially based on the expert's use of knowledge (see Williams & Davids, 1995), would seem to be the most crucial criterion for success. Thus, although law clarifications and interpretations may be guided by advice from the governing body, it is the application of these laws by the senior referees 'in the field' that set the standards (Bunting, 1999). Examinations of umpires calling force-out plays in baseball would seem to support this, as they were found to collectively adopt a normative rule in their adjudication of 'phantom tags' (Rainey & Larsen, 1988; Rainey, Larsen, Stephenson, & Olson, 1993).

In addition to the accuracy of law-application, equability of decisions may also play a part in the referee's evaluation. Therefore, performance appraisals should also consider the individual's DM against his or her peers. Examining the level of agreement by measuring the range between responses may be an indicator of the extent to which individuals share interpretations and, as such, represents another important and face valid criterion for measuring referee performance. Furthermore, measuring the 'coherence' of decisions by examining the shared understanding of the event by different officials is also important (Mascarenhas, Collins, & Mortimer, 2002; Millgram & Thagard, 1996). Such collective understanding has been measured by investigating the rationale that individuals use to arrive at their decisions (Abraham & Collins, 1998; Langan-Fox, Code, & Langfield-Smith, 2000). When decisions are built on a coherent appreciation of an event, teams then have the ability to perform together more effectively and successfully (Rouse, Cannon-Bowers & Salas, 1992).

Cannon-Bowers, Salas and Converse (1990) attributed such coherent performances to shared mental models (SMMs); a concept that serves to explain faultless performance through implicit interactions between members of successful teams (see Brehmer, 1972). Therefore, as a corollary, for rugby union officials these SMMs consist of not only knowledge of the other team members and their roles, allowing effective coordination strategies between referee and touch-judges (who control the sidelines) but also a declarative knowledge base of the task, it's concepts, and the relationship between them (Stout et al., 1996). Furthermore, as these SMMs underpin coherent performance by providing similarly organised expectations surrounding the task (Rouse & Morris, 1986), the development of SMMs can be used as a basis for understanding and enhancing both dependent and independent team DM in real-life settings (see MacMahon & Ste-Marie, 1999; Stout et al., 1996). As such the SMMs of all those involved in the officiating process, the referees, touch-judges, their coaches and assessors, are of interest when exploring the efficacy of rugby union refereeing.

Assessing such DM performance and SMMs in rugby union may be best served by examining the tackle (law 15) as it regularly creates the most controversy and is thought by many to be one of the most complex events to referee in all team sports (Ackford, 2003; Bunting, 1999). Previous referee researchers have examined the 'matter of fact' offside decision in soccer, asking whether a player was offside or not (e.g., Oudejans et al. 2000), and also 'matter of opinion' decisions that often involve just two players, asking if anyone has committed a foul and if so, whom (e.g., Plessner & Betsch, 2002; Jones et al., 2002). However, refereeing the tackle in rugby union presents a unique situation where multiple, complex and dynamic decisions are required, as there are timing elements, overlapping elements, interactive elements and often multiple players involved in the action (see Ackford, 2003). Thus it is likely that a more extensive declarative knowledge base and hence a more complex SMM is necessary for rugby union officials (see, MacMahon & Ste-Marie, 1999). Consequently, giving NDM such a rigorous challenge should test the robustness of our methods and hence provide implications that will assist DM in other open team sports.

Therefore, the primary aim of this study was to measure the DM accuracy, agreement and coherence of England's best RFU referees, their assessors, coaches and touch-judges. Specifically, we were interested in the relationship between the officials' accuracy (as measured by their ability to reach an agreed standard), their conformity to each other and the coherence of their reasons underlying their decisions. Secondly, recognizing the roles played by different officials in their coherent application of law we were interested in differences between groups. Finally, we anticipated that the results would highlight specific applied areas of concern in refereeing the tackle and provide a preliminary application of NDM theories with a video-based system to assess the time-pressured DM of expert officials viewing actual scenarios.

Method

Participants

The participants consisted of 132 male RFU officials who were the delegates at the RFU referees national conference. They included 45 of the top 65 RFU referees, 27 referee assessors, 13 referee coaches, and 47 of the top 120 touch-judges. This sample represents 132 of the 239 individuals responsible for either officiating, or developing officials in England's top five rugby union divisions. The referees, ranging in age from 27 to 51 years (M = 38.6 yr.; SD = 5.6 yr.) had refereed on the English National Panel from 1 to 16 years. Based on their national rankings (1-65) made by a group of referee development officers in May 1998 from the periodical evaluations of 37 advisors, the referees were already sub-divided into 1 of 3 groups; a top-20 group, who were responsible for refereeing in the premier league (level 1; n = 14); a mid-panel group ranked from 21-40, responsible for national league level 2 and 3 games (n = 8); and a lower-panel group ranked from 41-65 who officiated at levels 4 and 5 (n = 23).

Instruments

In order to prepare a test instrument, incidents were selected from actual premier league rugby union games, recorded with professional video equipment (Betacam-SP). Each scenario was filmed in close-up from a raised gantry, positioned at the halfway line. Only incidents occurring in the middle of the pitch (<20o of arc) were examined for inclusion in the study. This provided a view looking down over the incident, similar to the angle that the match day referee might experience (cf., McLennan & Omodei, 1996).

Further steps were taken to ensure the ecological validity of the test items. From an original tape of 130 tackle incidents compiled from 60 hours of premier league play, an independent expert panel consisting of elite referees (n = 4), coaches (n = 2) and players (n = 2) examined the clips. This group independently graded each tackle on the difficulty of the decision on a three point scale where 1 = easy, 2 = medium, and 3 = hard. In addition, they discarded all the tackles that did not display sufficient information to make an accurate decision, or those where they felt the match-day referee's decision would be discernible. The experts then convened as a group and selected 10 difficult (i.e., grade 3) tackles from those remaining that they regarded as presenting realistic game scenarios for the accurate application of law 15. It was anticipated that the use of difficult yet realistic scenarios would provide information to inform referee DM training in the future. Finally, these 10 incidents were edited together to provide a test instrument.

Each edited clip began with a voice-over that introduced the two teams competing and indicated the team in possession. The tackle incident was then played with approximately 5-seconds of 'lead-in', in order to orientate the participants to the scene. After the tackle incident the recording cut to black and the title "make your decision now" appeared on the screen.

Table 1 Notes:

The mean level of accuracy for all participants across all clips was 49.6% (K= .25). The mean level of confidence in their decisions for all participants across all clips was M = 4.0 (SD = 1.0). Significant Kappa statistic indicates a better than chance agreement (significance adjusted by Bonferroni method). Correct decision. Strength of agreement (1) as per Landis and Koch (1977) (AS) Approaching Significance, * p < .05.

A response sheet was developed to enable participants to quickly and easily indicate their decision. This was essential since time pressure, as opposed to slower, more reflective DM, is a crucial factor for naturalistic environments (Klein, 1997b). Participants were given a copy of the response sheet, consisting of a series of boxes in which to indicate their decision, a space to explain the reasoning behind their decision, a Likert scale to rate their confidence in the accuracy of each decision, ranging from 1 (low) to 5 (high), and a section to comment on the quality of each clip. Content of the response sheet is included in Table 1.

Pilot Testing

Prior to the participants' assessments, pilot testing was conducted using a group of individuals familiar with the rugby laws to verify the qualities of the videotape, suitable viewing positions, the efficacy of the response sheet, and the typical length of time it would take to complete it. Based on this pilot work, the following procedure was developed.

Procedure

For the purposes of viewing the 10 assessment clips, the participants were randomly divided into four viewing groups of no more than 35 for data-collection purposes only, each having approximately the same number of referees, touch-judges, assessors and coaches. The pilot study and subsequent analysis of the results confirmed this to be large enough to minimise variability due to procedural differences but small enough to allow each individual an acceptable view of the screen. They were then informed that their own personal responses would remain confidential and that their results would only be presented as grouped data depending upon their officiating classification. After the participants familiarised themselves with the response sheet they sat in the darkened room where they could comfortably see the tackle incidents projected onto a screen via a standard VHS video recorder and a data-projector. This presented an image about 8 feet wide and 5 feet high. The first clip from the videotape was then played and paused immediately after its completion. Participants were asked to make an immediate decision by ticking the appropriate box. They were then given 3-minutes to complete the remainder of the response sheet, and were explicitly told not to change the decision once made. An inspection of the response sheets and observation of participants suggested that all conformed to these instructions.

After responding to all 10 clips in the same manner, participants were asked to compare the quality of information upon which they made decisions in this study to the quality of information they 'tended' to get as referees on the pitch and write their explanation on the back of the response sheet. This procedure was followed consistently for all four data-collection groups.

Data Analysis

Two of the full-time RFU referees, at the time nationally ranked 1 and 2, determined the correct response. Replicating the conditions under which the participants were asked to respond, they both independently made their immediate decision on the 10 clips. In cases where these two referees had initially disagreed upon responses (clips 4 and 9) they reviewed the videotape, and discussed the clip, before agreeing on the most appropriate decision. In fact their initial disagreement was only minor as in both clips they agreed on which team to advantage but provided inconsistent decisions on the sanction for such infringements. For example, in clip 4 one expert chose to play on, advantaging the attacking team who retained possession, and the other chose to award a penalty to the attacking team. Similarly in clip 9, one expert awarded a scrum to the defending team while the other awarded a penalty. Finally, these experts indicated "how many times per game" they typically had to adjudicate a tackle situation like the one presented in each clip. The expert's mean frequency ratings (number of occurrences per game) for all 10 tackles was M = 10.9 (SD = 7.8).

Participants DM performance was assessed by three measures, (1) accuracy--the percentage of participants achieving the correct decision, (2) agreement--the degree of spread of their responses, and (3) coherence--the similarity of their reasons underpinning decisions. The kappa statistic of agreement (K) was used to measure the spread of responses. This offers a ratio of the proportion of times that the raters agree, against the maximum number of times that agreement was possible and corrects for chance (see Altman, 1991). Thus, a score of K = .90 would represent a 'very good' (high) level of agreement, and K = .10 would represent a 'very poor' (low) level of agreement, as classified by the system proposed by Landis and Koch (1977). In addition to these measures, for each clip the participants' reasons for their decisions were examined to determine the extent of coherence in their mental models of each event. Similarly, all three analyses were conducted on a group basis, consisting of the three subgroups of referees, and the three other groups, assessors, touch-judges, and referee coaches. Bonferroni adjustments were applied to control for the experiment-wise chances of a type-one error.

Results

Accuracy, Agreement, Coherence and Confidence levels for all Participants

Table 1 provides details of the percentage incidence of responses made, highlighting the accuracy scores and the kappa statistic of agreement for each clip. The mean level of accuracy across the 10 clips for all participants was 49.6% (SD = 28.6%). High levels of accuracy were achieved for clip 1 (82%), clip 7 (89%) and clip 10 (70%), and naturally these clips also exposed high levels of agreement (clip 1, K= .60; clip 7, K = .74; and clip 10, K = .41). In addition, these clips showed very high coherence in the participants' reasoning for each decision. In clip 1, 95% of the participants who responded accurately showed agreement by awarding the penalty for offside with only 5% penalizing for support players arriving off their feet. In clip 7,94% of the accurate participants awarded a penalty to the attacking team for the defender failing to roll away, and similarly in clip 10, 95% of the respondents making an accurate decision penalized the ball carrier for not releasing the ball.

Accuracy, Agreement, Coherence and Confidence levels by Group

The mean accuracy scores shown in Table 2 revealed that the top-20 referees were the most accurate (M = 54.3%, SD = 32.9%), although interestingly the lower-panel group showed greater accuracy (M = 52.4%, SD = 26.3%) than the mid-panel group (M = 47.1%, SD = 28.4%). Furthermore, despite poorer performance, this middle group of referees showed greater confidence levels in their decisions than all other groups (M = 4.4; SD = 0.7). The referee coaches were the least accurate (M = 43.0%, SD = 37.3%). In fact, their decisions were less accurate than the referees in 8 of the 10 clips.

Investigating the prevalence of a SMM by measuring the extent of shared reasons underpinning decisions revealed perfect coherence when groups displayed perfect accuracy. For example, in clip 1 the top-20 referees achieved 100% accuracy (see Table 2) and all chose to penalize the defending players for encroaching offside. Similarly, in clip 7 both the top-20 and lower-panel referees revealed maximum accuracy with 100% agreement since all the participants awarded a penalty to the attacking team for the defender's failure to roll away. In addition, across all 10 clips when officials were accurate, the top-20 showed a considerably higher level of coherence in their reasons underlying decisions (M = 93%) when compared to all other groups (mid-panel, M = 86%; lower-panel, M = 82%; touch-judges, M = 80%; assessors, M = 87%; referee coaches, M = 80%).

The mean accuracy of all the support groups (touch-judges, assessors and referee coaches) across all 10 clips was M = 47.9% (SD = 28.6%).

Applied Areas of Concern in Refereeing the Tackle

Surprisingly, given the level of officials examined, 2 of the 10 clips revealed extremely low accuracy scores (clip 2, M = 15%; clip 5, M = 21%,). Furthermore, in an additional three clips, participants failed to achieve 50% accuracy (clip 4, M = 31%; clip 6, M = 49%; and clip 9, M = 30%). Moreover, when the levels of agreement are considered, clip 5 and clip 9 reveal a negative kappa statistic (clip 5, K = -.04; and clip 9, K = -.01), a result in fact lower than the level that would be predicted by chance alone. Interestingly, for clip 5 there was no drop in confidence levels (M = 4.0) across all the participants. In fact, they were nearly as confident in this decision as they were for the first clip (M = 4.1) where 82% of them agreed and made an accurate response.

Further exploration into the coherence of the reasoning underpinning decisions revealed the greatest discrepancy in clips 2, and 9. In clip 2, where only 15% achieved the correct decision, 68% awarded this penalty for support players arriving off their feet, 19% for the tackler not rolling away and 14% for offside (all legitimate rulings within the RFU laws). Similarly in clip 9 the participants were divided, with 51% awarding the penalty for not releasing the ball and 49% for the ball carrier's support arriving off their feet.

From an applied perspective, clip 4, as well as showing relatively low levels of accuracy (31% awarding a penalty to the attacking team) also resulted in 48% of the participants awarding a penalty to the defensive team and 45% awarding possession to the attacking team, either through awarding a scrum, playing advantage or choosing to playing on. Thus the participants were almost equally split on which team should benefit from the decision, which would clearly have a profound effect on the game. Although the two experts had initially disagreed on this clip, they were in agreement that the attacking team should benefit from the play.

Similarly in clip 7, while producing high levels of accuracy (M = 89%), 13% of participants believed that the clip contained an offence worthy of a yellow-card, a procedure used to warn, sanction or send off a player. Once again the levels of confidence in the accuracy of the participant's decisions (M = 4.6) did not reflect this DM discrepancy.

The Fidelity of the Video Recordings and the Naturalistic Paradigm

The participants' feedback suggests that the NDM procedure used in this investigation was acceptable for all the groups examined. Only 26 of the 1,320 participant responses (i.e., 132 participants assessing 10 situations) were reported as holding insufficient information to make a decision, while the mean confidence level for all participants across all clips was M = 4.0 (SD = 1.0) out of a maximum of 5.

In terms of the ecological validity of the procedure, only 14% of the participants believed that the quality of the video and camera angle needed improving in at least one of the clips, although no consistent pattern emerged as to which clips needed enhancement. Also, 10% suggested that more information on the game such as scoreline and knowledge of previous plays would have made the decision easier. Only 5 of the 132 participants made comments on the influence of the referee on the screen. However, all the participants felt the test to be a fair evaluation of referee DM prowess and, most pertinently for the present investigation, there was no relationship between negative feedback on the information presented on the screen with the levels of accuracy or agreement shown.

Discussion

Analyses of all Participants

The primary aim of this investigation was to assess the accuracy, agreement and coherence of England's best RFU referees, touch-judges, assessors and referee coaches. The mean levels of accuracy and agreement revealed poor DM performance. Despite selecting difficult DM scenarios, all 10 clips were judged by experts to be representative of actual decisions required on the field of play, which occur on average 11 times per game. Since these RFU officials averaged only 50% accuracy, this represents approximately 5 or 6 wrong decisions per game. Clearly, the ramifications on the game may be significant. Moreover, it is of even greater concern that the participants' level of confidence in their decisions rarely decreased, even when their decisions became more discordant. In other words, although these top officials made both inaccurate and widespread decisions, they were all as individuals equally confident in the accuracy of their DM.

Efficacy of SMMs to Test Officials 'DM

As suggested earlier in this paper, shared mental models do appear to help accurate DM since when a high percentage of officials are accurate their shared understanding as indicated by the same reasoning is also high. Equally, when the number of accurate responses is low the reasoning underpinning those decisions is even lower. In addition, since the top-20 referees indicated greater coherence in the reasons underpinning their accurate decisions, it seems fair to conclude that their mental models have more similarities. This supports the ecological and congruent validity of the methods used.

The critical emphasis in the development of a SMM relies on understanding the reason for differences in decisions. The simplest explanation may be that the different decisions are a reflection of the participants' ability to identify the cues pertinent to making an informed decision. In fact, as outlined by Mortimer and Collins (1997) it may be that the individual participant has a particular scaling value for the pertinent cues, using the terms criteria (the recognition of relevant cues) and weighting (the relative value of each of the criteria in reaching the decision). Thus, in the rugby union tackle situation, one referee may rate the tackler's inability to roll away as the most important criterion above the ball carrier's decision to hold on to the ball until support arrives. This would result in awarding a penalty to the attacking team. However, if another referee weighted the ball carrier's obligation to immediately pass, place, or release the ball as more important, then this referee would be more likely to award a penalty to the defending team. This may explain the poor coherence levels in clips 2 and 9. Accordingly, applying a hierarchical weighting scale, where elements of the decision are prioritised, may be one method of improving DM in such highly time pressured environments (Annett, 1997; Rasmussen, 1985).

Analyses by Group

Some inter-group differences were apparent; for example, the referees collectively were marginally better than the support groups. However, the mid-panel referees' performance was worse than both the touch-judges and the assessors, yet they were the most confident in their decisions. This may suggest that the mid-panel referees achieved this level of ranking because of their greater confidence levels, rather than through more accurate DM. A study by Franks, Elliott and Johnson (1985) would seem to support this idea. This investigation asked expert and novice gymnasts to view paired handspring performances, to identify if there were differences between the two and to state where these differences occurred. Results showed that the experts were no more accurate, but were simply more confident in their decisions. However, in the present case the super-elite top-20 group, which included several international referees, showed more 'realistic' confidence scores since these levels more accurately represented their levels of accuracy and coherence.

Most alarmingly the referee coaches revealed the lowest levels of accuracy. In fact, they were worse than the referees to whom they are required to offer guidance. Since most of these individuals are ex-referees who had not performed in many years, this is perhaps not surprising since the speed of the game is now much quicker (Campsall, 2002) and inevitably interpretations have similarly evolved to meet the new demands of the professional game. Nevertheless, this has enormous implications for the development of elite referees. If the referee coaches, the individuals responsible for teaching referees, are offering erroneous or disparate advice on this critical area of law-application, the current levels of inaccurate and incoherent DM may remain.

Before concluding, it is important to consider any methodological limitations that may have contributed to our findings. First, it is possible that some of the officials may have seen some of the test incidents before as they may have been broadcast on television, or indeed the participants may have been involved in officiating the games. However, a referee will typically officiate in at least 25 games per season, each containing in the region of 120 tackles, which would total about 3,000 of these types of situations. In this intervention it is questionable whether or not referees would have been able to remember each incident. Nevertheless, more control should be taken to prevent this in future.

Finally, another possible limitation of this study is the small number of test clips that were used to assess referee performance. Furthermore these difficult clips are not necessarily representative of the most common tackles that are likely to be encountered. Therefore, future studies should investigate a wider variety of scenarios in order to explore the levels of accuracy and coherence that are required to referee at the top level. More importantly, to help ensure that game outcomes are not adversely influenced by poor referee decisions, interventions should provide an expert's detailed interpretations, particularly focusing on the types of tackles that create problems, in order to produce more coherent referee DM.

Applied Implications

From an applied perspective, this video-based NDM approach offers a means of identifying areas of concern (cf., Abernethy, 1996). For example, this investigation revealed inconsistent use of yellow-cards (and the subsequent loss of a player for 10 minutes) and revealed decisions with penalties awarded in opposite directions.

It is surprising that two clips (5 and 9) revealed levels of agreement lower than that which would be expected by chance (as reflected by the negative Kappa scores in Table 1). Thus, for these two specific cases taken from premier league games, the decisions made by England's best RFU referees (which included two international referees ranked in the world top-20), touch-judges, assessors and coaches appear to offer decisions that could fairly be described as random. It appears that, with respect to the application of law 15, England's top officials seem to be providing very unpredictable decisions. Clearly, the influence of such poor coherence on the game can be substantial, with players having to adjust their play week by week to fit in with the individual foibles of each particular referee. Perhaps this is acceptable, although there is currently no data indicating the levels of consistency that are acceptable in any sports setting. Nevertheless, the views expressed by premier league coaches, the main consumers in this case, are clear. They want a lot more consistency, and see the development of greater coherence in the management of law 15 as the most critical factor for the improvement of RFU refereeing (Bunting, 1998; Melrose, 1998). Finally, although only very few clips generated this disappointingly low level of agreement, the impact of such extremes on player trust and the respect held for officials may have wider implications. In short, one inaccurate decision especially at the wrong time could change the tenor of the whole match.

In addition to highlighting particular types of tackles that create problems, this test also identified the groups of officials who were less accurate. The referee coaches' poor performance in particular may necessitate some form of SMM training. Developing the declarative knowledge of the task, the key concepts and their inter-relationship by exposing the expert's reasons underpinning their decisions, might be an appropriate way to improve their understanding of the tackle (Stout et al., 1996). Future research should examine the efficacy of such techniques for sports officials in light of the growing literature in NDM (Cannon-Bowers et al., 1996).

A Naturalistic Approach to Referee Decision-making

The findings of this preliminary investigation support McLennan and Omodei's (1996) conclusions that 'own-point-of-view' video scenarios, in this case closely representing the match day referee's perspective, can effectively be used for investigating referee DM through a NDM perspective. All participants indicated that this approach represented a fair test of their refereeing prowess, while suggestions for refinement were relatively minor. Furthermore, all the participants showed high confidence levels and only a very low percentage were unable to offer decisions due to insufficient information.

Despite this support for the NDM framework, feedback suggested that several other factors might need to be refined in order to make the test and subsequent training systems as real as possible. For example, some participants felt that knowledge of the flow of the game may be beneficial, with comments such as "it didn't allow me to get a feel for the atmosphere", and "it would have been helpful to have seen previous plays in the game". However, these are factors that may be more representative of the art rather than the science of refereeing (i.e., the judgment of context rather than pure law-application). While it may be argued that context forms a critical part of 'mastery of the laws' (see Anshel, 1995; Anshel & Webb, 1991) it may present so many degrees of freedom that it is too complex to assess and train reliably. Moreover, it seems sensible that before developing such advanced skills like contextual judgment (see Mascarenhas et al., 2002) officials develop coherence in pure law application, which provides the critical foundation upon which to develop more advanced skills. Without such, officials may become even more discordant as contextual factors are added. So, in the absence of contextual factors such as the emotion of the players and the tenor and flow of the game, the present assessment provides a clear, unambiguous test that requires a comparatively unequivocal application of the law.

In keeping with the literature, this study supports the contention that researchers need to look at the reasons underlying decisions, as well as the actual decisions made. Thus, training packages that use these types of 'contentious' tackles, independently adjudged to be realistic refereeing scenarios, may be appropriate to expose an expert's mental model. This may speed up the process of amassing experience (Stokes, Kemper, & Kite, 1997) and advance the development of a SMM so that referees decisions are not esoteric, but rather based on an accurate and coherent understanding of law.

Authors' Note

We gratefully acknowledge the financial support from the Rugby Football Union and the contributions made by Nick Bunting and the full-time referees at the Rugby Football Union Referees Centre of Excellence.

References

Abernethy, B. (1996). Training the visual-perceptual skills of athletes. Insights from the study of motor expertise. American Journal of Sports Medicine, 24(suppl. 6), S89-S92.

Abraham, A., & Collins, D. (1998). Examining and extending research in coach development. Quest, 50, 59-79.

Ackford, P. (2003, March 16). Ring of confidence from the whistle blowers. The Sunday Tele graph, p. 11.

Alessi, S. M. (1988). Fidelity in the design of instructional simulations. Journal of Computer-Based Instruction, 15(2), 40-47.

Altman, D. G. (1991). Practical statistics for medical research. Boca Raton, FL: Chapman and Hall.

Annett, J. (1997). Analysing team skills. In R Flin, E Salas, M Strub, & L Martin (Eds.), DM under stress." Emerging themes and applications (pp. 315-325). Aldershot: Ashgate.

Anshel, M. H. (1995). Development of a rating scale for determining competence in basketball referees: Implications for sport psychology. The Sport Psychologist, 9, 4-28.

Anshel, M. H., & Webb, P. (1991). Defining competence for effective refereeing. Sports Coach, 14(3), 32-37.

Brehmer, B. (1972). Policy conflict as a function of policy similarity and policy complexity. Scandinavian Journal of Psychology, 13,208-221.

Bunting, N. J. (1998) Rugby Football Union Referee. Welcome to the National Conference, Bromsgrove, July 1999.

Bunting, N. J. (1999) Allied Dunbar premiership guidance for referees, players and coaches on the application of law (RFU Tech. Rep. from the conference on the game). Castlecroft, Wolverhampton.

Campsall, B. (2002). Refereeing the Tackle. In High, C. J. (Chair), Rugby Football Union Performance Department: Inaugural Elite Referee Unit Conference, Huddersfield, August 2002.

Cannon-Bowers, J. A., & Bell, H. H. (1997). Training decision-makers for complex environments: Implications of the naturalistic decision-making perspective. In C. E. Zsambok, & G. Klein (Eds.), Naturalistic decision making (pp. 99-110). Mahwah, N J: Lawrence Erlbaum.

Cannon-Bowers, J. A. Salas, E., & Converse, S. A. (1990). Cognitive psychology and team training: Shared mental models in complex systems. Human Factors Society Bulletin, 33(12), 1-4.

Cannon-Bowers, J. A. Salas, E., & Pruitt (1996). Establishing the boundaries of a paradigm for decision making research. Human Factors, 38(2), 193-205.

Craven, B. J. (1999). A psychophysical study of leg-before-wicket judgments in cricket. British Journal of Psychology, 89, 555-578.

Eady, J. (1999). World class performance plan - Guidelines. London. Knight, Kavanagh, & Page.

Ford, G. G., Gallagher, S. H., Lacy, B. A., Bridwell, A. M., & Goodwin, E (1999) Repositioning the home plate umpire to provide enhanced perceptual cues and more accurate ball-strike judgments. Journal of Sport Behavior, 22,(1), 28-44.

Franks, I. M., Elliott, M., & Johnson, R. (1985). The effects of experience on the detection and location of performance differences in a gymnastic technique. Paper presented at the meeting of the Canadian society for psychomotor learning and sport psychology, Montreal.

Jones, M. V., Paull, G. C., & Erskine, J. (2002). The impact of a team's aggressive reputation on the decisions of association football referees. Journal of Sports Sciences, 20, 991-1000.

Klein, G. (1997a). An overview of naturalistic decision making applications. In C. E. Zsambok, & G. Klein (Eds.), Naturalistic decision making (pp. 49-59). Mahwah, N J: Lawrence Erlbaum.

Klein, G. (1997b). The current status of the naturalistic decision making framework. In R Flin, E Salas, M Strub, & L Martin (Eds.), Decision-making under stress." Emerging themes and applications (pp. 137-146). Aldershot: Ashgate.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174.

Langan-Fox, J., Code, S., & Langfield-Smith, K. (2000). "Team mental models: Techniques, methods and analytic approaches", Human Factors 42, 242-271.

MacMahon, C. & Ste-Marie, D. M. (1999). Decision-making in rugby officials. Paper presented at the Canadian Society for Psychomotor Learning and Sport Psychology, Edmonton: Alberta.

Mascarenhas, D. R. D., Collins, D., & Mortimer, P. (2002). The art of reason versus the exact ness of science in elite refereeing: Comments on Plessner and Betsch (2001). Journal of Sport and Exercise Psychology, 24, 328-333.

Melrose, A. (1998). More Jaw/Jaw, Less War/War: Coach and Referee Communication. RFU Journal, 12-13. Autumn 1998.

McLennan, J., & Omodei, M., (1996). The role of prepriming in recognition-primed decision-making. Perceptual and Motor Skills, 82, 1059-1069.

Millgram, E. & Thagard, E (1996). Deliberative coherence. Synthese, 108, 63-88.

Mortimer, P. W., & Collins, D. J. (1997, September). Coherence of decision-making in team sports. Paper presented at the BASES Annual Conference, York.

Omodei, M., McLennan, J., & Whitford, P. (1998). Using a head-mounted video camera and two-stage replay to enhance orienteering performance. International Journal of Sport Psychology, 29, 115-131.

Orasanu, J., & Connolly, T. (1993). The reinvention of decision-making. In G. A. Klein, J. Orasanu, R. Calderwood, & C. E. Zsambok (Eds.), Decision-making in action." Models and methods (pp. 3-20). Norwood, N J: Ablex.

Oudejans, R.R.D., Verheijen, R., Bakker, EC., Gerrits, J.C., Steinbruckner, M. & Beek, P.J. (2000) Errors in judging 'offside' in football. Nature, 404, 33.

Rainey, D., & Larsen, J. D. (1988). Balls, strikes, and norms: rule violations and normative rules among baseball umpires. Journal of Sport and Exercise Psychology, 10, 75-80.

Rainey, D., Larsen, J. D., Stephenson A., & Olson, T. (1993). Normative rules among umpires: the "phantom tag" at second base. Journal of Sport Behavior, 16, 3, 147-155.

Rasmussen, J. (1985). The role of hierarchical knowledge representations in decision making and system management. IEEE Transactions on Systems, Man and Cybernetics, SMC-15, (2) 234-243.

Rouse, W.B., Cannon-Bowers, J.A. & Salas, E. (1992) The role of mental models in team performance in complex systems. IEEE Transactions on Systems, Man and Cybernetics, 22, 1296-1308.

Rouse, W. B., & Morris, N. M. (1986). On looking into the black box: prospects and limits in the search for mental-models. Psychological Bulletin, 100, 349-363.

Sloan, T. (2004) Refs can't rate refs. Referee, 329, 58-61.

Ste-Marie, D. (2003) Expertise in sport judges and referees. In J. Starkes & K. A. Ericsson (Eds.), Expert performance in sports." Advances in research on sport expertise (pp. 169-189). Illinois, Human Kinetics.

Stokes, A. E, Kemper, K., & Kite, K. (1997). Aeronautical decision making, cue recognition, and expertise under time pressure. In C. E. Zsambok & G. Klein (Eds.), Naturalistic decision making (pp. 183-196). Mahwah, N J: Lawrence Erlbaum.

Stout, R. J. Cannon-Bowers, J. A., & Salas, E. (1996). The role of shared mental models in developing team situational awareness: Implications for training. Training Research Journal, 2, 85-116.

Williams, A. M., & Davids, K (1995). Declarative knowledge in sport: A by-product of experience or a characteristic of expertise. Journal of Sport and Exercise Psychology, 17, 259-273.

Williams, A. M., & Grant, A. (1999). Perceptual skills in soccer: Implications for talent identification to enhance coach-performer interactions. Journal of Sports Sciences, 18, 737-750.

Yates, J. F. (2001). "Outsider:" Impressions of naturalistic decision making. In E. Salas & G. Klein (Eds.), Linking expertise and naturalistic decision making (pp. 9-33). Mahwah, NJ: Lawrence Erlbaum.

Address Correspondence To: Duncan RD Mascarenhas, University of Edinburgh Department of PE Sport & Leisure Studies St Leonard's Land, Holyrood Road Edinburgh. EH8 8AQ Scotland Phone:+44(0) 131-651-6043 Fax:+44(0) 131-651-6521 Email: duncan.mascarenhas@education.ed.ac.uk

Duncan R. D. Mascarenhas, Dave Collins and Patrick Mortimer

University of Edinburgh, UK
Table 1.

Responses of all Participants Expressed as a Percentage

 Clip Number

Decision 1 2 3 4 5
 No action--play on 2 13 9 2 21 (c)
 Not enough info' 7 6 2 2
 Manage situation 2 8 3 2 10
 Advantage 5 2 2
 Penalty to attack 82 (c) 15 (c) 7 31 (c) 15
 Penalty to defence 5 7 58 (c) 48 15
 Free Kick 11
 Scrum 2 51 2 10 22
 Scrum with turnover 2 6 14

Level of Confidence in Decision from 1 (low) to 5 (high)
 M 4.1 3.6 3.8 4.3 4.0
 SD 0.9 1.0 1.1 0.8 0.9
Kappa Statistic .60 * .14 .20 .14 -.04
Strength of agreement (1) Mod Poor Poor Poor V Poor

 Clip Number

Decision 6 7 8 9 10
 No action--play on 5 18 1
 Not enough info' 2 1 2
 Manage situation 6 2 16 1
 Advantage 1 2 1 1
 Penalty to attack 5 89 (c) 17 11 17
 Penalty to defence 30 6 12 30 (c) 7 (c)
 Free Kick
 Scrum 49 2 55 18 5
 Scrum with turnover 2 1 14 4 5

Level of Confidence in Decision from 1 (low) to 5 (high)
 M 3.8 4.6 3.9 3.9 4.2
 SD 1.1 0.7 1.0 1.0 .09
Kappa Statistic 0.17 .74 * .21 -.01 .4 (AS)
Strength of agreement (1) Poor Good Fair V Poor Mod

Table 2

Percentage of Correct Responses, Agreement and Confidence Scores
by Group

 Referees

Clip Top-20 21-4 41-65 Touch- Assessors Referee
 Judges Coaches

 1 100.0 87.5 82.6 83.0 70.4 84.6
 2 7.1 0.0 18.2 6.4 38.5 7.7
 3 64.3 75.0 45.5 65.9 64.0 18.2
 4 42.9 25.0 45.5 25.5 37.0 0.0
 5 14.3 12.5 26.1 23.9 22.2 8.3
 6 78.6 50.0 36.4 44.7 40.7 66.7
 7 100.0 71.4 100.0 79.1 92.3 92.3
 8 42.9 62.5 52.2 51.1 55.6 76.9
 9 28.6 37.5 39.1 28.3 33.3 8.3
 10 64.3 50.0 78.3 73.9 70.4 66.7
 M 54.3 47.1 52.4 48.2 52.4 43.0
 SD 32.9 28.4 26.3 26.7 21.7 37.3
Kappa statistic of Agreement
 M .39 .30 .35 .29 .29 .34
 SD .32 .20 .27 .19 .21 .26
Level of Confidence in Decision
 M 4.3 4.4 4.1 3.9 3.9 3.9
 SD 0.8 0.7 0.8 0.9 1.0 1.1

Notes: The mean accuracy of all the referees across all 10 clips
was M = 51.3% (SD = 28.5%).


联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有