A comparison of test-retest reliabilities using the self-talk use questionnaire.
Hardy, James ; Hall, Craig R.
Self-talk can be thought of as a construct concerned with
athletes' multidimensional sport related self-verbalizations, which
seem to serve instructional and motivational functions (Hardy, Hall,
& Hardy, in press). Although elite athletes and coaches both support
the use of appropriate (positive) self-talk (Gould, Hodge, Peterson,
& Giannini, 1989), our knowledge about this mental skill is quite
limited. This limitation is somewhat surprising given that the use of
cognitive restructuring interventions have been shown to be a more
powerful treatment (d = .79) than the use of relaxation (d = .73),
mental rehearsal (d = .57), and goal setting (d = .54) interventions for
the enhancement of sporting performance (Meyers, Whelan, & Murphy,
1996).
A possible reason for the current state of affairs in the self-talk
literature may be due to a lack of descriptive data upon which to
further examine self-talk's relationships. Hardy and colleagues
(Hardy, Gammage, & Hall, 2001; Hardy, Hall, & Hardy, in press,
2004) attempted to remedy this problem. Consequently, an inductive qualitative approach was utilized in their initial study (Hardy et al.,
2001). It was found that both the content (i.e., what is said) and the
functions of self-talk (i.e., why athletes employ self-talk) were
multidimensional. Quantitative findings obtained, via the Self-Talk Use
Questionnaire (STUQ), from subsequent studies offered support and
extended these qualitative results. That is, athletes reported the
frequent use of self-talk as categorized qualitatively as well as sex,
sport, and competitive level differences examined. The STUQ was
developed as a preliminary attempt to quantify athletes' use of
self-talk in addition to supplement Hardy et al.'s previous
qualitative findings. The STUQ was based on similar descriptive
instrument used to examine mental imagery, the Imagery Use Questionnaire
(IUQ; Hall, Rodgers, & Barr, 1990) as well as Hardy et al.'s
(2001) qualitative findings. The IUQ is a valid and reliable general
measure of athletes' frequency of the use of mental imagery. It
places emphasis on the imagery-related habits of athletes (i.e., when
athletes use imagery) as well as the content of their imagery. Hardy et
al.'s qualitative findings helped guide the generation of items
that were relevant to the mental skill of self-talk. Suggestions from an
experienced sport psychology consultant and a national level soccer
coach facilitated the wording of "athlete friendly" items. The
STUQ assesses the frequency of the use of self-talk. It places an
emphasis on when athletes employ self-talk, the content of
athletes' self-talk as well as athletes' use of the specific
functions of self-talk (i.e., the purpose of self-talk) and how athletes
employ self-talk (e.g., use of self-talk in combination with imagery).
With regard to the content of athletes' self-talk, although
differences across sex and skill level were absent, Hardy et al. (in
press) found that team and individual sport athletes employ self-talk of
differing content. Hardy et al.'s (2004) findings related to how
athletes use self-talk were somewhat different to the findings for the
content of self-talk. Athletes' reported an increasing use of
self-talk as their competitive season progressed. Furthermore, although
male and female athletes were not found to differ on how they employed
self-talk, significant effects for skill level and sport type were
present.
With regard to athletes' use of the functions of self-talk,
Hardy et al. (in press) did not uncover significant differences between
male and female, and skilled and less skilled athletes. They did
demonstrate however, that individual sport athletes make greater use of
self-talk in general and more precisely greater use of nearly all
specific functions of self-talk, as compared to their team sport
counterparts. In addition, a significant main effect for setting was
revealed, in that athletes reported significantly greater use of
self-talk in conjunction with competition-related than practice-related
situations. A significant main effect for temporal phase was also found.
That is, differences were found between the use of self-talk before,
during and after practice and competition. Regardless of setting,
self-talk was employed most frequently during as opposed to prior to or
post practice and competition.
A noted limitation to Hardy et al.'s (in press) findings was
the need to interpret them with some caution as psychometric information
on the STUQ is lacking. Given that "without solid measurement, it
is difficult to challenge, disconfirm, and/or extend psychological
theory in sport and exercise psychology" (Duda, 1998, p.xxiii)
there is an obvious need to examine the psychometric properties of the
STUQ. Some frequently employed methods to assess properties of
questionnaires are not, however, applicable to the STUQ. For example,
because the STUQ's items are not grouped into sub-scales
representing a range of self-talk related factors, examination of the
instrument's factor structure, through exploratory or confirmatory
techniques, as well as an examination of the STUQ's sub-scale
internal consistency are not conceptually appropriate. Examination of
the STUQ's internal consistency via an item-total test approach
was, however, possible. As a result, this was one purpose of the present
study.
The study's primary purpose, however, focused on a second
relevant psychometric property, test-retest reliability. According to Thomas and Nelson (2001), test-retest reliability or stability "is
one of the most severe tests of consistency" (p. 188). The
test-retest method involves administering a test on two separate
occasions in an identical manner. The stability of the response variable
is of critical importance to the test-retest assessment. Not only does
the relative stability of the response variable influence the length of
the test-retest interval (Portney & Watkins, 1993), it can also
determine whether the examination of an instrument's test-retest
stability should even be attempted (Schutz, 1998). For example, if an
underlying response variable is not stable, it would make little sense
to assess the test-retest stability of an instrument designed to measure
such a dynamic construct. With regard to the present study's
underlying variable, self-talk, there is no evidence to date that
indicates that athletes' general use of self-talk naturally changes
exclusively over time. Although preliminary research has found the use
of self-talk to alter across (a) practice and competitive settings, (b)
preparatory phases (before, during, and after practice/competition)
(Hardy et al., in press) and (c) training cycles of the season (off-,
early regular, and late regular) (Hardy et al., 2004), none of these
independent variables are exclusively based on time itself, each deals
with very distinct situations for the athlete.
Traditionally, within the sport psychology literature the
test-retest reliability or stability of psychometric questionnaires has
been assessed through the use of the Pearson (interclass) correlation
with a small sample of athletes (e.g., Anderson & Cychosz, 1994;
Hall & Barr, 1992; Pelletier, Fortier, Vallerand, Tuscon, Briere,
& Blais, 1995). Unfortunately there are limitations to the
utilization of this approach. As Thomas and Nelson pointed out, use of
the Pearson correlation is only suitable when assessing the relationship
between two different variables. This situation is clearly not the case
with a test-retest design--the same variable is measured on two separate
occasions. Consequently, Thomas and Nelson suggest that the intraclass
correlation should be employed when concerned with the scoring of the
same variable across time (e.g., Brewer et al., 2000). However, it
should be noted that correlations are an indication of relationship and
do not offer information regarding agreement (Bland & Altman, 1986;
Nevill, 1996) and so are unable to detect systematic bias in responses
from one time to another. Furthermore, both the Pearson correlation and
the intraclass correlation are often used as summary statistics obtained
by pooling relevant items. Wilson and Batterham (1999) and Nevill, Lane,
Kilgour, Bowes, and Whyte (2001) indicated that the use of such
statistics may not provide a clear picture of the stability of an
instrument's items. This is because individual items with poor
stability that might be present cannot be clearly identified due to the
averaging out process inherent in the use of summary statistics.
As a result, Nevill et al. (2001) supported Wilson and
Batterham's (1999) proposal to use a within individual item-by-item
approach to test-retest designs. It should be noted that Bland and
Altman (1986) first forwarded Wilson and Batterham's general
approach. They recommended the use of the proportion of agreement,
computed for each item. The proportion of agreement is "based on
the proportion of participants that record the same response on two
separate occasions" (Nevill et al., 2001, p.273). However, it was
Nevill et al.'s contention that Wilson and Batterham's overly
complex "bootstrapped" item-by-item approach also lacked the
ability to detect systematic bias and distinguish between "near
misses" and "wide disagreements". To this end, Nevill and
colleagues recommended the use of a modified proportion of agreement
procedure. This method entailed the calculation of test-retest
differences and then the reporting of the percentage of individuals
who's differences were found to be within a reference value of no
practical importance. With regard to the variable under investigation in
their study, social physique anxiety, a relatively stable trait measured
on a 5-point scale, a reference value of [+ or -] 1 was adopted and it
was forwarded that most participants (i.e., 90%) should record
differences within this value (Nevill et al.).
Although the constructs under investigation in the present study
(self-talk; 9-point scale) and in Nevill et al. 's (2001) study
(social physique anxiety; 5-point scale) are different in nature and are
scored using different response scales, a similar within individual
item-by-item approach was employed in the present study. To reflect
these differences, Nevill et al.'s reference value of [+ or -] 1
was altered to meet the demands of the present study (when applicable).
This alteration was carried out because identical proportion of
agreement values (e.g., 95%) using the same limits of agreement (e.g.,
+1) obtained from questionnaires utilizing 9 and 5-point response scales
respectively, are not equivalent--the 9-point scale related proportion
of agreement has greater relative stability. Thus, it was expected that
most (90%) participants would report responses within [+ or -] 2 of each
other. As such, it was hypothesized that the STUQ items would be
relatively stable over time.
In sum, the general aim of the present study was to illustrate the
use of the proportion of agreement method to examining test-retest
reliability using the STUQ as an example. An examination of the
instrument's general internal consistency and test-retest
reliability/stability was undertaken. It was expected that the STUQ
would display good reliability.
Method
Participants
Participants were recruited from volleyball (n = 74) and basketball
(n = 27) activity classes; the sample was comprised of 101 Kinesiology undergraduate volunteers (44 males, 57 females) with a mean age of 20.92
years (SD = 1.19). The number of participants in the present study
corresponds with Nevill et al.'s (2001) recommendations concerning
minimum sample size for the examination of questionnaires'
reliability via the use of non-parametric approaches.
Measures
A modified version of the STUQ (Hardy et al., in press; 2004) was
administered. The instrument was modified to be relevant to the sample
employed. This resulted in items assessing the use of self-talk in
competition to be dropped, as the athletes did not play volleyball or
basketball competitively. The modified version of the STUQ was comprised
of 36 items contained within 4 sections, with an emphasis on the
frequency of athletes' use of self-talk. Following a self-statement
oriented definition of self-talk, athletes completed the 4 questions in
Section I dealing with when athletes generally use self-talk. Section 2
contained 9 questions related to the content of self-talk (i.e., what
athletes say to themselves). Section 3 was comprised of 12 items that
assessed the specific functions of self-talk (i.e., the reasons why
athletes talk to themselves). Finally, Section 4 contained 11 questions
about how athletes use self-talk (e.g., consistency of self-talk and
belief in self-talk).
Participants responded to the majority of the items using a 9-point
scale (1 = never, 9 = all the time). Two items required the use of a
5-point scale (1 = not at all consistent/strongly disbelieve, 5 =
completely consistent/strongly believe). For the purposes of the
employed analyses, only those items that were responded to in the above
manner were included in data analyses. In other words, only items
responded to via Likert type scales were utilized (n = 24). This
illustrates a limitation noted by Hardy et al. (2004) with regard to the
unusual ratio-based response format of the STUQ's content related
questions.
Procedure
Permission to approach participants was first gained from their
respective activity class instructors. All participants were over the
age of 18 years old, thus parental consent was not required. The nature
of the study was explained to each participant. Each volunteer was
informed of the nature of his or her involvement in the study via a
letter of information. Informed consent was implied by completion of the
STUQ. The STUQ was administered in weeks 4 and 5 of the six-week
volleyball/basketball activity courses. On each occasion, the STUQ took
approximately 10 minutes to fill out and was completed at the beginning
of the activity class. The time frame of one week employed in the
present study is substantially shorter than Kline's (1993)
recommended three-month gap between survey administrations. Kline (1993)
proposed the extended gap in order to minimize the influence of the
recall of individuals' responses. A much shorter gap was employed
in the present study in order to reduce the influences that time of
season (i.e., use of self-talk early versus late in the course) and
improved ability (i.e., learning effect) might have on individuals'
responses.
Data Analysis
First, the STUQ's internal consistency was assessed using an
item-total test approach. To this end, Cronbach's alpha was
calculated. As the STUQ does not have subscales, the consistency of
responses across the entire 24 items was assessed. The items were
normally distributed (i.e., standard deviations greater than one and
skewness less than two) for both the test and retest data. Second, in
order to assess test-retest stability the recommendations of Nevill et
al. (2001) were followed--a non-parametric approach proposed by Bland
and Altman (1999) was conducted. Thus, the proportion of agreement and
proportion of test-retest differences found within a [+ or -] 2
reference value (or +1 when appropriate) were calculated. The
nonparametric median sign test was employed to test for the presence of
systematic bias. A Bonferroni corrected significance level of p <
.002 (p = .05 / 24) was employed.
Results
Mean and standard deviation frequency values are shown in Table 1.
Overall, the descriptive statistics presented in Table 1 are slightly
lower but comparable to those reported in the literature (Hardy et al.,
in press, 2004).
Internal consistency
Participants' responses given on the first data collection
point were analyzed for the purpose of examining the survey's
internal consistency. The result from the test of internal consistency
indicated that the STUQ items have good internal consistency,
Cronbach's alpha = .94.
Test-retest reliability
In order to illustrate the value of Bland and Altman's (1986)
proportion of agreement approach, test-retest reliability was assessed
three ways. First, the traditional and inappropriate approach to
test-retest reliability via interclass correlation was used. Second, the
less traditional but more appropriate approach via intra-class
correlation was conducted. Finally, the most appropriate approach to
test-retest reliability was undertaken, a modified version of Bland and
Altman's proportion of agreement approach. Table 2 contains
correlation and agreement values for each of the 24 STUQ items examined.
It can be seen that the average interclass correlations for the
STUQ items was .66 (ranging from .55 to .80; see Table 2). Reliance on
interclass correlations alone would suggest that there were moderate to
fairly strong positive relationships between the responses initially
collected and the retest responses for the STUQ items examined. This
finding might suggest that the test-retest reliability for the STUQ
items examined was marginal. Calculation of intraclass correlations
(ICC) using a 2-way random variable absolute agreement approach
generated an average ICC of .66 (ranging from .54 to .80; see Table 2).
According to Vincent (1999), such ICC coefficients would suggest that
the STUQ items examined possess marginal test-retest stability. However,
more specifically only 6 items from the 24 examined demonstrated ICC
values greater than .70. Again, it must be re-iterated--correlations do
not give a measure of stability or agreement, only association
(Ludbrook, 1997).
A better understanding of the STUQ's stability over time can
be gleaned from the proportion of agreement percentages. As indicated in
Table 2, the proportion of agreement ranged from 21% to 70% across the
24 items. The proportion of agreement within the respective specified
reference values ranged from 81% to 96%, however. Moreover, while only
11 items had proportions of agreement greater than 90%, all 24 of the
items except 2 (i.e., effort control and goal function items) had
agreement values greater than 85%. With regard to the presence of
systematic bias, results from median sign tests indicated a significant
negative bias for 2 items. Participants reported significantly higher
responses on the retest of the planned self-talk item (45 participants
reported differences below the median and just 16 above the median) and
the item assessing the use of self-talk before practice (48 participants
reported differences below the median and just 20 above the median).
Together, results from the proportion of agreement and median sign test
procedures are suggestive that the majority of the 24 STUQ items
examined are reasonably stable.
Discussion
The aim of the present study was to generate reliability
information concerning the STUQ. A fairly new method for assessing the
stability of survey items was employed. Overall, supportive evidence for
the a priori hypotheses was found. Specifically, the STUQ items examined
appear to be internally consistent and relatively stable over time.
With regard to the different test-retest techniques presented, it
can be seen that varying pictures would emerge depending on which
technique was relied upon, ranging from marginal through to adequate
stability for 22 of the 24 items examined. The low (Pearson and ICC)
correlation coefficients may be due in part to the restrictive sample of
the athletic population employed in the study. All volleyball and
basketball players took part in their respective sport at the
recreational level. It is possible that this led to narrow variance that
subsequently impacted on the coefficients (Wilson & Batterham,
1999). It is proposed that Nevill et al.'s (2001) proportion of
agreement protocol employed in the present study is the best approach to
assessing a survey's test-retest reliability or stability. As such,
researchers interested in examining a survey's test-retest
reliability would do well to avoid the use of correlations that can not
offer information about agreement or stability (Bland & Altman,
1986; Nevill, 1996), and instead utilize Nevill and colleagues'
method. It should be noted that a second approach to the proportion of
agreement technique is available to researchers. It is possible to
create the limits of agreement based on 95% confidence intervals (Bland
& Altman, 1999). The use of confidence intervals was not appropriate
for the present study. Confidence intervals would likely create
boundaries of agreement that include decimal places. The STUQ's
response scale involves self-ratings of whole numbers only.
When the present study's proportion of agreement values are
compared to previous research that has employed this technique, the STUQ
items examined fare favorably against items from the Social Physique
Anxiety Scale (SPAS; Hart, Leary, & Rejeski, 1989). One possible
explanation for this comes from the work of Nevill et al. (2001) and
Wilson and Batterham (1999) that suggests that some of the SPAS's
items could be reworded to improve their test-retest stability.
Alternatively, the present study's use of a different reference
value (+ 2) may have contributed to the appearance of the STUQ
items' superior stability. It should be noted however, the nature
of self-talk is different to the trait of social physique anxiety and
that participants responded to the STUQ items (in most cases) via a 9
point scale, not a 5 point scale like the SPAS utilizes.
The above point reflects a limitation to the proportion of
agreement approach employed in the present study; there is an element of
subjectivity regarding the use of limits of agreement. Specifically, if
limits of agreement are utilized, what should these boundaries of
agreement be? Wilson and Batterham (1999) have presented an argument
that limits to agreement should not be employed if the variable under
investigation is discrete in nature. That is, if the Likert-type
responses represent distinct categories. If item responses are
conceptualized as continuous in nature, Bland and Altman (1999) forward
the somewhat subjective approach of employing reference values of no
practical value whereby differences within the limits of agreement are
not clinically important. The range of possible responses is not
currently considered (e.g., 1 to 5 vs. 1 to 9) in the proportion of
agreement protocol. It should be noted that the use of a [+ or -] 1
limit of agreement on a 5 point Liken-type scale is not equivalent to
the use of the same reference values on a 9-point Likert-type scale.
Such differences in the meaning of reference values guided our use of a
[+ or -] 2 limit of agreement for items responded to via a 9 point
Likert-type scale, although it is acknowledged that this reference value
may be liberal. (Interestingly, the 5-point spread obtained from
utilizing [+ or -] 2 agreement limits from a 9-point response scale is
actually relatively more conservative than a 3-point range of agreement
from a 5-point response scale.) If a [+ or -] 1 reference value was
employed in the present investigation a much different story emerges
(see Table 2 for proportions of agreement +1 for each of the items). As
shown in Table 2, reliance on a [+ or -] 1 reference value would
indicate that, with the exception of the two STUQ items scored on a
5-point scale, none of the 24 STUQ items examined met the proportion
criterion of 90%. Thus, although Bland and Altman (1999) comment that
"the decision about what is acceptable agreement is a clinical
one" (p. 139), it would seem that there is a need for future
research to extend the proportion of agreement method to incorporate the
potential variance of responses.
Although the present results may help alleviate some concerns
regarding Hardy and coworkers' (in press, 2004) STUQ related
findings, they are not without their problems. Due to poor item
stability, caution interpreting effort control and goal function related
findings is needed. Furthermore, due to response format differences for
some of the STUQ items and the sample employed, not all items on the
STUQ were examined. As a result, reliability information on the content
and competition related STUQ items is still absent. It should be noted,
however, that Hardy et al. (2004; Study 2) reproduced their content
related findings in a replication study. Overall, recent use of the STUQ
has generated preliminary findings that should be used to base initial
discussion and facilitate more in depth examinations of self-talk in the
sports and exercise domains.
References
Anderson, D. F., & Cychosz, C. M. (1994). Development of an
exercise identity scale. Perceptual and Motor Skills, 78, 747-751.
Bland, J. M. & Altman, D. G. (1986). Statistical methods for
assessing agreement between two methods of clinical measurement. Lancet,
i, 307-310.
Bland, J. M. & Altman, D. G. (1999). Measuring agreement in
methods comparison studies. Statistical Methods in Medical Research, 8,
135-160.
Brewer, B. W., Van Raalte, J. L., Petitpas, A. J., Sklar, J. H.,
Pohlman, M. H., Krushell, R. J., Ditmer, T. D., Daly, J. M., &
Weinstock, J. (2000). Preliminary psychometric evaluation of a measure
of adherence to clinic-based sport injury rehabilitation. Physical
Therapy in Sport Journal, 1, 68-74.
Duda, J. L. (1998). Advances in sport and exercise psychology
measurement. Morgantown, WV: Fitness Information Technology.
Gould, D., Hodge, K., Peterson, K., & Giannini, J. (1989). An
exploratory examination of strategies used by elite coaches to enhance
self-efficacy in athletes. Journal of Sport & Exercise Psychology,
11, 128-140.
Hall, C.R., & Barr, K. A. (1992). The use of imagery by rowers.
International Journal of Sport Psychology, 23, 243-261.
Hall, C. R., Rodgers, W. M., & Barr, K. A. (1990). The use of
imagery by athletes in selected sports. International Journal of Sport
Psychology, 4, 1-10.
Hardy, J., Gammage, K. L., & Hall, C. R. (2001). A descriptive
study of athlete self-talk. The Sport Psychologist, 15, 306-318.
Hardy, J., Hall, C. R., & Hardy, L. (in press). Quantifying
athletes' use of self-talk. Journal of Sport Science.
Hardy, J., Hall, C. R., & Hardy, L. (2004). A note on how
athletes use self-talk. Journal of Applied Sport Psychology, 16,
251-257.
Hart, E. H., Leary, M. R., & Rejeski, W. J., (1989). The
measurement of social physique anxiety. Journal of Sport & Exercise
Psychology, 11, 94-104.
Ludbrook, L. (1997). Comparing methods of measurement. Clinical and
Experimental Pharmacology and Physiology, 24, 193-203.
Kline, P. (1993). Handbook of psychological testing. London:
Routledge.
Meyers, A. W., Whelan, J. P., & Murphy, S. M. (1996). Cognitive
behavioral strategies in athletic performance enhancement. In M. Hersen,
R. M. Eisler, & M. Miller (Eds.), Progress in behavior
modification." Vol. 30 (pp. 137-164). Pacific Grove, CA: Brooks/
Cole.
Nevill, A. M. (1996). Validity and measurement agreement in sports
performance. Journal of Sport Sciences, 14, 199.
Nevill, A. M., Lane, A. M., Kilgour, L. J., Bowes, N., & Whyte,
G. P. (2001). Stability of psycho metric questionnaires. Journal of
Sport Sciences, 19, 273-278.
Pelletier, L. G., Fortier, M. S., Vallerand, R. J., Tuscon, K. M.,
Briere, N. M., Blais, M. R. (1995). Toward a new measure of intrinsic
motivation, extrinsic motivation, and amotivation in sports: the Sport
Motivation Scale (SMS). Journal of Sport & Exercise Psychology, 17,
35-53.
Portney, L. G., & Watkins, M. P. (1993). Foundations of
clinical research." applications to practice. Stamford, CT:
Appleton & Lange.
Schutz, R. W. (1998). Assessing the stability of psychological
traits and measures. In J. L. Duda (Ed.), Advances in sport and exercise
psychology measurement (pp. 393-408). Morgantown, WV: Fitness
Information Technology.
Thomas, J. R., & Nelson, J. K. (2001). Research methods in
physical activity (4th ed.). Human Kinetics; Champaign, IL.
Vincent, W. J. (1999). Statistics in Kinesiology (2nd ed.). Human
Kinetics; Champaign, IL.
Wilson, K., & Batterham, A. (1999). Stability of questionnaire
items in sport and exercise psychology: Bootstrap limits of agreement.
Journal of Sport Sciences, 17, 725-734.
Address Correspondence To: James Hardy, School of Sport, Health,
and Exercise Sciences University of Wales, Bangor George Building,
Bangor, Gwynedd LL57 2PX UK Email: j.t.hardy@bangor.ac.uk. Fax: 01248
371053
James Hardy and Craig R. Hall
University of Western Ontario
Table 1
Descriptive statistics for the STUQ items examined
Time 1 Time 2
Mean Standard Mean Standard
Deviation Deviation
When athletes use self-talk
Before a practice 2.91 1.81 3.34 1.81
During a practice 5.64 1.83 5.36 1.81
After a practice 3.24 1.92 2.99 1.58
Away from a practice 3.03 2.01 2.60 1.61
Use of the functions of self-talk
Skill function 5.53 2.10 5.47 1.86
Strategy function 4.91 1.96 4.95 1.93
Psyching function 5.67 2.28 5.55 2.22
Relaxation function 4.54 2.39 4.69 2.23
Nerve control function 4.71 2.29 4.77 2.14
Focusing function 5.68 1.93 5.47 1.85
Self-confidence function 5.30 2.28 4.98 2.06
Mental preparation function 5.14 2.14 5.17 2.00
Coping function 5.26 2.34 4.99 2.15
Motivation function 5.12 2.19 5.00 2.11
Effort control function 4.83 2.20 4.52 2.13
Goal function 4.59 2.34 4.39 2.03
How athletes use self-talk
Before attempting skills 5.54 2.23 5.47 1.90
During execution of skills 4.10 2.22 4.45 1.99
Self-talk with imagery 5.58 1.99 5.11 2.08
Self-talk with physical practice 5.42 1.90 5.32 1.88
Self-talk alone 4.47 2.03 4.44 1.88
Planned self-talk 2.89 1.93 3.34 1.92
Consistent self-talk * 3.00 0.91 3.13 0.80
Belief in self-talk * 3.66 0.79 3.68 0.77
Note. Items were scored via a 9 point scale except those where
indicated. * denotes that the item was scored on a 5 point scale.
Table 2
Test-retest statistics for the STUQ items
STUQ item Inter-class Intra-class PA (%)
correlation correlation
coefficient coefficient
When athletes use
self-talk
Before a practice 0.65 0.66 33 (33%)
During a practice 0.67 0.66 47 (47%)
After a practice 0.56 0.54 34 (34%)
Away from a practice 0.66 0.62 46 (46%)
Use of the functions
of self-talk
Skill function 0.65 0.66 31 (31%)
Strategy function 0.69 0.69 29 (29%)
Psyching function 0.80 0.80 39 (39%)
Relaxation function 0.74 0.75 34 (34%)
Nerve control function 0.72 0.72 31 (31%)
Focusing function 0.64 0.64 30 (30%)
Self-confidence function 0.69 0.69 30 (30%)
Mental preparation
function 0.76 0.76 37 (37%)
Coping function 0.74 0.74 31 (31%)
Motivation function 0.69 0.69 34 (34%)
Effort control function 0.63 0.62 21 (21%)
Goal function 0.62 0.60 25 (25%)
How athletes use self-talk
Before attempting skills 0.73 0.73 27 (27%)
During execution
of skills 0.64 0.65 32 (32%)
Self-talk with imagery 0.58 0.60 27 (27%)
Self-talk with
physical practice 0.58 0.59 20 (20%)
Self-talk alone 0.62 0.62 24 (24%)
Planned self-talk 0.70 0.67 40 (40%)
Consistent self-talk * 0.55 0.55 55 (54%)
Belief in self-talk * 0.64 0.64 71 (69%)
STUQ item PA PA
[+ or -] (%) [+ or -] 2 (%)
When athletes use
self-talk
Before a practice 71 (70%) 90 (89%)
During a practice 76 (75%) 91 (90%)
After a practice 66 (65%) 88 (87%)
Away from a practice 78 (77%) 90 (89%)
Use of the functions
of self-talk
Skill function 64 (64%) 89 (88%)
Strategy function 75 (74%) 91 (90%)
Psyching function 75 (74%) 91 (90%)
Relaxation function 67 (66%) 87 (86%)
Nerve control function 66 (65%) 86 (85%)
Focusing function 62 (61%) 93 (92%)
Self-confidence function 67 (66%) 87 (86%)
Mental preparation
function 73 (72%) 92 (91%)
Coping function 69 (68%) 88 (87%)
Motivation function 69 (68%) 89 (88%)
Effort control function 60 (59%) 83 (82%)
Goal function 60 (59%) 82 (81%)
How athletes use self-talk
Before attempting skills 68 (67%) 91 (90%)
During execution
of skills 62 (61%) 88 (87%)
Self-talk with imagery 62 (61%) 88 (87%)
Self-talk with
physical practice 68 (67%) 87 (86%)
Self-talk alone 65 (64%) 90 (89%)
Planned self-talk 71 (70%) 87 (86%)
Consistent self-talk * 93 (92%) 93 (92%)
Belief in self-talk * 97 (96%) 97 (96%)
Note. PA = proportion of agreement. * denotes that this item was
scored on a 5 point Likert type scale, accordingly a reference
value of [+ or -] 1 was consistently employed.