Correspondence
Stefan SchmidtTo the Editor:
Reading the guest editorial by Dr. James Carpenter QP, 66, 339-342), I was quite puzzled. Dr. Carpenter describes, amongst other examples, the events of September 11, its consequences, and the experience of the situation using "we" as a pronoun. He also made use of verbalizations such as: "Thousands of our countrymen and women...." and "We round a new class of heroes ..."
I do not hold American citizenship, I am not living in the United States, and I think the same applies for quite a lot of the readers of the Journal, which I conceived to be an international journal. This guest editorial is either not addressing the subscribers from abroad or is just ignoring the fact that there are other possibilities than being American.
The perception of the events of 9/11 differs a lot with your nationality, origin, and also with the media bringing you the news. I would like to ask Dr. Carpenter to reread his editorial while simultaneously imaging that he is living somewhere in Europe, South America, or at some other place in the world. I am wondering whether he, as a reader of the Journal, would feel addressed.
STEFAN SCHMIDT
Hornberg Str. 8
79285 Ebringen
Germany
sschmidt@ukl.uni-freiburg.de
To the Editor:
Dr. Schmidt reminds us that the concerns of parapsychology--indeed, all of our deeper concerns--are bigger than any nation. Thanks to him for that. After 9/11 I felt a new love for my country. I loved its shocked innocence; its shaken arrogance. That feeling spilled over into the language of my editorial.
JAMES CARPENTER
Rhine Research Center
2741 Campus Walk Ave., Bldg. 500
Durham, NC, 27705, USA
jcarp@med.unc.edu
To the Editor:
Rex Stanford (2003) recently noted "the wisdom of performing a power analysis prior to attempting replications of a study" (p. 18). Statisticians strongly concur with this point (e.g., Utts, 1991). However, power analysis also brings into focus some of the pivotal and problematic issues in psi research.
Power analysis is used to determine the sample size needed to have a reasonable likelihood of obtaining significant results. It is particularly important for interpreting nonsignificant or marginal results. My experience working in medical research for over a decade has been that power analysis is usually expected as part of grant applications and is required in the protocols for studies submitted to FDA to support approval of new products.
Power analysis, as commonly applied in planning studies, is based on the assumption that the effects being investigated are reasonably stable across studies and reasonably independent of the investigator. The likelihood of obtaining significant results is assumed to increase as the sample size increases.
There is currently strong evidence that psi research does not have these properties. The effects in experiments vary markedly among experimenters. Therefore, any power analysis must be experimenter specific. Even more problematic, the evidence for frequent declines and changes in results for a line of research for an experimenter indicates that psi effects are not stable across studies and often seem to change capriciously (Houtkooper, 2002; Kennedy, 2003).
From a more technical perspective, power analyses, like other statistical methods, are based on the fact that the z score is expected to increase with the square root of the sample size. Similarly, the z score divided by the square root of sample size is used to measure effect size and is expected to be unrelated to sample size.
The expected association between sample size and z score or significance level was not found in meta-analyses of random number generator (RNG) studies (Radin & Nelson, 2000) and early ganzfeld studies (Honorton, 1983). In equivalent results, effect size was found to be inversely related to sample size in RNG studies (Steinkamp, Boller, & Bosch, 2002), later ganzfeld studies (Bem & Honorton, 1994) and early card experiments (Nash, 1989; discussed in Kennedy, 1994). These findings are another way of expressing the experimenter differences and declines noted above, and are also consistent with goal-oriented psi experimenter effects (Kennedy, 1994).
Much greater thought needs to be given to the application and interpretation of statistical methods under these circumstances. Trying to use power analysis to plan the sample size for confirmatory studies brings these issues to the forefront.
It appears to me that understanding these problematic properties of psi is the top priority for research. Historically in parapsychology the inconsistent effects have been thought to be due to variations in psychological factors such as attitude and motivation. That is certainly one of the more testable approaches and is a good starting point. However, there has been little effort to explore these realms in depth--to understand what peoples' actual motivations are relating to psi and why they feel and believe that way.
For example, it appears that the sex differences in attitude toward psi and the occurrence of psi experiences is an area of interest whose time has come. In the last issue of the Journal, I discussed evidence that the extreme skeptics tend to be males who have rational, controlling personalities, and the likelihood that these personality factors are genetically based and have had adaptive value in evolution (Kennedy, 2003). In the same issue, Stanford (2003) commented that gender is likely to be a significant factor in psi research and should always be examined. Palmer and Neppe (2003) reported a study that found the overall association between psi experiences and temporal lobe dysfunction was confounded by greater reports of experiences and symptoms by females. It also can be noted that Watt and Ramakers (2003) reported a study that recruited favorable and skeptical experimenters, which resulted in 6 of the 9 believers being female and 3 of the 5 skeptics being male.
These sex and personality differences raise the likelihood that attitude toward psi is associated with genetically based personality factors. If psi effects are related to attitude and motivation, understanding these deep-seated motivations would seem to be crucial for obtaining and interpreting psi, as well as for understanding the opposition to psi.
Research on attitudes, motivations, and meanings related to psi may provide a foundation for understanding psi that is more replicable and more widely accepted than research attempting to elicit psi. This foundation may provide a more favorable environment for research eliciting psi as well as more productive interactions with other disciplines and with skeptics.
REFERENCES
BEM, D.J., & HONORTON, C. (1994). Does psi exist? Replicable evidence for an anomalous process of information transfer. Psychological Bulletin, 115, 4-18.
HONORTON, C. (1983). Response to Hyman's critique of psi ganzfeld studies. In W. G. Roll, J. Beloff & R. A. White (Eds.), Research in parapsychology 1982 (pp. 23-26). Metuchen, NJ: Scarecrow Press.
HOUTKOOPER, J. M. (2002). [Letter to the editor]. Journal of Parapsychology, 66, 329-333.
KENNEDY, J. E. (1994). Exploring the limits of science and beyond: Research strategy and status. Journal of Parapsychology, 58, 59-77.
KENNEDY, J. E. (2003). The capricious, actively evasive, unsustainable nature of psi: A summary and hypotheses. Journal of Parapsychology, 76, 53-74.
NASH, C. (1989). Intra-experiment and intra-subject scoring declines in Extrasensory perception after sixty years. Journal of the Society for Psychical Research, 55, 412-416.
PALMER, J., & NEPPE, V. M. (2003). A controlled analysis of subjective paranormal experiences in temporal lobe dysfunction in a neuropsychiatric population. Journal of Parapsychology, 67, 75-97.
RADIN, D., & NELSON, R. (2000). Meta-analysis of mind-matter interaction experiments: 1959 to 2000. [Unpublished Manuscript.] Boundary Institute, Los Altos, California, and Princeton Engineering Anomalies Research, Princeton University.
STANFORD, R. G. (2003). Research strategies for enhancing conceptual development and replicability. Journal of Parapsychology, 67, 15-51.
STEINKAMP, F., BOLLER, E., & BOSCH, H. (2002). Experiments examining the possibility of human intention interactions with random number generators: A preliminary meta-analysis [Abstract]. Journal of Parapsychology, 66, 238-239.
UTTS, J. (1991). Replication and meta-analysis in parapsychology. Statistical Science, 6, 363-403.
WATT, C., & RAMAKERS, P. (2003). Experimenter effects with a remote facilitation of attention focusing task: A study with multiple believer and disbeliever experimenters. Journal of Parapsychology, 67, 99-116.
J. E. KENNEDY
Boulder, Colorado
72130.1210@compuserve.com
To the Editor:
We would like to comment on Rupert Sheldrake's and Pam Smart's paper, "Videotaped experiments on telephone telepathy" (JP, 67, 147-166).
The authors present data from experiments on telephone telepathy. In one of their specific analyses they report a significantly different hit rate dependent on whether the subjects were called by familiar or unfamiliar callers. These differences between familiar and unfamiliar callers are depicted in figure 1 (p. 156). The differences are reported to be "very significant statistically (p = .000001)" (p. 156), and according to the authors the significant difference "supports an interpretation in terms of telepathy" (p. 163).
We disagree with this particular analysis and the conclusion. We believe the authors' conclusion to be based on a statistically unsatisfactory procedure.
However, our critique applies only to the comparison of familiar and unfamiliar callers. The overall hit rate of the entire experiment remains untouched by the following reanalysis.
The authors calculated separate hit rates for cases in which familiar and unfamiliar people were calling. Then the differences were compared with a fisher exact test. This procedure results in a misleading overestimation of the difference. It does hot take into account a possible response bias as the following example will demonstrate. A subject is called 10 times, 5 times by an unfamiliar caller and five times by a familiar caller. In each trial the subject guesses to be called by the familiar person. Thus the hit rate for the familiar caller is 100% and for the unfamiliar is 0%. This difference is not due to telepathy but due to a response bias. Interestingly, the authors mention this problem on page 164 but do not take any precautions.
A simple way to take this response bias into account is to ask whether the subject is right with his or her guesses. In the example above the subject would be right in 50% of the cases where he or she guessed a familiar caller and there are no data for unfamiliar callers because he or she never guessed so.
We have reanalyzed the data accordingly (see also Burdick & Kelly, 1977, p. 86). However, the reanalysis was not possible on the basis of the paper published. There are calculation errors in Table 3, where the numbers in the columns do not add up correctly. Additionally, the numbers in Table 3 do not correspond with the numbers in Table 10. On request we received corrected tables from Rupert Sheldrake and arrived at the following conclusion.
Within 175 total calls, the subjects guessed 41 times that the call would come from an unfamiliar person. They were right in 15 cases (hit rate 36.6%, MCE = 25%, binominal p = .07). In the remaining 134 cases they guessed to be called by familiar persons and were right in 61 cases (hit rate 45.5%, MCE = 25%, binominal p = 2 x [10.sup.-7]). The difference between these two subsamples, based on a 2 x 2 table [sup.2]-test, is statistically not significant, [chi square] (1, N = 175) = 1.02, p = .31. Thus there is no significant difference in hit rates in dependence of the familiarity of the caller. Consequently, the interpretation of the authors, namely that the result "supports an interpretation in terms of telepathy" (p. 163), cannot be maintained.
On the basis of our approach we have updated Figure 1 from the publication (p. 156) that shows the relationship between familiar and unfamiliar callers in total as well as for each subject. Both graphs, the original (top) and the updated (bottom), are depicted above to demonstrate the striking differences resulting in the application of appropriate statistics.
[FIGURE 1 OMITTED]
REFERENCE
BURDICK, D. S., & KELLY, E. F. (1977). Statistical methods in parapsychological research. In B. B. Wolman (Ed.), Handbook of parapsychology (pp. 81-130). New York: Van Nostrand Reinhold.
STEFAN SCHMIDT
SUSANNE MULLER
HARALD WALACH
Evaluation Group for Complementary Medicine
Institute of Environmental Medicine and Hospital Epidemiology
University Hospital Freiburg
Hugstetter Str. 55
D-79106 Freiburg, Germany
sschmidt@ukl.uni-freiburg.de
To the Editor:
Schmidt et al. rightly point out that there were some errors in Tables 3 and 10 of our paper "Videotaped experiments on telephone telepathy" (JP, 67, 144-166). We are sorry about this. The corrected bottom row of Table 3 is as follows:
Caller "Trials" "BT" "Carole" "Gayle" "Jayne" Totals 70 0 0 28 32 "Pam" % right p Totals 10 43 .0008 The corrected top and bottom row of Table 10 are as follows: Participant F:actual F:exp UF:actual UF:exp SH Series 3 60 35 10 35 Totals 136 100.2 41 76.8
We agree with Schmidt et al. that there was a response bias. Participants tended to guess that familiar people were calling more often than they actually were. The purpose of our Table 10 was to draw attention to this bias and to quantify it.
In their analysis, Schmidt et al. expressed the success rates as a proportion of the number of guesses. We considered doing this analysis ourselves, but dismissed it because we thought it was of doubtful validity. It inverts the normal order by treating the guesses as if they were the independent variable, or stimulus, and the randomized calls as if they were the dependent variable, or response.
Nevertheless, setting aside these doubts, we have applied Schmidt et al.'s method to data from non-videotaped trials involving 37 participants reported in another paper (Sheldrake & Smart, 2003). These are the data missing from Schmidt et al.'s analysis, whose absence they indicated by a question mark in their Figure over the heading "37P." The results are shown in Figure 1, alongside our original graph. When the success rate is expressed as a percentage of guesses, as opposed to calls, the difference between familiar and unfamiliar callers is diminished, but is still very significant (p = .006).
Clearly it is important to correct for a response bias in favour of familiar callers, but the Schmidt-Muller-Walach method may not be the most appropriate way to do it. We asked Jan van Bolhuis, a statistician at the Free University of Amsterdam, for his advice, and he carried out a randomization test, which seems a more reliable method of tackling this question. This involves taking tables of guesses, as in Tables 3, 5, 6, 7 and 8 in our JP paper, and carrying out random permutations of the guesses (Noreen, 1989). These permutations were carried out in such a way that the number of calls from the different callers remained the same, and so did the number of guesses of each caller's name, but the guesses were assigned to the calls at random in 30,000 different combinations. Thus, for example, in out 28 trials with Thomas Marcovici (Table 8), he said "Gabriel" 7 times, "Luke" 6, "Pam" 3 and "Sam" 12 times. There were in fact 6 calls from Gabriel, 5 from Luke, 10 from Pam and 7 from Sam. The 28 guesses were randomly assigned to the 28 calls in 30,000 different permutations. The totals in the right-hand and bottom margins of the tables remained the same; thus, for example, there were 6 calls from Gabriel, and 7 guesses of Gabriel's name in each permutation.
This statistical method does not alter the bias in guessing the names of familiar callers. Given this response bias, it estimates out how likely the observed pattern was to have arisen by chance. Say x is the number of permutations given as a difference between familiar and unfamiliar callers as great or greater than that actually observed. Then x/30,000 gives the estimated p value that the observed difference between success rates with familiar and unfamiliar callers arose by chance.
In only two experiments does this difference reach statistical significance, but all show differences in the same direction (Table 1). When all the data are combined, using the Stouffer-Hemelrijk method, the overall significance is p = .0003. Applying weightings to take into account the different numbers of trials in each table, the overall significance is p = .0002.
Schmidt et al. are right in pointing out that the response bias reduces the difference in success with familiar and unfamiliar callers. But the overall difference is still very significant by a more precise statistical analysis than theirs. This difference is also very significant in the data from the unfilmed experiments, even when analyzed by the Schmidt-Muller-Walach method (Figure 1).
[FIGURE 1 OMITTED]
TABLE 1 THE SIGNIFICANCE OF THE DIFFERENCES BETWEEN SUCCESS RATES WITH FAMILIAR AND UNFAMILIAR CALLERS FOR THE DATA IN THE TABLES, ESTIMATED BY A RANDOM PERMUTATION ANALYSIS WITH 30.000 RANDOM PERMUTATIONS Table p 3 .009 5 .21 6 .27 7 .22 8 .0009
REFERENCES
NOREEN, E.W. (1989). Computer intensive methods for testing hypotheses. New York: Wiley.
SHELDRAKE, R. & SMART, P. (2003). Experimental tests for telephone telepathy. Journal of the Society for Psychical Research, 67, 184-199.
RUPERT SHELDRAKE
20 Willow Road
London NW3 ITJ, UK
ars@dircon.co.uk
To the Editor:
When designing an experiment to determine the effects of an independent variable, it is important not to confound that variable with another variable. Precisely this error is committed in Watt and Ramakers' recent investigation of the experimenter effect (JP, 67, 99-116). Briefly, they hypothesized that experimenters who believed in psi (believers) would be psi-conducive and that disbelieving experimenter (disbelievers) would be psi-inhibitory. However, in running the experiment, the believers were specifically encouraged to produce positive results and were given financial inducements for doing so, whereas the disbelievers were explicitly encouraged to produce negative results (i.e., a more positive effect in the control trials than in the psi trials) and were given financial incentives for doing so. Thus, in this study, the experimenter belief variable was completely confounded with the experimental procedure used. (This procedure also has the effect of not so subtly communicating the experimenters' hypothesis to the participants in the study, which is never a good idea, as the subjects may then act in such a way as to confirm the hypothesis, regardless of its merits as applied to appropriately blinded subjects).
The "suggestive" evidence of an experimenter effect round in this study could simply be the result of the different procedures used for the believer and disbeliever experimenters rather than being the result of the experimenters' differing degrees of belief in psi. In fact, one interpretation is that both groups actually succeeded at the psi task given them. Had the believers been encouraged to produce negative results and offered financial inducements for doing so, their results might very well have been identical to those produced by the disbelievers. Presumably, in the "real world" the financial reward contingencies for believing and disbelieving experimenters are precisely the same at the outset stage of their careers. To allow any inferences as to the effect of experimenter belief in psi, the procedure used should be the same for both groups (i.e., either both groups encouraged to produce positive results or both groups participating in both "help" and "hinder" conditions in a counterbalanced manner).
DOUGLAS M. STOKES
424 Little Lake Drive, #3
Ann Arbor, MI 48103
Dstokes380@aol.com
COPYRIGHT 2003 Parapsychology Press
COPYRIGHT 2004 Gale Group