文章基本信息

标题：How reliable are radiocarbon laboratories? A report on the Fourth International Radiocarbon Inter-comparison (FIRI) (1998-2001). (Method).
作者：Boaretto, Elisabetta ; Bryant, Charlotte ; Carmi, Israel 等
期刊名称：Antiquity
印刷版ISSN：0003-598X
出版年度：2003
期号：March
语种：English
出版社：Cambridge University Press
摘要：The most recent radiocarbon inter-comparison exercise (FIRI), completed in 2001, was also the most extensive so far, with 85 laboratories participating. The study was designed firstly to assess the comparability of the results from different laboratories and then to quantify the extent and possible causes of any inter-laboratory variation. Radiocarbon dating is universally employed as a dating tool in archaeology, but there is an inevitable diversity of experimental approaches within radiocarbon dating facilities and in this situation the issue of comparability of results amongst laboratories becomes paramount. In keeping with the principles of analytical science, radiocarbon laboratories have always been conscious of the importance of accuracy and precision for their reported results i.e. the ethos of analytical quality control (QC) which in turn is the foundation for the wider concept of quality assurance (QA). The care and effort given to establishing and maintaining primary standards and reference materials exemplify this concern for good quality management within the radiocarbon community.
关键词：Laboratories;Radiocarbon dating

How reliable are radiocarbon laboratories? A report on the Fourth International Radiocarbon Inter-comparison (FIRI) (1998-2001). (Method).

Boaretto, Elisabetta ; Bryant, Charlotte ; Carmi, Israel 等

Rationale

The most recent radiocarbon inter-comparison exercise (FIRI), completed in 2001, was also the most extensive so far, with 85 laboratories participating. The study was designed firstly to assess the comparability of the results from different laboratories and then to quantify the extent and possible causes of any inter-laboratory variation. Radiocarbon dating is universally employed as a dating tool in archaeology, but there is an inevitable diversity of experimental approaches within radiocarbon dating facilities and in this situation the issue of comparability of results amongst laboratories becomes paramount. In keeping with the principles of analytical science, radiocarbon laboratories have always been conscious of the importance of accuracy and precision for their reported results i.e. the ethos of analytical quality control (QC) which in turn is the foundation for the wider concept of quality assurance (QA). The care and effort given to establishing and maintaining primary standards and reference materials exemplify this concern for good quality management within the radiocarbon community.

As early as 1989, Long and Kalin (1990) stressed that it was incumbent on individual radiocarbon laboratories to engage in a formal programme of quality assurance (QA) while Polach (1989) noted that the opportunity for internal checking by individual laboratories in routine [sup.14]C measurement was hampered by a lack of suitable quality control (QC) and reference materials. The work reported here describes ongoing international efforts by means of a laboratory inter-comparison to assure users of laboratory quality and comparability of measurements and to provide suitable quality control and reference materials. This work builds on the previous laboratory inter-comparisons that have taken place over the last 20 years (ISG, 1982; Scott et al, 1991; Rozanski et al, 1992; Gulliksen & Scott, 1995).

A substantial effort has been made by the [sup.14]C community to develop and apply both internal and external QA procedures. FIRI provides a part of these procedures in the form of an independent check of laboratory performance. However, it only provides a spot check of operational performance at the time it was carried out and does not measure consistent performance over a period of time and so should not form the basis of a `league table of laboratory performance'. This is why the FIRI results are published without laboratory attribution.

Objectives

The specific objectives of the Fourth International Radiocarbon Inter-comparison (FIRI) (Bryant et al, 2000) were to provide:

* An unambiguous demonstration of the degree of consistency or otherwise among the results obtained, on a routine basis, from different laboratories. This information is crucial for both laboratories and procurers (researchers and funding agencies).

* Quantification of the extent of, and identification of, the possible causes of, any interlaboratory variation.

* Direct assessment of the comparability of liquid scintillation counting (LSC), gas proportional counting (GPC) and accelerator mass spectrometry (AMS) techniques.

* Creation of suitable, well-characterized quality control and reference materials.

* Assurance of trace-ability of the measurements and provision of an independent check on laboratory performance.

These objectives are directly related to analytical quality control by focusing on experimental accuracy, precision and reproducibility as indices for the assessment and inter-comparison of laboratory performance. Evidence of an acceptable level of analytical quality control is the essential precursor for overall quality assurance. The basic methodology employed in the inter-comparison was to invite laboratories to date a series of reference samples so as to compare their performances and the performance of different radiocarbon methods.

Selection, preparation and testing of control samples

The selection and preparation of large samples of homogeneous [sup.14]C activity (uniform age) which can then be sub-divided are vital components of a successful radiocarbon inter-comparison exercise. Natural materials were sought which were representative of routinely dated materials and whose ages spanned the full range of the applied [sup.14]C timescale. Potential materials that were identified included wood (if possible with a dendrochronological date), peat, bone, marine carbonate and grain, together with specific components of samples such as the cellulose fraction of wood and the humic acid fraction of peat. The degree of preparation varied from a thorough physical mixing (e.g. marine carbonate--turbidite sediment), through grinding and mixing (whole peat), to complete chemical homogenisation (humic acid extraction from peat).

All bulk materials were prepared in a single batch, homogenised and checked by replicate analyses on eight randomly selected aliquots.

Homogeneity testing

Other than for the dendrochronologically dated wood samples, the bulk samples were tested at different sub-sample sizes (reflecting one of the key differences between AMS and radiometric measurement). In all cases, two laboratories checked the sample homogeneity. All the samples were in good agreement with the exception of the turbidite and the modern cellulose samples. For the turbidite, the difference between the two laboratories was later demonstrated to be due to pretreatment. For the modern cellulose sample, the difference between the two laboratories was due to a small error in assessing the modern standard activity on the part of one laboratory. Notwithstanding these difficulties, the results of the homogeneity testing indicated that when laboratories complied with specific instructions concerning sample handling and pre-treatment, all of the samples could be considered to be homogeneous and thus suitable for inter-comparison.

Tests conducted by the participating laboratories

Each laboratory participating in the inter-comparison was invited to measure a total of ten samples drawn from a set of seven core materials within a one-year period. These samples are described in Table 1. They included four dendro-dated samples from the Belfast and German master chronologies, to provide an assessment of laboratory accuracy. Three sets of duplicate samples were provided blind (Kauri wood, Belfast dendro-dated wood and Barley mash) to allow an assessment of laboratory precision.

The reference samples were distributed to over 120 laboratories during 1999 and by the deadline of December 2000, 92 sets of results had been received, with some laboratories submitting more than one set of results. The broad geographical distribution of participating laboratories is shown in Table 2, the radiocarbon techniques employed in Table 3 and a list of the participating laboratories can be found in Table 6.

Relative performance of laboratories

A total of 122 observations out of 1056 (i.e. slightly over 10%) was identified as anomalous (i.e. outliers). From the statistical definition of an outlier, their proportion should have been around 5%. Thus the number of outliers was approximately twice the number that would be expected if they were occurring purely by chance. 39 laboratories (42%) had at least one result classed as an outlier. Of the 39, almost 60% (23) of these had more than one of their results thus classed and over 20% (9) had five or more such results. The distribution of outliers was uniform over the 10 samples, thus, no single sample-type contributed the majority of the outliers. Of the 122 outliers, 87% came from LSC laboratories.

Other sources of variation (pre-treatment, modern standard and background material)

For the turbidite sample, a significant age difference was observed between the acid-leached and the non pre-treated samples. For the whole wood samples, a very small--so practically unimportant, but statistically significant--effect due to pre-treatment was also observed. There was an indication of an association between the presence of an outlier and the modern standard used by these laboratories: further analysis indicated that the presence of outliers was linked to the modern standard used, some laboratories having no access to the primary standards of NIST OxI and OxII. After omission of outliers, there was then no evidence of a difference, on average, for any sample due to modern standard or background materials, with the exception of the near background sample (Kauri wood).

Overall, a relatively small number of laboratories (14%) generated more than 60% of the outlying observations, and the majority of these laboratories use liquid scintillation techniques (including direct absorption). However, it should be noted that there remain a substantial number of liquid scintillation laboratories with none or only one outlier.

Measures of orecision and accuracy--comparing duplicates

Laboratories were asked to measure three pairs of duplicate samples: A and B (Kauri wood, near background activity), D and F (Belfast wood, around 50 pMC (percent modern carbon)) and G and J (barley mash, at approximately 111 pMC) to allow the assessment of laboratory precision relative to the quoted errors. The summary statistics for the differences of the duplicates are shown in Table 4 (note that D/F results are given in years BP).

This analysis showed that, on average, the difference between duplicates is zero (over all laboratories and also for individual laboratories). However, the magnitude of the difference in some individual cases was large relative to the quoted errors (and larger than expected given the interpretation of the quoted error). The implication is that, in such cases, a source of variation may not be completely accounted for in the quoted error. On the other hand, evidence was also observed of agreement between the duplicates, which was in fact better than would be expected on the basis of the quoted errors. This corresponds to an underestimation of precision. The observed differences were adequately described by the quoted errors for approximately 50% of the laboratories.

Samples of known age

Accuracy can only be assessed against known age materials and for [sup.14]C these are typically dendro-dated wood samples, so four such samples were included in FIRI. Consensus values (based on an iterative procedure involving the calculation of a weighted average (Rozanski et al, 1992)) for the samples are shown in Table 5. A different method, based on reliability analysis, was used for the calculation of the consensus value for samples A and B.

The four dendro-dated wood samples included in the list of core samples were D and F (duplicates) from the Belfast master chronology and dendro-dated to 3200-3239 BC ([sup.14]C age of 4495 BP); sample I (also from the Belfast master chronology) which has a dendro-date of 3299-3257 BC ([sup.14]C age of 4471 BP) and sample H from the German oak chronology which was dendro-dated to 313-294 BC ([sup.14]C age of 2215 BP). With respect to the dendrodated samples, it can be observed that the consensus values and the average `master' values are such that the differences are all within the limits of the quoted errors. Thus, the consensus results are in agreement with the master chronology results, so that overall, we can conclude that laboratories are, in general, accurate. For an individual laboratory, the difference (known age-laboratory measured age) for the dendro-dated samples can also be used to assess accuracy. It was found that the differences were distributed around zero, with the majority of results lying in the range of 100 years. Formal calculations showed that approximately 30% of the laboratories had a statistically significant offset.

Improving quality

The reported results from FIRI for each laboratory are in some senses a summary and therefore do not allow further examination of the causes of laboratory offsets (beyond that already reported here). The responsibility for investigating sources of the offset (and if required, amending procedures) rests with the individual laboratories. We have studied the effect of the modern standards and the background materials that are used by laboratories in their analyses and find no evidence that these factors make a significant contribution to the overall variation observed.

In-house policies for the definition and use of standards is important. The FIRI results demonstrate that there remains a need for standards and reference materials to which laboratories have ready access to allow checking and correction. Five categories of standard can be defined for application in the ideal situation viz.,

i) Primary (or modern) standard. The internationally calibrated and certified materials NBS-Ox I and NBS-Ox II.

ii) Secondary standards. Internationally recognised materials such as ANU-sucrose, Chinese--sucrose and the IAEA quality control reference series (C1-C8). Reference materials from TIRI and FIRI are now also available to expand this list.

iii) In-house/inter-laboratory QC standards. Materials selected to represent the type and age of submitted samples.

iv) In-house working standard(s). A bulk supply of homogeneous material that is available in sufficient quantity to allow repeated and frequent analysis. These measurements are intended to monitor and control the reproducibility of the analytical process over time.

v) Background standards. To achieve accurate and reproducible work, and especially with samples older than say four half-lives, it is essential to define the appropriate background signal using "[sup.14]C free" (geologically old) material that has a chemical composition close to that of the sample. The background material should also be subjected to an identical form of any pre-treatment that is applied to the raw sample.

It is clear that programmes such as FIRI are, and will continue to be, necessary. One plan under consideration is that a major inter-comparison, such as FIRI, would be organised every four years but that in each of the three preceding years, a small number of samples would be sent to laboratories to be analysed in a short time and feedback then given. In this way, the `spot-check' nature of FIRI and the lack of continuous monitoring of performance would be remedied. Such a system would have benefits to the participating laboratories and would also provide a better `quality guarantee' to the user communities. All results and a full report on the inter-comparison will appear as a special issue of Radiocarbon in 2003 (Scott et al., forthcoming).

Rewards for the archaeological user

The selection by a user of a laboratory to which samples are sent is dependent on a number of factors, including cost and the time taken to obtain results. The choice may also be dependent on the sample material, on the sample size and on the precision with which the result is required. Laboratories differ in their capabilities to measure very small samples (AMS rather than radiometric); a few laboratories are able to measure to extremely high precision (<15 years) (only radiometric), while specialised pre-treatment procedures for unusual materials may only be available in a small number of laboratories.

But laboratory selection should also be based on an evaluation of the QA performance of the individual laboratory. By the simple fact of participation in a programme such as FIRI, a laboratory is emphasising its commitment to the quality assurance of its results. As in previous inter-comparisons, although the laboratory attribution in any results table is anonymous, a list of the participating laboratories is published (Table 6). Further, individual laboratories are encouraged, if they wish, to publish and publicise their own performance in the inter-comparison and many do so. Users are also encouraged to ask laboratories about their QA policies and should pay attention to laboratory participation in inter-comparisons and more specifically to laboratory use of standards and reference materials (such as TIRI, FIRI and IAEA C1-C8).

That said, the FIRI results have emphasised that on average, [sup.14]C laboratories (whether gas proportional, liquid scintillation or accelerator mass spectrometry) are providing accurate and precise results.

Conclusion

The results of the latest FIRI have demonstrated that there are no significant differences between the main measurement techniques (gas proportional counting, liquid scintillation counting and accelerator mass spectrometry) but there is evidence from some laboratories of small laboratory offsets relative to known age samples. There is also evidence in some cases for over- or under-estimation of measurement precision. Approximately 10% of all results were classified as extreme (outliers) and these results were generated by 14% of the laboratories. Notwithstanding good internal QA procedures, some problems still occur which can best be detected by participation in independent inter-comparisons such as FIRI where the results allow individual laboratories to assess their performance and to take remedial measures.

Table 1. Core sample descriptions

Core Sample description FIRI code * Age/Activity

Kauri wood A, B Near background
Marine turbidite C ~3 half-lives
Belfast dendro-dated wood D, F ~1 half-life
Humic acid E ~2 half-lives
Barley mash G, J modern
Hohenheim dendro-dated wood H < 1 half-life
Belfast dendro-dated cellulose I ~1 half-life

* Code to indicate the material which is being dated

Table 2. Geographical distribution of participating laboratories

Geographical area Number of laboratories

Europe (EU) 35
Europe (non EU) 15
North and South America and Canada 16
Asia and the Far East 15
Australia and New Zealand 4

Table 3. Methods applied

Method Number of laboratories using it

LSC (1) 44
GPC (2) 19
AMS (3) 17
Target feeder for AMS (4) 8
Direct absorption and LSC (5) 2

(1) Liquid scintillation counting.

(2) Gas proportional counting.

(3) Accelerator mass spectrometry.

(4) Laboratories that prepare samples and send
them to AMS laboratories for measurement.

(5) Laboratories that absorb sample carbon in the form of
C[O.sub.2] into a tertiary amine or similar compound and
measure the activity by LSC (generally a low precision method)

Table 4. Descriptive statistics: (Differences
between duplicates DF in years BP)

 Standard
 Average Deviation
Sample Number or Mean of Minimum Maximum
pair of results difference difference difference difference

AB 54 0.029 pmC 0.214 -0.66 0.53
GJ 71 -0.094 pmC 1.085 -4.37 2.76
DF 79 17.4 years BP 97.3 -239 310

Table 5. Consensus values

 Consensus value
Sample Known age (estimated 1 precision)

AB (pMC) - 0.24 pMC (1)
 (95% CI (0.23 - 0.30))
C (yBP) - 18176(10.5) yBP2
DF (yBP) 3200-3239BC ([sup.14]C age 4495BP) 4508 (3) yBP
E (yBP) - 11780 (7) yBP
GJ(pMC) - 110.7 (0.04) pMC
H(yBP) 313-294BC ([sup.14]C age 2215BP) 2232(5) yBP
I (yBP) 3299-3257BC ([sup.14]C age 4471BP) 4485(5) yBP

(1) percent modern carbon; (2) radiocarbon
years before present where present is 1950.

Table 6. Laboratories participating in FIRI

Laboratory name Country

LATYR, La Plata Argentina

Pabellon INGEIS Argentina

CSIRO, Glen Osmond Australia
ANTARES AMS Centre, ANSTO Australia
Arsenal Research Austria
VERA, Universitat Wien Austria
VRI, Institut fur Radiumforschung Austria
 und Kernphysik
IRPA, KIK Belgium
IGSB, Minsk Byelorussia

EIL, University of Canada
Waterloo
AECL, Chalk River Canada

Geological Survey of Canada Canada
Kyushu Environmental Japan
Evaluation Association
Institute for Advanced Japan
Science, Osaka
Palynosurvey Co Japan
CCR Nagoya University Japan
Gakushuin University, Tokyo Japan
Kyoto Sangyo University Japan
Seoul National University Korea

Institute of Geology, Vilnius Lithuania
RJ van de Graaff Lab, Netherlands
Utrecht
CIO Groningen Netherlands

Rafter Lab, Institute of New Zealand
Geological Sciences
University of Waikato New Zealand
EHPL-Env, Ontario Hydro Canada

IOEE Chinese Academy of China
Sciences
Rudjer Boskovic Institute Croatia
Institut fur Fysik, Denmark
University of Aarhus
Institute of Geology, Tallinn Estonia
GSF, Espoo Finland

University of Helsinki Finland
IPSN/LMRE, Orsay France

HIGL, Paris-Sud University France

Tandetron-Gif France
Universite Claude Bernard, Lyon France
Umweltforschungzentrum Germany
Leipzig- Halle
Leibniz, Universitat Kiel Germany
IUF, Universitat Koln Germany
UFZ-CER, PRG, Halle Germany
Institut fur Bodenkunde, Germany
Universitat Hamburg
Heidelberg University Germany
DAI, Berlin Germany

IGR, NLB, Hannover Germany

Universitat Erlangen-Nurnberg Germany
LOIH, Insitiute of Physical Greece
Chemistry, Demokritos
LOA, Institute of Materials Greece
Science, Demokritos
Institute of Nuclear Research, HAS Hungary
Physical Research Lab, Earth India
Sciences Div., Ahmedabad
Physical Research Lab, India
Radiocarbon Dating Lab, Ahmedabad
Birbal Sahni Institute, Lucknow India
CRDIRT, JCPJ, Jakarta Indonesia
University College Dublin Ireland
Kimmel Center, Weizmann Israel
Institute
RDL, University of Rome Italy
La Sapienza
Radiological Dating Laboratory, Norway
Trondheim
Silesian Technical Poland
University, Gliwice
A & E Museum, Lodz Poland
ITN-Sacavem Portugal

Geological Institute, RAS Russia
Geographical Research, Russia
St. Petersburg State U.
Institute of Geography, RAS Russia
Institute of Ecology and Russia
Evolution, RAS
Institute of History of Russia
Material Culture, RAS
IQFR, Madrid Spain
University of Granada Spain
Facultad de Quimica, Spain
Universitat de Barcelona
Tandem Lab, University of Uppsala Sweden
Universitat Bern Switzerland
ETH, Zurich Switzerland
Department of Geology, NTU Taiwan

Office of Atomic Energy for Peace Thailand
School of Geosciences, Queens UK
University Belfast
Research Lab for Archaeology, UK
Oxford
SUERC, East Kilbride UK
NERC Radiocarbon Lab UK

Lab of Radioecology, KIEV Ukraine

USGS, Reston USA
Beta Analytic Inc., Florida USA

NSF Arizona USA

Geochron Labs, Cambridge, MA USA
CAMS/LLNL USA
NOSAMS WHOI USA
INSTAAR, University of USA
Colorado at Boulder
UC Riverside USA

ISGS, Illinois USA

Acknowledgements

This work was supported by NERC (Grant Ref: GR9/03389) and the European Commission (Grant Ref: SMT4-CT98-2265). We also wish to express our gratitude to Mike Baillie, Marco Spurk, Roy Switsur, Glengoyne Distilleries, Ganna Zaitseva, Kh Arslanov, John Thomson and Alan Hogg who provided many of the samples.

References

BRYANT C, I. CARMI, G. COOK, S. GULLIKSEN, D. HARKNESS, J. HEINEMEIER, E. MCGEE, P. NAYSMITH, G. POSSNERT, M. SCOTT, J. VAN DER PLICHT & M. VAN STRYDONCK. 2002. Sample requirements and design of a inter-laboratory trial for Radiocarbon laboratories. NIMB 172: 355-359.

GULLIKSEN S & E. M. SCOTT. 1995. TIRI report, Radiocarbon 37(2): 820-821.

ISG, 1982. An inter-laboratory comparison of radiocarbon measurements in tree-rings. Nature 198: 619-623.

LONG A & R. KALIN. 1990. A suggested quality assurance protocol for Radiocarbon dating laboratories. Radiocarbon, 32(3): 329-335

POLACH, H. 1989. [sup.14]C CARE. Radiocarbon, 31(3): 422-431.

ROZANSKI K, W. STICHLER, R. GONFIANTINI, E.M. SCOTT, R.P. BEUKENS, B. KROMER & J. VAN DER PLICHT. 1992. The IAEA [sup.14]C intercomparison exercise 1990. Radiocarbon 34(3): 506-519.

SCOTT E M, T.C. AITCHISON, D.D. HARKNESS, G.T. COOK & M. S. BAXTER. 1990. An overview of all three stages of the international radiocarbon intercomparison. Radiocarbon 32 (3): 309-319.

Elisabetta Boaretto (2), Charlotte Bryant (1), Israel Carmi (2), Gordon Cook (3), Steinar Gulliksen (4), Doug Harkness (8), Jan Heinemeier (5), John McClure (8), Edward McGee (6), Philip Naysmith (3), Goran Possnert (7), Marian Scott (8) * , Hans van der Plicht (9), Mark van Strydonck (10).

1) NERC Radiocarbon Laboratory, Scotland, 2) Weizmann Institute, Israel, 3) SUERC, Scotland, 4) NUST, Norway, 5) University of Aarhus, Denmark, 6) University College Dublin, Eire, 7) University of Uppsala, Sweden, 8) University of Glasgow, Scotland, 9) University of Groningen, The Netherlands, 10) KIK, Belgium.

* corresponding author: marian@stats.gla.ac.uk

Received 6 June 2002; Revised 5 January 2003.