Reports of observational epidemiological studies often categorise (group) continuous risk factor (exposure) variables. However, there has been little systematic assessment of how categorisation is practiced or reported in the literature and no extended guidelines for the practice have been identified. Thus, we assessed the nature of such practice in the epidemiological literature. Two months (December 2007 and January 2008) of five epidemiological and five general medical journals were reviewed. All articles that examined the relationship between continuous risk factors and health outcomes were surveyed using a standard proforma, with the focus on the primary risk factor. Using the survey results we provide illustrative examples and, combined with ideas from the broader literature and from experience, we offer guidelines for good practice.
Of the 254 articles reviewed, 58 were included in our survey. Categorisation occurred in 50 (86%) of them. Of those, 42% also analysed the variable continuously and 24% considered alternative groupings. Most (78%) used 3 to 5 groups. No articles relied solely on dichotomisation, although it did feature prominently in 3 articles. The choice of group boundaries varied: 34% used quantiles, 18% equally spaced categories, 12% external criteria, 34% other approaches and 2% did not describe the approach used. Categorical risk estimates were most commonly (66%) presented as pairwise comparisons to a reference group, usually the highest or lowest (79%). Reporting of categorical analysis was mostly in tables; only 20% in figures.
Categorical analyses of continuous risk factors are common. Accordingly, we provide recommendations for good practice. Key issues include pre-defining appropriate choice of groupings and analysis strategies, clear presentation of grouped findings in tables and figures, and drawing valid conclusions from categorical analyses, avoiding injudicious use of multiple alternative analyses.