Estimation of the number of species extant in a geographic region has
been discussed in the statistical literature for more than sixty years. The focus
of this work is on the use of pilot data to design future studies in this context.
A Dirichlet-multinomial probability model for species frequency data is used to
obtain a posterior distribution on the number of species and to learn about the dis-
tribution of species frequencies. A geometric distribution is proposed as the prior
distribution for the number of species. Simulations demonstrate that this prior dis-
tribution can handle a wide range of species frequency distributions including the
problematic case with many rare species and a few exceptionally abundant species.
Monte Carlo methods are used along with the Dirichlet-multinomial model to per-
form sample size calculations from pilot data, e.g., to determine the number of
additional samples required to collect a certain proportion of all the species with
a pre-speci¯ed coverage probability. Simulations and real data applications are
discussed