摘要:Despite rapid advances in sequencing technology, many commercially relevant species
remain unsequenced, and many that are sequenced have very poorly annotated
genomes. There is therefore still considerable interest in using comparative approaches
to exploit information from well-characterised model organisms in order to better un-
derstand related species. This paper develops a statistical method for automating part
of a comparative genomics bioinformatic pipeline for the identification of genes and
genomic regions in a model organism associated with a QTL region in an unsequenced
species. A non-parametric Bayesian statistical model is used for characterising the
density of a large number of BLAST hits across a model species genome. The method
is illustrated using a test problem demonstrating that markers associated with Bovine
hemoglobin can be automatically mapped to a region of the human genome containing
human hemoglobin genes. Consequently, by exploiting the (relatively) high quality of
genome annotation for model organisms and humans it is possible to quickly identify
candidate genes in those well-characterised genomes relevant to the quantitative trait
of interest.
关键词:Bayesian; non-parametric; density estimation; QTL; BLAST; mapping; comparative
genomics.