摘要:Dinoflagellates are a diverse group of unicellular primary producers and grazers that exhibit some of the most remarkable features known among eukaryotes. These include gigabase-sized nuclear genomes, permanently condensed chromosomes and highly reduced organelle DNA. However, the genetic inventory that allows dinoflagellates to thrive in diverse ecological niches is poorly characterised. Here we systematically assess the functional capacity of 3,368,684 predicted proteins from 47 transcriptome datasets spanning eight dinoflagellate orders. We find that 1,232,023 proteins do not share significant sequence similarity to known sequences, i.e. are "dark". Of these, we consider 441,006 (13.1% of overall proteins) that are found in multiple taxa, or occur as alternative splice variants, to comprise the high-confidence dark proteins. Even with unknown function, 43.3% of these dark proteins can be annotated with conserved structural features using an exhaustive search against available data, validating their existence and importance. Furthermore, these dark proteins and their putative homologs are largely lineage-specific and recovered in multiple taxa. We also identified conserved functions in all dinoflagellates, and those specific to toxin-producing, symbiotic, and cold-adapted lineages. Our results demonstrate the remarkable divergence of gene functions in dinoflagellates, and provide a platform for investigations into the diversification of these ecologically important organisms.