Novel Bioinformatic Methods for Inferring Reticulation History
Inferring the evolution of history of species derived from hybridization poses special challenges for phylogenetic inference. Both homoploid hybrid and allopolyploid hybrid contain multiple parental genomes with distinct evolutionary histories and inheritance patterns that can resemble webbed networks rather than bifurcating trees. The recent advent of fern-specific suites of nuclear markers for cloning (Rothfels, Li, Sigel, et al. 2015 American Journal of Botany) and target-capture (www.flagellateplants.org), have open the doors for hybrid phylogenomics, but inferring different parental, or homoeolgous gene copies presents substantial bioinformatic challenges. One recent project in the Sigel Lab has been to develop and test a bioinformatic pipeline for inferring paralogous and homoelogous sequences for hybrid fern species. The Sorter of Orthologous Regions for Target Enrichment Reads (SORTER) pipeline, generates allele and homeolog-level alignment for hundreds of target capture loci that can be used for concatenated and multispecies coalescent phylogenomic inference (Mendez-Reneau, Burleigh, and Sigel, in press, Systematic Botany; https://github.com/JonasMendez/SORTER). By testing SORTER on the well-studied P. vulgare complex, we were able to resolve relationships among all diploid species and correctly infer the progenitors of five allopolyploid species on multilabel trees. We are applying this tool to our study of Salvinia systematics, and we think it will be very useful for study of other plant reticulation complexes.
The three stages of the SORTER pipeline with the contig and clustering settings used in this study. Black arrows indicate data processing steps, with each black arrowing pointing to the output for that step. Boxes highlight the final output for each of the three stages of the pipeline. Gray arrows represent the interactions between outputs among different stages of the pipeline. For example, the first step of stage 2 (step 7) utilizes trimmed reads and locus cluster sequences generated in stage 1.