The Data Science Transdisciplinary Area of Excellence (TAE) organizes and sponsors different kinds of events, which are described below. Explore events in the menu on the right hand side of the page. You can find webcasts of some talks on the "Webcasting" page. Subscribe to our Google Calendar to keep in touch.
Data Salon
The Data Salon is our signature event. It is an informal gathering where researchers on campus share ongoing data-related work with the data science community, with the objective of communicating with those outside of their immediate field. We then open discussion to all attendees with the goal of developing and identifying strategies, methods and related questions of interest to the researcher and data scientists. The concept of a "salon" is for a researcher to present problems and research opportunities in a domain-independent fashion that invites contributions and collaborations from different disciplines, rather than an evaluation of the merits of a specific result, as would be the case for a conventional research seminar.
Invited Speaker Series
For our Invited Speaker Series, we invite leaders in data science from off campus, including researchers, administrators, executives and policy makers from universities, institutes and organizations, to share their recent research developments or their insights on the data science movement.
Other Events
We organize events such as faculty-student mixers, workshops, lectures and data competitions.
We also sponsor/endorse seminars and colloquiums organized by various departments and units on campus. These events are often not initiated by the Data Science TAE, but by the individual departments or units.
Contact us if you would like to lead a discussion in a Data Salon event, nominate an invited
speaker or request our sponsorship of a seminar that your department/unit is organizing.
All Events
Imputation Performance for Degraded
Ancient Mitochondrial Genomes
 Ancient DNA (aDNA) research frequently targets mitochondrial genomes due to their high cellular copy number and well-characterized phylogenetic architecture. However, post-mortem degradation driven by hydrolysis, oxidation, microbial activity, and environmental conditions fragments endogenous DNA and complicates downstream analyses. Despite widespread use of minimum coverage thresholds (typically 3–10x) in aDNA studies, no systematic benchmarking has defined the sequencing depth required for accurate mtDNA haplogroup classification. Additionally, while imputation methods effectively recover missing genotypes in nuclear DNA studies, their application to haploid mitochondrial sequences remains underexplored.
We address these gaps using a reference panel of 46,791 complete mitochondrial genomes and 3,500 simulated aDNA datasets spanning 0.25–15x coverage, generated with gargammel. Reads were processed through EAGER (v2.5.2), and consensus sequences were classified via Haplogrep3. We compared imputation accuracy between MAVEN, our novel HMM-based tool, and the existing kNN-based method MitoImp. Results indicate that 10x coverage represents a critical threshold for reliable haplogroup assignment, with diminishing returns beyond this depth. At ultra-low coverage (<2x), MAVEN outperformed MitoImp, especially when filtering for Haplogrep3 quality scores ≥0.90. However, accurate sub-haplogroup resolution remained challenging, reflecting fundamental constraints on imputing sparse haploid data. These results provide the first systematic framework for coverage requirements in ancient mtDNA analysis and clarify the boundaries of imputation-based approaches for degraded specimens.
91ÉçÇø the speakers:  Matthew V. Emery is an Assistant Professor of Anthropology at 91ÉçÇø. He is interested in a wide range of biomolecular applications in forensic, anthropological and the (bio)archaeological sciences. To date, Emery’s research has focused on applying ancient DNA and next-generation sequencing methods to highly degraded archaeological and forensic human bones and teeth.