Tandy Warnow

Electrical and Computer Engineering
Tandy Warnow
Professor
  • Bioengineering
  • Computer Science
3235 Siebel Center for Comp Sci
201 N. Goodwin Ave.
Urbana Illinois 61801

Primary Research Area

  • Control

For more information

Profile

Education

  • BA Mathematics, The University of California at Berkeley, 1984
  • PhD Mathematics, The University of California at Berkeley, 1991

Biography

Warnow received her PhD in Mathematics at UC Berkeley (1991) under the direction of Gene Lawler, and did postdoctoral training with Simon Tavare and Michael Waterman at USC (1991-1992). After spending a year in the Discrete Algorithms Group at Sandia National Laboratories in Albuquerque, NM, she joined the Computer and Information Sciences Department faculty at the University of Pennsylvania. Tandy joined the faculty at the University of Texas in 1998, where she was the David Bruton Jr. Centennial Professor of Computer Science. She is now a member of two departments at the University of Illinois - Computer Science and Bioengineering, where she is the Founder Professor of Engineering. She received the National Science Foundation Young Investigator Award in 1994, the David and Lucile Packard Foundation Award in Science and Engineering in 1996, a Radcliffe Institute Fellowship in 2006, and a Guggenheim Foundation Fellowship for 2011. She was elected a Fellow of the Association for Computing Machinery (ACM) in 2015 and of the International Society for Computational Biology (ISCB) in 2017 .

Warnow's main research is in algorithms for statistical estimation problems in computatiional biology and historical linguistics. Among her major contributions are SATe, PASTA, and UPP, three different methods for multiple sequence alignment that provide high accuracy on large datasets (up to 1,000,000 sequences). She also contributed ASTRAL, a method for species tree estimation from multi-gene datasets, that provides high accuracy in the presence of gene tree heterogeneity due to incomplete lineage sorting; ASTRAL is now the dominant method for species tree estimation on large datasets that provides statistical guarantees (i.e., statistical consistency under the multi-species coalescent model). Another major contribution was the development of the "short quartet methods" (with Peter Erdos, Laszlo Szekely, and Mike Steel) for phylogeny estimation, which provided the first methods with polynomial sample complexity for phylogeny estimation. Warnow also developed a phylogenetically-based ensemble method using profile Hidden Markov Models that improves accuracy (both precision and recall) for a number of different bioinformatics problems, including protein sequence classification, metagenomic taxon identification, and ultra-large multiple sequence alignment. Finally, Warnow's collaboration with linguist Don Ringe (Univ of Pennsylvania) led to a rigorous approach to inferring evolutionary histories (both trees and networks) for natural languages, and settled several outstanding conjectures for Indo-European.

Warnow has had several leadership roles in international consortia, including Genome 10K, the Avian Phylogenomics Project and the Thousand Plant Transcriptome Initiative. She was also the Director of the CIPRES (Cyber-Infrastructure for Phylogenetics Research) project (funded by a large ITR grant from NSF), which had more than 10 universities around the country, trained more than 50 PhD students (including many of the computer scientists now working in computational biology), and led to the establishment of the CIPRES Gateway.

Warnow's current work is developing novel machine learning and statistical learning approaches for large-scale phylogenomics (i.e., species tree estimation using genome-scale datasets), metagenomics, protein classification, and bibliometrics.  An exciting new development in her lab is the divide-and-conquer approach, using Disjoint Tree Merger methods, that enable computationally intensive methods to scale to large datasets. The best of these methods (e.g., TreeMerge and GTM) are the work of her current PhD students, and have been shown to dramatically reduce running time without reducing accuracy.

Academic Positions

  • The University of Illinois at Urbana-Champaign, Founder Professor of Computer Science, 2014-present. Special Advisor to Department Head 2019-present;  Associate Head for Computer Science 2017-2018 and 2019-present. Affiliate in the departments of Mathematics, Electrical and Computer Engineering, Bioengineering, Statistics, Plant Biology,  Entomology, and Ecology, Evolution, and Behavior. Affiliate in the National Center for Supercomputing Applications (NCSA), Coordinated Sciences Laboratory (CSL), and member of the Carl R. Woese Institute for Genomic Biology (IGB). Affiliate in the PEEC program.
  • The University of Texas at Austin, David Bruton Jr. Professor of Computer Science, 2003-2014

Professional Highlights

  • Radcliffe Institute Fellow (2003)
  • David and Lucile Packard Foundation Fellow (1996)
  • John Simon Guggenheim Fellow, 2011
  • Published book with Cambridge University Press, Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation, in 2017
  • ISCB (International Society for Computational Biology) Fellow election 2017
  • ACM (Association for Computing Machinery) Fellow election 2015

Teaching Statement

At the undergraduate level, I teach courses in discrete mathematics and algorithm design and analysis, and use problems from computational biology to demonstrate the applications of these skills and techniques to real world problems. At the graduate level, I teach a course CS 581: Algorithmic Computational Genomics. The main focus of CS 581 is on phylogeny (evolutionary tree) estimation, but the course also covers the related problems of computing multiple sequence alignments, genome assembly, and analyzing microbiomes. Students will learn the mathematical and computational foundations in these areas, read the current literature, and do a team research project. The techniques involved include discrete algorithms, graph theory, simulations, and probabilistic analysis of algorithms. Course website: http://tandy.cs.illinois.edu/581-2018.html.

Course Development

  • BIO 540/ CS 581 Algorithmic Genomic Biology. Course website: http://tandy.cs.illinois.edu/581-2017.html. Course description: The purpose of the course is to give each student enough background and training in the area of algorithmic genomic biology so that you will be able to do research in this area, and publish papers. Every year, two or more students from this course have done final projects that were subsequently published in major scientific journals; you can be one of them! The main focus of the course is on phylogeny (evolutionary tree) estimation, multiple sequence alignment, and genome-scale phylogenetics, which are problems that present very interesting challenges from a computational and statistical standpoint. Time permitting, we will also discuss computational problems in microbiome analysis, protein function and structure prediction, genome assembly, and even historical linguistics. Students will learn the mathematical and computational foundations in these areas, read the current literature, and do a team research project. The course is designed for doctoral students in computer science, computer engineering, bioengineering, mathematics, and statistics, and does not depend on any prior background in biology. The technical material will depend on discrete algorithms, graph theory, simulations, and probabilistic analysis of algorithms.

Research Statement

My research combines mathematics, computer science, probability, and statistics, in order to develop algorithms with improved accuracy for large-scale and complex estimation problems in phylogenomics and metagenomics. My major interests include multiple sequence alignment and phylogeny estimation (both gene trees and species trees) and metagenomic analysis, but I also work in Historical Linguistics and Bibliometrics. My current work aims to develop methods for ultra-large datasets (anywhere from 10,000 to 1,000,000 sequences), including datasets that are highly fragmentary and present other real world challenges. We use real data and perform massive simulations to evaluate the performance of methods that we develop, and also collaborate closely with biologists and linguists in data analysis.

Graduate Research Opportunities

My research is currently focused on four topics, and all have multiple open questions where graduate students (MS or PhD) would be helpful. Although the topics are described in terms of biological or linguistic data, the research is to develop novel computational methods that provide excellent accuracy and scalability. All these problems have deep mathematical foundations, and almost all involve NP-hard statistical estimation problems. There are opportunities to prove theorems (if you are mathematically inclined), develop and implement heuristics for NP-hard problems, develop parallel implementations of methods, and analyze datasets. Graduate students should be strong programmers, and mathematical intuition is very helpful as well.

No background is needed in biology or linguistics!!

1. Phylogenomics - estimation of species trees and/or phylogenetic networks from multiple loci. The main focus is on combining gene trees that exhibit conflict due to incomplete lineage sorting or horizontal gene transfer (or even duplication and loss).

2. Metagenomics - taxonomic characterization of very short sequences sampled from environmental samples.

3. Multiple sequence alignment - especially for very large datasets (up to one million sequences) that are highly fragmentary

4. Historical linguistics - and analysis of Indo-European and other language families.

Undergraduate Research Opportunities

Undergraduates with programming skills and interest in developing and studying computational methods in biology or historical linguistics are encouraged to contact me about research projects. No background in either biology or linguistics is needed, but very good programming skills (in Python, for example) are necessary.

Research Interests

  • Bibliometrics
  • Metagenomics
  • Multiple sequence alignment
  • Phylogenomics
  • Big Data
  • Machine Learning
  • Discrete and graph-theoretic algorithms

Research Areas

Research Topics

Books Authored or Co-Authored (Original Editions)

  • Computational Phylogenetics: An introduction to designing methods for phylogeny estimation 2018, Cambridge University Press

Selected Articles in Journals

  • J. Bradley, S. Devarakonda, A. Davey, D. Korobskiy, S. Liu, D. Lakhdar-Hamina,T. Warnow, and G. Chacko. Co-citations in context: disciplinary heterogeneity is relevant. In press, Quantitative Science Sciences (MIT Press),
  • J. Leebens-Mack et al. One thousand plant transcriptomes and the phylogenomics of green plants, Nature, https://doi.org/10.1038/s41586-019-1693-2
  •  S. Pattabiraman and T. Warnow. Profile Hidden Markov Models are Not Identifiable. To appear, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019 (selected papers from ACM-BCB 2018).
  • J.E. Tarver, M. d. Reis, S. Mirarab, R. J. Moran, S. Parker, J.E. O'Reilly, B.L. King, M.J. O'Connell, R.J. Asher, T. Warnow, K. J. Peterson, P.C.J. Donoghue, and D. Pisani. The interrelationships of placental mammals and the limits of phylogenetic inference. Genome Biology and Evolution, doi:10.1093/gbe/evv261.
  • S. Mirarab, Md. S. Bayzid, B. Boussau, and T. Warnow. Response to Comment on "Statistical binning enables an accurate coalescent-based estimation of the avian tree". Science, 2015, volume 350, number 6257, p. 171, DOI: 10.1126/science.aaa7719.
  • S. Mirarab and T. Warnow. "ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes", Proceedings Intelligent Systems for Molecular Biology (ISMB) 2015, and Bioinformatics 2015 31 (12): i44-i52 doi: 10.1093/bioinformatics/btv234
  • M. S. Bayzid, S. Mirarab, B. Boussau, and T. Warnow. "Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses", PLOS One, 2015, DOI: 10.1371/journal.pone.0129183.
  • N. Nguyen, S. Mirarab, K. Kumar, and T. Warnow, "Ultra-large alignments using phylogeny aware profiles". Genome Biology (2015) 16:124 doi: 10.1186/s13059-015-0688-z
  • E. D. Jarvis, S. Mirarab, A. J. Aberer, B. Li, P. Houde, C. Li, S. Y. W. Ho, B. C. Faircloth, B. Nabholz, J. T. Howard, A. Suh, C. C. Weber, R. R. da Fonseca, J. Li, F. Zhang, H. Li, L. Zhou, N. Narula, L. Liu, G. Ganapathy, B. Boussau, Md. S. Bayzid, V. Zavidovych, S. Subramanian, T. Gabaldon, S. Capella-Gutierrez, J. Huerta-Cepas, B. Rekepalli, K. Munch, M. Schierup, B. Lindow, W. C. Warren, D. Ray, R. E. Green, M. W. Bruford, X. Zhan, A. Dixon, S. Li, N. Li, Y. Huang, E. P. Derryberry, M. F. Bertelsen, F. H. Sheldon, R. T. Brumfield, C. V. Mello, P. V. Lovell, M. Wirthlin, M. P. C. Schneider, F. Prosdocimi, J. A. Samaniego, A. M. V. Velazquez, A. Alfaro-Nunez, P. F. Campos, B. Petersen, T. Sicheritz-Ponten, A. Pas, T. Bailey, P. Scofield, M. Bunce, D. M. Lambert, Q. Zhou, P. Perelman, A. C. Driskell, B. Shapiro, Z. Xiong, Y. Zeng, S. Liu, Z. Li, B. Liu, K. Wu, J. Xiao, X. Yinqi, Q. Zheng, Y. Zhang, H. Yang, J. Wang, L. Smeds, F. E. Rheindt, M. Braun, J. Fjeldsa, L. Orlando, F. K. Barker, K. A. Jonsson, W. Johnson, K.-P. Koepfli, S. O'Brien, D. Haussler, O. A. Ryder, C. Rahbek, E. Willerslev, G. R. Graves, T. C. Glenn, J. McCormack, D. Burt, H. Ellegren, P. Alstrom, S. V. Edwards, A. Stamatakis, D. P. Mindell, J. Cracraft, E. L. Braun, T. Warnow, W. Jun, M. T. P. Gilbert, and G. Zhang. "Whole-genome analyses resolve early branches in the tree of life of modern birds." Science 12 December 2014: 1320-1331
  • S. Mirarab, Md. S. Bayzid, B. Boussau, and T. Warnow. "Statistical binning enables an accurate coalescent-based estimation of the avian tree". Science, 12 December 2014: 1250463. Science
  • S. Mirarab, N. Nguyen, S. Guo, L.-S. Wang, J. Kim, and T. Warnow ``PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences." Journal of Computational Biology. December 2014 (ahead of print)
  • N. Nguyen, S. Mirarab, B. Liu, M. Pop, and T. Warnow "TIPP:Taxonomic Identification and Phylogenetic Profiling." Bioinformatics, 2014; doi: 10.1093/bioinformatics/btu721
  • N. Wickett, S. Mirarab, N. Nguyen, T. Warnow, et al. (37 authors). ``Phylotranscriptomic analysis of the origin and diversification of land plants." Proceedings of the National Academy of Sciences (PNAS), doi: 10.1073/pnas.1323926111
  • S. Mirarab, R. Reaz, Md. S. Bayzid, T. Zimmermann, M.S. Swenson, and T. Warnow. "ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation." Bioinformatics 2014 30 (17): i541-i548. doi: 10.1093/bioinformatics/btu462.

Articles in Conference Proceedings

  • E. Molloy and T. Warnow. TreeMerge: A new method for improving the scalability of species tree estimation methods. Conditionally accepted, Proc. Intelligent Systems for Molecular Biology (ISMB) 2019.

Conferences Organized or Chaired

  • Science at Extreme Scales: Where Big Data Meets Large-Scale Computing, Co-organizer of IPAM long program for September-December 2018
  • Advancing Genomic Biology through Novel Method Development, organizer, June 2017, Radcliffe Institute for Advanced Study, Cambridge Massachusetts
  • Co-organizer, Next Generation Sequencing – Algorithms, and Software For Biomedical Applications, Dagstuhl Seminar, August 28 to September 2, 2016
  • IPAM (NSF Institute for Pure and Applied Mathematics) Workshop on Multiple Sequence Alignment, January 2015; co-organized with Sebastien Roch (Wisconsin) and Jim Leebens-Mack (Georgia)

Honors

  • Fellow of the International Society for Computational Biology (2017)
  • Fellow of the ACM (Association for Computing Machinery) (2015)
  • John Simon Guggenheim Memorial Fellowship (2010)
  • Radcliffe Institute for Advanced Study (2003)
  • David and Lucile Packard Foundation Fellowship in Science and Engineering (1996)
  • National Science Foundation Young Investigator Award (1994)

Courses Taught

  • BIOE 298 - Intro Bioinformatics for BIOE
  • BIOE 498 - Intro Bioinformatics for BIOE
  • BIOE 540 - Algorithmic Genomic Biology
  • BIOE 598 - Algorithmic Genomic Biology
  • CS 173 - Discrete Structures
  • CS 196 - Freshman Honors
  • CS 466 - Introduction to Bioinformatics
  • CS 581 - Algorithmic Genomic Biology
  • CS 591 - Advanced Seminar
  • CS 598 - Algorithmic Genomic Biology