My interests are algorithm and software development for clinical microbiome and metabolome data analysis. Since 2016, I have worked with consumer products, pharmaceutical and biotech companies (through Clarity Genomics) on projects involving machine learning, statistical power analysis, cloud workflow development for genome assembly and annotation and microbiome-metabolome biomarker discovery. I also participate in open-source software development for microbiome research through collaborations with research institutions.

2009-2010; M.Sc. Computer Science (4.0 GPA), Faculty of Engineering, McMaster University, Canada. Thesis supervision by William F. Smyth

2010-2013; Ph.D. University of Lille / Inria Lille - Nord Europe, France. Thesis co-supervision by Hélène Touzet and Laurent Noé

2014-2016; postdoc in Rob Knight's Lab at the BioFrontiers Institute at the University of Colorado, Boulder, USA (2014-15) and the Department of Pediatrics at University of California, San Diego, USA (2015-16)

2016-present; Consultant & Managing Director at Clarity Genomics (Antwerp, Belgium; San Diego, CA, USA; Rhône-Alpes, France)

Languages: English, Russian, Italian, French, Dutch (beginner)

Selected publications and slides

Microbiome profiling of human cancer tissue

"Following recent demonstrations that some types of cancer show substantial microbial contributions, we re-examined whole-genome and whole-transcriptome sequencing studies in The Cancer Genome Atlas (TCGA) of 33 cancer types from treatment-naive patients (a total of 18,116 samples) for microbial reads, and found unique microbial signatures in tissue and blood within and between most major types of cancer."


Extended Figure 2: Performance metrics details discriminating between and within TCGA types of cancer using microbial abundances.

Zhu Q., Mai U., Pfeiffer W. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and ArchaeaNature Communications 10, 5477 (2019)

Phylogenomics of 10,575 genomes

"Here we build a reference phylogeny of 10,575 evenly-sampled bacterial and archaeal genomes, based on a comprehensive set of 381 markers, using multiple strategies. Our trees indicate remarkably closer evolutionary proximity between Archaea and Bacteria than previous estimates that were limited to fewer 'core' genes, such as the ribosomal proteins."


Figure 1: A new view of the bacterial and archaeal tree of life.


Thompson L., Sanders J., McDonald D. et al., A communal catalogue reveals Earth’s multiscale microbial diversity, Nature 551, 457–463 (2017)

The Earth Microbiome Project

"We present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale."


Figure 1: Environment type and provenance of samples.

Amir A., McDonald D., Navas-Molina J.A. et al., Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns, mSystems, 2017

deblur: single nucleotide resolution

"Here we introduce a novel sub-operational-taxonomic-unit (sOTU) approach, Deblur, that uses error profiles to obtain putative error-free sequences from Illumina MiSeq and HiSeq sequencing platforms. Deblur substantially reduces computational demands relative to similar sOTU methods and does so with similar or better sensitivity and specificity."


Figure 2: Benchmarks of OTU picking tools on artificial communities.


Kopylova E. and Smyth W.F., The three squares lemma revisitedJournal of Discrete Algorithms 11, 3-14 (2012)

The Three Squares Lemma

"In Fan K. et al., 2006, it was shown that if two maximally periodic substrings (runs) begin at the same position i, consequently no runs begin at some neighboring position i+k. This is the fundamental idea behind our combinatorial work, in which we provide well substantiated conjectures implying that three neighboring squares in a string force a trivial breakdown of the substring beginning at position i into repetitions of a small period."


Kopylova E., Noé L. and Touzet H., SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data, Bioinformatics 28, (2012)


"SortMeRNA is a software for rapid filtering of rRNA fragments from metatranscriptomic data. The core algorithm is based on approximate seeds and allows for fast and sensitive analysis of nucleotide sequencess. Additional applications include OTU-picking and taxonomy assignation available through QIIME v1.9+ ( - v1.9.0-rc1)."


int main(void) {long long ago; struct by {class ical; int struments;} for (;;) char med; return true;}

Music is a great complement to math, isn't it?

©2020 by Evguenia Kopylova