At the 30th conference for Intelligent Systems for Molecular Biology (ISMB), Mohammadamin Edrisi, PhD student in the Nakhleh Lab in Rice University’s Department of Computer Science, received an honorable mention for the Ian Lawson Van Toch Memorial Award for Outstanding Student Paper on work toward a new scalable method for single-nucleotide variants (SNVs).
“ISMB is one of the two premiere computational biology conferences. Getting a paper published there and receiving an honorable mention for an outstanding student paper award are extremely competitive,” explains Luay Nakhleh, Professor of Computer Science and of BioSciences and the William and Stephanie Sick Dean of the George R. Brown School of Engineering. “This is a testimony to Mohammad’s great technical and communication skills.”
"Receiving this acknowledgement was really surprising for me," Edrisi says, "but I am so grateful for the chance for our group's research to be recognized. This has helped to increase my confidence in giving presentations and sharing this important research with the world."
Edrisi's co-supervisor, Huw Ogilvie, Assistant Research Professor of Computer Science at Rice, gave feedback on the paper before the presentation. Edrisi says, "That input was critical for ensuring I had a more effective talk and ultimately for this honorable mention. He suggested I replace formulas with figures that show examples—and as a result, the presentation became much easier for the audience to understand and I think had the biggest impact on its success."
ISMB is both the longest running and largest conference in computational biology and bioinformatics in the world. It is the flagship meeting of International Society for Computational Biology (ISCB) which is a global society "advocating for and advancing scholarship, research, training, outreach, and inclusive community building in computational biology and its professions."
Edrisi's paper, "Phylovar: Toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data," has also been published in Bioinformatics, the top journal for bioinformatics and computational biology. The paper’s collaborators include PI Luay Nakhleh, Co-PIs David Posada and Hamim Zafar, and co-authors Monica V. Valecha, Sunkara B. V. Chowdary, Sergio Robledo, and Huw A. Ogilvie.
An SNV is the substitution of a single nucleotide in the genome. These variations can not only pinpoint traits in human subjects but also their susceptibility to a wide range of diseases, including cancer and Alzheimer's disease.
The more common method to extract genetic information is called bulk sequencing, and has varied uses—including for identifying common mutations in cancer and even sequencing the genome of different animals. A recently developed method for better, more accurate sequencing, called single-cell sequencing, extracts the genomic content of the individual cells one by one. However, this technology can be error-filled, which makes the ensuing data noisy and fuzzy.
"For quite some time, our group has been taking into account the evolution of the cells that helps us to recognize and identify errors. At the same time as we identify SNVs, we infer and reconstruct the evolutionary history of the cells (the phylogenetic tree). This simultaneous inference is called phylogeny-aware SNV detection," Edrisi explains. "The novelty of this research is that we are extending this approach to whole-genome sequencing data. This scale is unprecedented in this league of methods."
Using software engineering, Edrisi and his group were able to reduce the run-time of the model, enough that this program can be run on a personal PC or group of servers. Thus, the new methodology is now scalable to single-cell whole-genome (scWGS) and whole-exome sequencing (scWES)—a more precise method to identify SNVs across a person's entire genome. Further, it opens up the opportunity for this methodology to incorporate other types of mutations beyond SNVs.
The key innovation of the researchers' work was their approach to phylogenetic tree inference by matrix multiplication, a key component of machine learning. They took advantage of the Python package for scientific computing, NumPy, to address bottlenecking issues (which lead to delays in processing). Thus, other researchers who want to use the same approach for phylogenetic research can now do so faster.
“While the paper has multiple authors, not only did Mohammad do almost all the work, but he also led the project,” says Nakhleh. “There is no question in my mind that, should he choose to pursue a faculty or research position, Mohammad will go on to become a top researcher in computational biology.”