Fourth-year Ph.D. student Leo Elworth leveraged a Rice University grant from the National Science Foundation to develop a tool that makes the visualization of genome data more easily available to scientists.
“Luay Nakhleh and Marina Vannucci had gotten an NSF grant to do interdisciplinary research,” he said, “and Luay suggested everyone in his group pitch projects to the undergraduates who had been hired as summer researchers. The grant funding allowed us to focus on a solution developed by computer scientists for use by biologists."
The 2016 “big data” grant that Nakhleh and Vannucci, chairs of computer science and statistics, respectively, received from the National Science Foundation is being used to train the next generation of scientists to address problems that are inherently multidisciplinary among mathematics, statistics, computer science and other disciplines.
“Luay suggested that everyone in his research group put some suggestions together,” Elworth said, “and I was definitely interested in working with undergrads on a research project, so I pitched several ideas. There were more proposals than students, so the presenters had no idea if our pitchers would be successful or not.”
Fifteen undergraduates listened to a wide range of projects proposed by faculty members and grad students in CS, Statistics, and BioSciences, and had a chance to speak individually to the presenters. After the students ranked their top-three preferences, Nakhleh made the final assignments and Elworth had three undergrads interested in his ideas.
“It was a huge success,” Elworth said, “and the project has already led to a manuscript focused on visualizing evolutionary diversity across genomes.”
Elworth noticed a trend in the three years he has been researching at Rice: “As I was reading papers on computational genomics, I noticed many high-profile papers using the same types of processes, starting from scratch and building all the tools necessary to run these analyses so they can get to the really interesting part of their genome research.
“You’d think there would be a tool that automates most of the standard comparative analyses, but there wasn’t.It takes days or a week to create the software. It’s the type of project that never gets done because you have to take time to think it through, make it easy to use and available to scientists working with a range of different genomes and file formats. It seemed like a manageable project, and the freshmen could definitely do it.”
The tool Elworth and his three undergraduate students built will help researchers who are studying gene-tree topologies as they explore evolutionary diversity. To more easily study the similarities between genomes, scientists look for patterns in the gene trees and where they are found in genomic alignments.
“Researchers break several genomes down into pieces and run different software on each piece to find the corresponding evolutionary histories,” Elworth said. “If you take a set of genomes and build a collection of trees, you’ll see different stories or histories emerge in different regions of the genomes. We wrote a program to separate the genomes into pieces and build the corresponding trees on each piece. That’s when it starts getting complicated. Which histories are the ones that need the most attention?”
The rapid deployment of these features surprised Elworth most. “We had to keep thinking of things for them to work on. Their work went on GitHub and they wrote nice documentation there as well.
“The intricacy of the features they developed by the end of summer made me realize we could tell these kids to do whatever, no matter how complicated we wanted them to get, and they’d come through.”
In their final week, Nakhleh proposed an idea that opened up new possibilities for the project. Elworth said all three undergraduates were so excited by the idea that they wanted to keep working on it, even during the academic year. “I expect they’ll get busy, but I’ll keep the torch going and if they have time to work, they will definitely be making contributions to the future of the project,” Elworth said.
To download the tool or read more about it go to GitHub.
For perspectives from the undergraduate researchers, see:
Chab Allen
Travis Benedict
Peter Dulworth