Rice University Computer Science Ph.D. student Tiancheng Xu builds hardware accelerators to speed up dense data computational software programs. Recently, he presented a hardware accelerator he developed with his advisors Alan Cox and Scott Rixner at the IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
“I enjoy working on hardware accelerators for application domains, and analyzing genome data is an intriguing problem to me,” Xu explains.
“When I arrived at Rice in 2020, the first year of the pandemic, research into the virus and the SARS-CoV-2 genome was rapidly expanding in many directions. The computational aspects of genomics research felt like the perfect topic for me, so my advisors set up a meeting with Todd Treangen.”
Treangen, an assistant professor of Computer Science at Rice, is a colleague of Xu’s advisors, Alan Cox and Scott Rixner. Both Cox and Rixner are acclaimed researchers and inventors in the department, with a wide range of collaborators in diverse fields.
Xu said, “Meeting with Todd, who is an expert in genomics, helped us narrow our focus. Based on his experience and expertise in that domain, Todd pointed us to LoFreq. This is a tool that is important for processing genome analysis—especially viral genomes—but it suffers from a long turn-around time.”
Genome analysis involves a computation pipeline. At the very beginning of the pipe is the genome sequencing, which churns out multiple short segments that are then stitched together in a pattern determined by the reference genome. The process yields high quality data, but errors can still occur in both stages: composition and stringing together.
Meanwhile, the genome data itself may contain small variations, perhaps occurring in only one out of 100 people. These variations may appear as an anomaly in a certain position or a substitution in the sequence. But how can scientists tell if the variation is a computational error or an actual variation in the genome? The process of distinguishing between the two types of discrepancies is referred to as variation calling. LoFreq is an effective variant-caller, but its long execution time has prompted some scientists to choose faster, less effective tools.
Xu said, “LoFreq performs time-consuming (and thus expensive) operations for millions of iterations; the structure of the computation prevents it from a simpler parallelization. If we think of this type of computation as traffic on a highway, then LoFreq is a heavily congested freeway."
“In addition to congestion, LoFreq’s traffic has two latent properties: the fundamental operations are slow and the process involves data dependency. Imagine if vehicles on the highway in our analogy can only proceed at a crawl (slow traffic) and cars at the front of a bottleneck cannot move forward until the vehicles behind them have been sorted (data dependency).”
Xu’s solution was to customize a hardware accelerator for LoFreq. LoFreq still handles enormous traffic, but that traffic is now distributed in a smarter way. To achieve this type of distribution, Xu turned to a field-programmable gate array (FPGA) rather than a central processing unit (CPU) or graphics processing unit (GPU). He said FPGAs allow programmers to physically change the hardware circuits and customize them for the software running on top.
“On our hypothetical freeway, parallel programming on CPUs and GPUs is similar to using routing and scheduling to reduce congestion, whereas programming the FPGAs can physically change the structure of the highway to optimize for the traffic,” said Xu.
Although FPGAs allow programmers full control of the circuits, there are still only a limited number of circuits built into the hardware. Xu said programmers must choose from many different options and combinations for optimizations—each with its own speed-up effect and circuit resource cost—and he enjoys the challenge of designing an accelerator to achieve the highest speed-up given these constraints.
Cox was impressed with Xu’s innovations. He said, “When we first met to discuss LoFreq, its variant calling execution time for the SARS-COV-2 typically ran for hours. Now, with the FPGA hardware accelerator, LoFreq processes SARS-COV-2 data in minutes.”
Xu is proud of the initial accelerator but believes this is no time to coast. He said, “We are pushing the optimization even further, and we want to make the accelerator easier for members of the computational biology community to access and use. We’ve been exploring AWS or similar cloud services as a channel for deploying our accelerator. We want to make it as easy for other scientists to use as pushing a button.”
Tiancheng Xu is a Computer Science Ph.D. student, co-advised by Alan L. Cox and Scott Rixner. He matriculated at Rice University in 2020, after completing his M.S. work in Computer Science at the University of Rochester.