Computer scientists from Rice University and bioinformatic researchers at Baylor College of Medicine created a new tool to further medical advances that depend on deep analysis of the human microbiome.
“The human microbiome contains trillions of microorganisms that can support or degrade our health,” said Rice Assistant Professor of Computer Science Todd Treangen. “Medical advances in areas like intestinal health are already guided by computational analysis of microbiome samples in public databases such as the NIH Human Microbiome Project (HMP).”
Unfortunately, research is hampered by extra data – contaminant sequences – in the uploaded data samples. Contaminants may be introduced by DNA extraction kits, in the sampling lab environment itself, or by other means. Although ultrapure water and similar negative controls can be used to identify the presence of contaminants, many early donations of microbiome samples in the HMP were not accompanied by negative controls.
Treangen said, “Microbes are everywhere. The original motivation for Squeegee started with a straightforward yet challenging research question: can we identify microbial contaminants within a microbial community without negative controls?
“Our motivation was not to replace negative controls, but rather to provide a computational tool that could remove contamination from publicly available metagenomic datasets like the HMP. These datasets have been used and cited thousands of times, yet, they lacked negative controls.”
While Treangen is encouraged by Squeegee’s performance, he recognizes its role in the overall microbiome analysis process. He said, “We absolutely view Squeegee as a complementary approach, one that can be utilized alongside methods that leverage negative controls (such as the Decontam) to catch all potential microbial contaminants.”
Why is it called Squeegee?
Anyone who has sluiced water off a window or shower wall has probably used a squeegee. As the interdisciplinary research team contemplated what to call their new tool, they realized that it removes residual microbial contaminants from various microbiomes like a squeegee removes residual water from flat surfaces.
“My Ph.D. student, Yunxi Liu – who was first author of the study– also really liked the name Squeegee,” said Treangen with a smile. “But ‘crumb sweeper’ might have been a more appropriate name.
“If we look at a single metagenomic sample, without a negative control it is extremely difficult to say whether a given microbe is a contaminant or not. This is both due to the trace amount of microbial contamination (let’s call this a single bread crumb) and the lack of data to corroborate a contamination call.
“However, if we look at multiple datasets, all from different microbiomes (skin, mouth, gut, soil, etc.), we may then observe multiple breadcrumbs across multiple datasets -- providing a path to follow, to confidently call the putative microbial contaminant a true contaminant. This is especially true if we observe these breadcrumbs in locations where we don’t typically expect to find any bread (dentist’s office, basketball court, pool, etc.).”
The study was published in Nature Communications, a peer-reviewed weekly journal that is highly regarded by scientists around the world. A third-year Ph.D. student, this is Liu’s second appearance in Nature Communications as a first author.
Liu chose the graduate program at Rice because he saw opportunities to make significant contributions in the research areas that mattered most to him. He said, “The most important thing to me is that my research has direct implications to practical problems; it is always encouraging to see our work can immediately help others in the field. The effective collaborative relationship between Rice and TMC lays the foundation for future publications and has both inspired and fueled my day-to-day PhD research.”
Treangen agreed and said, “Over the past four years, Yunxi has demonstrated an uncanny ability to come up with novel and clever approaches to challenging computational problems within bioinformatics. Squeegee is one such example that will aid scientists in their ability to identify and remove microbial contamination from microbiome datasets.”
Squeegee is open-source and available for download at: https://gitlab.com/treangenlab/squeegee
More information about Squeegee can be found in the December 6, 2022 edition of From the Labs, a BCM publication.