2 minute read

BIG DATA

Alumni Feature

DEEP LEARNING

FINDING THE KEY TO UNLOCKING THE WORLD’S COMPLEX PROBLEMS

BY DALE LONG

The worlds of mathematics, science and computing are becoming aligned, like mixing liquids in a chemistry beaker, to explore the impossible, discover medical breakthroughs and make machines more intelligent—all in a concentrated effort to improve people’s lives.

For instance, the development of deep neural “stacked information” networks combined with imaging processes for examining the neurosensory tissue within the core of the human eye is helping ophthalmologists improve the early detection rates for vision loss in diabetics. Elsewhere, these deep learning models are helping pathologists better identify cancerous cells in hopes of correcting the one out of 12 breast cancer biopsies that are currently being misdiagnosed or undisclosed.

As you might suspect, Rose-Hulman alumni are in the thick of deep learning. Ryan Poplin, a 2004 computer science and mathematics alumnus, is leading Google’s Brain Genomics Team that’s looking to produce the short DNA fragments and any significant small differences within them— providing scientists and other researchers with the roadmap to unlock the building blocks to solving such genetic diseases as cystic fibrosis and sickle cell anemia, producing plants with a higher nutritional value or that can tolerate herbicides, and exchanging DNA from persons to solve genetic dysfunctions.

Ryan Poplin is leading Google’s Brain Genomics Team.

Ryan Poplin is leading Google’s Brain Genomics Team.

PHOTO: BRYAN CANTWELL

ENDLESS POSSIBILITIES

“We’re talking about a whole new world for everybody in science, engineering, computer science and math. The possibilities are quite endless,” says Poplin. His team of five engineers is developing a variety of algorithms and problem-solving tools in computational genomics.

This relatively new process, known as “deep variant calling,” is a vast improvement over conventional “variant calling” techniques, which produce thousands of errors and missed variants within each genome sequence segment. The approach helped Poplin and Google-affiliated Verily Life Sciences startup win the highest performance award for single nucleotide polymorphisms, or SNPs, in the Federal Drug Administration’s 2016 variant calling challenge.

“Deep variant represents a significant step towards more automatic deep-learning approaches for developing software to interpret biological instrumentation data,” says Poplin, who added to his academic credentials from Rose-Hulman with a master’s degree in neural computation at Carnegie Mellon University.

He was the lead developer of the Genome Analysis Toolkit, a popular software package for processing next-generation DNA sequencing data. Poplin has been sought as a keynote speaker at conferences and workshops throughout the country for his expertise in data science and computational genomics. Publications by Google Brain team members have been featured in leading computer science journals.

Poplin’s team is taking advantage of technical resources available through Google and its parent company, Alphabet, to meet its research goals.

This produces intelligent machines and processes that will someday improve people’s lives. This is done through deep learning research, a subfield of machine learning, that focuses on building highly flexible models that learn their own features, end-to-end, and make efficient use of data and computation.

UNLOCKING POTENTIAL DISCOVERIES

“Our expertise in these systems is giving us the knowledge to build tools that are accelerating machine-learning research and unlocking its practical value,” Poplin remarks in a telephone interview from Google Brain’s offices amid Google’s headquarters in Mountain View, California.

Google Brain researchers have the liberty of setting their own research agendas and determining level of engagement—from basic methodological studies or more applied research. Then, this work in genomic information is being shared with researchers, physicians and other professionals.

“Big genomic data is moving at such rapid speeds. We’re talking in terms petabytes (a million gigabytes) rapidly growing toward exabytes (one quintillion bytes),” he says. “Through extensions to the Google Cloud Platform, the same technologies that power Google Search and Google Maps are securely storing, processing, exploring and sharing large, complex datasets that are valuable resources to everyone.”