By developing artificial intelligence (AI) software that runs on commodity processors and trains deep neural networks 15 times faster than platforms based on graphics processors, Anshumali Shrivastava has brought the future of computing closer to reality.
“The cost of training is the bottleneck in AI. Companies are spending millions of dollars a week just to train and fine-tune their AI workloads,” said Shrivastava, assistant professor of computer science (CS).
Deep neural networks (DNN) are a powerful form of AI that can outperform humans at certain tasks. DNN training is typically a series of matrix multiplication operations, an ideal workload for graphics processing units (GPUs), which cost about three times more than general purpose central processing units (CPUs).
“The industry is fixated on one kind of improvement — faster matrix multiplications,” Shrivastava said. “Everyone is looking at specialized hardware and architectures to push matrix multiplication. People are now even talking about having specialized hardware-software stacks for specific kinds of deep learning. Instead of taking an expensive algorithm and throwing the whole world of system optimization at it, I’m saying, ‘Let’s revisit the algorithm.’”
Shrivastava’s startup company, Houston-based ThirdAI, pronounced “Third Eye,” is dedicated to building the next generation of scalable and sustainable AI tools and rewriting deep learning systems from scratch. ThirdAI’s accelerator builds hash-based processing algorithms for training and inference with neural networks. The technology is a result of 10 years of innovation in finding efficient mathematics for deep learning.
Shrivastava’s lab has recast DNN training as a search problem that could be solved with hash tables. Their “sub-linear deep learning engine” (SLIDE) is designed to run on commodity CPUs. Shrivastava and his collaborators at Intel have shown it could outperform GPU-based training.
The SLIDE deep-learning engine algorithm has demonstrated that ThirdAI can make Commodity x86 CPUs 15 times faster than most potent NVIDIA GPUs for training large neural networks.
“Hash table-based acceleration already outperforms GPU, but CPUs are also evolving,” said Shabnam Daghaghi, a collaborator with Shrivastava who earned his Ph.D. this year from Rice in electrical engineering and computer science. “We leveraged those innovations to take SLIDE even further, showing that if you aren’t fixated on matrix multiplications, you can leverage the power in modern CPUs and train AI models four to 15 times faster than the best specialized hardware alternative.”
“Democratization of AI,” Shrivastava added, “will happen when we can train AI on commodity hardware.”
The following is an excerpt from an article that originally appeared in the 2022 issue of Rice Engineering Magazine.