Anastasios Kyrillidis, Rice University’s Noah Harding Assistant Professor of Computer Science, is one of 79 recipients of the recently announced Amazon Research Awards (ARA). His proposed research, “Efficient and affordable transformers for distributed platforms,” will build on his previous breakthroughs in optimization for large-scale systems.
Anyone who has used voice commands to communicate with applications like Alexa or Siri has interacted with artificial intelligence (AI) through a deep learning solution. As omniscient as they may seem, these applications do have limitations and computer scientists continue pushing at those boundaries. In fields like natural language processing and computer vision, deep learning models called transformers have gained attention for their success in deciphering very large datasets. But transformers are expensive in both time and memory; these costs limit the models’ utility for commercial and consumer applications and appliances.
Kyrillidis was intrigued by the transformers’ increasing resource requirements which are directly linked to the way the models attach significance to each part of an input. He said, “Transformers are everywhere, and I’m not talking about Optimus Prime or the Autobots. “In machine learning, a transformer is the neural network architecture behind most of the recent advances that “disrupted” — positively or negatively — popular science: think of AlphaFold and protein structure predictions; think of DALLE-2 and text-to-image synthesis; think of ChatGPT and language applications. So far so good; however, the required computational and monetary budget to train such models has become prohibitive for most of us; and by us, I mean everyone except a handful of tech companies.”
It is not yet clear whether such large computational and monetary budgets are always necessary. “Yes, the more the data and the bigger the model, the better. Yet, it remains widely open whether there are intermediate solutions that are less data- and computation-hungry, while still performing surprisingly well. Already, there exist scaling laws that indicate finding the sweet spot among dataset size, model size and computational power parameters is an interesting and vital research open question,” said Kyrillidis.
In his transformer proposal, Kyrillidis expressed concern about the limited accessibility to existing big-budget models with cost and other restrictions that hinder contributions from traditional research ‘players,’ such as academia labs. He said, “We see distributed computing as the modus operandi for training such large models. Of course we are not alone; various protocols already exist to partition dataset/computation across devices and distribute computation costs. We slightly move away from these approaches by exploring trade-offs between computation and performance, and by allowing approximate training dynamics. With this grant, we will focus on this unexplored open question of whether there is redundancy in transformer models that enables sparse model training and leads to smaller, faster, but still accurate models.”
Several graduate students in Kyrillidis’ Optimization Lab (OptimaLab) have already expressed interest in working on the ARA research. Chen Dun and Jasper Liao are two members of OptimaLab that closely work on these issues, from a practical and theoretical perspective. Chen Dun has led the group’s effort on efficient large-scale neural network training (IST project); Jasper Liao has greatly contributed to the theoretical understanding of these efficient techniques.
Amazon funded the recently announced awards in four areas: Prime Video, automated reasoning, Amazon sustainability, and Amazon Web Services AI. The largest category, AWS AI, includes Kyrillidis’ proposal. In addition to their unrestricted funding awards, recipients will have access to over 300 Amazon public datasets, are given promotional credits for AWS AI/ML services and tools, and will have opportunities to engage in hands-on sessions with Amazon scientists and engineers.
Kyrillidis said, “Beyond the funding, the access to Amazon’s public datasets and AI/ML tools is important to our work. Having access to a limited dataset suite often does not reveal all the aspects (negative or positive) that characterize the performance of the proposed ideas. E.g., if there is a single dataset you can work with, your solution (over time) might be overfit to perform well on this particular dataset and will not generalize to other scenarios. The more the available datasets to ‘play with,’ the more probable to find interesting behaviors of the developed algorithms, which further lead to more open questions and move research forward. Finally, having access to datasets from diverse applications — like computer vision, language tasks, and other applications — might reveal the need for algorithms that adapt between different task modalities.”
Although the grant is a one-year, unrestricted award, the purpose of the funding is to continue the development of open-source tools and research that benefit the machine learning community at large, or impactful research that uses machine learning tools on AWS. Kyrillidis is happy to contribute to the open-source community and resources and believes that these efforts should be encouraged more in order to make again traditional players (like academia labs) countable in modern research.