Algorithms that bear the increasingly heavy load of manipulating big data can be honed for improved efficiency, and Maryam Aliakbarpour focuses her research on these types of tools. The new Michael B. Yuen and Sandra A. Tsai Assistant Professor in the Department of Computer Science will begin sharing her work and her passion for developing impactful algorithms for realistic models with students and colleagues this fall.
“I’m very excited to join the renowned Rice Computer Science faculty working on applied algorithms,” she said.
“My work tackles fundamental problems in learning theory while providing solutions for more realistic settings. In particular, I design algorithms which solve statistical problems, and I prove they will perform as expected when deployed in environments with some constraints.
“There are many realistic settings that would impose constraints on the algorithms. This could be a lack of computational resources such as time and memory. Alternatively, it could be societal constraints, which we must impose on algorithms to align them with our values as a society, such as privacy and fairness.
She said one example of her research involves the density estimation problem. “Let’s say we have a predefined allocation of resources (time and memory), and we want to achieve an approximately correct solution. Perhaps the initial question can be solved with a specified number of data points but if I give you fewer bits of memory, how much more data will you need to solve the problem?
“At its simplest, the resources for solving a problem form a kind of equation where decreasing one variable will necessarily increase another. I work on quantifying this relationship. I also ponder questions like, ‘How would privacy constraints affect the amount of data one would need to solve a problem?’
“Or sometimes you have all the data and memory you need, but not enough time to solve the problem. What is the tradeoff in the accuracy of the solution if I give you only a very limited time to run your algorithm?”
Her pursuit of better understanding the resource and results tradeoffs in learning theory problems inspired Aliakbarpour to adopt the PAC learning model, with its focus on approximately correct answers. In many cases, an approximately correct answer is good enough. To choose appropriate clothing, for example, knowing the temperature is approximately 90°F (32.2°C) works just as well as knowing the temperature will be exactly 87°F or 92°F.
“The PAC learning model says we have a high probability that we will find an answer if the answer doesn’t have to be perfect,” she said. “We might find nothing, we might find something that is totally garbage, but there is a high probability that we will find an answer and it will be approximately correct. This is a powerful model for solving a lot of problems where a slight amount of ambiguity is acceptable.
“Usually, it is difficult to solve the problem ‘exactly’. But, with the power of randomness, we can solve many problems approximately. If you are going for the best answer but your resources are limited, you may have to compromise and settle for something approximately correct.”
To introduce students to the beauty of approximation and the satisfaction of appropriately rebalancing resources, Aliakbarpour will teach a seminar course using papers like “Fast Learning Requires Good Memory” by Ran Raz. In addition to their paper discussions, the students will focus on learning problems in the context of different constraints on privacy, time, memory, and other resources.
“This paper’s results focus on how a memory constraint will affect learning. Raz’s paper focus on a parity learning problem which can be solved with Gaussian Elimination. This is a very simple algorithm; in fact, we teach students in high schools to solve a system of linear equations with this algorithm,” said Alikabarpour.
“It turns out using this simple algorithm, we need to be able to store all the points we see with enough memory then solve the problem. What Raz has shown in this paper is that, if I am allowed to store a small fraction of those data points in the memory, I would need exponentially more data points to solve the problem. What is surprising about this result is that it points out that very tiny changes in memory settings can actually change the entire landscape of the problem!”
The general theme of her research is how random approximation helps solve challenges, with a primary focus on theory problems. Aliakbarpour said statistical inference is the theoretical foundation of data analysis, encompassing many algorithmic tools to be used by data scientists.
“My goal is to adapt these tools to work in more realistic settings that have not been considered before. Some of these new settings follow the novel computational models that are used in the current practices of data science where computational resources are scarce. Other settings of interest are the ones that address our societal concerns,” she said.
“I specifically focus on algorithms that preserve the privacy of individuals whose data is being used. Other researchers in the field of learning theory are working on algorithms that are fair. We cannot forget or forego our social concerns for how the data is treated.”