Rice CS Ph.D. student Senthil Rajasekaran, alumna Suguman Bansal (Ph.D. ‘20), and their advisor, University Professor Moshe Vardi, have published new research at the International Joint Conference on Artificial Intelligence (IJCAI ’23). Their paper, Multi-Agent Systems with Quantitative Satisficing Goals, addresses a very persistent problem in the world of multi-agent computer systems: finding equilibria.
Game Theory And Multi-Agent Systems
Game theory studies the decision-making process in any situation that could be considered a game. Contrary to what may come to mind when you hear the word, a “game” is defined here as any instance where a group of agents (people, programs, etc.) strategically interact with each other in order to achieve their own goals.
Algorithmic game theory applies this study to the field of computer science, where the “players” are elements of a computer-based system and the “goal” is a numerical reward given when a certain criteria is met.
The team’s work focuses on the intersection of algorithmic game theory and the field of formal verification. Broadly, formal verification is the process of mathematically verifying the correctness of software systems — making sure the schematics and the code match up to prevent mistakes.
Algorithmic game theory is commonly used to analyze multi-agent systems (MAS). Multi-agent game scenarios are run to see how well the agents in a system can achieve equilibrium — a situation in which no agent is given a reason to change its own behavior. But actually making that happen has proven very difficult.
There are, according to Rajasekaran, two big problems with achieving this kind of equilibrium, also known as Nash equilibrium. The first is that analyzing the sheer number of outcomes is incredibly difficult, and becomes harder with each new agent added to the system.
The second is that agents within a typical MAS always seek to improve their payoff, even if that improvement is miniscule in size. If, for example, an agent were given a choice between 1 and 1.0000000001, it would choose the latter because it technically represents an increased payoff.
“This tendency to always seek improvements to their own payoff – no matter how miniscule – destroys the ability for mutually beneficial cooperation,” Rajasekaran said. “Would you be able to trust someone that would choose a single extra penny for themselves over a million dollars for everyone else?”
Compounding the second problem is the fact that agents in an MAS are also often set up to prefer a reward now rather than later, because the payoff decreases by a set amount with each increment of time that passes.
Satisficing Outcomes: Aiming For “Good Enough”
This new research outlines a way to possibly get around these problems: the use of quantitative satisficing goals.
As the name implies, a “satisficing goal” or outcome is not as optimized as possible, but still good enough to satisfy the goal. The goals, in this case, are thresholds set for the agents in a system.
According to Rajasekaran, these goals serve as buckets, separating payoffs into certain thresholds an agent can try to meet. So if an agent had a predefined threshold of 3, it would aim for a payoff higher than 3 and change its strategy if the payoff was lower. In other words the agent is aiming for a payoff “good enough” to satisfy the threshold.
Having that predefined threshold to meet, says Rajasekaran, turns a “quantitative setting into a qualitative one.” This allows for efficient algorithmic techniques to find equilibria but removes the richness of the previous payoff structure which allowed for many more possible outcomes rather than just meeting a threshold or not. In order to recapture this richness the work extends this analysis to allow for a single agent to have multiple thresholds.
These thresholds can be set as close together as possible, and you can build in as many of them as you want. For example, if you had the thresholds 0,1, 2, 3, 4, and 5, each agent would try to satisfy as many of those thresholds as possible by getting the highest number possible as a payoff.
A payoff of 3.5 would satisfy thresholds 0,1, 2, and 3, but not 4 or 5, so the agent would only change its strategy if it could meet the new threshold of 4 . This approach lets the authors analyze when an agent might be motivated to change its strategy even if there are more than two possible outcomes.
The paper proves that this can all be done in PSPACE, or polynomial space, an important element for a problem with multiple agents that can be in multiple states.
“This is kind of like having our cake and eating it too,” Rajasekaran said, “Because by making many thresholds that are very close together we can approximate optimization very well without running into its pitfalls.”
Next Steps
So far the work is only theoretical, and Rajasekaran warns that there are some caveats. For one, agents have only been allowed to behave deterministically so far, though the goal is eventually to test this approach with probabilistic behavior.
Another limitation is that the discount factor — the decrease in reward by a set amount over time — has to be in the form 1/n for the current model to work which, Rajasekaran says, “doesn’t model real economic scenarios very well.”
The team is already pursuing the next phase of this work, striving for a version of this approach that can satisfy these limitations, emphasizing that the “modularity” of a satisficing goal approach yields “efficient and useful behavior in a setting where many fundamental questions lack answers.”