Training a computer to see a chair may be relatively simple, but training it to recognize “chair-ness” is a challenge.
Vicente Ordóñez, a newly minted associate professor of computer science at Rice’s Brown School of Engineering, is taking it on with the help of a prestigious National Science Foundation CAREER Award, given to promising faculty members in the early stages of their academic careers.
The five-year grant for nearly $500,000 will support his efforts at Rice to teach machines to recognize complex visual concepts through their “compositionality.” That would help a machine-learning algorithm not only recognize the characteristics of a table but also classify it as a coffee table, a dinner table, a workbench … and so on.
“This is an area where computer vision systems can be improved,” said Ordóñez, who joined Rice this month after several years on the University of Virginia faculty. “Today, one needs a lot of training data to teach a computer new concepts. If we want to train it to recognize a new type of animal, we need to take a lot of pictures of that animal and feed them into the algorithm so it can learn to recognize the new species.
“But at some point, this is limiting because it involves a lot of data collection and annotation,” he said. “We think there’s a better way, so we’re teaching computer algorithms concepts with natural language, rather than just annotating images.”
Training the machine to recognize “cat-ness,” for instance, would involve images, language and follow-up, not unlike teaching a child. “We’ll go back and forth to correct the computer to explain when it makes mistakes,” Ordóñez said. “That way, we’ll refine concepts it has never seen before. Currently, some of our proposals are not possible or not very effective, and we’re trying to fix that.
“Combining vision and language is one of the main themes of our lab,” he said. “We want to communicate the contents of an image in a way that is easy to understand through language. We also want people to be able to easily retrieve images that fit certain criteria.
“Things work pretty well now to find pictures of cats or dogs,” Ordóñez said. “But if you specify more complex things — I want a picture of a brown dog on top of a table that also has a white cat underneath — it becomes more complicated, even if you have a very large database.”
Ordóñez said the Rice lab will look at the social implications of machine-learning systems and inherent biases in the training data that can skew results.
“For instance, if there are a lot of pictures of women holding a baby in the images we use for training, but not many of men holding a baby,” he said. “This could lead to issues: The computer might recognize babies when they’re held by a woman, but when they’re held by a man, they make mistakes. By building compositional models, we believe some of these errors will be mitigated.”
Ultimately, he hopes interactive portals will entice the public to advance the algorithm by teaching it new primitives, data building blocks that help associate and identify forms in many categories.