Can caches be smart and still be as fast? Or, in more specific application, can caches still perform quickly and effectively even with increased volume and typos, misspellings, and redundancy—all the things we see in real life searches?
Anshumali Shrivastava, associate professor in Rice University’s Department of Computer Science and CEO of artificial intelligence (AI) startup ThirdAI, and Chen Luo (PhD ’19), applied scientist at Amazon Search, addressed this problem in Luo's work. They've developed a new caching technique that goes a long way to answering the question affirmatively, with profound implications that go well beyond its application at Amazon.
What is "caching"?
An enormous search engine, such as Amazon, uses AI to allow customers to get better search results. Even with all this AI, they can have a long latency (delay) because of the overwhelming amount of possible queries. To "cache" a query means to store (or simply memorize) the results of the given query, so that the server doesn't have to go over expensive AI-based retrieval or re-fetch the data every single time the same request is made—shortening the amount of time between typing in a query and seeing the results.
Caching is a common practice: when an Amazon user types in a commonly used particular brand of clothing item, like "Under Armour shorts," the server caches (stores) the query and the corresponding results, so that for the same subsequent query the search results will appear almost instantly without having to retrieve any data.
However, if a user makes a typo or a spurious change and types in "Under Armor shorts" or "Under Armour short," reverses the order of the words and types in "shorts Under Armour" or even "Under Armour brand shorts," the cached result of "Under Armour shorts" will not match in these cases (a.k.a. cache miss). Clearly, caching all five of these queries is an inefficient way to make use of limited space. Cache size can't be arbitrarily increased because of resource constraints.
Machine learning clearly can help here (i.e., recognizing that "Under Armour short" and "Under Armour shorts" are essentially the same queries). However, a machine learning query is expensive and significantly slower than cache latency (less than 2-5ms). "The key goal here is to be able to resolve caches' mistakes without paying extra memory and to avoid prohibitive machine learning latency," says Shrivastava.
What is locality-sensitive hashing (LSH)?
The Amazon Science team recognized that their cache needs were increasing as users searched more and more across their site. "This motivated us to create a kind of cache that can still [work with] a large amount of data without increasing the size of the cache or the cache needs," says Luo.
Locality-sensitive hashing (LSH) uses a hash function to map a string of symbols to a hash "bucket," with similar symbol strings being mapped to the same bucket. This may mean that "Under Armour shorts," "Under Armor shorts," and "shorts Under Armour" are placed in the same bucket, but the system isn't so sensitive that it wouldn't collect other strings like "Under Armour sneakers," "Under Armour shirts," or even "underwear" and "under desk treadmill."
"There's a lot of innovation in this work to make it practical," explains Luo. "How do we compute the hashing faster? How do we avoid doing a lot of repetitions to get different hashes? And also, how do we retrieve the hash barcode without computing the similarity measures? And how do we speed up the indexing and the query time of the hashing?" LSH has been around for over two decades, but Shrivastava and his colleagues at Rice have been working to make the technology faster, resource-frugal, and more accurate, bringing it to a point where the queries could be answered smartly with cache-like latency.
How robust caching works
As a mathematical workaround, Shrivastava, Luo, and their colleagues developed a technique that hashes the same string multiple times, using slightly different locality-sensitive hash functions. There's one canonical query for each set of related queries (so, for example, "Under Armour shorts") that serves as an index to the query results.
Using 36 different hash functions in total, this technique tallies the index that shows up most frequently and retrieves the related results. Results that show up more often are given a heavier edge weight, and results that fall below a certain weight are discarded. So the results that show up the most across buckets are seen as the most relevant search terms.
In practice, a search term "shorts under armor" can lead to the string "Under Armour shorts" being mapped to several buckets and thus being isolated as the correct cache to be retrieved for that particular term.
Using weighted Jaccard similarity, this technique also gives greater weight to product category correspondences than to brand name—in other words, that other exercise brand shorts could show up in the same query as a related search result. A person searching for Under Armour shorts would probably be more likely to be open to searching other brands of shorts than, say, receiving product information about Under Armour underwear.
Using this new methodology, the improvement in search queries was astounding—up to 250 percent improvement. It's also done at cache speed, meaning it doesn't take much more time to complete the function while procuring more accurate results. "AI may be an appealing solution [in working with large amounts of data]. But if you look at the cost, there should be a better way to use AI. So these algorithms can complement AI and make them even better," says Shrivastava.
What are the implications of this work?
In the case of Amazon, ROSE, short for RObuSt cachE, has already been applied in Amazon Search, and may be applied in other areas of business. But it's also relevant to search engines in general, and has wide application for any technology that uses large amounts of data and significant caching—which includes personal laptops. Luo and co-authors presented their ROSE findings at the Amazon Web Conference 2022 as well as a second paper in which they showed how clusters of related queries are used to generate an index query. Their findings were also tweeted by Amazon Science and the CTO of Amazon, Werner Vogels. The Amazon Search management team, especially senior manager Bing Yin, helped push the innovation forward and were very supportive.
Luo's long-term goal is to help make this a general platform, even outside Amazon. There are still some scientific innovations needed to improve the system, fix related bugs, and give it broader application. "Improving something as fundamental in computer science as caches, which directly improves some of the systems used by billions of people around the globe, is a poster child case of where math and algorithms can make a real world difference," says Shrivastava. "What we're seeing is just the tip of the iceberg, because caches are everywhere."