Rice University alumnus Matthew Schurr (B.S. in CS ‘16) develops and implements tools for information technology (IT) professionals to extract and analyze data within their network infrastructures.
Although he didn’t know anything about ExtraHop when he met its co-founder –Rice Computer Science alumnus Jesse Rothstein – Schurr said he was searching for job where the work he did would make an impact.
“In my internships, I didn’t really get a sense that I was producing value for the company. Sometimes, I even wondered if I was earning my salary. But at Extrahop, I saw a lot of opportunity to learn and solve interesting problems that would have an impact on their core business,” said Schurr.
ExtraHop builds appliances that enable their clients to passively collect, analyze, and draw insights from the packets traversing their internal network infrastructure, and provides built-in visualization and machine learning tools for analyzing the data they collect.
Schurr has implemented a high-performance parser for the popular WebSocket protocol which powers most real-time features on modern websites like chat software. He’s also launched the GeoIP Trigger API, which enables ExtraHop customers to collect and make decisions based off of location-based data. For example, a customer might wish to record repeated SSH login attempts from IPs from specific countries and flag them for investigation.
Then, Schurr began working on a trace appliance project in which he takes a lot of pride.
“I had a pretty big impact on that,” he said. “Our Trace Appliance (ETA) supports capturing and indexing incoming packets at 25+ Gbps without falling behind. If you have an incident – like a data breach – forensics people want to be able to look back at the packets to find out where and how the breach began and what was exfiltrated. We provide that.”
Schurr and his team focused on two search index problem areas: memory usage and durability. He reduced the search index memory usage by up to 90% and improved durability by adding a recovery feature capable of restoring an ETA with over 300 terabytes of stored packets (to its pre-crash state) in less than 20 minutes with very low data loss.
Recently, Schurr expanded his skillset in a different direction by adding distinct count metrics to the ExtraHop Discover Appliance (EDA). A distinct count metric displays a numeric value which corresponds to the number of unique values placed into an underlying set between two points in time.
“This feature was heavily requested by customers. Previously, if a customer wanted to count the number of unique users who accessed a resource – for example, the number of unique IP addresses visiting an LDAP server – there was no way to do it efficiently."
Image of a per-country breakdown of unique IPs accessing a service in the ExtraHop UI.
Schurr said the value added to their platform by this feature was significant. But, his team had to step back and make sure it was feasible to provide that value.
"Our metrics are time-indexed by breaking time into discrete 30-second cycles. In each 30-second cycle, we would need to store a set from which the distinct count could be derived. When customers queried for intervals larger than 30-seconds, we would need to calculate the union of the sets for each 30-second cycle overlapping their query interval," he said.
Image of ideas for creating a union to answer queries for usage longer than the 30-second time-indexed intervals.
“Our customers’ sets could easily contain tens of millions of values. For our platform, the cost of storing and merging sets of this magnitude using a traditional set implementation such as a hash set was prohibitively expensive. To make distinct count metrics feasible, we had to lower their cost by decreasing memory and computation requirements.”
“We ended up looking at an alternate probability-based set implementation called HyperLogLog (HLL). HLL sets are very desirable because they use configurable constant memory regardless of the size of the underlying set and the union operation occurs in constant time. Unfortunately, there are some trade-offs. For example, it’s not possible to test membership or enumerate the values in a HLL set. HLL also does not provide an exact cardinality, but instead yields an estimate that is +/-1% of the true cardinality. I implemented a high-performance variant of HLL and integrated it into our product where it serves as the backing set for distinct count metrics.”
Although Schurr is working deep inside a world that was unfamiliar to him when he interviewed for the job, he attributes much of his success to the CS program at Rice.
“Rice got me used to picking up new technologies quickly,” he said. “For my job, the most useful courses were the C classes (computer systems, operating systems, and networking). Eugene Ng’s networking class was one of my favorite classes at Rice and is very applicable to my job.
“What makes us (ExtraHop) unique is that we work in an arena where software performance really matters. We can’t always throw more hardware at a problem to improve performance - we have to be able to process packets at 100+ Gbps on a single appliance, so we have to be very careful. Every line of code we write is important.”
What could feel like intense pressure to keep system performance levels high doesn’t faze Schurr. “I think it is fun to see how much performance you can get out of a machine,” he said. “Plus our work environment is very collaborative and laid back, with flexible hours and time off.”
“I like to solve interesting problems, and there are a lot of high-impact problems to work on at ExtraHop. My team and I just recently wrapped up work on Network Activity Maps, a real-time in-browser visualization of network topology powered by WebGL. I’m looking forward to the next challenge.”
To learn more about Schurr’s recent projects, read his ExtraHop blog posts: Hacker, Interrupted and Espionage through Chrome Extensions.