Simplebet is a sports technology and betting company focused on unlocking a new type of sports betting called Micro-Markets. Rather than betting on the outcome of a game (win or loss), Micro-Markets turn every moment of every sporting event into a betting opportunity. You’re able to bet on every plate appearance in a baseball game, every drive in a football game, every shot in a basketball game.
Simplebet has a B2B data product that allows others, whether they are sports betting companies or media companies, to connect to a live stream of odds and surface Micro-Markets themselves.
In order to have complete system visibility and reduce Mean Time to Resolution (MTTR) for all aspects of their system, a wide range of Simplebet teams from machine learning to pricing to front-end use Lightstep on a daily basis. Each team proactively monitors their system with real-time system diagrams to see precisely where an error is, along with using Lightstep’s Change Intelligence, which automatically surfaces metrics, traces, and logs correlated with latency and errors.
Simplebet recently launched its first product — Micro-Market sports betting — in September 2020. This created immediate and exponential growth in traffic. “We went from having no public product to building a product that was capable of scaling to handle millions of bets,” said Dave Lucia, VP of Engineering. “Even before we launched the products, we put a lot of time into load testing the system and using Lightstep to find out where those bottlenecks were and tackle them one by one.”
Although the growth in traffic was expected, Simplebet’s traffic patterns are highly variable due to sports betting’s natural environment. “"When the markets open, we receive a major spike in bets that totals hundreds, if not thousands, of bets per second,” said Bryan Naegele, Head of Games.
“The way that people interact with the system can be different day-to-day, hour-by-hour. If there are 10 games going, that produces a known quantity of traffic; whereas, the consumer side can take on highly fluctuating traffic,” said Bryan.
Lightstep’s Service Health view highlights the greatest changes occurring in a system and compares operation performance. Lightstep automatically gathers latency, throughput, and error rate for each application, as well as important infrastructure metrics. From there, Lightstep will correlate changes in applications to changes in infrastructure to create a full picture of system health. With this, Simplebet is able to handle the unpredictable traffic on a daily basis.
“That traffic pattern is what we have to be able to handle. If you can't measure it and understand what's happening within the system, it's very difficult. You have to take the guesswork out as much as possible. Knowing whether or not, ‘Oh, we have a load balancer issue, or it's a database bottleneck, or there's someplace within the app code that can use some tuning.’ allows us to prioritize where to focus effort,” said Bryan.
Simplebet’s new product created an opportunity to invest in an observability platform. Observability helps developers understand what causes changes in their application and infrastructure, guiding even developers on the job for the first time to understand and revert regressions.
“We noticed that we could really improve our observability in the whole system. We didn't have a way to end-to-end track what was going on between teams and between services,” said Joshua Massover, Data Engineer.
Simplebet chose Lightstep not only based on the full-context observability provided but due to the fact that Lightstep is one of the major OpenTelemetry contributors. OpenTelemetry is typically the first step in any Observability strategy and provides a standardized vendor-agnostic data format that makes instrumentation a breeze. “I'm part of the observability working group for Erlang and Elixir and in the SIG for OpenTelemetry,” said Byran. “When it came time to make a decision as to how the company should approach observability, I obviously had a little bit of a bias as to what was going to positively impact the company.”
“We very quickly got all of our critical services integrated with OpenTelemetry and Lightstep,” said Dave. “OpenTelemetry is becoming more and more aligned with our culture.”
“If you have observability, then you have the key part of the scientific method, which is the ability to measure. If you can't measure, then no matter what you do testing wise, it doesn't matter. You’re still just guessing,” said Bryan. “If you can't observe and know what's going on in the system, it's very easy to waste a lot of engineering resources, making educated guesses, or relying on people's assumptions as to like, how to make something faster, or how to make something more reliable.”
“Having Bryan on the team helped us get the ball rolling and build up our observability story.”, said Dave. “We've been working upstream with the Elixir open source libraries and frameworks that we use. And, we've been pushing commits to those repos, getting tighter integration, and giving back to the community that benefits us in the long run. We're pretty committed to open source at Simplebet, and we're not afraid to jump in and contribute.”
When Simplebet found a performance issue in Lightstep, they were quickly able to see which team was affected. Lightstep is able to streamline incident resolution by automatically highlighting the critical path. This allows teams to focus on the current problem rather than viewing multiple dashboards trying to locate the issue.
“This specific bottleneck began as something that was hard to track down in Kubernetes infrastructure,” said Josh. “Dave [Lucia] was able to set up an alert that tracked our long-running requests. We discovered excess time spent from when the request was sent over the wire to when another service was picking it up. We knew that there was an infrastructure layer problem or something to be improved.”
Through their work with OpenTelemetry and Lightstep, the team was able to identify the exact, end-to-end request that came in (first leaving the monolith then trying to make a call to the auth system). Within minutes, they were able to pinpoint the issue and quantify the impact. The sporadic behavior of the error compounded the challenges in finding and diagnosing the problem. “In the 99.8 percentile, we were only seeing 20 milliseconds of latency but in the 99.9 percentile it doubled to 40 milliseconds,” said Ariel. “If it was a consistent problem, it would be easier for us to track it down but it’s incredibly hard to find. 20 to 40 milliseconds of latency across millions of requests is a problem.”
"Lightstep provides me insights that were previously rooted in my limited understanding of a complex system. What I love about Lightstep is that it tells you what is actually happening in your system as opposed to what you think is happening. I cannot imagine doing my job without it."
Learn how you can use Lightstep's Change Intelligence to find the root cause when you notice a deviation in your metrics.
- Headquarters: San Francisco, CA
- Industry Segment: Computer Software
- Employees: 2,000+