Yext Moves to LightStep [x]PM, Improving Application Performance

Yext consolidates the digital knowledge about a business’ brand assets (including people, places, and products – which are often scattered across multiple systems and spreadsheets) into a single centralized record of truth known as Yext Knowledge Manager. Businesses can update customers, partners, and business systems when locations and delivery points move and holiday hours and in-store promotions change. This drives new efficiencies, wins high-intent customers, and cultivates rich interactions. Yext is revolutionizing digital knowledge management for thousands of businesses worldwide.

The Challenge

A Complex, Multi-Platform System

Yext is radically changing the digital brand presence for thousands of businesses globally. Yet, as often happens with rapid company and product growth, Yext has found that increasingly sophisticated software tends to slow down. Growing data volumes and new features requiring input from an increasing number of collaborating services added significantly to the effort required to analyze these slowdowns. Without a quick way to diagnose performance issues in production, performance improvement work often took a back seat.

“Our enterprise customers want SaaS applications with low latency and fast response times, not to mention high availability and reliability,” explained Rob Figueiredo, Vice President of Engineering, Yext. “Coupled with the fact that we’re rapidly growing in terms of customers and transaction volumes, the complexity of our distributed services has expanded dramatically. As we integrated more and more specialized services into our systems, performance optimization became a problem. We needed full visibility across our distributed services to understand the root cause and to diagnose a resolution. Specifically, we wanted a solution that would give us real-time alerts when SLOs (Service Level Objectives) were violated and provide us with the ability to perform root cause analysis.”

In an attempt to address the problem, Figueiredo tried a couple of leading application performance management (APM) solutions that promised the ability to diagnose performance issues quickly. However, they were unable to address Yext’s multi-language microservices environment. “Automatic instrumentation provides you with a transaction timeline of events,” he said. “But it cannot follow a transaction around the system and give you root cause analysis.”

Microservices allow a distributed system to evolve as a set of independent parts, but a complete, global view of the system is still a requirement to effectively manage a production system. Figueiredo indicated the lack of that full view was one area where the leading APMs fell short. He wanted a solution that was end-to-end and always-on in production. “As the other solutions just record a sample of transactions and traces, we were only getting a partial picture,” he said. “In addition, finding these partial records in the system so we could act on them was extremely difficult.”

The Solution

Using Root Cause Analysis and Performance Omniscience

Earlier this year, Figueiredo turned to LightStep. “In comparison to the other solutions we tried, LightStep [x]PM was easy to deploy and manage,” he said. “We were up and running in several days.”

With [x]PM, Yext has visibility across its entire distributed services infrastructure over time. This makes root cause analysis and remediation much easier and more effective for Yext’s engineering team. “LightStep takes the pain out of performance firefighting and allows us to sleep better at night,” Figueiredo said. “We set up SLO alerts that are connected to our workflow solutions such as Slack and PagerDuty. [x]PM tells us about performance problems within seconds and then sends us straight to the root cause in our production system.”

Contrary to the other solutions Yext tried, [x]PM is designed to run always-on in production with negligible overhead. It employs distributed tracing that considers all application performance data and intelligently samples only the most useful information. As a result, customers never miss out on critical information. “We are able to save searches of tags with high cardinality that are pertinent to our business requirements, and we never have to worry about missing a single transaction,” Figueiredo said. “This is of significant value to us. We can also go back in time and get data that is needed to answer or illuminate a particular question.”

Following their initial implementation of [x]PM, Yext elected to expand its use of the monitoring capabilities to include tracking of application performance for specific, top-tier clients and provide bestin-class service. “With [x]PM, we have account-specific performance omniscience. It delivers stats and detailed traces, all broken down per customer, in real time,” said Figueiredo.

The Results

Efficiency and Accountability

Yext is realizing substantial returns from its deployment of [x]PM, including improved efficiencies for the engineering team. Prior to the deployment of [x]PM, weekly production engineering meetings were time-consuming and inefficient.

“We spent a lot of time on detective work, trying to perform root cause analysis on application performance from the prior week,” Figueiredo recalled. “Not only did these problems fester, but we expended a lot of valuable time trying to ascertain the root cause of problems. Our production engineering meetings are now embarrassingly easy. [x]PM is so simple; it tells us about performance problems within seconds, and we’re able to go straight to the root cause in the production system.” Overall, Figueiredo believes LightStep is saving Yext nearly one week a month of staff time in problem diagnostics.

“One of the top priorities at Yext is providing an excellent customer experience, and we measure it via SLOs or Service Level Objectives,” he added. “The percent of our page views across the application that do not meet our SLO thresholds has been reduced by 70 percent. We take great pride in meeting our customer SLOs, and [x]PM is helping us to get closer and closer to 100 percent.”

Figueiredo described a specific customer issue that [x]PM helped resolve: “One of our top customers uses an advanced workflow feature of Knowledge Manager. But that service had scaled poorly with the volume of data and made the experience unacceptably slow. [x]PM made it easy to see the exact SQL statement being run that was causing elongated load times and helped us to pinpoint the root cause in seconds. We reduced the number of page loads outside our target by 90%. This was a huge success.”

Yext also uses [x]PM to maintain a top 10 list of the slowest pages, so it can track and drive improvements in their performance over time. “Using Graphite, we generate a list of the worst offenders and then use [x]PM to access all information about those pages without any sampling. This top 10 list alerts us to issues that we may not have known about,” Figueiredo said. “Getting the analytical clarity over time makes application remediation much more proactive. Without [x]PM, this work is painstaking and error prone, and it likely would have languished in the engineering team backlog for much longer too.”

Figueiredo sees LightStep as a critical business enabler for Yext. “LightStep helps ensure that we deliver on the promises we make to our customers,” he said. “Our customers rely on us to provide them with time-sensitive business information. [x]PM helps ensure that we deliver on this promise by identifying latency and error issues and then enacting remediation quickly and efficiently. It’s been incredible for us.”

Challenges

  • Capturing always-on transaction and trace records
  • Addressing deficiencies of legacy APM solutions
  • Reducing application latency for key accounts
  • Improving productivity of the engineering team

Business Results

  • Monitors top 10 customers with custom SLOs
  • Lowered percent of page loads outside of SLOs by 70%
  • Reduced almost 100% of page loads to less than threshold target
  • Saved 12 weeks of staff work annually in manual root cause remediation

Organization Details

  • Headquarters: New York City
  • Industry Segment: Digital Brand Marketing
  • Employees: 700+
  • Publicly Traded: NYSE: YEXT
  • Market Capitalization: $1B+
Download PDF

Learn from the dreams and the nightmares of those managing production software

No hype – just thoughts about software performance and reliability for modern systems.

Stay Informed