VOOZH about

URL: https://thenewstack.io/lyfts-tips-for-avoiding-software-crashes/

⇱ Lyft's Tips for Avoiding (Software) Crashes - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-09-29 09:12:39
Lyft's Tips for Avoiding (Software) Crashes
Software Development

Lyft’s Tips for Avoiding (Software) Crashes

Lyft reduced crashes in its mobile apps by up to 50% in some categories by optimizing the data persistence layer. Here's how.
Sep 29th, 2022 9:12am by Jessica Wachtel
👁 Featued image for: Lyft’s Tips for Avoiding (Software) Crashes

Understanding what caused an app to crash is quite an undertaking. Bugs can happen anywhere in the codebase and vary in complexity and actionability (the engineering effort required for the first improvement). Deep knowledge of the underlying systems and frameworks is necessary in some cases for root causing a crash.

If you don’t know where to start looking for the solution the number of potential entryways can be paralyzing.

I’m guilty of opening an app and double tapping and swiping up to close it if it doesn’t work fast enough or if it even looks like it’s crashing. Guilty AF.

But up until now, I shied away from reading mobile performance articles. However, the recent blog post written by Wen Zhao, staff (Android) engineer at ride-sharing service Lyft, caught my eye because it has a lot to do with how data persistence leads to application crashes.

TL;DR: Monitor those reads/ writes and keep them as low as possible, or at the very least in a reasonable range. Don’t use synchronous interfaces for disk operations. And know those frameworks, people.

Note: The strategies outlined below are platform agnostic, though this article are uses examples from Android to highlight their execution.

Lowest-Hanging Fruit

Zhao wrote that it’s “important to start with the most obvious low-hanging fruit.”

The lowest hanging fruit in app stability is collecting crash stats from Lyft’s in-house observability tools, Bugsnag and Google Play Console. Here’s what was found:

  • Native crashes were not included in Lyft’s internal crash rate tracking. Native crashes occur in the Native/ C++ Layer of the Android Operating system. These are captured and reported differently.Lyft doesn’t pursue any additional crash reporting on them as they are not actionable.
  • Top 10 crashes contribute to 53% of overall crashes. This information was unexpected as there were many types of possible crashes. The chart below details the type and percentage.
👁 Image

Image: Lyft

  • Top crashes were long-lasting and “not actionable.” These positions were held for at least six months because the crashes would require outsized time to fix. Some of these increased slowly over time, slipping under the radar of standard triage and on-call responsibilities.

The top crashes were then categorized into three buckets.

👁 Image

Image: Lyft

The third-party SKD bucket was not-actionable since Lyft has no control over third-party SDKs. They reported the crashes to Google Maps (cause of the crashes) and both teams are working together to resolve. We already know that native crashes are also not-actionable. That left Lyft with Out of Memory (OOM) crashes as the lowest hanging fruit. Instabug gives a good explanation of these.

Targeting OOM Crashes

The Investigation: Lyft engineers reviewed many OOM crash stack traces and found something they had in common — there were calls to a RxJava2 blocking API (e.g. blockingGet() ) when reading values synchronously from a disk.

Let’s look at Lyft’s internal storage solution. When it’s reading data from the disk, it always creates a new IO thread by subscribing on the IO scheduler, reading and caching the data in a PublishRelay and outlining the blockingGet() function from RxJava2.

This approach is problematic for a few reasons in relation to OOM crashes. Per the RxJava docs, the IO scheduler can create an unbounded number of weaker threads. The IO scheduler doesn’t remove the idle threads immediately since it uses CachedThreadPool. Rather the scheduler keeps threads alive for about 60 seconds before clearing them.

And threads aren’t reused either. If there are 1,000 reads a minute then there are 1,000 new threads with each thread occupying approximately one to two MB memory at a minimum leading to OOM exceptions… that’s a lot of threads.

The engineering team the top disk read operations for Lyft’s apps and found the majority of disk reads came from two places in the codebase where the number of reads was exceptionally high at > 2,000 times per minute. The root cause was located.

The Solution: The solution was straightforward since new threads were only created when data was read from the disk. When the app was launched via a cold-start and data was read for the first time, the data was cached in local memory. This allowed all additional reads to happen from the cache and prevent additional threads.

The Results: OOM crashes were reduced as expected. Additionally, native crashes were reduced by 53%. Lyft engineers weren’t expecting such a large impact on native crashes but apparently, the cause of many native crashes was low application memory.

Targeting ANR Crashes

App Not Responding (ANRs) are crashes that take place when the UI thread is blocked for longer than five seconds and (to gracefully put it) the operating system prompts the user to close the app. These aren’t as low hanging as the OOMs but were still actionable.

The Investigation: Bugsnag’s stack trace reports, which also group ANRs with similar stack traces together, were necessary for rooting the cause of the ANRs. Lyft sorted the reports in descending order and found that their use of SharedPreferences was the source of most of the ANRs (also in the persistence layer).

Google recommends calling SharedPreferences.apply() to write and edit data asynchronously. But under the hood SharedPreferences.apply() adds disk write operations to a queue rather than executing these operations immediately. SharedPreferences.apply() executes several lifecycle events on the main thread synchronously. Many operations in the queue = application crash.

In order to translate this new information to the Lyft codebase specifically, they profiled disk write operations and found disk write frequency for Lyft’s applications was as high as 1.5k times per minute. They also found instances where the same value was written to the disk multiple times per second.

Eventually, the root cause was boiled down to the fact that Lyft’s internal storage framework abstracted the underlying storage mechanism, meaning disk storage and memory storage used the same interface. Developers were inadvertently treating disk and memory storage as one and the same.

The Solution: The product teams worked to remove all unnecessary disk writes from their features. Logging was added to audit any additional disk writes. A memory cache was created at the feature level where the additional writes were added. Then the cache was synced with disk storage at frequencies depending on the use case. The disk storage interface was also separated from the memory storage interface.

The Results: There was a 21% reduction of ANRs after a few months of experimentation.

Next Steps 

It was news to everyone that disk storage plays a much more critical role in application stability than previously known. With OOMs and ANRs reduced, a new long-term strategy was put in place that centered around what was learned throughout both investigations.

Lyft is going to continue working on its mobile performance. The next blog post promises to center around growing the actionability of issues in the performance space, by increasing investments in observability and debugging.

TRENDING STORIES
Jessica Wachtel is a developer marketing writer at InfluxData where she creates content that helps make the world of time series data more understandable and accessible. Jessica has a background in software development and technical journalism.
Read more from Jessica Wachtel
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.