React Native

Debugging a Non Reproducible Crash

October 10th 2018, our team releases a new version of our React Native app. We're happy and proud to deliver new features to our users.

But oh the horror! A few hours after the release, we're suddenly receiving an increased number of Android crashes.

 

10 000 crashes on Android

 

Our crash reporting tool, Sentry is on fire!!

All the new errors are of the type ++code>JSApplicationIllegalArgumentException Error while updating property 'left' in shadow node of type: RCTView"++/code>.

In React Native, this can usually happen if you set a property with a wrong type. But how come we didn't catch this error when testing the app? Our new releases are well tested by every member of the team on a panel of multiple devices.

Also the errors seem quite random, they seem to happen for any combination of ++code>property++/code> and type of shadow node. These are the first 3 that were occurring for instance:

  • ++code>Error while updating property 'paddingTop' in shadow node of type: RCTView++/code>
  • ++code>Error while updating property 'height' in shadow node of type: RCTImageView++/code>
  • ++code>Error while updating property 'fill' of a view managed by: RNSVGPath++/code>

 

It also seems to happen on any device and any Android version, based on the Sentry report

Any android version

A majority of Android 8.0.0 crashes but this is consistent with our user base

 

Let's reproduce!

So the first step when you want to fix a bug is reproducing, right? Fortunately, thanks to our Sentry logs, we know what the users are doing before triggering the crash.

Soooo let's see here...

The Sentry breadcrumbs are not so helpful

Welp, for the vast majority of crashes, users are just opening the app and kaboum, the crash happens.

Okay, let's try to replicate then. With the store app installed on 6 different Android devices, let's try to reproduce by opening and quitting the app a few times. No luck crashing the app! All the more impossible to reproduce locally in dev mode.

All right, this seems pointless. The crashes seem quite random anyway. The crash rate is raising at about 10%. Seems like you basically have 1 chance out of 10 to have the app crash on you when starting it.

Analyzing the stacktrace

To be able to reproduce this crash, let's try to understand it, where it comes from, how it thinks...

detective work

 

Well the thing is, as mentioned before, we have a few different errors. And they all have similar yet not entirely similar stacktraces.

Ok then, let's take the first one and analyse:

++pre>++code>java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1
   at android.support.v4.util.Pools$SimplePool.release(Pools.java:116)
   at com.facebook.react.bridge.DynamicFromMap.recycle(DynamicFromMap.java:40)
   at com.facebook.react.uimanager.LayoutShadowNode.setHeight(LayoutShadowNode.java:168)
   at java.lang.reflect.Method.invoke(Method.java)
   ...

java.lang.reflect.InvocationTargetException: null
   at java.lang.reflect.Method.invoke(Method.java)
   ...

com.facebook.react.bridge.JSApplicationIllegalArgumentException: Error while updating property 'height' in shadow node of type: RNSVGSvgView
   at com.facebook.react.uimanager.ViewManagersPropertyCache$PropSetter.updateShadowNodeProp(ViewManagersPropertyCache.java:113)
   ...
++/code>++/pre>

Ok this is where it's happening then, in ++code>android/support/v4/util/Pools.java++/code>.

Hmm, we're pretty deep into the Android support library. At this point, I'm afraid it would take too much time investigating. I want to find the quickest way to fix the issue for my users, then I'll take more time to investigate the root cause behind the bug.

Let's find another way

Another way to find the root cause of the bug is to check the new changes we made in this release. Especially changes that would affect native Android code. 2 hypotheses emerge:

  • we upgraded Native Navigation, a navigation solution that used native fragments on Android for each screen
  • we upgraded react-native-svg. There were a few exceptions related to SVG components but some had no relation with it, so maybe not a good bet

 

At this point since we cannot reproduce the bug, our best bet is:

 

  1. downgrading one of the 2 libs
  2. roll out to 10% users, since it's trivial to do so with the Play Store
  3. Check back in a few users if the crash happens in the new release. That way we validate or invalidate our hypothesis.
Play store rolling out


But how to choose which library to downgrade?

One solution to decide is playing "head or tail", but do we really want to resolve to that?

 

Getting to the bottom of this

Ok, let's dig deeper in the stacktrace we had previously picked, to see if we can determine which library to select.

 

++pre>++code>++code class="language-java">
/**
* Simple (non-synchronized) pool of objects.
*
* @param  The pooled type.
*/
public static class SimplePool implements Pool {
   private final Object[] mPool;

   private int mPoolSize;

   ...

   @Override
   public boolean release(T instance) {
       if (isInPool(instance)) {
           throw new IllegalStateException("Already in the pool!");
       }
       if (mPoolSize < mPool.length) {
           mPool[mPoolSize] = instance;
           mPoolSize++;
           return true;
       }
       return false;
   }
   ++/code>++/pre>

This above was where the crash was occurring. The error was ++code>java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1++/code> meaning ++code>mPool++/code> is an array of size 10, but ++code>mPoolSize=-1++/code>.

All right, how can get there with ++code>mPoolSize=-1++/code>? In addition to the ++code>recycle++/code> method above, the only place where ++code>mPoolSize++/code> is modified is in the ++code>acquire++/code> method of the ++code>SimplePool++/code> class:

++pre>++code>++code class="language-java">
public T acquire() {
   if (mPoolSize > 0) {
       final int lastPooledIndex = mPoolSize - 1;
       T instance = (T) mPool[lastPooledIndex];
       mPool[lastPooledIndex] = null;
       mPoolSize--;
       return instance;
   }
   return null;
}
++/code>++/pre>

So the only way ++code>mPoolSize++/code> can be ++code>-1++/code> is if hitting the line ++code>mPoolSize--++/code> while ++code>mPoolSize=0++/code>. But how is it possible with the condition above ++code>mPoolSize > 0++/code>?

Let's put a breakpoint in Android Studio and check what's going on when we start the app. I mean, there was an ++code>if++/code> condition, this piece of code could not malfunction!

And finally, revelation!

 

revelation

 

See, ++code>DynamicFromMap++/code> has a static reference to a ++code>SimplePool++/code>

++pre>++code class="language-java">
private static final Pools.SimplePool<DynamicFromMap> sPool = new Pools.SimplePool<>(10);
++/code>++/pre>

After a few dozen hits of the play button, with carefully placed breakpoints, we can see that ++code>SimplePool.acquire++/code> and ++code>SimplePool.release++/code> are called on the mqt_native_modules threads by React Native to manage style properties of React component (below the ++code>width++/code> of a component)

dynamic from map from RN thread

But they are also called on the main thread!

from svg thread

Above we can see that it is used to update a ++code>fill++/code> prop on the main thread, typically for a ++code>react-native-svg++/code> component! Indeed, it turns out ++code>react-native-svg++/code> was using ++code>DynamicFromMap++/code> only since version 7 of the lib to improve the performance of native svg animations.

Aaaand the function can actually be called from 2 threads but ++code>DynamicFromMap++/code> is not using ++code>SimplePool++/code> in a thread safe manner. "Thread safe", you say? For those of you who don't know the term, I'll explain in the next part. If already know what this means, feel free to skip until "In our case, here's what could happen"

 

Thread safe, a little theory

So JavaScript is single threaded, so JavaScript developers usually have no need to deal with thread safety.

Java, on the other hand, supports the concept of concurrent, or multi-threaded, programs. Several threads can run within a single program and can potentially access a common data structure, which can lead to unexpected results.

Let's take a simple example, in the image below, both Thread A and Thread B both:

  • read the integer into their memory
  • increment its value
  • return it
explaining threads

Thread B could potentially access the data value before Thread A was done updating it. We expected two separate increments, to result in a final value of ++code>19++/code>. What we potentially get instead, was ++code>18++/code>. Such a scenario, where the final state of data depends on the relative order of thread operations, is called a race condition. The problem with race conditions is that they don't necessarily happen all the time. It's possible in the case above that Thread B had more work to do before incrementing the value, giving enough time to Thread A to update the value. That explains the randomness and the impossibility to reproduce our crash.

A data structure is said to be thread safe, if operations can be done concurrently by many threads, without the risk of race conditions.

When one thread is performing read on a particular data element, no other thread should be allowed to modify or delete this element (this is called atomicity). In our earlier example, if the update cycles were atomic, the race condition would have been avoided. Thread B would have waited for Thread A to complete its increment, and then it would have gone ahead.

In our case, here's what could happen:

 

explaining threads

Since ++code>DynamicFromMap++/code> is holding a static reference to the ++code>SimplePool++/code>, multiple ++code>DynamicFromMap++/code> called from different threads can concurrently call the ++code>acquire++/code> method in ++code>SimplePool++/code>.

In the image above, Thread A is calling the method, evaluating the condition to be true, but hasn't yet decreased the value of ++code>mPoolSize++/code> (which is shared with Thread B) while Thread B is concurrently calling the method and already evaluating the condition which is then true also. Then each separate call would subsequently decrease the value of ++code>mPoolSize++/code> and that's how you get an impossible value.

Fixing it

Looking into how to fix it, we found a PR on react-native that was not merged making it thread safe.

Fixing the crash

We then deployed a patched version of react native rolling it out to our users. The crash was finally fixed, hooray!

Sooo many crashes

 

Finally, thanks to the help of Janic Duplessis (React Native core contributor), and Mikael Sand (++code>react-native-svg++/code> maintainer), the fix will be included in the next ++code>0.57++/code> minor for React Native

 

This bug took us some effort to fix but was a great opportunity to dig deeper into ++code>react-native++/code> and ++code>react-native-svg++/code>. A good debugger and a few well placed breakpoints go a long way. Hope you learned something useful as well!

 

[EDIT 02/04/2019] For French readers: Check out my talk on the subject @ParisJS!  ?

 

 

Développeur mobile ?

Rejoins nos équipes