What led to recovery mode?

Hi everyone,

It happens quite a lot that CommCareODK crashes while the users are on the
field. Days later when we check the phones we see that CommCare entered the
recovery mode. Most of the time we cannot upload the forms for unknown reason
(even when connecting to a working internet connection) and we end up clearing
CommCare's local data from the Android app management interface and reinstall
our own deployed app.

I'd like to track down the reasons of these crashes. Is there some kind of
logs I could get from the Android file system ? What protocol should I follow
to sort of this issue ?

Many thanks,

··· -- Charles Flèche mHealth Advisor Télécoms Sans Frontières http://www.tsfi.org Première Urgence - Aide Médicale Internationale http://www.pu-ami.org

Charles,

Sorry to hear this, this is a major problem! We definitely don't expect
phones to enter recovery mode.

Basically this happens when the app install that's currently on the phone
enters an inconsistent state that we can't correct. There are three causes
of this

  1. The app's storage files have been manually changed/modified

Some of CommCare's files non-userdata files are stored on the phone in an
area where external apps can delete them. This used to be a rampant problem
on Android phones which used the SD card to store certain files (this was
common before android 2.2)

  1. The phone's storage layer has corrupted the files stored to the device

This is a problem that we've seen on certain devices that we've been
investigating the prevalence of to identify whether it's real. It's hard
for us to differentiate between certain types of hardware failures due to
the way we encrypt data (since we write user files in an encrypted way,
minor block-level corruption of files that would be easy to spot in
un-encrypted data simply results in a pseudo-random blob looking equally
pseudo-random blob for us).

The kinds of block storage used by mobile devices have physical limitations
on the number of reads and writes they can perform. On good Flash storage
these limits are unlikely to be hit, but on certain counterfeit storage
chips it is a common problem for data reads and writes to fail silently.

We don't believe this to be a common problem outside of Counterfeit SD
cards (and most files on most phones are stored on the phone's storage, not
on an external SD card), but it has come up in the past rarely.

  1. There is a bug in CommCare's code that is resulting in corrupted files

It's possible that CommCare is writing an invalid app structure to the
phone's storage due to an unforseen bug in the code.

The only time this kind of bug could occur that would result in Recovery
Mode being triggered would be immediately after some sort of remote update
that resulted in the phone not having a fully installed and consistent
version of your app. This would not be expected, however, to cause any
problems in submitting unsent data to the server.

Relating to your Problem Specifically

Thanks for letting us know you've been experiencing this problem, and
please submit a bug report every time you see something occur! Some helpful
things for us to know to differentiate between these use cases

  1. What version of CommCare you are using - So we know whether it's
    possible that you are triggering an older bug
  2. What devices are in use: This helps us disambiguate whether the issue
    might be related to "external storage" on older android phones and also the
    likelihood that there might be any hardware problems.
  3. Whether there was a specific circumstance that seemed to trigger the
    issues. Are you frequently using remote updates for the app? Do you know
    whether the issues appeared after some critical event?
  4. How widespread is the issue? Do certain users seem to have the problem
    over and over, or is it randomly distributed between users?
  5. Can you replicate the problem? If so we can help work through it very
    quickly.

I'll follow up on the support channel about how we can help work together
on this moving forward.

Thanks,
-Clayton

··· On Thu, Dec 18, 2014 at 7:37 AM, Charles Flèche wrote: > > Hi everyone, > > It happens quite a lot that CommCareODK crashes while the users are on the > field. Days later when we check the phones we see that CommCare entered the > recovery mode. Most of the time we cannot upload the forms for unknown > reason > (even when connecting to a working internet connection) and we end up > clearing > CommCare's local data from the Android app management interface and > reinstall > our own deployed app. > > I'd like to track down the reasons of these crashes. Is there some kind of > logs I could get from the Android file system ? What protocol should I > follow > to sort of this issue ? > > Many thanks, > > -- > Charles Flèche > mHealth Advisor > Télécoms Sans Frontières http://www.tsfi.org > Première Urgence - Aide Médicale Internationale http://www.pu-ami.org > > -- > You received this message because you are subscribed to the Google Groups > "commcare-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to commcare-users+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. >

Hi guys,

··· On Friday 19 December 2014 16:46:23 Clayton Sims wrote: > 1) The app's storage files have been manually changed/modified > Some of CommCare's files non-userdata files are stored on the phone in an > area where external apps can delete them. This used to be a rampant problem > on Android phones which used the SD card to store certain files (this was > common before android 2.2)

This definitively seems to be the issue. Unfortunately thi sone is well spread upon
our users : 12 out of 22 mobile phones had a broken CommCare last time we
checked.

REPRO

CommCare ODK 2.17.0
Huawei G730-C00 (Android 4.1.2)
CommCare is installed on the internal memory, not on the SD Card

Repro steps :

  1. Set system default storage location to "Internal storage"

Note that starting with "SD Card" to "Internal storage" triggers the same issue.

  1. first log in a clean install of CommCare ODK
  2. disconnect from any network (so the forms stay on the phone)
  3. fill a form
  4. log out of CommCare
  5. set system default storage location to "SD Card"
  6. log into CommCare

Notice that the previously entered form is uploaded (but not really as the device is
not connected to any network) and discarded (there no number of forms next to
the Sync with server button and it doesn't appear is none of the Saved forms list)

  1. start filling a form

CommCare complains about a missing file.
See attached screenshot error1.png

  1. log out of CommCare
  2. force CommCare to stop (simulates a phone reboot)
  3. log in CommCare

The "Storage is Corrupt :/" message is displayed. We can either shutdown or enter
recovery mode.
See attached screenshot error2.png

While in recovery mode, running "Send Data" display the message "There were
errors submitting the forms". One form is discarded each time this button is
pushed. On the attached screenshot error3.png, I had 3 unsent forms before
entering recovery mode. I think CommCare tried to send my forms before entering
recovery mode because only 2 forms were shown in this mode. After pushing the
"Send Data" button, the error message shown up and the page stated that only 1
unsent form was remaining on the device.

ANDROID SEEMS TO BE THE CULPRIT

I just tried with another mobile phone :
CommCare ODK 2.18.0
WIKO Rainbow Version 8 (Android 4.4.2)

and couldn't break CommCare.

With an up-to-date version of CommCare on our production phones :
CommCare ODK 2.18.0
Huawei G730-C00 (Android 4.1.2)

the bug is again triggered.

So the issues seems not come from CommCare ODK itself, but from the system.

WHAT'S NEXT ?

Unfortunately for us it seems like Huawei won't upgrade the ROM to an up-to-date
one anytime soon... I found no custom ROM (like Cyanogen) either.

We were thinking about buying new phones anyway : these ones have issues (no WIFI
direct, no Burmese Unicode complient fonts, lot of bloatware builtin). I guess this
new fatal issue will trigger the switch... Be sure I'll flag this Huawei phone as dodgy
in the Wiki.

However I was wondering if you guys faced a similar situation in the past ? What did
you do ? I guess in the short term training the users to never change the default
storage location is a must, but obviously it won't prevent errors.

On the CommCare side, it seems like losing forms if the application is broken
should be flagged as a bug anyway.

But what about the Default storage switch issue ? Do you think something in the
4.1.2 Android SDK could prevent the bug (like bypassing the default storage
location and always write CommCare data into the internal memory for example) ?

Many thanks for the help,

--
Charles Flèche
mHealth Advisor
Télécoms Sans Frontières http://www.tsfi.org
Première Urgence - Aide Médicale Internationale http://www.pu-ami.org