Hi Eric,
Thanks for your patience and understanding this week, and for writing to us
with your concerns. I'm sure they are shared by many others on this list,
so I wanted to take the time to write a considered response.
You wrote asking the level of baseline performance you can depend on from
CommCareHQ, and pointed out that even if these issues happen infrequently,
timing can be everything. If CommCareHQ is up 99% of the time, but the 1%
of downtime happens during a training or project launch, it can be
incredibly disruptive.
There are two types of issues that I want to talk through in response to
this.
Normal server downtime for planned maintenance
In order to maintain stability and good performance, we will need to
schedule small windows of downtime for server maintenance. We announce
these time windows over the commcare-users list in advance so that you can
plan your project activities around them.
Unexpected issues affecting uptime or performance
While we work hard to minimize these events, we cannot guarantee they won't
happen in the future. However, we invest heavily into (a) minimizing the
chance these events will happen, and (b) minimizing their effects when they
do occur.
We have engineers in Boston, India, and Cape Town, which means CommCareHQ
has close to 24-hour engineering support. At all times, a team of engineers
dedicated on emergency standby to handle issues immediately as they arise.
If an event does occur, we quickly dedicate as many engineers as necessary
to resolve it as soon as possible. Afterwards, we write a detailed
retrospective of the issue and implement measures to make sure it cannot
recur.
Additionally, we have designed CommCare to be highly a resilient system.
Even if CommCareHQ is completely down, the normal day-to-day work of your
mobile workers can continue. As Cory mentioned in his latest email to this
group, while this week's server performance has has caused serious issues
for our users, we are fortunate that mobile users have for the most part
not been affected.
We will be writing up a more detailed retrospective after we are fully
through this weeks issues, making sure we learn from them and implement
safeguards against their recurrence.
Sincerely,
Amelia
I was really happy to see this message posted here by Dimagi folks,
because I've been meaning to post a message about site slowness for about
three days now (but too busy) and it's much better to see it proactively
explained by Dimagi staff!
For me, the site has been slow since February 2 (the night of February 1
for USA, I am in Thailand) -- that includes making simple saves to the form
builder, building new versions of apps... and now today, any activity on
commcarehq.org at all, including just loading up the initial page.
I hope the crunch gets flushed through and that you're able to get enough
capacity! I will avoid doing exports...
It does raise an interesting question about the level of baseline
performance we can depend on. (And I'm definitely not intending to whine!
) Just that, for example, if one of these performance issues comes up
when I have a big training/launch for a project in PNG or elsewhere, it
could be quite a real problem with getting the project to fire up and
succeed.
Thanks for the info!
Eric
--
You received this message because you are subscribed to the Google Groups
"commcare-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-users+unsubscribe@googlegroups.com.
···
On Wed, Feb 4, 2015 at 1:50 AM, Eric Stephan wrote:
> For more options, visit https://groups.google.com/d/optout.