Get form data modified since timestamp

Hi there,

I have a question from a batch data sync/integration perspective, where the goal is to retrieve form submissions which were created or modified since the timestamp of the last sync and the current time.

In that regard:

  1. Are CommCare form submissions immutable, i.e. can a form submission ever change once it is received by the server?

  2. A form submission in the API has a received_on field and a server_modified_on field. These field values seem to be almost equal but not quite. Can there be a situation where a form submission is first received by the server and then modified later?

  3. The List forms API has two query parameters received_on_start and received_on_end. Would querying the forms endpoint with a start and end timestamp give me the form submissions which were created or modified between those two timestamps, or is there a better way to achieve this?

Hey,

Thanks for the thoughtful questions and apologies that this is a confusing topic! I've just updated the documentation of the List Forms API to add two new parameters (indexed_on_start/end) and these are the recommended fields you should use for data synchronization. Also, if you are trying to export forms into a DB you should take a look at the Data Export Tool which handles much of the complexity for you.

To answer your specific questions:

  1. While it is unusual, form submissions can change e.g. from the "data cleaning" feature.
  2. Yes, see above.
  3. Yes, use indexed_on as per above.

Hope that helps,
Cory

Perfect. This explains it and is exactly what I need. Appreciate the quick reply.

Lars,
Hi, we met a few years ago at the annual DHIS2 Symposium.

Are you aware of CommCare Sync? Cory has led development on this tool which is a web application that has all the functionality of the Data Export tool that he mentioned plus more automated logging, scheduling, and multi-project exports that make this job easier when you have many CommCare domains that you want to store in one environment for analysis.

https://confluence.dimagi.com/display/commcarepublic/CommCare+Sync%3A+web+application+for+managing+your+data+warehouse

We also have some abilities to automate the generation of the query file that the Data Export Tool requires. And improved documentation on this process is coming out soon, very soon.

Hi @czue,

I'm re-upping this as I'm running into a similar problem. The "indexed_on_start/end" dates are really useful in the API for ensuring recently changed forms are brought forward in a new API request. Thanks for implementing. However, as far as I can tell in the List Forms API the "server_modified_on" date data don't come through in the response - even though the results can be sorted by it.

For our purposes, we've got some data managers who are interested in using the "data cleaning" feature to correct issues but it's then important for us to be able to detect when those changes have occurred or whether the form is just within the requested date range.

For instance, if I pull data for the month on the 20th of a month and again on the 30th using the indexed_on_start=2021-05-01 and indexed_on_end=2021-05-20/30 (respectively) at present, the only way I can figure out to detect if a form that was submitted on May 15th was updated after the 20th through the data cleaning feature in the second data pull is by checking the order in which it appears in the returned data. That's clumsy and error prone.

It would be more robust and helpful if the "server_modified_on" date were simply added to the returned data as it would make detection trivial and already seems to be part of the backend of the API since the results can be returned sorted by that date.

Hi @bhonermann

However, as far as I can tell in the List Forms API the "server_modified_on" date data don't come through in the response - even though the results can be sorted by it.

Are you sure? I just checked on one of my project spaces and I do see server_modified_on. It's in the top level of each of the form "objects", not inside the form field. Here's a truncated example of what it looks like for me:

{
  "meta": {...},
  "objects": [
    {
      "form": {
        "meta": {...}
      },
      "metadata": {...},
      "received_on": "2021-03-17T19:12:21.818242Z",
      "server_modified_on": "2021-03-17T19:12:22.695773Z",
      ...
    }
  ]
}

Do you not see the field at all? Or is it there, but blank?

Thanks @Ethan_Soergel!

Bah! You're absolutely right. It's not documented in the API sample output data here and I didn't realize my code was dropping some features before where I was checking.

Thanks much.

Ah I see that, thanks for flagging. I just added it to that example.