CSV Export to Stata for use with ODKmeta script

Hello All,

Our team is transitioning from ODK Collect to CommCare for data collection
and storage. We have been encountering a data export issue specific to this
transition.

I have one cross-sectional survey with about 500 variables deployed on two
tablets in the field. This survey was coded in Excel using ODK with a
’survey’ tab and ‘choices’ tab that matches numeric answer codes to labels.
I then use https://opendatakit.org/use/xlsform/ to convert my xlsx to xml
format.

Once this is successful, I then upload the form with logic already built
in. To do this I select my form and then the “Advanced” tab to upload
Xform. I do not use CommCare’s form builder. I believe this process of
uploading an Xform loses some important information stored in the 'choices’
tab of my survey.

Once I export collected data as a csv document, I am experiencing a lot of
issues with it’s compatibility to the ODKmeta Stata script developed for
ODK encoded survey data. (More information on ODKmeta:
https://github.com/PovertyAction/odkmeta)

The benefit of using this command in Stata is the multi-thousand line do
file generated by the command linking the numeric answer choices to their
value labels. This prevents the user from having to hand code long surveys
with many variables. This is also why I spent a lot of time on the front
end of survey development (prior to launching) using ODK to build logic for
my survey.

Any help would be appreciated, Thanks!
– Mackenzie Flynn
Fulbright-Fogarty Fellow, University of Washington
MS4, University of Louisville School of Medicine

Hi Mackenzie. Have you received any response on that? I’m doing the same process!

Thanks,
Soha

Hi,

I did not get much of a reply but I did figure out how to modify the code. I am more than happy to assist you. What issues are you running into? How many variables do you have?

Thanks,

Mackenzie

Thanks for getting back to me Mackenzie. So, I’m currently only testing out the data and how it exports out etc. My forms are really long, around 350 variables all in all (including skip logic questions and repeat groups). So, I’ve also built my forms using XLSform and then converted to Xform and uploaded to CommCare. I was hoping to use the ODKmeta Stata do file to clean the data but if there are issues with compatibility, I worry about doing that. Any guidance on issues you’ve faced along the way and how you resolved them would be useful for me.

Thanks!
Soha

My forms are about the same length, I created in excel with ODK code and used Xform to convert as well. You can definitely use ODKmeta and I worked on modifying the code for a long time because it was worth it for such a long do file to be created using the meta command. Here is what I did:

  1. I exported the data out of CommCare using a csv file

  2. I used the insheet command in Stata and created a pre-import cleaning do file

  3. In this pre-import cleaning, I created a do loop to remove “—” and replace with “.” for all missing data. This was a huge problem with running the meta generated do loop so it’s best to change before running ODKmeta. Then export delimited the revised csv file.

  4. Run your ODKmeta command to get the do file it generates.

  5. Next, you’ll have to go through this do file line by line to modify any issues. The timestamp was another huge problem for me with any time or date variables. Some of the first lines of code are datemask, timemask, datetimemask. I changed the datemask from MDY to YMD. In the end, I dropped some of the problematic date variables I couldn’t work through. Not the most ideal but I still have the data if I need to look up any specific dates.

  6. CommCare also adds its own prefix to all your variable names. I used the command “renpfix form” to drop the prefix “form” from all variables.

This seems like a lot of work but it should be quicker once you know where the issues are and can immediately address them. Additionally, once you run the ODKmeta and modify the generated do file, you can keep using that same do file every time you have new form submissions without having to run the ODKmeta command each time. My do file was >6400 lines of code so I wanted to try everything before giving up and writing all that code by hand.

Good luck,

Mackenzie

Hi Mackenzie,

Thank you so much for this thorough descrption of the steps you took. I’m sure this will be super helpful! I got to step 4 and ran my ODKmeta command and generated a do file, but I was actually thinking about the “–” for missing values and replacing that, so I will go ahead and create a loop to change that to “.” before I run the ODKmeta command again.

I’m also struggling with merging the repeat group data in my large data file. For some reason, CommCare exports data from each repeat group as an individual CSV file. Did you have to deal with that? If so, would you mind sharing how you integrated that in the ODKmeta do file?

Sorry for the trouble and thanks again for this; very useful info!

Best,
Soha