Data Structuring

Hello! Little question about data exporting - For the surveys we are deploying we are assessing around 70 criteria, and for each criterion we register the compliance (0,1) and the corrective action (text). Right now the criteria are managed as a series of columns, which makes for a very wide data table can be difficult to process. It also makes it difficult to present the results in a dashboard (we are using ArcGIS operational dashboards). The question has come up if it would be more efficient to structure our criteria as row (or transactions), such that for each non-compliant criteria would be a separate row. Is this approach compatible with CommCare and is it easy to implement?

An alternative on this theme is to have a tool that converts the multiple column CommCare output table into a set of transactions. Do you think this might be a reasonable approach? Can you recommend a tool that can do this?

Hi Jesse

We have a lot of surveys/forms that spit out data in a very wide format (400+ columns) and i can totally see what your issues may be.

there are advantages and disadvantages to having wide tables like this, but i found that having a vertical row based data set allows me to do a lot more if i have a strong infrastructure for it.

One of my favorite advantages to having row based data is that i can perform analysis really fast. i normally unpivot most of the columns and turn them into Attribute-value pairs. for example, the attribute would be “For criterion X, what is the status of this row/beneficiary/location/etc,” and the value would be “0” or “1” .

a quick and easy tool to achieve this is using PowerQuery for Excel if you’re running an older version (built in in newer versions). Power queries allow you to perform a set of actions on a given data set, and then re-run them whenever you want. a quick google search will get you started.
https://www.excelcampus.com/powerquery/power-query-overview/

alternatively, if you are leveraging a relational database engine like MySQL or SQL Server or anything sophisticated like that, you can run a standard SQL Unpivot command (here’s an example online https://docs.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-20170 )

my favorite use of un-pivoted data sets is that i can re-pivot them in a pivot table in excel, except, i have a slicer on attributes and a slicer on answers. you will find some difficulty telling apart the 0’s for criterion X from the 0’s for criterion Y. so for that, you can do a little trick where you concat the compliance value with the compliance factor, and getting a unique value for each type of answer that you have. what this will allow you to do is to basically narrow down your data selection for your reporting to very specific criteria without resulting in duplicates.

In short: you don’t need to have one or the other. you can automate a workflow using the different options that CommCare offers (daily excel exports for example) and something like Power Query to prepare the data, but you can report off of both. there’s no reason to give one up. One could be used for straight forward achievement/project deliverables type of reporting, and the unpivoted dataset can then be used for comprehensive analysis.

Hope this helps!

Mazz

Hi Mazz,

Just wanted to follow up and tell you how helpful Power Query has been, it’s changed how we can prepare the data and made dashboarding in ArcGIS seamless. Thanks so much again for your recommendation on this - GAME CHANGER!

Best,
Jessie

1 Like

Fantastic!

Glad I could help :slight_smile: