Randomizing questions (survey experiment)

Hello all!

I am new to commcare (usually use SurveyCTO). I'm running a survey with an NGO. I want to program a survey experiment. This involves randomly selecting 1 of 5 different scenarios of a question and having that display to the enumerator. Then there will be a few standard followup questions. I think requires programing some random code to pull from a fixed set? Or using lookup tables? I'm completely green so if someone can walk me through this I will be very grateful!

I'm also supposed to start next week!


Hi Catlan,

There are a number of different ways to randomize questions as you are describing.

The easiest one is to just define all five questions directly in the form and using the output of random() computed default value in a hidden value to control the display conditions for those questions.

Depending on your needs, there are ways to augment that process. You could use a lookup table to create a more dynamic set of potential values, or you could precompute the randomized distributions and preload them in complex ways to get very precise cohort groups, but unless you have very specific needs, the basic approach should be straightforward.

Hi Clayton,

Thanks for the reply! OK, this is helpful. I think we were jumping to the complicated version where it may not be necessary. I want simple randomization across the 5 possible questions -- I don't need complicated distributions or sub group stuff.

I see how creating 5 questions on the form and then for each using the random function could work but how would this tie together? That is, if I program a random logic to display the first scenario, how would I block the other questions? I guess what would the actual display condition formula needed here? Effectively, I want equal probability that a respondent will receive one of the scenarios.... so ideally the realized responses would be equal across conditions i.e. 200 with scenario 1, 200 with scenario 2..

Sorry for all the questions!

thank you

Hi Catlan,

Sounds like there are two key questions here.

One - Inside of the form, how do you mechanically show the right question based on the output of randomization

This is the straightforward part. In your form you'd have a hidden value which represents which question to display, for ease, I'll assume that they are keyed numerically (1-5). This hidden value would generate a number from 1-5 based on a Default Value xpath expression.

Then, each of the 5 questions would have a display condition like "#form/question_to_show = '2'" or "#form/question_to_show = '3'".

Finally, presuming you need a single value that collects the answer, you'd create another hidden value to normalize the responses by concat()'ing each of the 5 potential questions. A question whose display value returns false simply produces an empty string for its output, so concatenating all 5 questions will result in the hidden value containing the result of whichever question is displayed and answered.

Two - How do you produce the random distribution

There are a few options here. The easiest is the rand() function I mentioned above. You can use the patterns in the example documentation to make an expression which will produce one of [1,2,3,4,5] when the form is started with equal probability.

If you collect enough data, that approach is generally sufficient, and in most cases that's what I'd recommend

If you need to be more precise, you have a few different options.

One is to keep track for each user of how many times they've asked each question by incrementing a counter for each question on the User Case for the user, and precluding that question from being included in the set of random options once the number of times it has appeared has exceeded a threshold. There are some examples in the wiki documentation about creating Counters with the User Case construct. If you chose to go this route, you'd also need to change your approach to select the value from a list randomly (since the range of values needs to shrink, but the potential selections need to remain 1-5) rather than computing it directly, probably by concatenating 5 expressions which conditionally add their question ID, and then changing the random logic boundaries to be based on the length of that list and using substr() to extract the value.

Another option would be to pre-compute a random distribution per-user ahead of time and to add that random distribution to each user's user case. You could then use that precomputed list as a queue by using the substr() function to pull of the first item, and store a new value which consists of the remaining list.

Both of those approaches add considerable complexity to manage extra state, and generally aren't super justifiable without other constraints. If you needed, for instance, to have even distributions for each intervention group across different cohorts, using the queue approach can be helpful because you can maintain a different queue for each cohort. For a single-cohort intervention with 5 treatments, though, I'd guess that there would be little value to the added complexity in approach.