Extracting data and matching participant IDs across forms

I am having some trouble figuring out how to extract data and have included
two excel files and a csv file to help illustrate:

  • cases.csv
  • participants.xlsx (which is just a custom report with mostly the
    same info as cases.csv)
  • demographics.xlsx (custom report for the demographics form)

The main issue is that there are multiple entries in the
cases/participants data files for participant IDs and I thought there
should only be one participant per row with no repeats.

The demographics data doesn’t appear to have repeats, but does have a
number of entries where the info.case_name (which I think is just the
participant_id field) is blank.

I would like to export a file that has one line for every participant in
the study, identified by participant ID. I would then like to use this
file to serve as an index for all of the other forms.

So for example, a table with one row per participant with participant_id as
the index, and additional tables for the various forms in the study, with
participant_id and assessment point (e.g., baseline, 4 month follow-up,
etc…)

I’m not sure if this above is as clear as I would like it to be, so to put
it in SQL terms, I would like to be able to create a view with, for
example, the participants table and the demographics table connected by a
full outer join.

Any help or suggestions would be greatly appreciated!

Cases.csv (22.5 KB)

participants 2015-07-23.xlsx (24.5 KB)

Demographics 2015-07-23.xlsx (28.9 KB)

Hello Jonathan,

I took some time today to look into your questions and have a few insights:

1) Regarding the multiple case IDs:
I looked at your case list and it seems that some cases had the same
participant ID entered. There are four separate cases all with the same ID
of 10009. I know your data entry officers are the ones typing those in, so
there may be some entry errors or risk that some people are entering in the
same IDs (see the case list link here:
https://www.commcarehq.org/a/test-mgh/reports/case_list/?emw=project_data&case_type=&is_open=&search_query=
)

2) Regarding the empty case_name in the demographics form:
There is test data being exported along with your real data, and that may
explain why some properties are not filled out. I suggest that you delete
ALL the test users (and that includes “will” and “meryn”). Here is a link
on how to delete the users:
https://confluence.dimagi.com/display/commcarepublic/Create+and+Manage+CommCare+Mobile+Workers#CreateandManageCommCareMobileWorkers-D.Deactivate(Formerly"Archive")andDeleteMobileWorkers
)

NOTE^^: You want to DELETE the users, NOT deactivate them.

If you had test cases that were submitted by users now submitting real
data, another thing you should try is to reassign the “test” cases or fake
cases to a user you can then delete. For example, you can create a new
mobile worker called “delete_user”, and then reassign any remaining test
cases from active users that are submitting real data but may have also
submitted test data to the “delete_user” and then delete the user using the
link above. Here is a link on how to reassign cases:
https://confluence.dimagi.com/display/commcarepublic/Reassign+Cases

Once you do that, I think the empty case_name cases should be gone. You can
then use the case export and hopefully use the unique participant IDs to
then match case properties being saved. If there are repeats, please check
the case list using the link in 1) to ensure that cases do not have the
same participant IDs. You can use the form exports using the participant
IDs for the form information.

Regards,
Meryn

··· On Fri, Jul 24, 2015 at 8:07 PM, jalerner wrote:

I am having some trouble figuring out how to extract data and have
included two excel files and a csv file to help illustrate:

  • cases.csv
  • participants.xlsx (which is just a custom report with mostly the
    same info as cases.csv)
  • demographics.xlsx (custom report for the demographics form)

The main issue is that there are multiple entries in the
cases/participants data files for participant IDs and I thought there
should only be one participant per row with no repeats.

The demographics data doesn’t appear to have repeats, but does have a
number of entries where the info.case_name (which I think is just the
participant_id field) is blank.

I would like to export a file that has one line for every participant in
the study, identified by participant ID. I would then like to use this
file to serve as an index for all of the other forms.

So for example, a table with one row per participant with participant_id
as the index, and additional tables for the various forms in the study,
with participant_id and assessment point (e.g., baseline, 4 month
follow-up, etc…)

I’m not sure if this above is as clear as I would like it to be, so to put
it in SQL terms, I would like to be able to create a view with, for
example, the participants table and the demographics table connected by a
full outer join.

Any help or suggestions would be greatly appreciated!


Meryn Robinson, MPH
Research Coordinator

Dimagi, Inc
(617) 649 - 2214 ext. 83
Skype: merynrobinson

Test. 1, 2, 3.

Hi Meryn,

Thanks so much for taking the time to look into this and provide me with a possible fix. I will look at the links you provided, make the changes you suggest, and hopefully that will take care of everything.

Just curious, is there a simple way to prevent users from entering an ID that has already been used?

Thanks again!

Jonathan

··· From: Meryn Robinson Date: Monday, July 27, 2015 at 11:10 AM To: "Jonathan A. Lerner, Ph.D." Cc: commcare-users, Christina Jeffrey, "Bedoya, C. Andres,Ph.D.", Katie Biello, Katie Biello Subject: Re: Extracting data and matching participant IDs across forms

Hello Jonathan,

I took some time today to look into your questions and have a few insights:

  1. Regarding the multiple case IDs:
    I looked at your case list and it seems that some cases had the same participant ID entered. There are four separate cases all with the same ID of 10009. I know your data entry officers are the ones typing those in, so there may be some entry errors or risk that some people are entering in the same IDs (see the case list link here: https://www.commcarehq.org/a/test-mgh/reports/case_list/?emw=project_data&case_type=&is_open=&search_query=)

  2. Regarding the empty case_name in the demographics form:
    There is test data being exported along with your real data, and that may explain why some properties are not filled out. I suggest that you delete ALL the test users (and that includes “will” and “meryn”). Here is a link on how to delete the users: https://confluence.dimagi.com/display/commcarepublic/Create+and+Manage+CommCare+Mobile+Workers#CreateandManageCommCareMobileWorkers-D.Deactivate(Formerly"Archive")andDeleteMobileWorkers)

NOTE^^: You want to DELETE the users, NOT deactivate them.

If you had test cases that were submitted by users now submitting real data, another thing you should try is to reassign the “test” cases or fake cases to a user you can then delete. For example, you can create a new mobile worker called “delete_user”, and then reassign any remaining test cases from active users that are submitting real data but may have also submitted test data to the “delete_user” and then delete the user using the link above. Here is a link on how to reassign cases: https://confluence.dimagi.com/display/commcarepublic/Reassign+Cases

Once you do that, I think the empty case_name cases should be gone. You can then use the case export and hopefully use the unique participant IDs to then match case properties being saved. If there are repeats, please check the case list using the link in 1) to ensure that cases do not have the same participant IDs. You can use the form exports using the participant IDs for the form information.

Regards,
Meryn

On Fri, Jul 24, 2015 at 8:07 PM, jalerner <jalerner@mgh.harvard.edumailto:jalerner@mgh.harvard.edu> wrote:
I am having some trouble figuring out how to extract data and have included two excel files and a csv file to help illustrate:

  • cases.csv
  • participants.xlsx (which is just a custom report with mostly the same info as cases.csv)
  • demographics.xlsx (custom report for the demographics form)

The main issue is that there are multiple entries in the cases/participants data files for participant IDs and I thought there should only be one participant per row with no repeats.

The demographics data doesn’t appear to have repeats, but does have a number of entries where the info.case_name (which I think is just the participant_id field) is blank.

I would like to export a file that has one line for every participant in the study, identified by participant ID. I would then like to use this file to serve as an index for all of the other forms.

So for example, a table with one row per participant with participant_id as the index, and additional tables for the various forms in the study, with participant_id and assessment point (e.g., baseline, 4 month follow-up, etc…)

I’m not sure if this above is as clear as I would like it to be, so to put it in SQL terms, I would like to be able to create a view with, for example, the participants table and the demographics table connected by a full outer join.

Any help or suggestions would be greatly appreciated!


Meryn Robinson, MPH
Research Coordinator

Dimagi, Inc
(617) 649 - 2214 ext. 83
Skype: merynrobinson

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Hi Jonathan,

Here is another in cc-users with the same question and a few responses:
https://groups.google.com/forum/#!searchin/commcare-users/approaches$20for$20generating$20patient$20identifiers/commcare-users/VGwLGYEWzqw/nmxGEHbXOpUJ

Essentially there is no good way in CommCare right now to check unique IDs
against other cases. That post (link above) suggests some options related
to your own workflow outside of the app that could potentially help the
problem of duplicate IDs.

Regards,
Meryn

··· On Mon, Jul 27, 2015 at 8:43 PM, Lerner, Jonathan A.,Ph.D. < JALERNER@mgh.harvard.edu> wrote:

Hi Meryn,

Thanks so much for taking the time to look into this and provide me with
a possible fix. I will look at the links you provided, make the changes
you suggest, and hopefully that will take care of everything.

Just curious, is there a simple way to prevent users from entering an ID
that has already been used?

Thanks again!

Jonathan

From: Meryn Robinson
Date: Monday, July 27, 2015 at 11:10 AM
To: "Jonathan A. Lerner, Ph.D."
Cc: commcare-users, Christina Jeffrey, “Bedoya, C. Andres,Ph.D.”, Katie
Biello, Katie Biello
Subject: Re: Extracting data and matching participant IDs across forms

Hello Jonathan,

I took some time today to look into your questions and have a few
insights:

1) Regarding the multiple case IDs:
I looked at your case list and it seems that some cases had the same
participant ID entered. There are four separate cases all with the same ID
of 10009. I know your data entry officers are the ones typing those in, so
there may be some entry errors or risk that some people are entering in the
same IDs (see the case list link here:
https://www.commcarehq.org/a/test-mgh/reports/case_list/?emw=project_data&case_type=&is_open=&search_query=
)

2) Regarding the empty case_name in the demographics form:
There is test data being exported along with your real data, and that may
explain why some properties are not filled out. I suggest that you delete
ALL the test users (and that includes “will” and “meryn”). Here is a link
on how to delete the users:
https://confluence.dimagi.com/display/commcarepublic/Create+and+Manage+CommCare+Mobile+Workers#CreateandManageCommCareMobileWorkers-D.Deactivate(Formerly"Archive")andDeleteMobileWorkers
https://confluence.dimagi.com/display/commcarepublic/Create+and+Manage+CommCare+Mobile+Workers#CreateandManageCommCareMobileWorkers-D.Deactivate(Formerly"Archive")andDeleteMobileWorkers
)

NOTE^^: You want to DELETE the users, NOT deactivate them.

If you had test cases that were submitted by users now submitting real
data, another thing you should try is to reassign the “test” cases or fake
cases to a user you can then delete. For example, you can create a new
mobile worker called “delete_user”, and then reassign any remaining test
cases from active users that are submitting real data but may have also
submitted test data to the “delete_user” and then delete the user using the
link above. Here is a link on how to reassign cases:
https://confluence.dimagi.com/display/commcarepublic/Reassign+Cases

Once you do that, I think the empty case_name cases should be gone. You
can then use the case export and hopefully use the unique participant IDs
to then match case properties being saved. If there are repeats, please
check the case list using the link in 1) to ensure that cases do not have
the same participant IDs. You can use the form exports using the
participant IDs for the form information.

Regards,
Meryn

On Fri, Jul 24, 2015 at 8:07 PM, jalerner jalerner@mgh.harvard.edu wrote:

I am having some trouble figuring out how to extract data and have
included two excel files and a csv file to help illustrate:

  • cases.csv
  • participants.xlsx (which is just a custom report with mostly the
    same info as cases.csv)
  • demographics.xlsx (custom report for the demographics form)

The main issue is that there are multiple entries in the
cases/participants data files for participant IDs and I thought there
should only be one participant per row with no repeats.

The demographics data doesn’t appear to have repeats, but does have a
number of entries where the info.case_name (which I think is just the
participant_id field) is blank.

I would like to export a file that has one line for every participant
in the study, identified by participant ID. I would then like to use this
file to serve as an index for all of the other forms.

So for example, a table with one row per participant with
participant_id as the index, and additional tables for the various forms in
the study, with participant_id and assessment point (e.g., baseline, 4
month follow-up, etc…)

I’m not sure if this above is as clear as I would like it to be, so to
put it in SQL terms, I would like to be able to create a view with, for
example, the participants table and the demographics table connected by a
full outer join.

Any help or suggestions would be greatly appreciated!


Meryn Robinson, MPH
Research Coordinator

Dimagi, Inc
(617) 649 - 2214 ext. 83
Skype: merynrobinson

 The information in this e-mail is intended only for the person to

whom it is
addressed. If you believe this e-mail was sent to you in error and the
e-mail
contains patient information, please contact the Partners Compliance
HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in
error
but does not contain patient information, please contact the sender and
properly
dispose of the e-mail.


Meryn Robinson, MPH
Research Coordinator

Dimagi, Inc
(617) 649 - 2214 ext. 83
Skype: merynrobinson

Hi Meryn,

Thanks again for your help with this. I have looked at the links you provided, and I have done the following:

  1. Deleted mobile user accounts for test users, you, and Will
  2. Created a delete_user account and assigned test cases to it, then deleted the delete_user account

This seems to have taken care of the repeat ID problem, but the no particpant_id problem persists.

I am attaching the latest data files for users, cases, and demographics to illustrate.

Also, here is the list of users from the “Manage Mobile Workers” page. There are no users listed under “show deactivated mobile users.

    Username        Full Name       Date Registered Phone   More... Deactivate

1 agneslawraja xxxxxx 20-Jul-15 xxxxxx … Deactivate
2 balaguru xxxxxx 20-Jul-15 xxxxxx … Deactivate
3 dhanalakshmi xxxxxx 20-Jul-15 xxxxxx … Deactivate
4 jonathan xxxxxx 8-Sep-14 … Deactivate
5 jvarghese xxxxxx 17-Mar-15 xxxxxx … Deactivate
6 karikalan xxxxxx 20-Jul-15 xxxxxx … Deactivate
7 kbiello 21-Apr-15 xxxxxx … Deactivate
8 mkhan xxxxxx 17-Mar-15 xxxxxx … Deactivate
9 peterehlinger xxxxxx 26-Nov-14 xxxxxx … Deactivate
10 ronymoral xxxxxx 20-Jul-15 xxxxxx … Deactivate
11 rvashisht xxxxxx 17-Mar-15 xxxxxx … Deactivate
12 senthil xxxxxx 20-Jul-15 xxxxxx … Deactivate
13 srawat xxxxxx 17-Mar-15 xxxxxx … Deactivate

Any help you can provide would be greatly appreciated!

Thanks,

Jonathan

Cases (2015-07-29).xlsx (46.3 KB)

Demographics (2015-07-29).xlsx (13 KB)

Users.csv (1.73 KB)

··· From: Meryn Robinson Date: Monday, July 27, 2015 at 11:10 AM To: "Jonathan A. Lerner, Ph.D." Cc: commcare-users, Christina Jeffrey, "Bedoya, C. Andres,Ph.D.", Katie Biello, Katie Biello Subject: Re: Extracting data and matching participant IDs across forms

Hello Jonathan,

I took some time today to look into your questions and have a few insights:

  1. Regarding the multiple case IDs:
    I looked at your case list and it seems that some cases had the same participant ID entered. There are four separate cases all with the same ID of 10009. I know your data entry officers are the ones typing those in, so there may be some entry errors or risk that some people are entering in the same IDs (see the case list link here: https://www.commcarehq.org/a/test-mgh/reports/case_list/?emw=project_data&case_type=&is_open=&search_query=)

  2. Regarding the empty case_name in the demographics form:
    There is test data being exported along with your real data, and that may explain why some properties are not filled out. I suggest that you delete ALL the test users (and that includes “will” and “meryn”). Here is a link on how to delete the users: https://confluence.dimagi.com/display/commcarepublic/Create+and+Manage+CommCare+Mobile+Workers#CreateandManageCommCareMobileWorkers-D.Deactivate(Formerly"Archive")andDeleteMobileWorkers)

NOTE^^: You want to DELETE the users, NOT deactivate them.

If you had test cases that were submitted by users now submitting real data, another thing you should try is to reassign the “test” cases or fake cases to a user you can then delete. For example, you can create a new mobile worker called “delete_user”, and then reassign any remaining test cases from active users that are submitting real data but may have also submitted test data to the “delete_user” and then delete the user using the link above. Here is a link on how to reassign cases: https://confluence.dimagi.com/display/commcarepublic/Reassign+Cases

Once you do that, I think the empty case_name cases should be gone. You can then use the case export and hopefully use the unique participant IDs to then match case properties being saved. If there are repeats, please check the case list using the link in 1) to ensure that cases do not have the same participant IDs. You can use the form exports using the participant IDs for the form information.

Regards,
Meryn

On Fri, Jul 24, 2015 at 8:07 PM, jalerner <jalerner@mgh.harvard.edumailto:jalerner@mgh.harvard.edu> wrote:
I am having some trouble figuring out how to extract data and have included two excel files and a csv file to help illustrate:

  • cases.csv
  • participants.xlsx (which is just a custom report with mostly the same info as cases.csv)
  • demographics.xlsx (custom report for the demographics form)

The main issue is that there are multiple entries in the cases/participants data files for participant IDs and I thought there should only be one participant per row with no repeats.

The demographics data doesn’t appear to have repeats, but does have a number of entries where the info.case_name (which I think is just the participant_id field) is blank.

I would like to export a file that has one line for every participant in the study, identified by participant ID. I would then like to use this file to serve as an index for all of the other forms.

So for example, a table with one row per participant with participant_id as the index, and additional tables for the various forms in the study, with participant_id and assessment point (e.g., baseline, 4 month follow-up, etc…)

I’m not sure if this above is as clear as I would like it to be, so to put it in SQL terms, I would like to be able to create a view with, for example, the participants table and the demographics table connected by a full outer join.

Any help or suggestions would be greatly appreciated!


Meryn Robinson, MPH
Research Coordinator

Dimagi, Inc
(617) 649 - 2214 ext. 83
Skype: merynrobinson

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.