Upgrading old instance results in postgresql error

I have been trying to upgrade an instance circa-2021, but have encountered a problem I don't know how to address:

Having run the deploy_postgresql.yml upgrades to version 14, I now get an error when running the deploy_postgresql.yml as shown below:

Ubuntu 18.04.5 LTS
SystemCheckError: System check identified some issues:

ERRORS:
?: checks.Error querying model on database "default": "auditcare.AuditcareMigrationMeta": django.db.utils.ProgrammingError(relation "auditcare_auditcaremigrationmeta" does not exist
LINE 1: SELECT 'auditcare_auditcaremigrationmeta'::regclass
^
)
?: checks.Error querying model on database "default": "export.IncrementalExport": django.db.utils.ProgrammingError(relation "export_incrementalexport" does not exist
LINE 1: SELECT 'export_incrementalexport'::regclass
^
)
?: checks.Error querying model on database "default": "export.IncrementalExportCheckpoint": django.db.utils.ProgrammingError(relation "export_incrementalexportcheckpoint" does not exist
LINE 1: SELECT 'export_incrementalexportcheckpoint'::regclass
^
)

System check identified 3 issues (0 silenced).
Connection to 172.31.22.241 closed.

I read the 43. Optionally configure auditcare db — CommCareHQ Deployment documentation changelog, but I believe it says I don't need to do anything if I just want it to stay as normal? Any ideas how I can address this? I can't seem to deploy a newer version as the check release fails on this error too.

Hi Dirk

All of those models have been removed with migration to drop the tables which makes me thing you've somehow migrated the DB to a newer version but there is still old code being used somewhere which is expecting the tables to still exist.

I now get an error when running the deploy_postgresql.yml as shown below

What command are you running here? Running the deploy_postgres Ansible playbook shouldn't be executing the Django checks.

Ok, that makes sense. I was trying to do a later deploy previously and that asked me to do a lower deploy first...

When I do a deploy_postgresql.yml, it does give me that error AFTER it asks whether I would like to check and validate the services (which I replied yes to) - I have tried to provide an extract further below.

What is the best way of proceeding? Should I try and do a later deploy? My 1/2 completed deployments were deleted due to the previous deployment cleaning the releases and then failing on the task "Check release"

Is there anything else I can do to address this?

PLAY RECAP ***************************************************************************************************************************************
172.31.22.241 : ok=6 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

✓ Apply completed with status code 0
Would you like to run Django checks to validate the settings? [y/N]y
commcare-cloud monolith django-manage check --deploy
ssh ubuntu@172.31.22.241 -t -o UserKnownHostsFile=/home/ubuntu/environments/monolith/known_hosts 'sudo -iu cchq bash -c '"'"'cd /home/cchq/www/monolith/current; python_env/bin/python manage.py check --deploy'"'"''
Ubuntu 18.04.5 LTS
System check identified some issues:

WARNINGS:
?: (security.W001) You do not have 'django.middleware.security.SecurityMiddleware' in your MIDDLEWARE so the SECURE_HSTS_SECONDS, SECURE_CONTENT_TYPE_NOSNIFF, SECURE_BROWSER_XSS_FILTER, and SECURE_SSL_REDIRECT settings will have no effect.
?: (security.W009) Your SECRET_KEY has less than 50 characters or less than 5 unique characters. Please generate a long and random SECRET_KEY, otherwise many of Django's security-critical features will be vulnerable to attack.

System check identified 2 issues (0 silenced).
Connection to 172.31.22.241 closed.
commcare-cloud monolith django-manage check --deploy -t database
ssh ubuntu@172.31.22.241 -t -o UserKnownHostsFile=/home/ubuntu/environments/monolith/known_hosts 'sudo -iu cchq bash -c '"'"'cd /home/cchq/www/monolith/current; python_env/bin/python manage.py check --deploy -t database'"'"''
Ubuntu 18.04.5 LTS
SystemCheckError: System check identified some issues:

ERRORS:
?: checks.Error querying model on database "default": "auditcare.AuditcareMigrationMeta": django.db.utils.ProgrammingError(relation "auditcare_auditcaremigrationmeta" does not exist
LINE 1: SELECT 'auditcare_auditcaremigrationmeta'::regclass
^
)
?: checks.Error querying model on database "default": "export.IncrementalExport": django.db.utils.ProgrammingError(relation "export_incrementalexport" does not exist
LINE 1: SELECT 'export_incrementalexport'::regclass
^
)
?: checks.Error querying model on database "default": "export.IncrementalExportCheckpoint": django.db.utils.ProgrammingError(relation "export_incrementalexportcheckpoint" does not exist
LINE 1: SELECT 'export_incrementalexportcheckpoint'::regclass
^
)

System check identified 3 issues (0 silenced).
Connection to 172.31.22.241 closed.
commcare-cloud monolith django-manage check_services
ssh ubuntu@172.31.22.241 -t -o UserKnownHostsFile=/home/ubuntu/environments/monolith/known_hosts 'sudo -iu cchq bash -c '"'"'cd /home/cchq/www/monolith/current; python_env/bin/python manage.py check_services'"'"''
Ubuntu 18.04.5 LTS
SUCCESS (Took 0.11s) kafka : Kafka seems to be in order
SUCCESS (Took 0.00s) redis : Redis is up and using 26.78M memory
SUCCESS (Took 0.01s) postgres : default:commcarehq:OK p1:commcarehq_p1:OK p2:commcarehq_p2:OK proxy:commcarehq_proxy:OK synclogs:commcarehq_synclogs:OK ucr:commcarehq_ucr:OK Successfully got a user from postgres
SUCCESS (Took 0.02s) couch : Successfully queried an arbitrary couch view
FAILURE (Took 0.00s) celery : analytics_queue has been blocked for 4:01:02.330989 (max allowed is 0:30:00)
async_restore_queue has been blocked for 4:01:02.218801 (max allowed is 0:01:00)
background_queue has been blocked for 4:01:02.222660 (max allowed is 0:10:00)
case_import_queue has been blocked for 4:01:02.248685 (max allowed is 0:01:00)
case_rule_queue has been blocked for 4:01:02.271660 (max allowed is 1:00:00)
celery has been blocked for 4:01:02.567189 (max allowed is 0:01:00)
celery_periodic has been blocked for 4:01:02.254253 (max allowed is 0:10:00)
email_queue has been blocked for 4:01:02.236469 (max allowed is 0:00:30)
export_download_queue has been blocked for 4:01:02.293038 (max allowed is 0:00:30)
reminder_case_update_queue has been blocked for 4:01:02.280683 (max allowed is 0:15:00)
reminder_queue has been blocked for 4:01:02.264635 (max allowed is 0:30:00)
reminder_rule_queue has been blocked for 4:01:02.329053 (max allowed is 0:15:00)
repeat_record_queue has been blocked for 4:01:02.229004 (max allowed is 1:00:00)
sms_queue has been blocked for 4:01:02.271860 (max allowed is 0:05:00)
ucr_queue has been blocked for 4:01:03.332614 (max allowed is 1:00:00)
SUCCESS (Took 0.01s) elasticsearch : Successfully sent a doc to ES and read it back
SUCCESS (Took 0.05s) blobdb : Successfully saved a file to the blobdb
FAILURE (Took 0.18s) formplayer : Formplayer returned a 502 status code: https://commcare.logicims.one/formplayer/serverup
SUCCESS (Took 0.00s) rabbitmq : RabbitMQ OK
Connection to 172.31.22.241 closed.
(monolith) ubuntu@monolith:~$

OK that makes sense. The errors are happening when running the manage.py check --deploy commands.

I was trying to do a later deploy previously and that asked me to do a lower deploy first

Can you say more about this. Were there specific instructions for doing the earlier deploy? I think this is mostly likely due to some migrations that have since been removed from the code. We might have to do some manual steps to get those to run before you can deploy the latest code.

Thanks Simon, so I am now trying to deploy to latest with
commcare-cloud monolith deploy and I get an error shown below.

What I would like to do is run the command as described in the "63. Backfill SMS event data for API performance — CommCareHQ Deployment documentation" doc, but how do I do that without having to deploy the [896676ba] version? Is that possible? Or do I do a deploy like commcare-cloud monolith deploy --commcare-rev=896676ba3c3b7f5e2898f055ace3ee1b196a7fac first?

TASK [Run run_migrations] ************************************************************************************************************************

TASK [deploy_hq : Migrate databases] *************************************************************************************************************
failed: [172.31.22.241] (item=migrate_multi --noinput) => {"ansible_loop_var": "item", "changed": true, "cmd": ["./manage.py", "migrate_multi", "--noinput"], "delta": "0:00:16.445961", "end": "2023-05-10 14:46:29.863978", "item": "migrate_multi --noinput", "msg": "non-zero return code", "rc": 1, "start": "2023-05-10 14:46:13.418017", "stderr": "", "stderr_lines": [], "stdout": "\nThe following databases will be migrated:\n * default\n * p1\n * p2\n * proxy\n * synclogs\n\n\nThe following databases will be skipped:\n * ucr\n\nFound 1185936 items that need to be migrated.\nToo many to migrate automatically.\n\n\nThis migration cannot be performed automatically and must instead be run manually\nbefore this environment can be upgraded to the latest version of CommCare HQ.\nInstructions for running the migration can be found at this link:\n\n63. Backfill SMS event data for API performance — CommCareHQ Deployment documentation", "stdout_lines": ["", "The following databases will be migrated:", " * default", " * p1", " * p2", " * proxy", " * synclogs", "", "", "The following databases will be skipped:", " * ucr", "", "Found 1185936 items that need to be migrated.", "Too many to migrate automatically.", "", "", "This migration cannot be performed automatically and must instead be run manually", "before this environment can be upgraded to the latest version of CommCare HQ.", "Instructions for running the migration can be found at this link:", "", "63. Backfill SMS event data for API performance — CommCareHQ Deployment documentation"]}
skipping: [172.31.22.241] => (item=create_kafka_topics)

Looking at that change log you should just be able to run the managemnt command from the latest release folder.

You cold try cchq monolith django-manage --tmux backfill_sms_subevent_date

If that doesn't work you can either create a 'private' release or log into the machine and run the command manually from the most recent release (the one that was created when you tried to deploy the latest code).

Creating a private release will just create a new release from the latest code but won't run any of the deploy commands like DB migrations etc:

cchq monolith deploy --private

(I'm assuming here that you're using an up-to-date version of commcare-cloud).

Thanks Simon, that helped a huge amount, and I think the deployment has successfully completed now! Yay! Thanks so much

1 Like