Hi, I'm having difficulty getting a monolith server on b2b0383 (deployed for changelog 0080) to the latest verified production build (a4b2a47 as at the time of this post).
Deployment of b2b0383 goes fine, as does changelog 80's command:
commcare-cloud monolith django-manage --tmux copy_invitation_supply_point
the output for that command is:
Started at 2024-07-09 07:41:59
Processing [.........] 9/9 100% 0:00:00.028682 elapsed
Finished at 2024-07-09 07:41:59
Elapsed time: 0:00:00
Connection to 10.2.0.4 closed.
On first attempt at deploying a4b2a47 I receive the following:
(cchq) ccc@monolith:~/commcare-cloud$ cchq monolith deploy --commcare-rev a4b2a47
Whoa there bud! You're deploying non-default.
'commcare' repo: master != a4b2a47
ARE YOU DOING SOMETHING EXCEPTIONAL THAT WARRANTS THIS? [y/N]y
ansible 'django_manage[0]' -m shell -i /home/ccc/environments/monolith/inventory.ini -a 'sudo -iu cchq bash -c '"'"'git --git-dir=/home/cchq/www/monolith/current/.git rev-parse HEAD'"'"'' -u ansible '--ssh-common-args=-o UserKnownHostsFile=/home/ccc/environments/monolith/known_hosts' --diff
Diff generation skipped. Supply a Github token to see deploy diffs.
New version details:
Branch deployed : commcare: a4b2a47
Changelogs:
There have been some changelogs since last deploy, you must make sure that these are applied on your environment before deploying to avoid getting your environment into a broken state.
https://commcare-cloud.readthedocs.io/en/latest/changelog/0081-delete-receiverwrapper-couch-db.html
https://commcare-cloud.readthedocs.io/en/latest/changelog/0080-copy-invitation-supply-point-fields-to-location.html
Here's the complete diff on github: https://github.com/dimagi/commcare-hq/compare/b2b0383598d89238466cfebc384b0c6e142961bc...a4b2a470622237221f2a4bbf7e776d96e0cfb228
Are you sure you want to deploy to monolith? [y/N]y
...
TASK [Run run_migrations] *************************************************************************************************************************************************************
TASK [deploy_hq : Migrate databases] **************************************************************************************************************************************************
failed: [10.2.0.4] (item=migrate_multi --noinput) => {"ansible_loop_var": "item", "changed": true, "cmd": ["./manage.py", "migrate_multi", "--noinput"], "delta": "0:00:58.053610", "end": "2024-07-09 05:14:28.507159", "item": "migrate_multi --noinput", "msg": "non-zero return code", "rc": 1, "start": "2024-07-09 05:13:30.453549", "stderr":
...
state = self.apply_migration(", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/python_env/lib/python3.9/site-packages/django/db/migrations/executor.py\", line 252, in apply_migration", " state = migration.apply(state, schema_editor)", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/python_env/lib/python3.9/site-packages/django/db/migrations/migration.py\", line 132, in apply", " operation.database_forwards(", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/python_env/lib/python3.9/site-packages/django/db/migrations/operations/special.py\", line 193, in database_forwards", " self.code(from_state.apps, schema_editor)", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/corehq/apps/es/migration_operations.py\", line 368, in run", " return super().run(*args, **kwargs)", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/corehq/apps/es/migration_operations.py\", line 91, in run", " manager.index_create(self.name, self.render_index_metadata(", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/corehq/apps/es/client.py\", line 219, in index_create", " self._es.indices.create(index, metadata)", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/python_env/lib/python3.9/site-packages/elasticsearch5/client/utils.py\", line 73, in _wrapped", " return func(*args, params=params, **kwargs)", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/python_env/lib/python3.9/site-packages/elasticsearch5/client/indices.py\", line 106, in create", " return self.transport.perform_request('PUT', _make_path(index),", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/python_env/lib/python3.9/site-packages/elasticsearch5/transport.py\", line 312, in perform_request", " status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)", " File \"/home/cchq/www/monolith/releases/2024-07-09_05.02/python_env/lib/python3.9/site-packages/elasticsearch5/connection/http_urllib3.py\", line 143, in perform_request", " raise ConnectionTimeout('TIMEOUT', str(e), e)", "elasticsearch5.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='10.2.0.4', port=9200): Read timed out. (read timeout=30))"]}
Full error log here: TASK [deploy_hq : Migrate databases] ******************************************* - Pastebin.com
At this point, the es cluster health status is red. A restart of ES resolves that. I then attempt to resume the deploy:
cchq monolith deploy --resume=2024-07-09_05.02
This results in the following error: TASK [deploy_hq : Migrate databases] ******************************************* - Pastebin.com
From that, I gather that one of the migrations required an older version of the code base to run so I checked out 06f5905 and ran ./manage.py migrate_multi repeaters in a cchq python venv:
sudo su - cchq
cd www/monolith/current
git checkout 06f59059cef7321849e0ea9d8e15b7e824d3e26f
source python_env/bin/activate
./manage.py migrate_multi repeaters
I got this output:
The following databases will be migrated:
* default
* p1
* p2
* proxy
* synclogs
The following databases will be skipped:
* ucr
Found 0 RepeatRecord documents to migrate.
I then tried resuming the previous deploy with:
cchq monolith deploy --resume=2024-07-09_05.02
and got this error: failed: [10.2.0.4] (item=migrate_multi --noinput) => {"ansible_loop_var": "item" - Pastebin.com
That seems to indicate it can't continue because changelog 0080 has not been applied if I'm reading it correctly? I definitely ran 0080 and tried again to be sure and got the same output as before. I then tried resuming the failed deploy again but got the previous error about needing to do changelog 0080 first.
This part of the error seemed relevant:
raise FieldError(", "django.core.exceptions.FieldError: Cannot resolve keyword 'location' into field.
Choices are: assigned_locations, asyncsignuprequest, custom_user_data, domain, email, email_status, invited_by, invited_on, is_accepted, primary_location, primary_location_id, profile, profile_id, program, role, supply_point, tableau_group_ids, tableau_role, uuid"],`
Any assistance would be greatly appreciated!