Redeployment error: long wait on _decommission_host

Hi there,

I have noticed issues on redeployment. It paused on the following task.

[172.19.4.42] out: commcare-hq-echis-celery_ucr_indicator_queue_0: started
[172.19.4.42] out: commcare-hq-echis-celery_ucr_queue_0: started
[172.19.4.42] out: commcare-hq-echis-celerybeat: started
[172.19.4.42] out:

[172.19.3.41] Executing task 'restart_webworkers'
[172.19.3.41] Executing task '_decommission_host'

Hi Demis, do you still have the output of the deploy script? If you scroll back some, you should see more information, probably in red. The deploy script operates on each machine in parallel, so it's likely that something failed earlier, and these two can't progress beyond the current point until all hosts have reported success.

Hi Ethan,
This was the previous error before happened during deployment.

Error getting release commits: 414 {"message": "We received a Request-URL that is too long from your client."}
Traceback (most recent call last):
File "/home/ansible/.virtualenvs/cchq/bin/cchq", line 33, in
sys.exit(load_entry_point('commcare-cloud', 'console_scripts', 'cchq')())
File "/home/ansible/commcare-cloud/src/commcare_cloud/commcare_cloud.py", line 206, in main
exit_code = call_commcare_cloud()
File "/home/ansible/commcare-cloud/src/commcare_cloud/commcare_cloud.py", line 197, in call_commcare_cloud
exit_code = commands[args.command].run(args, unknown_args)
File "/home/ansible/commcare-cloud/src/commcare_cloud/commands/deploy/command.py", line 97, in run
rc = deploy_commcare(environment, args, unknown_args)
File "/home/ansible/commcare-cloud/src/commcare_cloud/commands/deploy/commcare.py", line 41, in deploy_commcare
record_successful_deploy(environment, diff, start)
File "/home/ansible/commcare-cloud/src/commcare_cloud/commands/deploy/commcare.py", line 152, in record_successful_deploy
update_sentry_post_deploy(environment, "commcarehq", diff.repo, diff, start_time, end_time)
File "/home/ansible/commcare-cloud/src/commcare_cloud/commands/deploy/sentry.py", line 39, in update_sentry_post_deploy
client.create_release(release_name, commits)
UnboundLocalError: local variable 'commits' referenced before assignment
You have new mail in /var/mail/ansible

Hi Demisew

There was a bug related to recording the deploy in Sentry which was recently fixed (Set commits to None on github error by proteusvacuum · Pull Request #4734 · dimagi/commcare-cloud · GitHub). Based on the line numbers from the traceback it looks like you're still on an older version.

Please try update your commcare-cloud.

I also wonder if there isn't another error further up in the log since this error is from right at the end of the deploy.

Hi Ethan,

i tried a clean deploy recently, the thing is it just paused after deploy and link a fresh code. Thas is after the following operation:

[172.19.4.35] sudo: ln -nfs /home/cchq/www/echis/releases/2021-05-31_07.58 /home/cchq/www/echis/current
[172.19.3.41] sudo: ln -nfs /home/cchq/www/echis/releases/2021-05-31_07.58 /home/cchq/www/echis/current
[172.19.4.42] sudo: ln -nfs /home/cchq/www/echis/releases/2021-05-31_07.58 /home/cchq/www/echis/current

Then paused on this task

[172.19.3.41] Executing task 'restart_webworkers'
[172.19.3.41] Executing task '_decommission_host'

Hi Simon,

the error posted above is from the previous deployment error log. The error is not seen now, because i updated commcare-cloud.

The new issue is, it doesn't show any error. but not able to proceed. paused on the following task.

[172.19.3.41] Executing task 'restart_webworkers'
[172.19.3.41] Executing task '_decommission_host'

Can you try run the deploy with --show=debug

1 Like

here is the output:

Parallel tasks now using pool size of 3

[172.19.3.41] Executing task 'restart_webworkers'

Parallel tasks now using pool size of 1

[172.19.3.41] Executing task '_decommission_host'

[172.19.3.41] run: /bin/bash -l -c "uname"

The recently added script in ~/.profile is creating a problem. Now deployment succeeded but not updated on: https://www.echisethiopia.org/hq/admin/deploy_history_report/

[172.19.3.41] Executing task 'record_successful_release'
[172.19.4.35] Executing task 'record_successful_release'
[172.19.4.42] Executing task 'record_successful_release'
[172.19.3.41] sudo: echo '/home/cchq/www/echis/releases/2021-06-01_04.52' >> "$(echo RELEASES.txt)"
[172.19.4.35] sudo: echo '/home/cchq/www/echis/releases/2021-06-01_04.52' >> "$(echo RELEASES.txt)"
[172.19.4.42] sudo: echo '/home/cchq/www/echis/releases/2021-06-01_04.52' >> "$(echo RELEASES.txt)"
The following error is displayed on screen:

Error getting release commits: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/reference/repos#compare-two-commits"}
commcare-cloud echis django-manage record_deploy_success --user ansible --environment echis --url 'https://github.com/dimagi/commcare-hq/compare//home/ansib$
e/.profile: line 11: /home/ansible/.commcare-cloud/repo/src/commcare_cloud/.bash_completion: No such file or directory
Downloading dependencies from galaxy and pip
WARNING: The directory '"'"'/home/ansible/.cache/pip'"'"' or its parent directory is not owned or is not writable by the current user. The cache has been dis
abled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo'"'"'s -H flag.
WARNING: The directory '"'"'/home/ansible/.cache/pip'"'"' or its parent directory is not owned or is not writable by the current user. The cache has been dis
abled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo'"'"'s -H flag.
ansible-galaxy install -f -r /home/ansible/commcare-cloud/src/commcare_cloud/ansible/requirements.yml
WARNING: The directory '"'"'/home/ansible/.cache/pip'"'"' or its parent directory is not owned or is not writable by the current user. The cache has been dis
abled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo'"'"'s -H flag.
WARNING: The directory '"'"'/home/ansible/.cache/pip'"'"' or its parent directory is not owned or is not writable by the current user. The cache has been dis
abled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo'"'"'s -H flag.
[WAR