Redeployment error: long wait on _decommission_host

demis08 · May 31, 2021, 6:57am

Hi there,

I have noticed issues on redeployment. It paused on the following task.

[172.19.4.42] out: commcare-hq-echis-celery_ucr_indicator_queue_0: started
[172.19.4.42] out: commcare-hq-echis-celery_ucr_queue_0: started
[172.19.4.42] out: commcare-hq-echis-celerybeat: started
[172.19.4.42] out:

[172.19.3.41] Executing task 'restart_webworkers'
[172.19.3.41] Executing task '_decommission_host'

Ethan_Soergel · May 31, 2021, 7:13am

Hi Demis, do you still have the output of the deploy script? If you scroll back some, you should see more information, probably in red. The deploy script operates on each machine in parallel, so it's likely that something failed earlier, and these two can't progress beyond the current point until all hosts have reported success.

demis08 · May 31, 2021, 8:03am

Hi Ethan,
This was the previous error before happened during deployment.

Error getting release commits: 414 {"message": "We received a Request-URL that is too long from your client."}
Traceback (most recent call last):
File "/home/ansible/.virtualenvs/cchq/bin/cchq", line 33, in
sys.exit(load_entry_point('commcare-cloud', 'console_scripts', 'cchq')())
File "/home/ansible/commcare-cloud/src/commcare_cloud/commcare_cloud.py", line 206, in main
exit_code = call_commcare_cloud()
File "/home/ansible/commcare-cloud/src/commcare_cloud/commcare_cloud.py", line 197, in call_commcare_cloud
exit_code = commands[args.command].run(args, unknown_args)
File "/home/ansible/commcare-cloud/src/commcare_cloud/commands/deploy/command.py", line 97, in run
rc = deploy_commcare(environment, args, unknown_args)
File "/home/ansible/commcare-cloud/src/commcare_cloud/commands/deploy/commcare.py", line 41, in deploy_commcare
record_successful_deploy(environment, diff, start)
File "/home/ansible/commcare-cloud/src/commcare_cloud/commands/deploy/commcare.py", line 152, in record_successful_deploy
update_sentry_post_deploy(environment, "commcarehq", diff.repo, diff, start_time, end_time)
File "/home/ansible/commcare-cloud/src/commcare_cloud/commands/deploy/sentry.py", line 39, in update_sentry_post_deploy
client.create_release(release_name, commits)
UnboundLocalError: local variable 'commits' referenced before assignment
You have new mail in /var/mail/ansible

Simon_Kelly · May 31, 2021, 8:35am

Hi Demisew

There was a bug related to recording the deploy in Sentry which was recently fixed (Set commits to None on github error by proteusvacuum · Pull Request #4734 · dimagi/commcare-cloud · GitHub). Based on the line numbers from the traceback it looks like you're still on an older version.

Please try update your commcare-cloud.

I also wonder if there isn't another error further up in the log since this error is from right at the end of the deploy.

demis08 · May 31, 2021, 8:35am

Hi Ethan,

i tried a clean deploy recently, the thing is it just paused after deploy and link a fresh code. Thas is after the following operation:

[172.19.4.35] sudo: ln -nfs /home/cchq/www/echis/releases/2021-05-31_07.58 /home/cchq/www/echis/current
[172.19.3.41] sudo: ln -nfs /home/cchq/www/echis/releases/2021-05-31_07.58 /home/cchq/www/echis/current
[172.19.4.42] sudo: ln -nfs /home/cchq/www/echis/releases/2021-05-31_07.58 /home/cchq/www/echis/current

Then paused on this task

[172.19.3.41] Executing task 'restart_webworkers'
[172.19.3.41] Executing task '_decommission_host'

demis08 · May 31, 2021, 8:42am

Hi Simon,

the error posted above is from the previous deployment error log. The error is not seen now, because i updated commcare-cloud.

The new issue is, it doesn't show any error. but not able to proceed. paused on the following task.

[172.19.3.41] Executing task 'restart_webworkers'
[172.19.3.41] Executing task '_decommission_host'

Simon_Kelly · May 31, 2021, 9:25am

Can you try run the deploy with --show=debug

demis08 · June 1, 2021, 5:16am

here is the output:

Parallel tasks now using pool size of 3

[172.19.3.41] Executing task 'restart_webworkers'

Parallel tasks now using pool size of 1

[172.19.3.41] Executing task '_decommission_host'

[172.19.3.41] run: /bin/bash -l -c "uname"

The recently added script in ~/.profile is creating a problem. Now deployment succeeded but not updated on: https://www.echisethiopia.org/hq/admin/deploy_history_report/

[172.19.3.41] Executing task 'record_successful_release'
[172.19.4.35] Executing task 'record_successful_release'
[172.19.4.42] Executing task 'record_successful_release'
[172.19.3.41] sudo: echo '/home/cchq/www/echis/releases/2021-06-01_04.52' >> "$(echo RELEASES.txt)"
[172.19.4.35] sudo: echo '/home/cchq/www/echis/releases/2021-06-01_04.52' >> "$(echo RELEASES.txt)"
[172.19.4.42] sudo: echo '/home/cchq/www/echis/releases/2021-06-01_04.52' >> "$(echo RELEASES.txt)"
The following error is displayed on screen:

Error getting release commits: 404 {"message": "Not Found", "documentation_url": "Repositories - GitHub Docs"}
commcare-cloud echis django-manage record_deploy_success --user ansible --environment echis --url 'https://github.com/dimagi/commcare-hq/compare//home/ansib$
e/.profile: line 11: /home/ansible/.commcare-cloud/repo/src/commcare_cloud/.bash_completion: No such file or directory
Downloading dependencies from galaxy and pip
WARNING: The directory '"'"'/home/ansible/.cache/pip'"'"' or its parent directory is not owned or is not writable by the current user. The cache has been dis
abled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo'"'"'s -H flag.
WARNING: The directory '"'"'/home/ansible/.cache/pip'"'"' or its parent directory is not owned or is not writable by the current user. The cache has been dis
abled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo'"'"'s -H flag.
ansible-galaxy install -f -r /home/ansible/commcare-cloud/src/commcare_cloud/ansible/requirements.yml
WARNING: The directory '"'"'/home/ansible/.cache/pip'"'"' or its parent directory is not owned or is not writable by the current user. The cache has been dis
abled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo'"'"'s -H flag.
WARNING: The directory '"'"'/home/ansible/.cache/pip'"'"' or its parent directory is not owned or is not writable by the current user. The cache has been dis
abled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo'"'"'s -H flag.
[WAR