Downtime start unable to shut down all services

I'm unable to stop services using commcare-cloud monolith downtime start

Even if I opt to kill services, I'm still getting the following when checking services:

FAILURE (Took   0.05s) kafka          : Could not connect to Kafka: NoBrokersAvailable
SUCCESS (Took   0.00s) redis          : Redis is up and using 193.09M memory
SUCCESS (Took   0.01s) postgres       : default:commcarehq:OK p1:commcarehq_p1:OK p2:commcarehq_p2:OK proxy:commcarehq_proxy:OK synclogs:commcarehq_synclogs:OK ucr:commcarehq_ucr:OK Successfully got a user from postgres
EXCEPTION (Took   0.00s) couch          : Service check errored with exception 'ConnectionError(MaxRetryError("HTTPConnectionPool(host='127.0.0.1', port=35984): Max retries exceeded with url: /commcarehq__apps (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fda8f851370>: Failed to establish a new connection: [Errno 111] Connection refused'))"))'
FAILURE (Took   0.01s) celery         : async_restore_queue has been blocked for 0:17:13.526189 (max allowed is 0:01:00)
background_queue has been blocked for 0:17:13.501510 (max allowed is 0:10:00)
case_import_queue has been blocked for 0:17:13.487967 (max allowed is 0:01:00)
celery has been blocked for 0:17:13.499305 (max allowed is 0:01:00)
celery_periodic has been blocked for 0:17:13.517133 (max allowed is 0:10:00)
email_queue has been blocked for 0:17:13.493053 (max allowed is 0:00:30)
export_download_queue has been blocked for 0:17:13.496966 (max allowed is 0:00:30)
EXCEPTION (Took   0.00s) elasticsearch  : Service check errored with exception 'ConnectionError('N/A', '<urllib3.connection.HTTPConnection object at 0x7fda8f5e2370>: Failed to establish a new connection: [Errno 111] Connection refused', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fda8f5e2370>: Failed to establish a new connection: [Errno 111] Connection refused'))'
SUCCESS (Took   0.05s) blobdb         : Successfully saved a file to the blobdb
FAILURE (Took   0.01s) formplayer     : Could not connect to formplayer: https://inddex24.org/formplayer/serverup
SUCCESS (Took   0.00s) rabbitmq       : RabbitMQ OK
Connection to 10.1.0.4 closed.

Trying to manually stop redis and postgresql returns the following:
Redis:

10.1.0.4 | FAILED! => {
    "changed": false,
    "msg": "redis-server process not presently configured with monit",
    "name": "redis-server",
    "state": "stopped"
}

PostgreSQL:

ansible 'postgresql,pg_standby,!remote_postgresql' -m monit -i /home/ccc/environments/monolith/inventory.ini -a 'name=postgresql_9.6 state=stopped' --diff -u ansible --become -e @/home/ccc/environments/monolith/public.yml -e @/home/ccc/environments/monolith/.generated.yml -e @/home/ccc/environments/monolith/vault.yml --vault-password-file=/home/ccc/commcare-cloud/src/commcare_cloud/ansible/echo_vault_password.sh '--ssh-common-args=-o UserKnownHostsFile=/home/ccc/environments/monolith/known_hosts'
[WARNING]: Could not match supplied host pattern, ignoring: remote_postgresql
10.1.0.4 | SUCCESS => {
    "changed": false,
    "name": "postgresql_9.6",
    "state": "stopped"
}
ansible 'postgresql,pg_standby,!remote_postgresql' -m monit -i /home/ccc/environments/monolith/inventory.ini -a 'name=pgbouncer state=stopped' --diff -u ansible --become -e @/home/ccc/environments/monolith/public.yml -e @/home/ccc/environments/monolith/.generated.yml -e @/home/ccc/environments/monolith/vault.yml --vault-password-file=/home/ccc/commcare-cloud/src/commcare_cloud/ansible/echo_vault_password.sh '--ssh-common-args=-o UserKnownHostsFile=/home/ccc/environments/monolith/known_hosts'
[WARNING]: Could not match supplied host pattern, ignoring: remote_postgresql
10.1.0.4 | SUCCESS => {
    "changed": false,
    "name": "pgbouncer",
    "state": "stopped"
}

Running check_services after the above reveals services are still running:

SUCCESS (Took   0.00s) redis          : Redis is up and using 193.07M memory
SUCCESS (Took   0.01s) postgres       : default:commcarehq:OK p1:commcarehq_p1:OK p2:commcarehq_p2:OK proxy:commcarehq_proxy:OK synclogs:commcarehq_synclogs:OK ucr:commcarehq_ucr:OK Successfully got a user from postgres

What could be causing this? I'd like to run an upgrade but am wary of these services that can't be stopped

It's possible there have been changes in how these processes are named / managed since they were set up. You could try run the Ansible setup playbooks to see if there are changes:

cchq <env> ap deploy_postgres.yml
cchq <env> ap deploy_redis.yml
1 Like