Quick monolith install problem

Good day to all. I've had a problem running quick monolith install on Ubuntu 18.04 LTS.
It give me error about install-config.yml when running commcare-cloud/quick_monolith_install/bash cchq-install.sh install-config.yml

TASK [debug] ************************************************************************************************************
ok: [127.0.0.1] => {
"msg": "Invalid installation configuration: ['monolith', 'mysitecom', 'srv001', 'exampelserver1.mysitecom', 'SSH public key string', 'ssh username', '']. Please define all the variables in your install-config.yml file."
}

TASK [assert] ***********************************************************************************************************
fatal: [127.0.0.1]: FAILED! => {
"assertion": "not config_is_not_valid",
"changed": false,
"evaluated_to": false,
"msg": "Assertion failed"
}

PLAY RECAP **************************************************************************************************************
127.0.0.1 : ok=4 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

It looks like it needs extra parameter. My install-config is made over samlple (just replaced private data):

env_name: "monolith"

site_host: "mysitecom"

server_inventory_name: "srv001"

server_host_name: "exampelserver1.mysitecom"

ssh_public_key: "SSH public key string"

ssh_username: "ssh username"

So, can anyone help me what else should I add?

The links in readme on github has been broken.

Any suggestion, what that extra parameter should be?

It looks like it also needs a parameter called cchq_venv, with the path to the commcare-cloud virtualenvironment directory, which will be something like this (though I'd verify the location on your system:
~/.virtualenvs/cchq

The links in the README should point to
https://commcare-cloud.readthedocs.io/en/latest/installation/2-manual-install.html
and
https://commcare-cloud.readthedocs.io/en/latest/installation/1-quick-monolith-install.html

Thanks for quick reply! But I've got the same error with cchq_venv declared.
Like we see in bootstrap-env-playbook.yml

vars:
install_config:
- '{{env_name}}'
- '{{site_host}}'
- '{{server_inventory_name}}'
- '{{server_host_name}}'
- '{{ssh_public_key}}'
- '{{ssh_username}}'
- '{{cchq_venv}}'
config_is_not_valid: '{{"" in install_config}}'

Event config_is_not_valid: '{{"" in install_config}}' happens when some of this variables are empty string.
But as we see in cchq-install.sh

#VENV should have been set by init.sh
ansible-playbook --connection=local --extra-vars "@$config_file_path" --extra-vars "cchq_venv=$VENV" "$DIR/bootstrap-env-playbook.yml"

Somehow $VENV isn't set correctly by init.sh and we get error from ansible about config part.
When we set direct path instead of $VENV

ansible-playbook --connection=local --extra-vars "@$config_file_path" --extra-vars "cchq_venv=/home/username/.virtualenvs/cchq" "$DIR/bootstrap-env-playbook.yml"

There's no error about config from ansible.

It's also possible init.sh isn't being executed for new terminal sessions - there's an option during setup to pick yes or no there.

Your explanation makes sense. Have you tried setting $VENV manually?

1 Like

Thanks for the clue! Yes, when $VENV was set manually, ansible script worked. It seems that init.sh isn't executed before ansible normally.

But, unfortunatly, in the end I've had this error:

TASK [Configure Pl/Proxy cluster] ***************************************************************************************
fatal: [192.169.233.128]: FAILED! => {"changed": false, "cmd": "./manage.py configure_pl_proxy_cluster --create_only", "msg": "stdout: Creating cluster config in DB proxy\n\n:stderr: Traceback (most recent call last):\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/db/backends/utils.py", line 82, in _execute\n return self.cursor.execute(sql)\npsycopg2.errors.UndefinedObject: role "axk5z8gymsfryjnqeun0gicdqdj4tu3j" does not exist\n\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/home/cchq/www/cchq/releases/2022-06-29_19.08/./manage.py", line 190, in \n main()\n File "/home/cchq/www/cchq/releases/2022-06-29_19.08/./manage.py", line 44, in main\n execute_from_command_line(sys.argv)\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/core/management/init.py", line 419, in execute_from_command_line\n utility.execute()\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/core/management/init.py", line 413, in execute\n self.fetch_command(subcommand).run_from_argv(self.argv)\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv\n self.execute(*args, **cmd_options)\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/core/management/base.py", line 398, in execute\n output = self.handle(*args, **options)\n File "/home/cchq/www/cchq/releases/2022-06-29_19.08/corehq/sql_db/management/commands/configure_pl_proxy_cluster.py", line 54, in handle\n create_or_update_cluster(plproxy_config, verbose, options['create_only'])\n File "/home/cchq/www/cchq/releases/2022-06-29_19.08/corehq/sql_db/management/commands/configure_pl_proxy_cluster.py", line 70, in create_or_update_cluster\n create_pl_proxy_cluster(cluster_config, verbose)\n File "/home/cchq/www/cchq/releases/2022-06-29_19.08/corehq/sql_db/management/commands/configure_pl_proxy_cluster.py", line 149, in create_pl_proxy_cluster\n cursor.execute(command)\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute\n return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers\n return executor(sql, params, many, context)\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute\n return self.cursor.execute(sql, params)\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/db/utils.py", line 90, in exit\n raise dj_exc_value.with_traceback(traceback) from exc_value\n File "/home/cchq/www/cchq/current/python_env/lib/python3.9/site-packages/django/db/backends/utils.py", line 82, in _execute\n return self.cursor.execute(sql)\ndjango.db.utils.ProgrammingError: role "axk5z8gymsfryjnqeun0gicdqdj4tu3j" does not exist\n\n", "path": "/home/cchq/www/cchq/current/python_env/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin", "syspath": ["/tmp/ansible_django_manage_payload_jvlkdvc8/ansible_django_manage_payload.zip", "", "/usr/lib/python36.zip", "/usr/lib/python3.6", "/usr/lib/python3.6/lib-dynload", "/usr/local/lib/python3.6/dist-packages", "/usr/lib/python3/dist-packages"]}

PLAY RECAP **************************************************************************************************************
192.169.233.128 : ok=651 changed=316 unreachable=0 failed=1 skipped=392 rescued=0 ignored=2

✗ Apply failed with status code 2

Saying something about undefined role. Any clues where to start digging?

I believe that error means that the postgres role does not exist. This could be because there are multiple instances of postgres running, and the wrong one is occupying the port commcare-cloud expects, or perhaps initial setup failed to create that role.

Hello,

This indicates that the postgres user axk5z8gymsfryjnqeun0gicdqdj4tu3j doesn't exist. CommCareHQ code base uses mainly a user called commcarehq but the quick-install script is generating a different username. There is possibly a bug. I will look into addressing this.

Meanwhile, to get around this, you can manually edit your environment config and do a deploy-stack again. Please do below on the above VM.

1 Like

Thank you for quick respond sir!
I've changed all usernames which was just a strange pool of symbolic string via ansible-vault edit ~/environments/<env_name>/vault.yml and ran commcare-cloud cchq deploy-stack --skip-check --skip-tags=users -e 'CCHQ_IS_FRESH_INSTALL=1', but got next error

TASK [couchdb2 : Add nodes] *********************************************************************************************
skipping: [192.169.233.128] => (item=192.169.233.128)

TASK [couchdb2 : Create system databases] *******************************************************************************
failed: [192.169.233.128] (item=_users) => {"ansible_loop_var": "item", "attempts": 1, "cache_control": "must-revalidate", "changed": false, "connection": "close", "content": "{"error":"unauthorized","reason":"Name or password is incorrect."}\n", "content_length": "67", "content_type": "application/json", "date": "Fri, 01 Jul 2022 22:18:42 GMT", "elapsed": 0, "item": "_users", "json": {"error": "unauthorized", "reason": "Name or password is incorrect."}, "msg": "Status code was 401 and not [201, 412]: HTTP Error 401: Unauthorized", "redirected": false, "server": "CouchDB/2.3.1 (Erlang OTP/19)", "status": 401, "url": "http://192.169.233.128:15984/_users", "x_couch_request_id": "5fced95eeb", "x_couchdb_body_time": "0"}
failed: [192.169.233.128] (item=_replicator) => {"ansible_loop_var": "item", "attempts": 1, "cache_control": "must-revalidate", "changed": false, "connection": "close", "content": "{"error":"unauthorized","reason":"Name or password is incorrect."}\n", "content_length": "67", "content_type": "application/json", "date": "Fri, 01 Jul 2022 22:18:42 GMT", "elapsed": 0, "item": "_replicator", "json": {"error": "unauthorized", "reason": "Name or password is incorrect."}, "msg": "Status code was 401 and not [201, 412]: HTTP Error 401: Unauthorized", "redirected": false, "server": "CouchDB/2.3.1 (Erlang OTP/19)", "status": 401, "url": "http://192.169.233.128:15984/_replicator", "x_couch_request_id": "0ebd25679f", "x_couchdb_body_time": "0"}
failed: [192.169.233.128] (item=_global_changes) => {"ansible_loop_var": "item", "attempts": 1, "cache_control": "must-revalidate", "changed": false, "connection": "close", "content": "{"error":"unauthorized","reason":"Name or password is incorrect."}\n", "content_length": "67", "content_type": "application/json", "date": "Fri, 01 Jul 2022 22:18:42 GMT", "elapsed": 0, "item": "_global_changes", "json": {"error": "unauthorized", "reason": "Name or password is incorrect."}, "msg": "Status code was 401 and not [201, 412]: HTTP Error 401: Unauthorized", "redirected": false, "server": "CouchDB/2.3.1 (Erlang OTP/19)", "status": 401, "url": "http://192.169.233.128:15984/_global_changes", "x_couch_request_id": "122c86bf79", "x_couchdb_body_time": "0"}

PLAY RECAP **************************************************************************************************************
192.169.233.128 : ok=215 changed=16 unreachable=0 failed=1 skipped=190 rescued=0 ignored=0

✗ Apply failed with status code 2

Any clue, what to do next?

Hello,

It is possible that the couchdb's credentials got updated to previous username, so the correctly updated username is no longer working.

The underlying bug that caused incorrect usernames is now fixed. You can re-pull the commcare-cloud code and then you could either try uninstalling couchdb and run the above command again or start all over by doing a system reset, if that's an option.

Thanks

Thank you very much for your help! After repository update script had been going well until the subcommand commcare-cloud $env_name django-manage create_kafka_topics . So I've got an error about zookeeper service exactly the same as in manual mode installation, described here After-reboot on a monolith consistently fails on start Zookeeper service - #7 by Sravan_Reddy