Fresh monolith installation

erobinson · May 14, 2021, 4:49pm

I'm setting up a fresh monolith installation using the instructions here and want to use this thread for feedback and queries I have since some things have changed since my last deployment.

The first query I have relates to the environment configuration. Is there some sort of documentation on the various variables used in the environment config? For example, the default environment configuration differs from our configuration from a previous deployment as follows:

In app-processes.yml, the old config has these entries missing from the new default:

case-pillow:
  num_processes: 1
xform-pillow:
  num_processes: 1

These settings are present in the old config but not in the new default:

CaseToElasticsearchPillow:
  num_processes: 1
FormSubmissionMetadataTrackerPillow:
  num_processes: 1
GroupToUserPillow:
  num_processes: 1
XFormToElasticsearchPillow:
  num_processes: 1
kafka-ucr-static:
  num_processes: 1

These settings are different in the old file:

Old                New
user-pillow:   ->  UserPillow:
group-pillow:  ->  GroupPillow:

In inventory.yml this setting is different in the new default:

Old                                New
[shared_dir_host:children]         [shared_dir_host:children]
                                   monolith

This entry is in the new default:

[citusdb_worker]

In meta.yml this setting is not in the old system but is in the new default:

always_deploy_formplayer: true

In public.yml this has changed:

Old                                 New
RESTRICT_DOMAIN_CREATION: False     RESTRICT_DOMAIN_CREATION: True

These are found in the old file but not in the new defaults:

PY3_RUN_CELERY_BEAT: False
PY3_RUN_CELERY_FLOWER: False
PY3_RUN_CELERY_WORKER: False
PY3_RUN_ERRAND_BOY: False
PY3_RUN_GUNICORN: False
PY3_RUN_MANAGEMENT: False
PY3_RUN_PILLOWTOP: False
PY3_RUN_WEBSOCKETS: False

Thanks!

Ethan_Soergel · May 14, 2021, 7:39pm

Yup, looks like there have been a number of changes since that config was set up. There's some documentation for many of those configuration options here.

This changelog entry explains the changes to the pillow processes that you describe. Basically, we consolidated a lot of related pillows as a performance optimization.
https://dimagi.github.io/commcare-cloud/changelog/0007-reorganize-pillows.html

Some of the others I'm not sure offhand, but you might be able to find more info in that docs page I linked earlier. I think most of this is just related to things that have changed in the interim. For example, always_deploy_formplayer was added because we split out the formplayer deploy from the commcare-hq deploy. Since a formplayer deploy involves some service disruption, it's nice to avoid that when unnecessary. That option lets you deploy both together without needing to worry about managing two deploy schedules.

RESTRICT_DOMAIN_CREATION I believe was set to True because we assume most self-hosters would prefer that as the default behavior.

We did a series of changes a year or two ago to set more sensible defaults for the typical use-case of a small, single-project environment. Additionally, we've aimed to keep that sample config up to date with changes and improvements to CommCareHQ and its deploy process as it is developed.

erobinson · June 10, 2021, 9:34am

Odd thing - In one of our monolith installations the commcare cloud configuration appears to have inexplicably stopped loading at log in. Running

source ~/.commcare-cloud/load_config.sh

doesn't appear to do anything. I'll see if I can troubleshoot in the mean time, but any ideas why that would happen after it worked previously?
I tried running manually using the bash -x switch and got this output:

+ export COMMCARE_CLOUD_ENVIRONMENTS=/home/ccc/environments
+ COMMCARE_CLOUD_ENVIRONMENTS=/home/ccc/environments
+ export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/ccc/.virtualenvs/cchq/bin:/home/ccc/.virtualenvs/cchq/bin
+ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/ccc/.virtualenvs/cchq/bin:/home/ccc/.virtualenvs/cchq/bin
+ source /home/ccc/commcare-cloud/src/commcare_cloud/.bash_completion
++ complete -F _commcare_cloud commcare-cloud
++ complete -F _commcare_cloud cchq

Simon_Kelly · June 10, 2021, 11:48am

There are two scripts that 'should' be set up to run when you start a bash terminal session. Perhaps you are missing the 2nd one:

~/.bash_profile

source ~/.commcare-cloud/load_config.sh

With only this file I can run commcare-cloud and I get bash completion.

~/.profile:

[ -t 1 ] && source ~/init-ansible

(This is a symbolic link to /home/{username}/commcare-cloud/control/init.sh.)

This is script does more setup and adds some bash aliases like update-code.

I'm not entirely sure why we have both. Perhaps @dannyroberts @Daniel_Miller can comment?

erobinson · June 10, 2021, 11:52am

On my system I have ~/.bash_profile as on yours. What's different is ~/.profile, on mine it's contents are:

# ~/.profile: executed by the command interpreter for login shells.
# This file is not read by bash(1), if ~/.bash_profile or ~/.bash_login
# exists.
# see /usr/share/doc/bash/examples/startup-files for examples.
# the files are located in the bash-doc package.

# the default umask is set in /etc/profile; for setting the umask
# for ssh logins, install and configure the libpam-umask package.
#umask 022

# if running bash
if [ -n "$BASH_VERSION" ]; then
    # include .bashrc if it exists
    if [ -f "$HOME/.bashrc" ]; then
	. "$HOME/.bashrc"
    fi
fi

# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
    PATH="$HOME/bin:$PATH"
fi
[ -t 1 ] && source ~/init-ansible

I assume the above is the default for Ubuntu 18.04 - I don't have any reference to ~/init-ansible

~/init-ansible is indeed a symlink to ~/environments/monolith/commcare-cloud/control/init.sh
I'm following instructions for the setup here: http://dimagi.github.io/commcare-cloud/setup/new_environment.html

Here's the prompt right after logging in:

And running ~/.commcare-cloud/load_config.sh produces no output at all.
If I run [ -t 1 ] && source ~/init-ansible from bash after logging in, I get:

Downloading dependencies from galaxy and pip
ansible-galaxy install -f -r /home/ccc/commcare-cloud/src/commcare_cloud/ansible/requirements.yml
ERROR: Cannot install -r /tmp/tmpj1sgcdup (line 1) and urllib3==1.26.5 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
Traceback (most recent call last):
  File "/home/ccc/.virtualenvs/cchq/bin/pip-sync", line 8, in <module>
    sys.exit(cli())
  File "/home/ccc/.virtualenvs/cchq/lib/python3.6/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/ccc/.virtualenvs/cchq/lib/python3.6/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/ccc/.virtualenvs/cchq/lib/python3.6/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ccc/.virtualenvs/cchq/lib/python3.6/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/ccc/.virtualenvs/cchq/lib/python3.6/site-packages/piptools/scripts/sync.py", line 151, in cli
    ask=ask,
  File "/home/ccc/.virtualenvs/cchq/lib/python3.6/site-packages/piptools/sync.py", line 256, in sync
    check=True,
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/home/ccc/.virtualenvs/cchq/bin/python', '-m', 'pip', 'install', '-r', '/tmp/tmpj1sgcdup', '-q']' returned non-zero exit status 1.
[WARNING]: - dependency andrewrothstein.couchdb (v2.1.4) (v2.1.4) from role andrewrothstein.couchdb-cluster differs from already installed version
(fcb957ed038ab1c4fddcfef6b9c7617dcdeec9b7), skipping
[WARNING]: - dependency ANXS.cron (None) from role tmpreaper differs from already installed version (v1.0.2), skipping
/home/ccc
[1]   Done                    { COMMCARE=; cd ${COMMCARE_CLOUD_REPO}; pip install --quiet --upgrade pip-tools; pip-sync --quiet requirements.txt; pip install --quiet --editable .; cd -; }
[2]-  Done                    COMMCARE= pip install --quiet --upgrade pip
-bash: wait: %2: no such job
[WARNING]: - dependency sansible.java (None) from role sansible.logstash differs from already installed version (v2.1.4), skipping
[WARNING]: - dependency sansible.users_and_groups (None) from role sansible.logstash differs from already installed version (v2.0.5), skipping
ansible-galaxy collection install -f -r /home/ccc/commcare-cloud/src/commcare_cloud/ansible/requirements.yml
To finish first-time installation, run `manage-commcare-cloud configure`
[3]+  Done                    COMMCARE= manage-commcare-cloud install
✓ origin already set to https://github.com/dimagi/commcare-cloud.git
✗ /home/ccc/commcare-cloud/src/commcare_cloud/fab/config.py does not exist and suitable location to copy it from was not found.
  This file is just a convenience, so this is a non-critical error.
  If you have fab/config.py in a previous location, then copy it to /home/ccc/commcare-cloud/src/commcare_cloud/fab/config.py.
/home/ccc

Welcome to commcare-cloud

Available commands:
update-code - update the commcare-cloud repositories (safely)
source /home/ccc/.virtualenvs/cchq/bin/activate - activate the ansible virtual environment
ansible-deploy-control [environment] - deploy changes to users on this control machine
commcare-cloud - CLI wrapper for ansible.
                 See commcare-cloud -h for more details.
                 See commcare-cloud <env> <command> -h for command details.

Thanks
Ed

Simon_Kelly · June 10, 2021, 1:34pm

There was PR merged recently to update the urllib3 dependency version which was subsequently reverted due to this issue. If you update your code to the latest that pip error should go away.

I also realized that the presence of .bash_profile prevent .profile from being executed. I think the best way forward is to remove .bash_profile and leave the [ -t 1 ] && source ~/init-ansible line in .profile. This should run that script on login.

I'll follow up with our other devs about reconciling the usages of these two files.

erobinson · June 10, 2021, 2:05pm

Thanks @Simon_Kelly, in addition, I'll wipe and re-run the installation process to see if there's a point where this issue arises.
Cheers!

EDIT in the mean time running a code-update and deleting ~./.bash_profile does the trick for loading the environment on login. I'll see about figuring the other issue out and get back to you.

erobinson · June 16, 2021, 7:33pm

OK, the above was sorted by adding the ~/.commcare-cloud/load_config.sh command to ~/.profile rather than ~/.bash_profile but now the instructions here appear to be failing on deploy-stack --first-time -e:

(cchq) ccc@CCHQ-prod:~/commcare-cloud$ commcare-cloud monolith deploy-stack --first-time -e 'CCHQ_IS_FRESH_INSTALL=1'
Traceback (most recent call last):
  File "/home/ccc/.virtualenvs/cchq/bin/commcare-cloud", line 33, in <module>
    sys.exit(load_entry_point('commcare-cloud', 'console_scripts', 'commcare-cloud')())
  File "/home/ccc/commcare-cloud/src/commcare_cloud/commcare_cloud.py", line 206, in main
    exit_code = call_commcare_cloud()
  File "/home/ccc/commcare-cloud/src/commcare_cloud/commcare_cloud.py", line 197, in call_commcare_cloud
    exit_code = commands[args.command].run(args, unknown_args)
  File "/home/ccc/commcare-cloud/src/commcare_cloud/commands/ansible/ansible_playbook.py", line 204, in run
    rc = BootstrapUsers(self.parser).run(deepcopy(args), deepcopy(unknown_args))
  File "/home/ccc/commcare-cloud/src/commcare_cloud/commands/ansible/ansible_playbook.py", line 322, in run
    return AnsiblePlaybook(self.parser).run(args, unknown_args, always_skip_check=True)
  File "/home/ccc/commcare-cloud/src/commcare_cloud/commands/ansible/ansible_playbook.py", line 78, in run
    environment.create_generated_yml()
  File "/home/ccc/commcare-cloud/src/commcare_cloud/environment/main.py", line 351, in create_generated_yml
    'dev_users': self.users_config.dev_users.to_json(),
  File "/home/ccc/.virtualenvs/cchq/lib/python3.6/site-packages/memoized.py", line 20, in _memoized
    cache[key] = value = fn(*args, **kwargs)
  File "/home/ccc/commcare-cloud/src/commcare_cloud/environment/main.py", line 178, in users_config
    present_users += user_group_json['dev_users']['present']
KeyError: 'dev_users'

I only have one user (ccc) in my ~/environments/_users/admins.yml file and the pub key is present in ~/environments/_authorized_keys/
In case it's useful, this is the directory tree from ~/ at that point in the installation:

├── commcare-cloud
│   ├── changelog
│   ├── commcare-cloud-bootstrap
│   │   ├── environment
│   │   └── specs
│   ├── control
│   ├── decisions
│   ├── docs
│   │   ├── changelog
│   │   ├── commcare-cloud
│   │   │   ├── commands
│   │   │   └── env
│   │   ├── firefighting
│   │   ├── howto
│   │   ├── monitoring
│   │   │   └── datadog_dashboards
│   │   ├── services
│   │   │   ├── airflow
│   │   │   ├── blobdb
│   │   │   ├── elasticsearch
│   │   │   ├── kafka
│   │   │   ├── nginx
│   │   │   ├── pillowtop
│   │   │   ├── postgresql
│   │   │   ├── rabbitmq
│   │   │   └── redis
│   │   ├── setup
│   │   │   └── new_environment_qa_img
│   │   └── system
│   ├── environments
│   │   ├── 64-test
│   │   ├── _authorized_keys
│   │   ├── _users
│   │   ├── confluence
│   │   ├── development
│   │   ├── echis
│   │   ├── enikshay-reference
│   │   ├── india
│   │   │   └── migrations
│   │   │       ├── 0001-couch_3_big_nodes
│   │   │       ├── 0002-add-couch-node-multiaz
│   │   │       ├── 0002-copy-certs-to-proxy1
│   │   │       └── 0003-migrate-to-aws
│   │   ├── jenkins
│   │   ├── motech
│   │   ├── pna
│   │   ├── production
│   │   │   └── migrations
│   │   │       ├── 0001-separate-main-couch-db
│   │   │       ├── 0002-redis
│   │   │       ├── 0003-give-commcarehq-couch-db-3-copies
│   │   │       │   └── migration_build_production_couchdb
│   │   │       ├── 0004-migrate-non-main-dbs
│   │   │       ├── 0005-reduce-couchdb-to-6-nodes
│   │   │       ├── 0006-move-to-3-bionic-nodes
│   │   │       └── 0007-get-production-couch-back-to-safety
│   │   ├── staging
│   │   │   └── migrations
│   │   │       ├── 0001-add-couch-node
│   │   │       ├── 0002-add-couch-node
│   │   │       ├── 0003-add-couch-node
│   │   │       └── 0004-add-couch-node
│   │   └── swiss
│   ├── git-hooks
│   ├── provisioning
│   ├── scripts
│   │   ├── aws
│   │   ├── inventory
│   │   └── tcl
│   ├── src
│   │   ├── commcare_cloud
│   │   │   ├── __pycache__
│   │   │   ├── ansible
│   │   │   │   ├── group_vars
│   │   │   │   ├── library
│   │   │   │   ├── openvpn_playbooks
│   │   │   │   ├── partials
│   │   │   │   ├── plugins
│   │   │   │   │   ├── inventory
│   │   │   │   │   │   └── __pycache__
│   │   │   │   │   └── lookup
│   │   │   │   ├── roles
│   │   │   │   │   ├── airflow
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── ansible-control
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── aws-efs
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── backups
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── bootstrap-machine
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── bootstrap-users
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── chaos
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── citusdb
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── cloudwatch_logs
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── commcarehq
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── common
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── common_installs
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── couchdb2
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── couchdb2-preinstall
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── datadog
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── devops_scripts
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── ebsnvme
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   │   └── _vendor
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── ecryptfs
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── edit
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── elasticsearch
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   │       ├── config
│   │   │   │   │   │       ├── systemd
│   │   │   │   │   │       └── upstart
│   │   │   │   │   ├── formplayer
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── git
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── haproxy
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── http_proxy
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── java
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── kafka
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── keepalived
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── kernel_tune
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── keystore
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── kinesis_agent
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── ksplice
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── logrotate
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── tests
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── lpar2rrd
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── lvm
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── monit
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── nginx
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── nodejs
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── pg_backup
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   │       └── plain
│   │   │   │   │   ├── pg_repack
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── files
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── pgbouncer
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── postgresql
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── postgresql_base
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── python
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── rabbitmq
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── redis
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   ├── templates
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── redis_monitoring
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── ruby_install
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── sentry
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── shared_dir
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── vars
│   │   │   │   │   ├── ssh
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── supervisor
│   │   │   │   │   │   ├── defaults
│   │   │   │   │   │   ├── handlers
│   │   │   │   │   │   ├── tasks
│   │   │   │   │   │   └── templates
│   │   │   │   │   ├── swap
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── ufw
│   │   │   │   │   │   └── tasks
│   │   │   │   │   ├── webworker
│   │   │   │   │   │   ├── meta
│   │   │   │   │   │   └── tasks
│   │   │   │   │   └── zookeeper
│   │   │   │   │       ├── meta
│   │   │   │   │       └── tasks
│   │   │   │   └── service_playbooks
│   │   │   ├── commands
│   │   │   │   ├── __pycache__
│   │   │   │   ├── ansible
│   │   │   │   │   └── __pycache__
│   │   │   │   ├── deploy
│   │   │   │   │   └── __pycache__
│   │   │   │   ├── inventory_lookup
│   │   │   │   │   └── __pycache__
│   │   │   │   ├── migrations
│   │   │   │   │   ├── __pycache__
│   │   │   │   │   ├── plays
│   │   │   │   │   └── templates
│   │   │   │   └── terraform
│   │   │   │       ├── __pycache__
│   │   │   │       ├── migrations
│   │   │   │       ├── templates
│   │   │   │       └── tests
│   │   │   ├── environment
│   │   │   │   ├── __pycache__
│   │   │   │   ├── schemas
│   │   │   │   │   ├── __pycache__
│   │   │   │   │   └── tests
│   │   │   │   └── secrets
│   │   │   │       ├── __pycache__
│   │   │   │       └── backends
│   │   │   │           ├── __pycache__
│   │   │   │           ├── ansible_vault
│   │   │   │           │   ├── __pycache__
│   │   │   │           │   └── tests
│   │   │   │           └── aws_secrets
│   │   │   │               ├── __pycache__
│   │   │   │               └── tests
│   │   │   ├── environmental-defaults
│   │   │   ├── fab
│   │   │   │   ├── __pycache__
│   │   │   │   ├── checks
│   │   │   │   ├── diff_templates
│   │   │   │   └── operations
│   │   │   ├── help_cache
│   │   │   ├── manage_commcare_cloud
│   │   │   │   ├── __pycache__
│   │   │   │   ├── monitors
│   │   │   │   └── tests
│   │   │   └── terraform
│   │   │       └── modules
│   │   │           ├── efs_file_system
│   │   │           │   └── mount-point
│   │   │           ├── elasticache
│   │   │           ├── elasticache-cluster
│   │   │           ├── ga_alb_waf
│   │   │           ├── iam
│   │   │           │   └── user
│   │   │           ├── internal_alb
│   │   │           ├── logshipping
│   │   │           │   └── firehose_stream
│   │   │           ├── network
│   │   │           ├── openvpn
│   │   │           ├── pgbouncer_nlb
│   │   │           ├── postgresql
│   │   │           ├── r53-private-zone-create-update
│   │   │           ├── r53-record-create-update
│   │   │           ├── s3-bucket-rdb-backup
│   │   │           └── server
│   │   │               └── iam
│   │   └── commcare_cloud.egg-info
│   └── tests
│       ├── couch_migration_config
│       │   ├── env1
│       │   └── plans
│       │       ├── new_node
│       │       │   └── migration_build_env1_plan
│       │       ├── new_node_empty
│       │       │   └── migration_build_env1_plan
│       │       └── reshard
│       │           └── migration_build_env1_plan
│       ├── csv_env
│       │   ├── multi_file
│       │   │   └── inventory
│       │   └── single_file
│       ├── file_migration_data
│       │   ├── source_env
│       │   └── target_env
│       ├── test_deploy
│       ├── test_envs
│       │   ├── 2018-04-04-development-snapshot
│       │   ├── 2018-04-04-enikshay-snapshot
│       │   ├── 2018-04-04-icds-new-snapshot
│       │   ├── 2018-04-04-pna-snapshot
│       │   ├── 2018-04-04-production-snapshot
│       │   ├── 2018-04-04-softlayer-snapshot
│       │   ├── 2018-04-04-staging-snapshot
│       │   ├── 2018-04-04-swiss-snapshot
│       │   └── small_cluster
│       └── test_ssh_envs
│           ├── no_strict_known_hosts
│           ├── simple_ssh
│           └── ssh_no_known_hosts
└── environments
     ├── _authorized_keys
     ├── _users
     └── monolith

Any idea what I should be looking for?

Simon_Kelly · June 17, 2021, 6:28am

Hi Ed

In your monolith/meta.yml file you need to have a section defining which user lists to use. For example I think this is what you should have in that file.

users:
  - admins

The reason for having to specify this is that the user list can be shared by more than one environment.

erobinson · June 17, 2021, 9:59am

Hey Simon, thanks for the quick response, here's my monolith/meta.yml:

deploy_env: monolith
env_monitoring_id: monolith
always_deploy_formplayer: true
users:
  - admins

My ~/environments/_users/admins.yml:

admins:
  present:
    - ccc
  absent: []

and the ccc.pub key file is dound under ~/environments/_authorized_keys

Any other ideas?
Thanks

EDIT
It seems I had to edit the admins.yml file and replace admins: with dev_users:
This is the updated file:

dev_users:
  present:
    - ccc
  absent: []

I believe this has changed since our last server deployment.

erobinson · June 17, 2021, 6:20pm

Deployment got as far as pgbouncer monit monitor:

TASK [pgbouncer monit monitor] **********************************************************************************************************************
fatal: [10.1.0.4]: FAILED! => {"attempts": 1, "changed": false, "cmd": "/usr/bin/monit summary -B", "msg": "Cannot create socket to [localhost]:2812 -- Connection refused", "rc": 1, "stderr": "Cannot create socket to [localhost]:2812 -- Connection refused\n", "stderr_lines": ["Cannot create socket to [localhost]:2812 -- Connection refused"], "stdout": "", "stdout_lines": []}

netstat reveals monit is running on that port

tcp        0      0 127.0.0.1:2812          0.0.0.0:*               LISTEN      9366/monit

I assume this is a firewall issue. Is it possible it was left off this table by mistake?

EDIT
I restarted the process after allowing TCP 2812 to localhost and it's halted with this now:

TASK [kernel_tune : set disk scheduler to noop for every raw device] ********************************************************************************
failed: [10.1.0.4] (item=sda) => {"ansible_loop_var": "item", "changed": true, "item": "sda", "msg": "non-zero return code", "rc": 1, "stderr": "Shared connection to 10.1.0.4 closed.\r\n", "stderr_lines": ["Shared connection to 10.1.0.4 closed."], "stdout": "\r\nsh: echo: I/O error\r\n", "stdout_lines": ["", "sh: echo: I/O error"]}
failed: [10.1.0.4] (item=sdb) => {"ansible_loop_var": "item", "changed": true, "item": "sdb", "msg": "non-zero return code", "rc": 1, "stderr": "Shared connection to 10.1.0.4 closed.\r\n", "stderr_lines": ["Shared connection to 10.1.0.4 closed."], "stdout": "\r\nsh: echo: I/O error\r\n", "stdout_lines": ["", "sh: echo: I/O error"]}
failed: [10.1.0.4] (item=sdc) => {"ansible_loop_var": "item", "changed": true, "item": "sdc", "msg": "non-zero return code", "rc": 1, "stderr": "Shared connection to 10.1.0.4 closed.\r\n", "stderr_lines": ["Shared connection to 10.1.0.4 closed."], "stdout": "\r\nsh: echo: I/O error\r\n", "stdout_lines": ["", "sh: echo: I/O error"]}

Any help appreciated!

Simon_Kelly · June 18, 2021, 6:56am

Port 2812 is used by monit's embedded web server. That has to be running for monit to work correctly. That port should not be made available outside of the machine.

You could try restart monit and check /var/log/monit.log or syslog to see if there is any information indicating why it's not running.

erobinson · June 18, 2021, 9:25am

Thanks Simon - it's definitely running and listening but I had to enable access to port 2812 (to localhost only):
sudo ufw allow from 127.0.0.1 to 127.0.0.1 port 2812 proto tcp
It seemed to get past the [pgbouncer monit monitor] step OK after that. I can test again without the firewall entry and see how it behaves. Will report back.

EDIT this time it ran through the [pgbouncer monit monitor] step OK (I performed the deploy from the top - enabling root and password login) so that was likely just a temporary glitch earlier for whatever reason. It is still sticking at [kernel_tune : set disk scheduler to noop for every raw device] as per my previous message.
If I run the command from the shell as root, I get this:

# echo noop > /sys/block/sda/queue/scheduler
bash: echo: write error: Invalid argument

FWIW this is an Azure Ubuntu 18.04 LTS VM. Current scheduler is set to:
[mq-deadline] none
This may be something specific to Azure but I will look into how to go about updating it on my VM and disabling that ansible script. Is this a permanent change - i.e. not just for the deploy? I assume the scheduling is handled at the hypervisor level on these Azure VMs.

For now i'm disabling that task and the subsequent grub tasks in disk_scheduler.yml and continuing with the setup.

EDIT services installation succeeded with that omitted. For some reason it failed on one of the other monit steps but on restarting it (no changes made), it succeeded so those were clearly temporary glitches.

erobinson · June 21, 2021, 11:33am

Quick q - if I were to make updates to the TLS certificates, is there a quick way to deploy nginx alone? I recall a deploy_proxy playbook but can't seem to find it anymore.
Thanks!

EDIT I did a full deploy and it seems to be ignoring my certificates and private key and using a self-signed certificate instead.
I have added the certificate and PK to the vault in this format:

ssl_secrets:
  certs:
    my_site: |
      -----BEGIN CERTIFICATE-----
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      -----END CERTIFICATE-----
  private_keys: 
    my_site: |
      -----BEGIN PRIVATE KEY-----
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      -----END PRIVATE KEY------

I have also updated proxy.yml to include the following:

fake_ssl_cert: no
nginx_combined_cert_value: "{{ ssl_secrets.certs.my_site }}"
nginx_key_value: "{{ ssl_secrets.private_keys.my_site }}"

Is this configuration still supported (it was at the last server I set up some time back)?
Thanks!

Simon_Kelly · June 22, 2021, 9:54am

Hi Ed

This configuration is still supported and from what I can tell you're config looks correct. You can run the cert tasks with the update-cert tag:

cchq <env> ap deploy_proxy.yml --tags update-cert

The result should be 2 files:

/etc/pki/tls/certs/{{ deploy_env }}_nginx_combined.crt
/etc/pki/tls/private/{{ deploy_env }}_nginx_commcarehq.org.key

You should also check the nginx config:

/etc/nginx/sites-available/{{ deploy_env }}_commcare

This file should have config keys pointing to the above files:

ssl_certificate
ssl_certificate_key

erobinson · June 29, 2021, 11:23am

I'm having trouble with mail routing from the app. I've a localhost based postfix configuration on port 25 set up that sends mail OK using mailutils. My email configuration seems fine in public.yml:

  EMAIL_SMTP_HOST: 'localhost'
  EMAIL_SMTP_PORT: 25
  EMAIL_USE_TLS: no

Which logs should I be checking on the CommCare side to see what it's up to - postfix log is void of any entries coming from Commcare.

Thanks!

EDIT just to add, I tried with an external SMTP server over TLS as well but no mail appears to be reaching the server.

Ethan_Soergel · June 29, 2021, 12:01pm

It depends a little on what email you're talking about, but most emails are sent from celery, in the email_queue, so you can check the celery logs. The log file should have email_queue in the filename, though I think for the default monolith install there's only one celery log file.

erobinson · June 29, 2021, 12:10pm

Thanks Ethan, at this stage I'm inviting web users and they're not receiving the invitation. I'll check the celery logs and report back.

Ethan_Soergel · June 29, 2021, 12:22pm

Yes, I believe those should be sent from that email queue

erobinson · June 29, 2021, 1:17pm

Thanks, I see entries that point to an issue. On the localhost postfix smtp, authentication is not required (available to localhost only). I see it inserts a password of 'dummy' if none is provided. I also noticed this change made during an update-config if I switch from an external smtp host with user/pass and tls to localhost with no user name / password and no tls:

TASK [commcarehq : copy localsettings] ************************************************************************************************************
--- before: /home/cchq/www/monolith/current/localsettings.py
+++ after: /home/ccc/.ansible/tmp/ansible-local-10983ftlgbp2u/tmpkq7xi44w/localsettings.py.j2
@@ -122,10 +122,8 @@
 
 # Email setup
 # email settings: these ones are the custom hq ones
-EMAIL_LOGIN = "fakeusername"
-EMAIL_PASSWORD = "xxxxxxxxxxx"
-EMAIL_SMTP_HOST = "smtp.externalserver.com"
-EMAIL_SMTP_PORT = 587
+EMAIL_BACKEND = 'django.core.mail.backends.filebased.EmailBackend'
+EMAIL_FILE_PATH = '/tmp/django_email'
 
 RETURN_PATH_EMAIL_PASSWORD = ""
 
@@ -493,7 +491,7 @@
 
-EMAIL_USE_TLS = True
+EMAIL_USE_TLS = False

changed: [10.1.0.4]

TASK [Update formplayer config files] *************************************************************************************************************
--- before: /home/cchq/www/monolith/formplayer_build/current/application.properties
+++ after: /home/ccc/.ansible/tmp/ansible-local-10983ftlgbp2u/tmpda40k53t/application.properties.j2
@@ -24,10 +24,10 @@
 
 spring.jpa.hibernate.ddl-auto
 
-smtp.host=smtp.exernalserver.com
-smtp.port=587
-smtp.username=username
-smtp.password=xxxxxxxxxxx
+smtp.host=localhost
+smtp.port=25
+smtp.username=
+smtp.password=dummy
 
 smtp.from.address=commcarehq-noreply+10@dimagi.com
 smtp.to.address=commcarehq-ops+formplayer@dimagi.com

changed: [10.1.0.4] => (item={'template': 'application.properties.j2', 'filename': 'application.properties'})
ok: [10.1.0.4] => (item={'template': 'logback-spring.xml.j2', 'filename': 'logback-spring.xml'})

I assume it's not going to send any mail with those settings. In the celery log when firing off an email, this appears:

[2021-06-29 13:13:23,617: INFO/MainProcess] Received task: corehq.apps.hqwebapp.tasks.send_html_email_async[99b98d40-0116-4469-8d66-a6a6130b4950]  
[2021-06-29 13:13:23,640: INFO/ForkPoolWorker-14] Task corehq.apps.hqwebapp.tasks.send_html_email_async[99b98d40-0116-4469-8d66-a6a6130b4950] succeeded in 0.02117404999989958s: None
[2021-06-29 13:13:26,810: INFO/MainProcess] Scaling down 1 processes.

Is there a special config I can use to have it send mail via smtp on localhost using port 25 and no authentication?

EDIT it may just be easier to enable auth on my SMTP server and test. I'll do that and report back.
EDIT it looks like it will be easiest to go with an external SMTP relay server.