[Resolved] Issue deploying letsencrypt on monolith previously configured for custom certificates

A client has a monolith running with (now expired) certificates and I wanted to configure it for letsencrypt certificates instead while they renew the main certificate.

I'm following this: commcare-cloud/ssl.md at master · dimagi/commcare-cloud · GitHub
It's failing on the letsencrypt_cert.yml playbook:

PLAY [proxy] ***********************************************************************************************************************************************************************

TASK [Remove legacy certbot apt repo] **********************************************************************************************************************************************
--- before: /etc/apt/sources.list.d/ppa_certbot_certbot_bionic.list
+++ after: /dev/null
@@ -1 +0,0 @@
-deb http://ppa.launchpad.net/certbot/certbot/ubuntu bionic main

changed: [x.x.x.x]

TASK [Uninstall Certbot via APT] ***************************************************************************************************************************************************
The following packages were automatically installed and are no longer required:
  libdumbnet1 libfwup1 libllvm9 python3-acme python3-certbot
  python3-configargparse python3-configobj python3-future python3-icu
  python3-josepy python3-mock python3-ndg-httpsclient python3-parsedatetime
  python3-pbr python3-requests-toolbelt python3-zope.component
  python3-zope.event python3-zope.hookable
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
  certbot
[master 8fd64c0] saving uncommitted changes in /etc prior to apt run
 Author: ansible <ansible@monolith.xxxxx.org>
 4 files changed, 2 insertions(+), 4 deletions(-)
 delete mode 100644 apt/sources.list.d/apache_bintray_com_couchdb_deb.list
 create mode 100644 apt/sources.list.d/couchdb.list
 delete mode 100644 apt/sources.list.d/ppa_certbot_certbot_bionic.list
0 upgraded, 0 newly installed, 1 to remove and 57 not upgraded.
changed: [x.x.x.x]

TASK [Check certbot version] *******************************************************************************************************************************************************
changed: [x.x.x.x]

TASK [update snap core] ************************************************************************************************************************************************************
fatal: [x.x.x.x]: FAILED! => {"changed": true, "cmd": "snap install core; snap refresh core", "delta": "0:01:22.060717", "end": "2021-08-30 10:54:50.651656", "msg": "non-zero return code", "rc": 1, "start": "2021-08-30 10:53:28.590939", "stderr": "snap \"core\" is already installed, see 'snap help refresh'\nerror: cannot perform the following tasks:\n- Setup snap \"core\" (11606) security profiles (cannot update mount namespace of snap \"gnome-characters\": cannot update preserved namespace of snap \"gnome-characters\": cannot update snap namespace: remove /usr/bin/gjs: read-only file system)", "stderr_lines": ["snap \"core\" is already installed, see 'snap help refresh'", "error: cannot perform the following tasks:", "- Setup snap \"core\" (11606) security profiles (cannot update mount namespace of snap \"gnome-characters\": cannot update preserved namespace of snap \"gnome-characters\": cannot update snap namespace: remove /usr/bin/gjs: read-only file system)"], "stdout": "", "stdout_lines": []}

PLAY RECAP *************************************************************************************************************************************************************************
x.x.x.x            : ok=3    changed=3    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

✗ Apply failed with status code 2

Any ideas how I should approach this? Thanks!

EDIT It looks like a reboot has resolved it - likely an open file issue.
EDIT2 It's now failing on the Nginx deploy:

TASK [Install nginx] ***************************************************************************************************************************************************************
fatal: [x.x.x.x]: FAILED! => {"cache_update_time": 1630331397, "cache_updated": true, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"     --simulate install 'nginx=1.17.3-1~bionic'' failed: E: Packages were downgraded and -y was used without --allow-downgrades.\n", "rc": 100, "stderr": "E: Packages were downgraded and -y was used without --allow-downgrades.\n", "stderr_lines": ["E: Packages were downgraded and -y was used without --allow-downgrades."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nThe following packages were automatically installed and are no longer required:\n  libdumbnet1 libfwup1 libllvm9 python3-acme python3-certbot\n  python3-configargparse python3-configobj python3-future python3-icu\n  python3-josepy python3-mock python3-ndg-httpsclient python3-parsedatetime\n  python3-pbr python3-requests-toolbelt python3-zope.component\n  python3-zope.event python3-zope.hookable\nUse 'sudo apt autoremove' to remove them.\nThe following packages will be DOWNGRADED:\n  nginx\n0 upgraded, 0 newly installed, 1 downgraded, 0 to remove and 57 not upgraded.\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "The following packages were automatically installed and are no longer required:", "  libdumbnet1 libfwup1 libllvm9 python3-acme python3-certbot", "  python3-configargparse python3-configobj python3-future python3-icu", "  python3-josepy python3-mock python3-ndg-httpsclient python3-parsedatetime", "  python3-pbr python3-requests-toolbelt python3-zope.component", "  python3-zope.event python3-zope.hookable", "Use 'sudo apt autoremove' to remove them.", "The following packages will be DOWNGRADED:", "  nginx", "0 upgraded, 0 newly installed, 1 downgraded, 0 to remove and 57 not upgraded."]}

PLAY RECAP *************************************************************************************************************************************************************************
x.x.x.x            : ok=3    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

✗ Check failed with status code 2

EDIT3 I resolved that issue by manually removing Nginx with a sudo apt remove nginx, however a new issue has arisen:

Full log here:
https://pastebin.com/raw/VSB6uKfa

Relevant error:

RUNNING HANDLER [check nginx configuration] ****************************************************************************************************************************************
fatal: [x.x.x.x]: FAILED! => {"changed": true, "cmd": "nginx -t", "delta": "0:00:00.005114", "end": "2021-08-30 13:58:47.509462", "failed_when_result": true, "msg": "non-zero return code", "rc": 127, "start": "2021-08-30 13:58:47.504348", "stderr": "/bin/sh: 1: nginx: not found", "stderr_lines": ["/bin/sh: 1: nginx: not found"], "stdout": "", "stdout_lines": []}

RUNNING HANDLER [reload the nginx service] *****************************************************************************************************************************************

RUNNING HANDLER [Assert nginx is running] ******************************************************************************************************************************************

NO MORE HOSTS LEFT *****************************************************************************************************************************************************************

PLAY RECAP *************************************************************************************************************************************************************************
x.x.x.x            : ok=63   changed=9    unreachable=0    failed=1    skipped=28   rescued=0    ignored=0

✗ Check failed with status code 2

*EDIT4 letting the script run despite the apparent failure in the initial check step worked (I assume since it actually performed the Nginx install during one of the earlier scripts?), however, it still appears to be using the old certificate and not the letsencrypt one. I'll check logs and revert shortly.

So it looks like the original (expired) certificate and key are still at /etc/pki/tls/certs/monolith_nginx_combined.crt and /etc/pki/tls/private/monolith_nginx_commcarehq.org.key and are being referenced in the /etc/nginx/sites-enabled/monolith_commcare file. I tried re-running the deploy_proxy playbook but the config file is not updated to use the letsencrypt certificates (which I imagine are stored elsewhere). I'll try remarking out the two references to the old certificates and redeploying proxy... more to come.

EDIT5 despite the proxy.yml having fake_ssl_cert = no and the certificate entries being removed from the vault.yml, it still appears to be adding references to the old certificate files in the /etc/nginx/sites-available/monolith_commcare file:

TASK [nginx : Create the site configurations] **************************************************************************************************************************************
--- before: /etc/nginx/sites-available/monolith_commcare
+++ after: /home/ccc/.ansible/tmp/ansible-local-32516w96ayrd1/tmpbwebl4q6/site.j2
@@ -43,8 +43,8 @@
   access_log /home/cchq/www/monolith/log/monolith_commcare-nginx_access.log rt_cache;
   error_log /home/cchq/www/monolith/log/monolith_commcare-nginx_error.log warn;

-#    ssl_certificate /etc/pki/tls/certs/monolith_nginx_combined.crt;
-#  ssl_certificate_key /etc/pki/tls/private/monolith_nginx_commcarehq.org.key;
+    ssl_certificate /etc/pki/tls/certs/monolith_nginx_combined.crt;
+  ssl_certificate_key /etc/pki/tls/private/monolith_nginx_commcarehq.org.key;

I'll have another look at sample environment proxy.yml files to see if there's something else I'm missing...

SOLVED It seems we were missing letsencrypt_cchq_ssl: True in the proxy.yml file. I'm not sure why it was missing, but adding that was required for the system to use the letsencrypt certificate. Mystery solved!

1 Like

Hi Ed

I'm glad you were able to resolve this, thanks for the updates!

1 Like