Cleaning up the Ansible monolith deployment path

Hey guys,

Hope all is well. Let me preface this with a thank you—I know you’ve got a
lot going on and don’t rely on ansible monolith deployments for your core
work, so I realize that any help you provide here is going above and
beyond. Thank you for that!

My objective is to get ansible-playbook -i inventories/monolith -u root -e
@vars/dev/dev_private.yml’ -e ‘@vars/dev/dev_public.yml’ deploy_stack.yml

running on a freshly provisioned Ubuntu 14.04.5 LTS (GNU/Linux
3.13.0-125-generic x86_64) droplet with 2 gigs of memory.

While I think that’s a solid goal for the whole CommCare open-source
community, I’d like to disclose that we’ve also got a client at Open
Function that wants to connect CommCare to another system using OpenFn, but
CommCare needs to be hosted on their servers due to regulatory issues.

Note that we made a couple of changes vagrant and edited some ansible
scripts. You can see this work here:
https://github.com/rorymckinley/commcare-sandbox/pull/1/files. One
significant change is that we are running the vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]

··· ***************************** ok: [165.227.172.214] => (item={u'username': u'commcarehq', u'name': u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) failed: [165.227.172.214] (item={u'username': u'commcarehq', u'name': u'commcarehq__users', u'is_https': False, u'host': u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) => {"cache_control": "must-revalidate", "content": "{\"error\":\"unauthorized\",\"reason\":\"You are not a server admin.\"}\n", "content_length": "64", "content_type": "text/plain; charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true, "item": {"host": "165.227.172.214", "is_https": false, "name": "commcarehq__users", "password": "commcarehq", "port": 5984, "username": "commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401: Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang OTP/R16B03)", "status": 401, "url": "http://165.227.172.214:5984/_config/admins/commcarehq"} to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP


165.227.172.214 : ok=135 changed=90 unreachable=0
failed=1

Possible solution 1: This task runs twice, but each user in “items” has
the same username and password. The failure can be stepped over, as we
don’t need to (and can’t) set up two different couchdb users with
commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails. After I reboot the
system and start monit manually they pass and redis is running, but
couchdb2 still shows “Execution failed”. After another system reboot, and
manually starting monit, both now show as running and being monitored.

monit status: Process 'couchdb2’
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017 11:59:49

TASK [couchdb2 : monit]


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”: true,
“msg”: “couchdb2 process not presently configured with monit”, “name”:
“couchdb2”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP


165.227.172.214 : ok=36 changed=20 unreachable=0
failed=1

TASK [redis : monit]


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”: true,
“msg”: “redis process not presently configured with monit”, “name”:
“redis”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit]


RUNNING HANDLER [redis : restart redis]


RUNNING HANDLER [redis : restart rsyslog]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP


165.227.172.214 : ok=17 changed=10 unreachable=0 failed=1

Issue 4:
TASK [touchforms : Touchforms user]


An exception occurred during task execution. To see the full traceback, use
-vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {“changed”: false,
“failed”: true, “module_stderr”: “Traceback (most recent call last):\n
File “/tmp/ansible_iUft9p/ansible_module_django_user.py”, line 144, in
\n main()\n File
”/tmp/ansible_iUft9p/ansible_module_django_user.py", line 125, in main\n
user.create_user()\n File
"/tmp/ansible_iUft9p/ansible_module_django_user.py", line 84, in
create_user\n superuser=repr(self.superuser),\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 1427, in call\n
return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 774, in init\n
self.wait()\n File “/usr/local/lib/python2.7/dist-packages/sh.py”,
line 792, in wait\n self.handle_command_exit_code(exit_code)\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 815, in
handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN:
/home/cchq/www/dev/current/python_env/bin/python manage.py shell
–plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n
File “manage.py”, line 9, in \n import django\nImportError: No
module named django\n\n", “module_stdout”: “Traceback (most recent call
last):\n File “manage.py”, line 9, in \n import
django\nImportError: No module named django\n\n”, “msg”: “MODULE FAILURE”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when we visit our
IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111:
Connection refused) while connecting to upstream, client: 186.106.251.211,
server: 165.227.172.214, request: “GET /favicon.ico HTTP/1.1”, upstream:
http://165.227.172.214:9010/favicon.ico”, host: “165.227.172.214”,
referrer: “https://165.227.172.214/solutions/

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py”, line 176, in get_new_connection
connection = Database.connect(**conn_params)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py”, line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we’re wondering:

  1. Why isn’t the server running itself?
  2. And how do we get it to run?

Best,
Taylor

Update: Rory found that one issue lay in the encrypted fs stuff. ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us having to reboot
during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py migrate) *and
get:
File
"/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py",
line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when
calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I’m fairly certain I’ve configured my
bucket properly and granted access via the access key and secret. (These
are not part of version control in the shared repo, of course.) Will update
as we go.

FWIW, python manage.py compress fails because it can’t find the Font
Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT ‘/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles’ or
with staticfiles.

··· On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: > > Hey guys, > > Hope all is well. Let me preface this with a thank you—I know you've got a > lot going on and don't rely on ansible monolith deployments for your core > work, so I realize that any help you provide here is going above and > beyond. Thank you for that! > > My objective is to get *ansible-playbook -i inventories/monolith -u root > -e '@vars/dev/dev_private.yml' -e '@vars/dev/dev_public.yml' > deploy_stack.yml* running on a freshly provisioned Ubuntu 14.04.5 LTS > (GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory. > > While I think that's a solid goal for the whole CommCare open-source > community, I'd like to disclose that we've also got a client at Open > Function that wants to connect CommCare to another system using OpenFn, but > CommCare needs to be hosted on their servers due to regulatory issues. > > Note that we made a couple of changes vagrant and edited some ansible > scripts. You can see this work here: > https://github.com/rorymckinley/commcare-sandbox/pull/1/files. One > significant change is that we are running the vagrant stuff as root. > > To the issues: > > *Issue #1:* > TASK [couchdb : Set CouchDB username and password] > ***************************** > ok: [165.227.172.214] => (item={u'username': u'commcarehq', u'name': > u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', > u'password': u'commcarehq', u'port': 5984}) > failed: [165.227.172.214] (item={u'username': u'commcarehq', u'name': > u'commcarehq__users', u'is_https': False, u'host': u'165.227.172.214', > u'password': u'commcarehq', u'port': 5984}) => {"cache_control": > "must-revalidate", "content": "{\"error\":\"unauthorized\",\"reason\":\"You > are not a server admin.\"}\n", "content_length": "64", "content_type": > "text/plain; charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", > "failed": true, "item": {"host": "165.227.172.214", "is_https": false, > "name": "commcarehq__users", "password": "commcarehq", "port": 5984, > "username": "commcarehq"}, "msg": "Status code was not [200]: HTTP Error > 401: Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang > OTP/R16B03)", "status": 401, "url": " > http://165.227.172.214:5984/_config/admins/commcarehq"} > to retry, use: --limit @/vagrant/ansible/deploy_stack.retry > > PLAY RECAP > ********************************************************************* > 165.227.172.214 : ok=135 changed=90 unreachable=0 > failed=1 > > *Possible solution 1:* This task runs twice, but each user in "items" has > the same username and password. The failure can be stepped over, as we > don't need to (and can't) set up two different couchdb users with > commcarehq:commcarehq on the same box. > > *Issue #2&3: *For both couchdb2 and redis, monit fails. After I reboot > the system and start monit manually they pass and redis is running, but > couchdb2 still shows "Execution failed". After another system reboot, and > manually starting monit, both now show as running and being monitored. > > monit status: Process 'couchdb2' > status Execution failed > monitoring status Monitored > data collected Thu, 05 Oct 2017 11:59:49 > > TASK [*couchdb2 : monit*] > ******************************************************** > fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": true, > "msg": "couchdb2 process not presently configured with monit", "name": > "couchdb2", "state": "monitored"} > > RUNNING HANDLER [monit : reload monit] > ***************************************** > to retry, use: --limit @/vagrant/ansible/deploy_stack.retry > > PLAY RECAP > ********************************************************************* > 165.227.172.214 : ok=36 changed=20 unreachable=0 > failed=1 > > TASK [*redis : monit*] > *********************************************************** > fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": true, > "msg": "redis process not presently configured with monit", "name": > "redis", "state": "monitored"} > > RUNNING HANDLER [monit : reload monit] > ***************************************** > > RUNNING HANDLER [redis : restart redis] > **************************************** > > RUNNING HANDLER [redis : restart rsyslog] > ************************************** > to retry, use: --limit @/vagrant/ansible/deploy_stack.retry > > PLAY RECAP > ********************************************************************* > 165.227.172.214 : ok=17 changed=10 unreachable=0 > failed=1 > > *Issue 4:* > TASK [touchforms : Touchforms user] > ******************************************** > An exception occurred during task execution. To see the full traceback, > use -vvv. The error was: ImportError: No module named django > fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {"changed": false, > "failed": true, "module_stderr": "Traceback (most recent call last):\n > File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 144, in > \n main()\n File > \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n > user.create_user()\n File > \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in > create_user\n superuser=repr(self.superuser),\n File > \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n > return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File > \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n > self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", > line 792, in wait\n self.handle_command_exit_code(exit_code)\n File > \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in > handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: > /home/cchq/www/dev/current/python_env/bin/python manage.py shell > --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n > File \"manage.py\", line 9, in \n import django\nImportError: No > module named django\n\n", "module_stdout": "Traceback (most recent call > last):\n File \"manage.py\", line 9, in \n import > django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} > to retry, use: --limit @/vagrant/ansible/deploy_stack.retry > > Possible solution: Here, we need to SSH in and then: > # su - cchq > # cd www/dev/current > # source python_env/bin/activate > # pip install -r requirements/requirements.txt > > At this point the whole ansible playbook succeeds, but when we visit our > IP, we get the maintenance page and see this in the nginx logs: > 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111: > Connection refused) while connecting to upstream, client: 186.106.251.211, > server: 165.227.172.214, request: "GET /favicon.ico HTTP/1.1", upstream: " > http://165.227.172.214:9010/favicon.ico", host: "165.227.172.214", > referrer: "https://165.227.172.214/solutions/" > > After activating the python_env we run runserver as `cchq`: > ./manage.py runserver 0.0.0.0:9010 > > File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection > connection = Database.connect(**conn_params) > File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect > conn = _connect(dsn, connection_factory=connection_factory, **kwasync) > django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server > > > At this point, we're wondering: > > 1. Why isn't the server running itself? > 2. And how do we get it to run? > > Best, > Taylor >

Hi Taylor

Our general process is as follows:

  1. Configure blank VMs (just OS)
  2. Create inventory file and vars files
  3. Run ansible deploy - there are often a few hiccoughs here since we
    don’t do fresh installs that often
  4. Once everything is setup we deploy our code with fabric scripts
    https://github.com/dimagi/commcare-hq-deploy as follows

fab deploy

environment is the name of an inventory file here:
https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory

This also makes use of this ‘environments.yml’ file which tells the
deploy scripts which services to run where and a few other things:
https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml

  1. That deploy will checkout the latest code, do the static file
    compression etc and also create the supervisor files needed to run the
    servers.

We’ve recently made some improvements to our couchdb setup (you should use
couchdb2). I’ve linked them in comments on your PR.

We are about to do a whole new cluster setup so it’s likely that there will
be some more changes coming soon.

Re the issues:

  1. Switch to using couchdb2
    2&3. Resolved in latest master + this PR (
    https://github.com/dimagi/commcarehq-ansible/pull/971)
  2. The virtual env should have already be setup by the deploy_commcarehq
    playbook which should execute prior to the touchforms playbook. Also
    touchforms is only necessary if you’re going to be doing sms surveys.

Re the encrypted drives. We run the deploy_stack playbook with
’after-reboot’ tag limited to the rebooted host. This should remount the
encrypted drive and perform a few other actions.

I hope that helps and thanks for the feedback!

Simon Kelly
Director of Server Engineer | Dimagi

··· On 5 October 2017 at 17:36, wrote:

Update: Rory found that one issue lay in the encrypted fs stuff. ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us having to reboot
during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py migrate) *and
get:
File “/home/cchq/www/dev/current/python_env/local/lib/python2.
7/site-packages/botocore/client.py”, line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when
calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I’m fairly certain I’ve configured my
bucket properly and granted access via the access key and secret. (These
are not part of version control in the shared repo, of course.) Will update
as we go.

FWIW, python manage.py compress fails because it can’t find the Font
Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/
registration/templates/registration/domain_request.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles’
or with staticfiles.

On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote:

Hey guys,

Hope all is well. Let me preface this with a thank you—I know you’ve got
a lot going on and don’t rely on ansible monolith deployments for your core
work, so I realize that any help you provide here is going above and
beyond. Thank you for that!

My objective is to get ansible-playbook -i inventories/monolith -u root
-e ‘@vars/dev/dev_private.yml’ -e '@vars/dev/dev_public.yml’
deploy_stack.yml
running on a freshly provisioned Ubuntu 14.04.5 LTS
(GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory.

While I think that’s a solid goal for the whole CommCare open-source
community, I’d like to disclose that we’ve also got a client at Open
Function that wants to connect CommCare to another system using OpenFn, but
CommCare needs to be hosted on their servers due to regulatory issues.

Note that we made a couple of changes vagrant and edited some ansible
scripts. You can see this work here: https://github.com/rorym
ckinley/commcare-sandbox/pull/1/files. One significant change is that we
are running the vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]


ok: [165.227.172.214] => (item={u’username’: u’commcarehq’, u’name’:
u’commcarehq’, u’is_https’: False, u’host’: u’165.227.172.214’,
u’password’: u’commcarehq’, u’port’: 5984})
failed: [165.227.172.214] (item={u’username’: u’commcarehq’, u’name’:
u’commcarehq__users’, u’is_https’: False, u’host’: u’165.227.172.214’,
u’password’: u’commcarehq’, u’port’: 5984}) => {“cache_control”:
“must-revalidate”, “content”: “{“error”:“unauthorized”,“reason”:“You
are not a server admin.”}\n”, “content_length”: “64”, “content_type”:
“text/plain; charset=utf-8”, “date”: “Thu, 05 Oct 2017 11:10:34 GMT”,
“failed”: true, “item”: {“host”: “165.227.172.214”, “is_https”: false,
“name”: “commcarehq__users”, “password”: “commcarehq”, “port”: 5984,
“username”: “commcarehq”}, “msg”: “Status code was not [200]: HTTP Error
401: Unauthorized”, “redirected”: false, “server”: “CouchDB/1.6.1 (Erlang
OTP/R16B03)”, “status”: 401, “url”: “http://165.227.172.214:5984/_
config/admins/commcarehq”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ************************************************************


165.227.172.214 : ok=135 changed=90 unreachable=0
failed=1

Possible solution 1: This task runs twice, but each user in "items"
has the same username and password. The failure can be stepped over, as we
don’t need to (and can’t) set up two different couchdb users with
commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails. After I reboot
the system and start monit manually they pass and redis is running, but
couchdb2 still shows “Execution failed”. After another system reboot, and
manually starting monit, both now show as running and being monitored.

monit status: Process 'couchdb2’
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017 11:59:49

TASK [couchdb2 : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”: true,
“msg”: “couchdb2 process not presently configured with monit”, “name”:
“couchdb2”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit] ******************************


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ************************************************************


165.227.172.214 : ok=36 changed=20 unreachable=0
failed=1

TASK [redis : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”: true,
“msg”: “redis process not presently configured with monit”, “name”:
“redis”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit] ******************************


RUNNING HANDLER [redis : restart redis] ******************************


RUNNING HANDLER [redis : restart rsyslog] ******************************


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ************************************************************


165.227.172.214 : ok=17 changed=10 unreachable=0
failed=1

Issue 4:
TASK [touchforms : Touchforms user] ******************************


An exception occurred during task execution. To see the full traceback,
use -vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {“changed”:
false, “failed”: true, “module_stderr”: “Traceback (most recent call
last):\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 144, in \n main()\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 125, in main\n user.create_user()\n File
”/tmp/ansible_iUft9p/ansible_module_django_user.py", line 84, in
create_user\n superuser=repr(self.superuser),\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 1427, in
call\n return RunningCommand(cmd, call_args, stdin, stdout,
stderr)\n File “/usr/local/lib/python2.7/dist-packages/sh.py”, line
774, in init\n self.wait()\n File “/usr/local/lib/python2.7/dist-packages/sh.py”,
line 792, in wait\n self.handle_command_exit_code(exit_code)\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 815, in
handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN:
/home/cchq/www/dev/current/python_env/bin/python manage.py shell
–plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n
File “manage.py”, line 9, in \n import django\nImportError: No
module named django\n\n", “module_stdout”: “Traceback (most recent call
last):\n File “manage.py”, line 9, in \n import
django\nImportError: No module named django\n\n”, “msg”: “MODULE FAILURE”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when we visit our
IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111:
Connection refused) while connecting to upstream, client: 186.106.251.211,
server: 165.227.172.214, request: “GET /favicon.ico HTTP/1.1”, upstream: “
http://165.227.172.214:9010/favicon.ico”, host: “165.227.172.214”,
referrer: “https://165.227.172.214/solutions/

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py”, line 176, in get_new_connection
connection = Database.connect(**conn_params)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py”, line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we’re wondering:

  1. Why isn’t the server running itself?
  2. And how do we get it to run?

Best,
Taylor


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hey Simon, thanks so much. We’ve got the fab deploy scripts running now
(albeit with lots of warning, sudo received non-zero exit codes*) and
finishing successfully. When we ssh into our box, got to the newly created
release, activate python and run runserver however, we get a server to
start but it throws this 500** whenever it’s accessed via the web:

OfflineGenerationError: You have offline compression enabled but key
"89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest. You
may need to run “python manage.py compress”.
2017-10-09 16:15:37,638 ERROR “GET /accounts/login/ HTTP/1.0” 500 59

When running compress, we get this font-awesome package error:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/openmrs/templates/openmrs/importers.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT ‘/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles’ or
with staticfiles.

Have you bumped into this before? Thanks!

*The non-zero exit codes all look pretty much like this:
[165.227.172.214] sudo:
/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything
–check
[165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not configured
(logging is disabled). Please see the documentation for more information.
[165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG
[165.227.172.214] out:

Warning: sudo() received nonzero return code 1 while executing
’/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything
–check’!

**Here’s the full 500 error:
https://gist.github.com/taylordowns2000/cebc671a34431826a326b66cadccee9d

··· On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote: > > Hi Taylor > > Our general process is as follows: > > 1. Configure blank VMs (just OS) > 2. Create inventory file and vars files > 3. Run ansible deploy - there are often a few hiccoughs here since we > don't do fresh installs that often > 4. Once everything is setup we deploy our code with fabric scripts > as follows > > fab deploy > > environment is the name of an inventory file here: > https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory > > This also makes use of this 'environments.yml' file which tells the > deploy scripts which services to run where and a few other things: > https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml > > 5. That deploy will checkout the latest code, do the static file > compression etc and also create the supervisor files needed to run the > servers. > > > We've recently made some improvements to our couchdb setup (you should use > couchdb2). I've linked them in comments on your PR. > > We are about to do a whole new cluster setup so it's likely that there > will be some more changes coming soon. > > Re the issues: > 1. Switch to using couchdb2 > 2&3. Resolved in latest master + this PR ( > https://github.com/dimagi/commcarehq-ansible/pull/971) > 4. The virtual env should have already be setup by the deploy_commcarehq > playbook which should execute prior to the touchforms playbook. Also > touchforms is only necessary if you're going to be doing sms surveys. > > Re the encrypted drives. We run the deploy_stack playbook with > 'after-reboot' tag limited to the rebooted host. This should remount the > encrypted drive and perform a few other actions. > > I hope that helps and thanks for the feedback! > > Simon Kelly > Director of Server Engineer | Dimagi > > On 5 October 2017 at 17:36, <tay...@openfn.org > wrote: > >> Update: Rory found that one issue lay in the encrypted fs stuff. ran: >> >> /etc/init.d/postgresql start >> /etc/init.d/pgbouncer stop >> /etc/init.d/pgbouncer start >> >> >> and we can run the server. This was probably due to us having to reboot >> during the deployment process. >> >> We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py migrate) *and >> get: >> File >> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py", >> line 599, in _make_api_call >> raise error_class(parsed_response, operation_name) >> botocore.exceptions.ClientError: An error occurred (AccessDenied) when >> calling the ListObjects operation: Access Denied >> >> This appears to be an S3 issue, but I'm fairly certain I've configured my >> bucket properly and granted access via the access key and secret. (These >> are not part of version control in the shared repo, of course.) Will update >> as we go. >> >> FWIW, *python manage.py compress* fails because it can't find the Font >> Awesome less file: >> CommandError: An error occurred during rendering >> /home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html: >> 'font-awesome/less/font-awesome.less' could not be found in the >> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles' or >> with staticfiles. >> >> >> On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: >>> >>> Hey guys, >>> >>> Hope all is well. Let me preface this with a thank you—I know you've got >>> a lot going on and don't rely on ansible monolith deployments for your core >>> work, so I realize that any help you provide here is going above and >>> beyond. Thank you for that! >>> >>> My objective is to get *ansible-playbook -i inventories/monolith -u >>> root -e '@vars/dev/dev_private.yml' -e '@vars/dev/dev_public.yml' >>> deploy_stack.yml* running on a freshly provisioned Ubuntu 14.04.5 LTS >>> (GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory. >>> >>> While I think that's a solid goal for the whole CommCare open-source >>> community, I'd like to disclose that we've also got a client at Open >>> Function that wants to connect CommCare to another system using OpenFn, but >>> CommCare needs to be hosted on their servers due to regulatory issues. >>> >>> Note that we made a couple of changes vagrant and edited some ansible >>> scripts. You can see this work here: >>> https://github.com/rorymckinley/commcare-sandbox/pull/1/files. One >>> significant change is that we are running the vagrant stuff as root. >>> >>> To the issues: >>> >>> *Issue #1:* >>> TASK [couchdb : Set CouchDB username and password] >>> ***************************** >>> ok: [165.227.172.214] => (item={u'username': u'commcarehq', u'name': >>> u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', >>> u'password': u'commcarehq', u'port': 5984}) >>> failed: [165.227.172.214] (item={u'username': u'commcarehq', u'name': >>> u'commcarehq__users', u'is_https': False, u'host': u'165.227.172.214', >>> u'password': u'commcarehq', u'port': 5984}) => {"cache_control": >>> "must-revalidate", "content": "{\"error\":\"unauthorized\",\"reason\":\"You >>> are not a server admin.\"}\n", "content_length": "64", "content_type": >>> "text/plain; charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", >>> "failed": true, "item": {"host": "165.227.172.214", "is_https": false, >>> "name": "commcarehq__users", "password": "commcarehq", "port": 5984, >>> "username": "commcarehq"}, "msg": "Status code was not [200]: HTTP Error >>> 401: Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang >>> OTP/R16B03)", "status": 401, "url": " >>> http://165.227.172.214:5984/_config/admins/commcarehq"} >>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>> >>> PLAY RECAP >>> ********************************************************************* >>> 165.227.172.214 : ok=135 changed=90 unreachable=0 >>> failed=1 >>> >>> *Possible solution 1:* This task runs twice, but each user in "items" >>> has the same username and password. The failure can be stepped over, as we >>> don't need to (and can't) set up two different couchdb users with >>> commcarehq:commcarehq on the same box. >>> >>> *Issue #2&3: *For both couchdb2 and redis, monit fails. After I reboot >>> the system and start monit manually they pass and redis is running, but >>> couchdb2 still shows "Execution failed". After another system reboot, and >>> manually starting monit, both now show as running and being monitored. >>> >>> monit status: Process 'couchdb2' >>> status Execution failed >>> monitoring status Monitored >>> data collected Thu, 05 Oct 2017 11:59:49 >>> >>> TASK [*couchdb2 : monit*] >>> ******************************************************** >>> fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": true, >>> "msg": "couchdb2 process not presently configured with monit", "name": >>> "couchdb2", "state": "monitored"} >>> >>> RUNNING HANDLER [monit : reload monit] >>> ***************************************** >>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>> >>> PLAY RECAP >>> ********************************************************************* >>> 165.227.172.214 : ok=36 changed=20 unreachable=0 >>> failed=1 >>> >>> TASK [*redis : monit*] >>> *********************************************************** >>> fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": true, >>> "msg": "redis process not presently configured with monit", "name": >>> "redis", "state": "monitored"} >>> >>> RUNNING HANDLER [monit : reload monit] >>> ***************************************** >>> >>> RUNNING HANDLER [redis : restart redis] >>> **************************************** >>> >>> RUNNING HANDLER [redis : restart rsyslog] >>> ************************************** >>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>> >>> PLAY RECAP >>> ********************************************************************* >>> 165.227.172.214 : ok=17 changed=10 unreachable=0 >>> failed=1 >>> >>> *Issue 4:* >>> TASK [touchforms : Touchforms user] >>> ******************************************** >>> An exception occurred during task execution. To see the full traceback, >>> use -vvv. The error was: ImportError: No module named django >>> fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {"changed": >>> false, "failed": true, "module_stderr": "Traceback (most recent call >>> last):\n File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line >>> 144, in \n main()\n File >>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n >>> user.create_user()\n File >>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in >>> create_user\n superuser=repr(self.superuser),\n File >>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n >>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File >>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n >>> self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", >>> line 792, in wait\n self.handle_command_exit_code(exit_code)\n File >>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in >>> handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: >>> /home/cchq/www/dev/current/python_env/bin/python manage.py shell >>> --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n >>> File \"manage.py\", line 9, in \n import django\nImportError: No >>> module named django\n\n", "module_stdout": "Traceback (most recent call >>> last):\n File \"manage.py\", line 9, in \n import >>> django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} >>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>> >>> Possible solution: Here, we need to SSH in and then: >>> # su - cchq >>> # cd www/dev/current >>> # source python_env/bin/activate >>> # pip install -r requirements/requirements.txt >>> >>> At this point the whole ansible playbook succeeds, but when we visit our >>> IP, we get the maintenance page and see this in the nginx logs: >>> 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111: >>> Connection refused) while connecting to upstream, client: 186.106.251.211, >>> server: 165.227.172.214, request: "GET /favicon.ico HTTP/1.1", upstream: " >>> http://165.227.172.214:9010/favicon.ico", host: "165.227.172.214", >>> referrer: "https://165.227.172.214/solutions/" >>> >>> After activating the python_env we run runserver as `cchq`: >>> ./manage.py runserver 0.0.0.0:9010 >>> >>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection >>> connection = Database.connect(**conn_params) >>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect >>> conn = _connect(dsn, connection_factory=connection_factory, **kwasync) >>> django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server >>> >>> >>> At this point, we're wondering: >>> >>> 1. Why isn't the server running itself? >>> 2. And how do we get it to run? >>> >>> Best, >>> Taylor >>> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "CommCare Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to commcare-developers+unsubscribe@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > >

Simon, my last update for the day:

I’ve got the server running (and serving html!
https://fd-files-production.s3.amazonaws.com/214131/TeaNBXNn9A1b2cZcaMnhyw?X-Amz-Expires=300&X-Amz-Date=20171009T212816Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIA2QBI5WP5HA3ZEA/20171009/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=56ec6111d2a96ced90fded9f16fc1c6f473796894c6da08c157a7ff3c0e870ae)
when I follow LESS option 1:


.

I cannot get compress to run using either option 2 or option 3, and with
option 1 (as you can probably see from the linked photo) I’m not actually
getting the static assets I need from a CDN.

The error on my compress command is no longer on motech, it’s now on
"hqadmin":
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqadmin/templates/hqadmin/loadtest.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT ‘/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles’ or
with staticfiles.

Thanks again for all your help. Speak soon!

Taylor

P.S. — In an effort to make this repeatable, we’ve got a fork of the
ansible repo going that includes a git submodule with your commcare-deploy
repo. Our goal is to get this down to a single git clone and a few shell
commands! Would love any feedback on the directory structure you use
locally.

··· On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote: > > Hey Simon, thanks so much. We've got the fab deploy scripts running now > (albeit with lots of warning, sudo received non-zero exit codes*) and > finishing successfully. When we ssh into our box, got to the newly created > release, activate python and run `runserver` however, we get a server to > start but it throws this 500** whenever it's accessed via the web: > > OfflineGenerationError: You have offline compression enabled but key > "89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest. You > may need to run "python manage.py compress". > 2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0" 500 59 > > When running compress, we get this font-awesome package error: > CommandError: An error occurred during rendering > /home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/openmrs/templates/openmrs/importers.html: > 'font-awesome/less/font-awesome.less' could not be found in the > COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles' or > with staticfiles. > > Have you bumped into this before? Thanks! > > **The non-zero exit codes all look pretty much like this:* > [165.227.172.214] sudo: > /home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python > /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything > --check > [165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not > configured (logging is disabled). Please see the documentation for more > information. > [165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG > [165.227.172.214] out: > > > Warning: sudo() received nonzero return code 1 while executing > '/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python > /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything > --check'! > > ***Here's the full 500 error:* > https://gist.github.com/taylordowns2000/cebc671a34431826a326b66cadccee9d > > > > On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote: >> >> Hi Taylor >> >> Our general process is as follows: >> >> 1. Configure blank VMs (just OS) >> 2. Create inventory file and vars files >> 3. Run ansible deploy - there are often a few hiccoughs here since we >> don't do fresh installs that often >> 4. Once everything is setup we deploy our code with fabric scripts >> as follows >> >> fab deploy >> >> environment is the name of an inventory file here: >> https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory >> >> This also makes use of this 'environments.yml' file which tells the >> deploy scripts which services to run where and a few other things: >> https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml >> >> 5. That deploy will checkout the latest code, do the static file >> compression etc and also create the supervisor files needed to run the >> servers. >> >> >> We've recently made some improvements to our couchdb setup (you should >> use couchdb2). I've linked them in comments on your PR. >> >> We are about to do a whole new cluster setup so it's likely that there >> will be some more changes coming soon. >> >> Re the issues: >> 1. Switch to using couchdb2 >> 2&3. Resolved in latest master + this PR ( >> https://github.com/dimagi/commcarehq-ansible/pull/971) >> 4. The virtual env should have already be setup by the deploy_commcarehq >> playbook which should execute prior to the touchforms playbook. Also >> touchforms is only necessary if you're going to be doing sms surveys. >> >> Re the encrypted drives. We run the deploy_stack playbook with >> 'after-reboot' tag limited to the rebooted host. This should remount the >> encrypted drive and perform a few other actions. >> >> I hope that helps and thanks for the feedback! >> >> Simon Kelly >> Director of Server Engineer | Dimagi >> >> On 5 October 2017 at 17:36, wrote: >> >>> Update: Rory found that one issue lay in the encrypted fs stuff. ran: >>> >>> /etc/init.d/postgresql start >>> /etc/init.d/pgbouncer stop >>> /etc/init.d/pgbouncer start >>> >>> >>> and we can run the server. This was probably due to us having to reboot >>> during the deployment process. >>> >>> We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py migrate) *and >>> get: >>> File >>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py", >>> line 599, in _make_api_call >>> raise error_class(parsed_response, operation_name) >>> botocore.exceptions.ClientError: An error occurred (AccessDenied) when >>> calling the ListObjects operation: Access Denied >>> >>> This appears to be an S3 issue, but I'm fairly certain I've configured >>> my bucket properly and granted access via the access key and secret. (These >>> are not part of version control in the shared repo, of course.) Will update >>> as we go. >>> >>> FWIW, *python manage.py compress* fails because it can't find the Font >>> Awesome less file: >>> CommandError: An error occurred during rendering >>> /home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html: >>> 'font-awesome/less/font-awesome.less' could not be found in the >>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles' or >>> with staticfiles. >>> >>> >>> On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: >>>> >>>> Hey guys, >>>> >>>> Hope all is well. Let me preface this with a thank you—I know you've >>>> got a lot going on and don't rely on ansible monolith deployments for your >>>> core work, so I realize that any help you provide here is going above and >>>> beyond. Thank you for that! >>>> >>>> My objective is to get *ansible-playbook -i inventories/monolith -u >>>> root -e '@vars/dev/dev_private.yml' -e '@vars/dev/dev_public.yml' >>>> deploy_stack.yml* running on a freshly provisioned Ubuntu 14.04.5 LTS >>>> (GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory. >>>> >>>> While I think that's a solid goal for the whole CommCare open-source >>>> community, I'd like to disclose that we've also got a client at Open >>>> Function that wants to connect CommCare to another system using OpenFn, but >>>> CommCare needs to be hosted on their servers due to regulatory issues. >>>> >>>> Note that we made a couple of changes vagrant and edited some ansible >>>> scripts. You can see this work here: >>>> https://github.com/rorymckinley/commcare-sandbox/pull/1/files. One >>>> significant change is that we are running the vagrant stuff as root. >>>> >>>> To the issues: >>>> >>>> *Issue #1:* >>>> TASK [couchdb : Set CouchDB username and password] >>>> ***************************** >>>> ok: [165.227.172.214] => (item={u'username': u'commcarehq', u'name': >>>> u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', >>>> u'password': u'commcarehq', u'port': 5984}) >>>> failed: [165.227.172.214] (item={u'username': u'commcarehq', u'name': >>>> u'commcarehq__users', u'is_https': False, u'host': u'165.227.172.214', >>>> u'password': u'commcarehq', u'port': 5984}) => {"cache_control": >>>> "must-revalidate", "content": "{\"error\":\"unauthorized\",\"reason\":\"You >>>> are not a server admin.\"}\n", "content_length": "64", "content_type": >>>> "text/plain; charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", >>>> "failed": true, "item": {"host": "165.227.172.214", "is_https": false, >>>> "name": "commcarehq__users", "password": "commcarehq", "port": 5984, >>>> "username": "commcarehq"}, "msg": "Status code was not [200]: HTTP Error >>>> 401: Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang >>>> OTP/R16B03)", "status": 401, "url": " >>>> http://165.227.172.214:5984/_config/admins/commcarehq"} >>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>> >>>> PLAY RECAP >>>> ********************************************************************* >>>> 165.227.172.214 : ok=135 changed=90 unreachable=0 >>>> failed=1 >>>> >>>> *Possible solution 1:* This task runs twice, but each user in "items" >>>> has the same username and password. The failure can be stepped over, as we >>>> don't need to (and can't) set up two different couchdb users with >>>> commcarehq:commcarehq on the same box. >>>> >>>> *Issue #2&3: *For both couchdb2 and redis, monit fails. After I reboot >>>> the system and start monit manually they pass and redis is running, but >>>> couchdb2 still shows "Execution failed". After another system reboot, and >>>> manually starting monit, both now show as running and being monitored. >>>> >>>> monit status: Process 'couchdb2' >>>> status Execution failed >>>> monitoring status Monitored >>>> data collected Thu, 05 Oct 2017 11:59:49 >>>> >>>> TASK [*couchdb2 : monit*] >>>> ******************************************************** >>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": true, >>>> "msg": "couchdb2 process not presently configured with monit", "name": >>>> "couchdb2", "state": "monitored"} >>>> >>>> RUNNING HANDLER [monit : reload monit] >>>> ***************************************** >>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>> >>>> PLAY RECAP >>>> ********************************************************************* >>>> 165.227.172.214 : ok=36 changed=20 unreachable=0 >>>> failed=1 >>>> >>>> TASK [*redis : monit*] >>>> *********************************************************** >>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": true, >>>> "msg": "redis process not presently configured with monit", "name": >>>> "redis", "state": "monitored"} >>>> >>>> RUNNING HANDLER [monit : reload monit] >>>> ***************************************** >>>> >>>> RUNNING HANDLER [redis : restart redis] >>>> **************************************** >>>> >>>> RUNNING HANDLER [redis : restart rsyslog] >>>> ************************************** >>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>> >>>> PLAY RECAP >>>> ********************************************************************* >>>> 165.227.172.214 : ok=17 changed=10 unreachable=0 >>>> failed=1 >>>> >>>> *Issue 4:* >>>> TASK [touchforms : Touchforms user] >>>> ******************************************** >>>> An exception occurred during task execution. To see the full traceback, >>>> use -vvv. The error was: ImportError: No module named django >>>> fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {"changed": >>>> false, "failed": true, "module_stderr": "Traceback (most recent call >>>> last):\n File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line >>>> 144, in \n main()\n File >>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n >>>> user.create_user()\n File >>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in >>>> create_user\n superuser=repr(self.superuser),\n File >>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n >>>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File >>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n >>>> self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", >>>> line 792, in wait\n self.handle_command_exit_code(exit_code)\n File >>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in >>>> handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: >>>> /home/cchq/www/dev/current/python_env/bin/python manage.py shell >>>> --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n >>>> File \"manage.py\", line 9, in \n import django\nImportError: No >>>> module named django\n\n", "module_stdout": "Traceback (most recent call >>>> last):\n File \"manage.py\", line 9, in \n import >>>> django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} >>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>> >>>> Possible solution: Here, we need to SSH in and then: >>>> # su - cchq >>>> # cd www/dev/current >>>> # source python_env/bin/activate >>>> # pip install -r requirements/requirements.txt >>>> >>>> At this point the whole ansible playbook succeeds, but when we visit >>>> our IP, we get the maintenance page and see this in the nginx logs: >>>> 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111: >>>> Connection refused) while connecting to upstream, client: 186.106.251.211, >>>> server: 165.227.172.214, request: "GET /favicon.ico HTTP/1.1", upstream: " >>>> http://165.227.172.214:9010/favicon.ico", host: "165.227.172.214", >>>> referrer: "https://165.227.172.214/solutions/" >>>> >>>> After activating the python_env we run runserver as `cchq`: >>>> ./manage.py runserver 0.0.0.0:9010 >>>> >>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection >>>> connection = Database.connect(**conn_params) >>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect >>>> conn = _connect(dsn, connection_factory=connection_factory, **kwasync) >>>> django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server >>>> >>>> >>>> At this point, we're wondering: >>>> >>>> 1. Why isn't the server running itself? >>>> 2. And how do we get it to run? >>>> >>>> Best, >>>> Taylor >>>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "CommCare Developers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to commcare-developers+unsubscribe@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >>

Hi Taylor,

About that compress error: Have you run bower update recently? I’d run
that, verify that the
file ./bower_components/font-awesome/less/font-awesome.less does indeed
exist afterwards, and then run collectstatic and compress again.

You can also double-check that your STATICFILES_DIRS contains
bower_components (it should be set up by
https://github.com/dimagi/commcare-hq/blob/master/settings.py#L87-L97)

-Jenny

··· On Mon, Oct 9, 2017 at 5:36 PM, wrote:

Simon, my last update for the day:

I’ve got the server running (and serving html!
https://fd-files-production.s3.amazonaws.com/214131/TeaNBXNn9A1b2cZcaMnhyw?X-Amz-Expires=300&X-Amz-Date=20171009T212816Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIA2QBI5WP5HA3ZEA/20171009/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=56ec6111d2a96ced90fded9f16fc1c6f473796894c6da08c157a7ff3c0e870ae)
when I follow LESS option 1: https://github.com/dimagi/
commcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you.

I cannot get compress to run using either option 2 or option 3, and
with option 1 (as you can probably see from the linked photo) I’m not
actually getting the static assets I need from a CDN.

The error on my compress command is no longer on motech, it’s now on
"hqadmin":
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/
hqadmin/templates/hqadmin/loadtest.html: 'font-awesome/less/font-awesome.less’
could not be found in the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles’
or with staticfiles.

Thanks again for all your help. Speak soon!

Taylor

P.S. — In an effort to make this repeatable, we’ve got a fork of the
ansible repo going that includes a git submodule with your commcare-deploy
repo. Our goal is to get this down to a single git clone and a few shell
commands! Would love any feedback on the directory structure you use
locally.

On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote:

Hey Simon, thanks so much. We’ve got the fab deploy scripts running now
(albeit with lots of warning, sudo received non-zero exit codes*) and
finishing successfully. When we ssh into our box, got to the newly created
release, activate python and run runserver however, we get a server to
start but it throws this 500** whenever it’s accessed via the web:

OfflineGenerationError: You have offline compression enabled but key
"89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest. You
may need to run “python manage.py compress”.
2017-10-09 16:15:37,638 ERROR “GET /accounts/login/ HTTP/1.0” 500 59

When running compress, we get this font-awesome package error:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/
openmrs/templates/openmrs/importers.html: 'font-awesome/less/font-awesome.less’
could not be found in the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles’
or with staticfiles.

Have you bumped into this before? Thanks!

*The non-zero exit codes all look pretty much like this:
[165.227.172.214] sudo: /home/cchq/www/dev/releases/20
17-10-09_16.04/python_env/bin/python /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check
[165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not
configured (logging is disabled). Please see the documentation for more
information.
[165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG
[165.227.172.214] out:

Warning: sudo() received nonzero return code 1 while executing
’/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check’!

**Here’s the full 500 error: https://gist.github.com
/taylordowns2000/cebc671a34431826a326b66cadccee9d

On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote:

Hi Taylor

Our general process is as follows:

  1. Configure blank VMs (just OS)
  2. Create inventory file and vars files
  3. Run ansible deploy - there are often a few hiccoughs here since
    we don’t do fresh installs that often
  4. Once everything is setup we deploy our code with fabric scripts
    https://github.com/dimagi/commcare-hq-deploy as follows

fab deploy

environment is the name of an inventory file here:
https://github.com/dimagi/commcare-hq-deploy/tree/
master/fab/inventory
https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory

This also makes use of this ‘environments.yml’ file which tells the
deploy scripts which services to run where and a few other things:
https://github.com/dimagi/commcare-hq-deploy/blob/
master/fab/environments.yml
https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml

  1. That deploy will checkout the latest code, do the static file
    compression etc and also create the supervisor files needed to run the
    servers.

We’ve recently made some improvements to our couchdb setup (you should
use couchdb2). I’ve linked them in comments on your PR.

We are about to do a whole new cluster setup so it’s likely that there
will be some more changes coming soon.

Re the issues:

  1. Switch to using couchdb2
    2&3. Resolved in latest master + this PR (https://github.com/dimagi/com
    mcarehq-ansible/pull/971)
  2. The virtual env should have already be setup by the deploy_commcarehq
    playbook which should execute prior to the touchforms playbook. Also
    touchforms is only necessary if you’re going to be doing sms surveys.

Re the encrypted drives. We run the deploy_stack playbook with
’after-reboot’ tag limited to the rebooted host. This should remount the
encrypted drive and perform a few other actions.

I hope that helps and thanks for the feedback!

Simon Kelly
Director of Server Engineer | Dimagi

On 5 October 2017 at 17:36, tay...@openfn.org wrote:

Update: Rory found that one issue lay in the encrypted fs stuff. ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us having to reboot
during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py migrate) *and
get:
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/botocore/client.py”, line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when
calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I’m fairly certain I’ve configured
my bucket properly and granted access via the access key and secret. (These
are not part of version control in the shared repo, of course.) Will update
as we go.

FWIW, python manage.py compress fails because it can’t find the Font
Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/reg
istration/templates/registration/domain_request.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles’
or with staticfiles.

On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote:

Hey guys,

Hope all is well. Let me preface this with a thank you—I know you’ve
got a lot going on and don’t rely on ansible monolith deployments for your
core work, so I realize that any help you provide here is going above and
beyond. Thank you for that!

My objective is to get ansible-playbook -i inventories/monolith -u
root -e ‘@vars/dev/dev_private.yml’ -e '@vars/dev/dev_public.yml’
deploy_stack.yml
running on a freshly provisioned Ubuntu 14.04.5 LTS
(GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory.

While I think that’s a solid goal for the whole CommCare open-source
community, I’d like to disclose that we’ve also got a client at Open
Function that wants to connect CommCare to another system using OpenFn, but
CommCare needs to be hosted on their servers due to regulatory issues.

Note that we made a couple of changes vagrant and edited some ansible
scripts. You can see this work here: https://github.com/rorym
ckinley/commcare-sandbox/pull/1/files. One significant change is that
we are running the vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]


ok: [165.227.172.214] => (item={u’username’: u’commcarehq’, u’name’:
u’commcarehq’, u’is_https’: False, u’host’: u’165.227.172.214’,
u’password’: u’commcarehq’, u’port’: 5984})
failed: [165.227.172.214] (item={u’username’: u’commcarehq’, u’name’:
u’commcarehq__users’, u’is_https’: False, u’host’: u’165.227.172.214’,
u’password’: u’commcarehq’, u’port’: 5984}) => {“cache_control”:
“must-revalidate”, “content”: “{“error”:“unauthorized”,“reason”:“You
are not a server admin.”}\n”, “content_length”: “64”, “content_type”:
“text/plain; charset=utf-8”, “date”: “Thu, 05 Oct 2017 11:10:34 GMT”,
“failed”: true, “item”: {“host”: “165.227.172.214”, “is_https”: false,
“name”: “commcarehq__users”, “password”: “commcarehq”, “port”: 5984,
“username”: “commcarehq”}, “msg”: “Status code was not [200]: HTTP Error
401: Unauthorized”, “redirected”: false, “server”: “CouchDB/1.6.1 (Erlang
OTP/R16B03)”, “status”: 401, “url”: “http://165.227.172.214:5984/_
config/admins/commcarehq”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=135 changed=90 unreachable=0
failed=1

Possible solution 1: This task runs twice, but each user in "items"
has the same username and password. The failure can be stepped over, as we
don’t need to (and can’t) set up two different couchdb users with
commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails. After I
reboot the system and start monit manually they pass and redis is running,
but couchdb2 still shows “Execution failed”. After another system reboot,
and manually starting monit, both now show as running and being monitored.

monit status: Process 'couchdb2’
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017 11:59:49

TASK [couchdb2 : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”:
true, “msg”: “couchdb2 process not presently configured with monit”,
“name”: “couchdb2”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit] ******************************


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=36 changed=20 unreachable=0
failed=1

TASK [redis : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”:
true, “msg”: “redis process not presently configured with monit”, “name”:
“redis”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit] ******************************


RUNNING HANDLER [redis : restart redis] ******************************


RUNNING HANDLER [redis : restart rsyslog]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=17 changed=10 unreachable=0
failed=1

Issue 4:
TASK [touchforms : Touchforms user] ******************************


An exception occurred during task execution. To see the full
traceback, use -vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {“changed”:
false, “failed”: true, “module_stderr”: “Traceback (most recent call
last):\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 144, in \n main()\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 125, in main\n user.create_user()\n File
”/tmp/ansible_iUft9p/ansible_module_django_user.py", line 84, in
create_user\n superuser=repr(self.superuser),\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 1427, in
call\n return RunningCommand(cmd, call_args, stdin, stdout,
stderr)\n File “/usr/local/lib/python2.7/dist-packages/sh.py”,
line 774, in init\n self.wait()\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 792, in
wait\n self.handle_command_exit_code(exit_code)\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 815, in
handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN:
/home/cchq/www/dev/current/python_env/bin/python manage.py shell
–plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n
File “manage.py”, line 9, in \n import django\nImportError: No
module named django\n\n", “module_stdout”: “Traceback (most recent call
last):\n File “manage.py”, line 9, in \n import
django\nImportError: No module named django\n\n”, “msg”: “MODULE FAILURE”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when we visit
our IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111:
Connection refused) while connecting to upstream, client: 186.106.251.211,
server: 165.227.172.214, request: “GET /favicon.ico HTTP/1.1”, upstream: “
http://165.227.172.214:9010/favicon.ico”, host: “165.227.172.214”,
referrer: “https://165.227.172.214/solutions/

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py”, line 176, in get_new_connection
connection = Database.connect(**conn_params)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py”, line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we’re wondering:

  1. Why isn’t the server running itself?
  2. And how do we get it to run?

Best,
Taylor


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Been offline travelling so sorry for the slow response. Strange that you
get that error if you’re using the fabric deploy script since it should do
a bower update but I’d check what Jenny suggested to make sure.

Re the “sudo received non-zero exit codes” messages, as long as it’s only
for the ‘preindex’ command that should be fine. If there are any other
errors during deploy then it won’t complete. (also PR to remove those
warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393)

Simon Kelly
Director of Server Engineer | Dimagi

··· On 10 October 2017 at 11:27, Jenny Schweers wrote:

Hi Taylor,

About that compress error: Have you run bower update recently? I’d run
that, verify that the file ./bower_components/font-awesome/less/font-awesome.less
does indeed exist afterwards, and then run collectstatic and compress again.

You can also double-check that your STATICFILES_DIRS contains
bower_components (it should be set up by https://github.com/dimagi/
commcare-hq/blob/master/settings.py#L87-L97)

-Jenny

On Mon, Oct 9, 2017 at 5:36 PM, taylor@openfn.org wrote:

Simon, my last update for the day:

I’ve got the server running (and serving html!
https://fd-files-production.s3.amazonaws.com/214131/TeaNBXNn9A1b2cZcaMnhyw?X-Amz-Expires=300&X-Amz-Date=20171009T212816Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIA2QBI5WP5HA3ZEA/20171009/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=56ec6111d2a96ced90fded9f16fc1c6f473796894c6da08c157a7ff3c0e870ae)
when I follow LESS option 1: https://github.com/dimagi/c
ommcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you.

I cannot get compress to run using either option 2 or option 3, and
with option 1 (as you can probably see from the linked photo) I’m not
actually getting the static assets I need from a CDN.

The error on my compress command is no longer on motech, it’s now on
"hqadmin":
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqa
dmin/templates/hqadmin/loadtest.html: 'font-awesome/less/font-awesome.less’
could not be found in the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles’
or with staticfiles.

Thanks again for all your help. Speak soon!

Taylor

P.S. — In an effort to make this repeatable, we’ve got a fork of the
ansible repo going that includes a git submodule with your commcare-deploy
repo. Our goal is to get this down to a single git clone and a few shell
commands! Would love any feedback on the directory structure you use
locally.

On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote:

Hey Simon, thanks so much. We’ve got the fab deploy scripts running now
(albeit with lots of warning, sudo received non-zero exit codes*) and
finishing successfully. When we ssh into our box, got to the newly created
release, activate python and run runserver however, we get a server to
start but it throws this 500** whenever it’s accessed via the web:

OfflineGenerationError: You have offline compression enabled but key
"89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest.
You may need to run “python manage.py compress”.
2017-10-09 16:15:37,638 ERROR “GET /accounts/login/ HTTP/1.0” 500 59

When running compress, we get this font-awesome package error:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/o
penmrs/templates/openmrs/importers.html: 'font-awesome/less/font-awesome.less’
could not be found in the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles’
or with staticfiles.

Have you bumped into this before? Thanks!

*The non-zero exit codes all look pretty much like this:
[165.227.172.214] sudo: /home/cchq/www/dev/releases/20
17-10-09_16.04/python_env/bin/python /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check
[165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not
configured (logging is disabled). Please see the documentation for more
information.
[165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG
[165.227.172.214] out:

Warning: sudo() received nonzero return code 1 while executing
’/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check’!

**Here’s the full 500 error: https://gist.github.com
/taylordowns2000/cebc671a34431826a326b66cadccee9d

On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote:

Hi Taylor

Our general process is as follows:

  1. Configure blank VMs (just OS)
  2. Create inventory file and vars files
  3. Run ansible deploy - there are often a few hiccoughs here since
    we don’t do fresh installs that often
  4. Once everything is setup we deploy our code with fabric scripts
    https://github.com/dimagi/commcare-hq-deploy as follows

fab deploy

environment is the name of an inventory file here:
https://github.com/dimagi/commcare-hq-deploy/tree/mast
er/fab/inventory
https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory

This also makes use of this ‘environments.yml’ file which tells the
deploy scripts which services to run where and a few other things:
https://github.com/dimagi/commcare-hq-deploy/blob/ma
ster/fab/environments.yml
https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml

  1. That deploy will checkout the latest code, do the static file
    compression etc and also create the supervisor files needed to run the
    servers.

We’ve recently made some improvements to our couchdb setup (you should
use couchdb2). I’ve linked them in comments on your PR.

We are about to do a whole new cluster setup so it’s likely that there
will be some more changes coming soon.

Re the issues:

  1. Switch to using couchdb2
    2&3. Resolved in latest master + this PR (https://github.com/dimagi/com
    mcarehq-ansible/pull/971)
  2. The virtual env should have already be setup by
    the deploy_commcarehq playbook which should execute prior to the touchforms
    playbook. Also touchforms is only necessary if you’re going to be doing sms
    surveys.

Re the encrypted drives. We run the deploy_stack playbook with
’after-reboot’ tag limited to the rebooted host. This should remount the
encrypted drive and perform a few other actions.

I hope that helps and thanks for the feedback!

Simon Kelly
Director of Server Engineer | Dimagi

On 5 October 2017 at 17:36, tay...@openfn.org wrote:

Update: Rory found that one issue lay in the encrypted fs stuff. ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us having to
reboot during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py migrate)
*and get:
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/s
ite-packages/botocore/client.py”, line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied)
when calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I’m fairly certain I’ve configured
my bucket properly and granted access via the access key and secret. (These
are not part of version control in the shared repo, of course.) Will update
as we go.

FWIW, python manage.py compress fails because it can’t find the
Font Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/reg
istration/templates/registration/domain_request.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles’
or with staticfiles.

On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote:

Hey guys,

Hope all is well. Let me preface this with a thank you—I know you’ve
got a lot going on and don’t rely on ansible monolith deployments for your
core work, so I realize that any help you provide here is going above and
beyond. Thank you for that!

My objective is to get ansible-playbook -i inventories/monolith -u
root -e ‘@vars/dev/dev_private.yml’ -e '@vars/dev/dev_public.yml’
deploy_stack.yml
running on a freshly provisioned Ubuntu 14.04.5
LTS (GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory.

While I think that’s a solid goal for the whole CommCare open-source
community, I’d like to disclose that we’ve also got a client at Open
Function that wants to connect CommCare to another system using OpenFn, but
CommCare needs to be hosted on their servers due to regulatory issues.

Note that we made a couple of changes vagrant and edited some ansible
scripts. You can see this work here: https://github.com/rorym
ckinley/commcare-sandbox/pull/1/files. One significant change is
that we are running the vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]


ok: [165.227.172.214] => (item={u’username’: u’commcarehq’, u’name’:
u’commcarehq’, u’is_https’: False, u’host’: u’165.227.172.214’,
u’password’: u’commcarehq’, u’port’: 5984})
failed: [165.227.172.214] (item={u’username’: u’commcarehq’, u’name’:
u’commcarehq__users’, u’is_https’: False, u’host’: u’165.227.172.214’,
u’password’: u’commcarehq’, u’port’: 5984}) => {“cache_control”:
“must-revalidate”, “content”: “{“error”:“unauthorized”,“reason”:“You
are not a server admin.”}\n”, “content_length”: “64”, “content_type”:
“text/plain; charset=utf-8”, “date”: “Thu, 05 Oct 2017 11:10:34 GMT”,
“failed”: true, “item”: {“host”: “165.227.172.214”, “is_https”: false,
“name”: “commcarehq__users”, “password”: “commcarehq”, “port”: 5984,
“username”: “commcarehq”}, “msg”: “Status code was not [200]: HTTP Error
401: Unauthorized”, “redirected”: false, “server”: “CouchDB/1.6.1 (Erlang
OTP/R16B03)”, “status”: 401, “url”: “http://165.227.172.214:5984/_
config/admins/commcarehq”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=135 changed=90 unreachable=0
failed=1

Possible solution 1: This task runs twice, but each user in
"items" has the same username and password. The failure can be stepped
over, as we don’t need to (and can’t) set up two different couchdb users
with commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails. After I
reboot the system and start monit manually they pass and redis is running,
but couchdb2 still shows “Execution failed”. After another system reboot,
and manually starting monit, both now show as running and being monitored.

monit status: Process 'couchdb2’
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017 11:59:49

TASK [couchdb2 : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”:
true, “msg”: “couchdb2 process not presently configured with monit”,
“name”: “couchdb2”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit] ******************************


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=36 changed=20 unreachable=0
failed=1

TASK [redis : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”:
true, “msg”: “redis process not presently configured with monit”, “name”:
“redis”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit] ******************************


RUNNING HANDLER [redis : restart redis] ******************************


RUNNING HANDLER [redis : restart rsyslog]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=17 changed=10 unreachable=0
failed=1

Issue 4:
TASK [touchforms : Touchforms user] ******************************


An exception occurred during task execution. To see the full
traceback, use -vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {“changed”:
false, “failed”: true, “module_stderr”: “Traceback (most recent call
last):\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 144, in \n main()\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 125, in main\n user.create_user()\n File
”/tmp/ansible_iUft9p/ansible_module_django_user.py", line 84, in
create_user\n superuser=repr(self.superuser),\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 1427, in
call\n return RunningCommand(cmd, call_args, stdin, stdout,
stderr)\n File “/usr/local/lib/python2.7/dist-packages/sh.py”,
line 774, in init\n self.wait()\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 792, in
wait\n self.handle_command_exit_code(exit_code)\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 815, in
handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN:
/home/cchq/www/dev/current/python_env/bin/python manage.py shell
–plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n
File “manage.py”, line 9, in \n import django\nImportError: No
module named django\n\n", “module_stdout”: “Traceback (most recent call
last):\n File “manage.py”, line 9, in \n import
django\nImportError: No module named django\n\n”, “msg”: “MODULE FAILURE”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when we visit
our IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111:
Connection refused) while connecting to upstream, client: 186.106.251.211,
server: 165.227.172.214, request: “GET /favicon.ico HTTP/1.1”, upstream: “
http://165.227.172.214:9010/favicon.ico”, host: “165.227.172.214”,
referrer: “https://165.227.172.214/solutions/

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py”, line 176, in get_new_connection
connection = Database.connect(**conn_params)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py”, line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we’re wondering:

  1. Why isn’t the server running itself?
  2. And how do we get it to run?

Best,
Taylor


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Simon

Yes, Jenny’s advice helped us out immensely - we now have commcare up and
serving the static assets.

We are seeing what we think are errors connecting to the riak-cs instance -
and I tried running ./manage.py ptop_preindex which produces some iniital
success, but then:

Starting pillow preindex ledgers
Traceback (most recent call last):
File
"/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/gevent/greenlet.py",
line 327, in run
result = self._run(*self.args, **self.kwargs)
File
"/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqcase/management/commands/ptop_preindex.py",
line 53, in do_reindex
FACTORIES_BY_SLUGreindex_command.build().reindex()
File
"/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py",
line 137, in build
initialize_index_and_mapping(get_es_new(), CASE_SEARCH_INDEX_INFO)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 87, in
initialize_index_and_mapping
initialize_index(es, index_info)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 92, in
initialize_index
return create_index_and_set_settings_normal(es, index_info.index,
index_info.meta)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 73, in
create_index_and_set_settings_normal
es.indices.create(index=index, body=metadata)
File
"/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/utils.py",
line 69, in _wrapped
return func(*args, params=params, **kwargs)
File
"/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/indices.py",
line 103, in create
params=params, body=body)
File
"/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/transport.py",
line 307, in perform_request
status, headers, data = connection.perform_request(method, url, params,
body, ignore=ignore, timeout=timeout)
File
"/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py",
line 93, in perform_request
self._raise_error(response.status, raw_data)
File
"/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/base.py",
line 105, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code,
error_message, additional_info)
NotFoundError: TransportError(404, u’404 Not
Found

Not Found

The requested document was not
found on this server.


mochiweb+webmachine web
server’)
<Greenlet at 0x7f9713dac2d0: do_reindex(u’case_search’, False)> failed with
NotFoundError

There are more errors in this ilk, the above is merely the first (note: I
have added some debugging print statements, so line numbers may be slightly
out). Does the above point to us doing something that is obviously wrong?

Thanks in advance.

Rory

··· On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote: > > Been offline travelling so sorry for the slow response. Strange that you > get that error if you're using the fabric deploy script since it should do > a bower update but I'd check what Jenny suggested to make sure. > > Re the "sudo received non-zero exit codes" messages, as long as it's only > for the 'preindex' command that should be fine. If there are any other > errors during deploy then it won't complete. (also PR to remove those > warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393) > > > > Simon Kelly > Director of Server Engineer | Dimagi > > On 10 October 2017 at 11:27, Jenny Schweers <jsch...@dimagi.com > wrote: > >> Hi Taylor, >> >> About that compress error: Have you run `bower update` recently? I'd run >> that, verify that the >> file ./bower_components/font-awesome/less/font-awesome.less does indeed >> exist afterwards, and then run collectstatic and compress again. >> >> You can also double-check that your STATICFILES_DIRS contains >> bower_components (it should be set up by >> https://github.com/dimagi/commcare-hq/blob/master/settings.py#L87-L97) >> >> -Jenny >> >> On Mon, Oct 9, 2017 at 5:36 PM, <tay...@openfn.org > wrote: >> >>> Simon, my last update for the day: >>> >>> I've got the server running (and serving html! >>> ) >>> when I follow LESS option 1: >>> https://github.com/dimagi/commcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you >>> . >>> >>> I cannot get *compress* to run using either option 2 or option 3, and >>> with option 1 (as you can probably see from the linked photo) I'm not >>> actually getting the static assets I need from a CDN. >>> >>> The error on my *compress* command is no longer on motech, it's now on >>> "hqadmin": >>> CommandError: An error occurred during rendering >>> /home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqadmin/templates/hqadmin/loadtest.html: >>> 'font-awesome/less/font-awesome.less' could not be found in the >>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles' or >>> with staticfiles. >>> >>> Thanks again for all your help. Speak soon! >>> >>> Taylor >>> >>> P.S. — In an effort to make this repeatable, we've got a fork of the >>> ansible repo going that includes a git submodule with your commcare-deploy >>> repo. Our goal is to get this down to a single git clone and a few shell >>> commands! Would love any feedback on the directory structure you use >>> locally. >>> >>> On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote: >>>> >>>> Hey Simon, thanks so much. We've got the fab deploy scripts running now >>>> (albeit with lots of warning, sudo received non-zero exit codes*) and >>>> finishing successfully. When we ssh into our box, got to the newly created >>>> release, activate python and run `runserver` however, we get a server to >>>> start but it throws this 500** whenever it's accessed via the web: >>>> >>>> OfflineGenerationError: You have offline compression enabled but key >>>> "89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest. You >>>> may need to run "python manage.py compress". >>>> 2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0" 500 59 >>>> >>>> When running compress, we get this font-awesome package error: >>>> CommandError: An error occurred during rendering >>>> /home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/openmrs/templates/openmrs/importers.html: >>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles' or >>>> with staticfiles. >>>> >>>> Have you bumped into this before? Thanks! >>>> >>>> **The non-zero exit codes all look pretty much like this:* >>>> [165.227.172.214] sudo: >>>> /home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>> --check >>>> [165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not >>>> configured (logging is disabled). Please see the documentation for more >>>> information. >>>> [165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG >>>> [165.227.172.214] out: >>>> >>>> >>>> Warning: sudo() received nonzero return code 1 while executing >>>> '/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>> --check'! >>>> >>>> ***Here's the full 500 error:* >>>> https://gist.github.com/taylordowns2000/cebc671a34431826a326b66cadccee9d >>>> >>>> >>>> >>>> On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote: >>>>> >>>>> Hi Taylor >>>>> >>>>> Our general process is as follows: >>>>> >>>>> 1. Configure blank VMs (just OS) >>>>> 2. Create inventory file and vars files >>>>> 3. Run ansible deploy - there are often a few hiccoughs here since >>>>> we don't do fresh installs that often >>>>> 4. Once everything is setup we deploy our code with fabric scripts >>>>> as follows >>>>> >>>>> fab deploy >>>>> >>>>> environment is the name of an inventory file here: >>>>> https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory >>>>> >>>>> This also makes use of this 'environments.yml' file which tells >>>>> the deploy scripts which services to run where and a few other things: >>>>> https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml >>>>> >>>>> 5. That deploy will checkout the latest code, do the static file >>>>> compression etc and also create the supervisor files needed to run the >>>>> servers. >>>>> >>>>> >>>>> We've recently made some improvements to our couchdb setup (you should >>>>> use couchdb2). I've linked them in comments on your PR. >>>>> >>>>> We are about to do a whole new cluster setup so it's likely that there >>>>> will be some more changes coming soon. >>>>> >>>>> Re the issues: >>>>> 1. Switch to using couchdb2 >>>>> 2&3. Resolved in latest master + this PR ( >>>>> https://github.com/dimagi/commcarehq-ansible/pull/971) >>>>> 4. The virtual env should have already be setup by >>>>> the deploy_commcarehq playbook which should execute prior to the touchforms >>>>> playbook. Also touchforms is only necessary if you're going to be doing sms >>>>> surveys. >>>>> >>>>> Re the encrypted drives. We run the deploy_stack playbook with >>>>> 'after-reboot' tag limited to the rebooted host. This should remount the >>>>> encrypted drive and perform a few other actions. >>>>> >>>>> I hope that helps and thanks for the feedback! >>>>> >>>>> Simon Kelly >>>>> Director of Server Engineer | Dimagi >>>>> >>>>> On 5 October 2017 at 17:36, wrote: >>>>> >>>>>> Update: Rory found that one issue lay in the encrypted fs stuff. ran: >>>>>> >>>>>> /etc/init.d/postgresql start >>>>>> /etc/init.d/pgbouncer stop >>>>>> /etc/init.d/pgbouncer start >>>>>> >>>>>> >>>>>> and we can run the server. This was probably due to us having to >>>>>> reboot during the deployment process. >>>>>> >>>>>> We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py >>>>>> migrate) *and get: >>>>>> File >>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py", >>>>>> line 599, in _make_api_call >>>>>> raise error_class(parsed_response, operation_name) >>>>>> botocore.exceptions.ClientError: An error occurred (AccessDenied) >>>>>> when calling the ListObjects operation: Access Denied >>>>>> >>>>>> This appears to be an S3 issue, but I'm fairly certain I've >>>>>> configured my bucket properly and granted access via the access key and >>>>>> secret. (These are not part of version control in the shared repo, of >>>>>> course.) Will update as we go. >>>>>> >>>>>> FWIW, *python manage.py compress* fails because it can't find the >>>>>> Font Awesome less file: >>>>>> CommandError: An error occurred during rendering >>>>>> /home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html: >>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles' or >>>>>> with staticfiles. >>>>>> >>>>>> >>>>>> On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: >>>>>>> >>>>>>> Hey guys, >>>>>>> >>>>>>> Hope all is well. Let me preface this with a thank you—I know you've >>>>>>> got a lot going on and don't rely on ansible monolith deployments for your >>>>>>> core work, so I realize that any help you provide here is going above and >>>>>>> beyond. Thank you for that! >>>>>>> >>>>>>> My objective is to get *ansible-playbook -i inventories/monolith -u >>>>>>> root -e '@vars/dev/dev_private.yml' -e '@vars/dev/dev_public.yml' >>>>>>> deploy_stack.yml* running on a freshly provisioned Ubuntu 14.04.5 >>>>>>> LTS (GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory. >>>>>>> >>>>>>> While I think that's a solid goal for the whole CommCare open-source >>>>>>> community, I'd like to disclose that we've also got a client at Open >>>>>>> Function that wants to connect CommCare to another system using OpenFn, but >>>>>>> CommCare needs to be hosted on their servers due to regulatory issues. >>>>>>> >>>>>>> Note that we made a couple of changes vagrant and edited some >>>>>>> ansible scripts. You can see this work here: >>>>>>> https://github.com/rorymckinley/commcare-sandbox/pull/1/files. One >>>>>>> significant change is that we are running the vagrant stuff as root. >>>>>>> >>>>>>> To the issues: >>>>>>> >>>>>>> *Issue #1:* >>>>>>> TASK [couchdb : Set CouchDB username and password] >>>>>>> ***************************** >>>>>>> ok: [165.227.172.214] => (item={u'username': u'commcarehq', u'name': >>>>>>> u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', >>>>>>> u'password': u'commcarehq', u'port': 5984}) >>>>>>> failed: [165.227.172.214] (item={u'username': u'commcarehq', >>>>>>> u'name': u'commcarehq__users', u'is_https': False, u'host': >>>>>>> u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) => >>>>>>> {"cache_control": "must-revalidate", "content": >>>>>>> "{\"error\":\"unauthorized\",\"reason\":\"You are not a server >>>>>>> admin.\"}\n", "content_length": "64", "content_type": "text/plain; >>>>>>> charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true, >>>>>>> "item": {"host": "165.227.172.214", "is_https": false, "name": >>>>>>> "commcarehq__users", "password": "commcarehq", "port": 5984, "username": >>>>>>> "commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401: >>>>>>> Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang >>>>>>> OTP/R16B03)", "status": 401, "url": " >>>>>>> http://165.227.172.214:5984/_config/admins/commcarehq"} >>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>> >>>>>>> PLAY RECAP >>>>>>> ********************************************************************* >>>>>>> 165.227.172.214 : ok=135 changed=90 unreachable=0 >>>>>>> failed=1 >>>>>>> >>>>>>> *Possible solution 1:* This task runs twice, but each user in >>>>>>> "items" has the same username and password. The failure can be stepped >>>>>>> over, as we don't need to (and can't) set up two different couchdb users >>>>>>> with commcarehq:commcarehq on the same box. >>>>>>> >>>>>>> *Issue #2&3: *For both couchdb2 and redis, monit fails. After I >>>>>>> reboot the system and start monit manually they pass and redis is running, >>>>>>> but couchdb2 still shows "Execution failed". After another system reboot, >>>>>>> and manually starting monit, both now show as running and being monitored. >>>>>>> >>>>>>> monit status: Process 'couchdb2' >>>>>>> status Execution failed >>>>>>> monitoring status Monitored >>>>>>> data collected Thu, 05 Oct 2017 11:59:49 >>>>>>> >>>>>>> TASK [*couchdb2 : monit*] >>>>>>> ******************************************************** >>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": >>>>>>> true, "msg": "couchdb2 process not presently configured with monit", >>>>>>> "name": "couchdb2", "state": "monitored"} >>>>>>> >>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>> ***************************************** >>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>> >>>>>>> PLAY RECAP >>>>>>> ********************************************************************* >>>>>>> 165.227.172.214 : ok=36 changed=20 unreachable=0 >>>>>>> failed=1 >>>>>>> >>>>>>> TASK [*redis : monit*] >>>>>>> *********************************************************** >>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": >>>>>>> true, "msg": "redis process not presently configured with monit", "name": >>>>>>> "redis", "state": "monitored"} >>>>>>> >>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>> ***************************************** >>>>>>> >>>>>>> RUNNING HANDLER [redis : restart redis] >>>>>>> **************************************** >>>>>>> >>>>>>> RUNNING HANDLER [redis : restart rsyslog] >>>>>>> ************************************** >>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>> >>>>>>> PLAY RECAP >>>>>>> ********************************************************************* >>>>>>> 165.227.172.214 : ok=17 changed=10 unreachable=0 >>>>>>> failed=1 >>>>>>> >>>>>>> *Issue 4:* >>>>>>> TASK [touchforms : Touchforms user] >>>>>>> ******************************************** >>>>>>> An exception occurred during task execution. To see the full >>>>>>> traceback, use -vvv. The error was: ImportError: No module named django >>>>>>> fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {"changed": >>>>>>> false, "failed": true, "module_stderr": "Traceback (most recent call >>>>>>> last):\n File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line >>>>>>> 144, in \n main()\n File >>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n >>>>>>> user.create_user()\n File >>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in >>>>>>> create_user\n superuser=repr(self.superuser),\n File >>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n >>>>>>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File >>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n >>>>>>> self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", >>>>>>> line 792, in wait\n self.handle_command_exit_code(exit_code)\n File >>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in >>>>>>> handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: >>>>>>> /home/cchq/www/dev/current/python_env/bin/python manage.py shell >>>>>>> --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n >>>>>>> File \"manage.py\", line 9, in \n import django\nImportError: No >>>>>>> module named django\n\n", "module_stdout": "Traceback (most recent call >>>>>>> last):\n File \"manage.py\", line 9, in \n import >>>>>>> django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} >>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>> >>>>>>> Possible solution: Here, we need to SSH in and then: >>>>>>> # su - cchq >>>>>>> # cd www/dev/current >>>>>>> # source python_env/bin/activate >>>>>>> # pip install -r requirements/requirements.txt >>>>>>> >>>>>>> At this point the whole ansible playbook succeeds, but when we visit >>>>>>> our IP, we get the maintenance page and see this in the nginx logs: >>>>>>> 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111: >>>>>>> Connection refused) while connecting to upstream, client: 186.106.251.211, >>>>>>> server: 165.227.172.214, request: "GET /favicon.ico HTTP/1.1", upstream: " >>>>>>> http://165.227.172.214:9010/favicon.ico", host: "165.227.172.214", >>>>>>> referrer: "https://165.227.172.214/solutions/" >>>>>>> >>>>>>> After activating the python_env we run runserver as `cchq`: >>>>>>> ./manage.py runserver 0.0.0.0:9010 >>>>>>> >>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection >>>>>>> connection = Database.connect(**conn_params) >>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect >>>>>>> conn = _connect(dsn, connection_factory=connection_factory, **kwasync) >>>>>>> django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server >>>>>>> >>>>>>> >>>>>>> At this point, we're wondering: >>>>>>> >>>>>>> 1. Why isn't the server running itself? >>>>>>> 2. And how do we get it to run? >>>>>>> >>>>>>> Best, >>>>>>> Taylor >>>>>>> >>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "CommCare Developers" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "CommCare Developers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to commcare-developers+unsubscribe@googlegroups.com >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "CommCare Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to commcare-developers+unsubscribe@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > >

That seems like the Elasticsearch address may be incorrect. This error is
happening when the command is trying to create a new index in elasticsearch.

I’d check that you’ve got your ES connection details correct in
localsettings:

  • ELASTICSEARCH_HOST
  • ELASTICSEARCH_PORT

You can test the connection using curl:

$ curl :

{
“status” : 200,
“name” : “Albino”,
“cluster_name” : “agrajag”,
“version” : {
“number” : “1.7.4”,
“build_hash” : “0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e”,
“build_timestamp” : “2015-12-15T16:45:04Z”,
“build_snapshot” : false,
“lucene_version” : “4.10.4”
},
“tagline” : “You Know, for Search”
}

Simon Kelly
Director of Server Engineer | Dimagi

··· On 11 October 2017 at 11:31, wrote:

Hi Simon

Yes, Jenny’s advice helped us out immensely - we now have commcare up and
serving the static assets.

We are seeing what we think are errors connecting to the riak-cs instance

  • and I tried running ./manage.py ptop_preindex which produces some
    iniital success, but then:

Starting pillow preindex ledgers
Traceback (most recent call last):
File “/home/cchq/www/dev/current/python_env/local/lib/python2.
7/site-packages/gevent/greenlet.py”, line 327, in run
result = self._run(*self.args, **self.kwargs)
File “/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/
hqcase/management/commands/ptop_preindex.py”, line 53, in do_reindex
FACTORIES_BY_SLUGreindex_command.build().reindex()
File “/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py”,
line 137, in build
initialize_index_and_mapping(get_es_new(), CASE_SEARCH_INDEX_INFO)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 87, in
initialize_index_and_mapping
initialize_index(es, index_info)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 92, in
initialize_index
return create_index_and_set_settings_normal(es, index_info.index,
index_info.meta)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 73, in
create_index_and_set_settings_normal
es.indices.create(index=index, body=metadata)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.
7/site-packages/elasticsearch/client/utils.py”, line 69, in _wrapped
return func(*args, params=params, **kwargs)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.
7/site-packages/elasticsearch/client/indices.py”, line 103, in create
params=params, body=body)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.
7/site-packages/elasticsearch/transport.py”, line 307, in perform_request
status, headers, data = connection.perform_request(method, url,
params, body, ignore=ignore, timeout=timeout)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.
7/site-packages/elasticsearch/connection/http_urllib3.py”, line 93, in
perform_request
self._raise_error(response.status, raw_data)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.
7/site-packages/elasticsearch/connection/base.py”, line 105, in
_raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code,
error_message, additional_info)
NotFoundError: TransportError(404, u’404 Not
Found

Not Found

The requested document was
not found on this server.


mochiweb+webmachine web
server’)
<Greenlet at 0x7f9713dac2d0: do_reindex(u’case_search’, False)> failed
with NotFoundError

There are more errors in this ilk, the above is merely the first (note: I
have added some debugging print statements, so line numbers may be slightly
out). Does the above point to us doing something that is obviously wrong?

Thanks in advance.

Rory

On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote:

Been offline travelling so sorry for the slow response. Strange that you
get that error if you’re using the fabric deploy script since it should do
a bower update but I’d check what Jenny suggested to make sure.

Re the “sudo received non-zero exit codes” messages, as long as it’s
only for the ‘preindex’ command that should be fine. If there are any other
errors during deploy then it won’t complete. (also PR to remove those
warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393)

Simon Kelly
Director of Server Engineer | Dimagi

On 10 October 2017 at 11:27, Jenny Schweers jsch...@dimagi.com wrote:

Hi Taylor,

About that compress error: Have you run bower update recently? I’d run
that, verify that the file ./bower_components/font-awesome/less/font-awesome.less
does indeed exist afterwards, and then run collectstatic and compress again.

You can also double-check that your STATICFILES_DIRS contains
bower_components (it should be set up by https://github.com/dimagi/c
ommcare-hq/blob/master/settings.py#L87-L97)

-Jenny

On Mon, Oct 9, 2017 at 5:36 PM, tay...@openfn.org wrote:

Simon, my last update for the day:

I’ve got the server running (and serving html!
https://fd-files-production.s3.amazonaws.com/214131/TeaNBXNn9A1b2cZcaMnhyw?X-Amz-Expires=300&X-Amz-Date=20171009T212816Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIA2QBI5WP5HA3ZEA/20171009/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=56ec6111d2a96ced90fded9f16fc1c6f473796894c6da08c157a7ff3c0e870ae)
when I follow LESS option 1: https://github.com/dimagi/c
ommcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you
.

I cannot get compress to run using either option 2 or option 3, and
with option 1 (as you can probably see from the linked photo) I’m not
actually getting the static assets I need from a CDN.

The error on my compress command is no longer on motech, it’s now on
"hqadmin":
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqa
dmin/templates/hqadmin/loadtest.html: 'font-awesome/less/font-awesome.less’
could not be found in the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles’
or with staticfiles.

Thanks again for all your help. Speak soon!

Taylor

P.S. — In an effort to make this repeatable, we’ve got a fork of the
ansible repo going that includes a git submodule with your commcare-deploy
repo. Our goal is to get this down to a single git clone and a few shell
commands! Would love any feedback on the directory structure you use
locally.

On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote:

Hey Simon, thanks so much. We’ve got the fab deploy scripts running
now (albeit with lots of warning, sudo received non-zero exit codes*) and
finishing successfully. When we ssh into our box, got to the newly created
release, activate python and run runserver however, we get a server to
start but it throws this 500** whenever it’s accessed via the web:

OfflineGenerationError: You have offline compression enabled but key
"89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest.
You may need to run “python manage.py compress”.
2017-10-09 16:15:37,638 ERROR “GET /accounts/login/ HTTP/1.0” 500 59

When running compress, we get this font-awesome package error:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/
openmrs/templates/openmrs/importers.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles’
or with staticfiles.

Have you bumped into this before? Thanks!

*The non-zero exit codes all look pretty much like this:
[165.227.172.214] sudo: /home/cchq/www/dev/releases/20
17-10-09_16.04/python_env/bin/python /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check
[165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not
configured (logging is disabled). Please see the documentation for more
information.
[165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG
[165.227.172.214] out:

Warning: sudo() received nonzero return code 1 while executing
’/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check’!

**Here’s the full 500 error: https://gist.github.com
/taylordowns2000/cebc671a34431826a326b66cadccee9d

On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote:

Hi Taylor

Our general process is as follows:

  1. Configure blank VMs (just OS)
  2. Create inventory file and vars files
  3. Run ansible deploy - there are often a few hiccoughs here
    since we don’t do fresh installs that often
  4. Once everything is setup we deploy our code with fabric scripts
    https://github.com/dimagi/commcare-hq-deploy as follows

fab deploy

environment is the name of an inventory file here:
https://github.com/dimagi/commcare-hq-deploy/tree/
master/fab/inventory
https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory

This also makes use of this ‘environments.yml’ file which tells
the deploy scripts which services to run where and a few other things:
https://github.com/dimagi/commcare-hq-deploy/blob/
master/fab/environments.yml
https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml

  1. That deploy will checkout the latest code, do the static file
    compression etc and also create the supervisor files needed to run the
    servers.

We’ve recently made some improvements to our couchdb setup (you
should use couchdb2). I’ve linked them in comments on your PR.

We are about to do a whole new cluster setup so it’s likely that
there will be some more changes coming soon.

Re the issues:

  1. Switch to using couchdb2
    2&3. Resolved in latest master + this PR (
    https://github.com/dimagi/commcarehq-ansible/pull/971)
  2. The virtual env should have already be setup by
    the deploy_commcarehq playbook which should execute prior to the touchforms
    playbook. Also touchforms is only necessary if you’re going to be doing sms
    surveys.

Re the encrypted drives. We run the deploy_stack playbook with
’after-reboot’ tag limited to the rebooted host. This should remount the
encrypted drive and perform a few other actions.

I hope that helps and thanks for the feedback!

Simon Kelly
Director of Server Engineer | Dimagi

On 5 October 2017 at 17:36, tay...@openfn.org wrote:

Update: Rory found that one issue lay in the encrypted fs stuff. ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us having to
reboot during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py
migrate) *and get:
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/botocore/client.py”, line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied)
when calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I’m fairly certain I’ve
configured my bucket properly and granted access via the access key and
secret. (These are not part of version control in the shared repo, of
course.) Will update as we go.

FWIW, python manage.py compress fails because it can’t find the
Font Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/reg
istration/templates/registration/domain_request.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles’
or with staticfiles.

On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote:

Hey guys,

Hope all is well. Let me preface this with a thank you—I know
you’ve got a lot going on and don’t rely on ansible monolith deployments
for your core work, so I realize that any help you provide here is going
above and beyond. Thank you for that!

My objective is to get ansible-playbook -i inventories/monolith
-u root -e ‘@vars/dev/dev_private.yml’ -e '@vars/dev/dev_public.yml’
deploy_stack.yml
running on a freshly provisioned Ubuntu 14.04.5
LTS (GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory.

While I think that’s a solid goal for the whole CommCare
open-source community, I’d like to disclose that we’ve also got a client at
Open Function that wants to connect CommCare to another system using
OpenFn, but CommCare needs to be hosted on their servers due to regulatory
issues.

Note that we made a couple of changes vagrant and edited some
ansible scripts. You can see this work here:
https://github.com/rorymckinley/commcare-sandbox/pull/1/files. One
significant change is that we are running the vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]


ok: [165.227.172.214] => (item={u’username’: u’commcarehq’,
u’name’: u’commcarehq’, u’is_https’: False, u’host’: u’165.227.172.214’,
u’password’: u’commcarehq’, u’port’: 5984})
failed: [165.227.172.214] (item={u’username’: u’commcarehq’,
u’name’: u’commcarehq__users’, u’is_https’: False, u’host’:
u’165.227.172.214’, u’password’: u’commcarehq’, u’port’: 5984}) =>
{“cache_control”: “must-revalidate”, “content”:
"{“error”:“unauthorized”,“reason”:“You are not a server
admin.”}\n", “content_length”: “64”, “content_type”: “text/plain;
charset=utf-8”, “date”: “Thu, 05 Oct 2017 11:10:34 GMT”, “failed”: true,
“item”: {“host”: “165.227.172.214”, “is_https”: false, “name”:
“commcarehq__users”, “password”: “commcarehq”, “port”: 5984, “username”:
“commcarehq”}, “msg”: “Status code was not [200]: HTTP Error 401:
Unauthorized”, “redirected”: false, “server”: “CouchDB/1.6.1 (Erlang
OTP/R16B03)”, “status”: 401, “url”: “http://165.227.172.214:5984/_
config/admins/commcarehq”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=135 changed=90 unreachable=0
failed=1

Possible solution 1: This task runs twice, but each user in
"items" has the same username and password. The failure can be stepped
over, as we don’t need to (and can’t) set up two different couchdb users
with commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails. After I
reboot the system and start monit manually they pass and redis is running,
but couchdb2 still shows “Execution failed”. After another system reboot,
and manually starting monit, both now show as running and being monitored.

monit status: Process 'couchdb2’
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017 11:59:49

TASK [couchdb2 : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”:
true, “msg”: “couchdb2 process not presently configured with monit”,
“name”: “couchdb2”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=36 changed=20 unreachable=0
failed=1

TASK [redis : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”:
true, “msg”: “redis process not presently configured with monit”, “name”:
“redis”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit]


RUNNING HANDLER [redis : restart redis]


RUNNING HANDLER [redis : restart rsyslog]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=17 changed=10 unreachable=0
failed=1

Issue 4:
TASK [touchforms : Touchforms user] ******************************


An exception occurred during task execution. To see the full
traceback, use -vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => {“changed”:
false, “failed”: true, “module_stderr”: “Traceback (most recent call
last):\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 144, in \n main()\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 125, in main\n user.create_user()\n File
”/tmp/ansible_iUft9p/ansible_module_django_user.py", line 84, in
create_user\n superuser=repr(self.superuser),\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 1427, in
call\n return RunningCommand(cmd, call_args, stdin, stdout,
stderr)\n File “/usr/local/lib/python2.7/dist-packages/sh.py”,
line 774, in init\n self.wait()\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 792, in
wait\n self.handle_command_exit_code(exit_code)\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 815, in
handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN:
/home/cchq/www/dev/current/python_env/bin/python manage.py shell
–plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n
File “manage.py”, line 9, in \n import django\nImportError: No
module named django\n\n", “module_stdout”: “Traceback (most recent call
last):\n File “manage.py”, line 9, in \n import
django\nImportError: No module named django\n\n”, “msg”: “MODULE FAILURE”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when we
visit our IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111:
Connection refused) while connecting to upstream, client: 186.106.251.211,
server: 165.227.172.214, request: “GET /favicon.ico HTTP/1.1”, upstream: “
http://165.227.172.214:9010/favicon.ico”, host: “165.227.172.214”,
referrer: “https://165.227.172.214/solutions/

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py”, line 176, in get_new_connection
connection = Database.connect(**conn_params)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py”, line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we’re wondering:

  1. Why isn’t the server running itself?
  2. And how do we get it to run?

Best,
Taylor


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Simon.

Just to make sure I am not missing something really obvious (“missing
something really obvious” is in fact, quite an accurate summation of my
adventure so far) - the ansible scripts set up riak-cs, and so I can point
those ES connection strings at the local riak-cs instance?

Regards

Rory

··· On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote: > > That seems like the Elasticsearch address may be incorrect. This error is > happening when the command is trying to create a new index in elasticsearch. > > I'd check that you've got your ES connection details correct in > localsettings: > > - ELASTICSEARCH_HOST > - ELASTICSEARCH_PORT > > You can test the connection using curl: > > $ curl : > > > { > "status" : 200, > "name" : "Albino", > "cluster_name" : "agrajag", > "version" : { > "number" : "1.7.4", > "build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e", > "build_timestamp" : "2015-12-15T16:45:04Z", > "build_snapshot" : false, > "lucene_version" : "4.10.4" > }, > "tagline" : "You Know, for Search" > } > > > > Simon Kelly > Director of Server Engineer | Dimagi > > On 11 October 2017 at 11:31, <rorymc...@capefox.co > wrote: > >> Hi Simon >> >> Yes, Jenny's advice helped us out immensely - we now have commcare up and >> serving the static assets. >> >> We are seeing what we think are errors connecting to the riak-cs instance >> - and I tried running `./manage.py ptop_preindex` which produces some >> iniital success, but then: >> >> Starting pillow preindex ledgers >> Traceback (most recent call last): >> File >> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/gevent/greenlet.py", >> line 327, in run >> result = self._run(*self.args, **self.kwargs) >> File >> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqcase/management/commands/ptop_preindex.py", >> line 53, in do_reindex >> FACTORIES_BY_SLUG[reindex_command](**kwargs).build().reindex() >> File >> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py", >> line 137, in build >> initialize_index_and_mapping(get_es_new(), CASE_SEARCH_INDEX_INFO) >> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 87, in >> initialize_index_and_mapping >> initialize_index(es, index_info) >> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 92, in >> initialize_index >> return create_index_and_set_settings_normal(es, index_info.index, >> index_info.meta) >> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 73, in >> create_index_and_set_settings_normal >> es.indices.create(index=index, body=metadata) >> File >> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", >> line 69, in _wrapped >> return func(*args, params=params, **kwargs) >> File >> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/indices.py", >> line 103, in create >> params=params, body=body) >> File >> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/transport.py", >> line 307, in perform_request >> status, headers, data = connection.perform_request(method, url, >> params, body, ignore=ignore, timeout=timeout) >> File >> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", >> line 93, in perform_request >> self._raise_error(response.status, raw_data) >> File >> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", >> line 105, in _raise_error >> raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, >> error_message, additional_info) >> NotFoundError: TransportError(404, u'404 Not >> Found

Not Found

The requested document was not >> found on this server.


mochiweb+webmachine web >> server') >> failed >> with NotFoundError >> >> There are more errors in this ilk, the above is merely the first (note: I >> have added some debugging print statements, so line numbers may be slightly >> out). Does the above point to us doing something that is obviously wrong? >> >> Thanks in advance. >> >> Rory >> >> On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote: >>> >>> Been offline travelling so sorry for the slow response. Strange that you >>> get that error if you're using the fabric deploy script since it should do >>> a bower update but I'd check what Jenny suggested to make sure. >>> >>> Re the "sudo received non-zero exit codes" messages, as long as it's >>> only for the 'preindex' command that should be fine. If there are any other >>> errors during deploy then it won't complete. (also PR to remove those >>> warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393) >>> >>> >>> >>> Simon Kelly >>> Director of Server Engineer | Dimagi >>> >>> On 10 October 2017 at 11:27, Jenny Schweers wrote: >>> >>>> Hi Taylor, >>>> >>>> About that compress error: Have you run `bower update` recently? I'd >>>> run that, verify that the >>>> file ./bower_components/font-awesome/less/font-awesome.less does indeed >>>> exist afterwards, and then run collectstatic and compress again. >>>> >>>> You can also double-check that your STATICFILES_DIRS contains >>>> bower_components (it should be set up by >>>> https://github.com/dimagi/commcare-hq/blob/master/settings.py#L87-L97) >>>> >>>> -Jenny >>>> >>>> On Mon, Oct 9, 2017 at 5:36 PM, wrote: >>>> >>>>> Simon, my last update for the day: >>>>> >>>>> I've got the server running (and serving html! >>>>> ) >>>>> when I follow LESS option 1: >>>>> https://github.com/dimagi/commcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you >>>>> . >>>>> >>>>> I cannot get *compress* to run using either option 2 or option 3, and >>>>> with option 1 (as you can probably see from the linked photo) I'm not >>>>> actually getting the static assets I need from a CDN. >>>>> >>>>> The error on my *compress* command is no longer on motech, it's now >>>>> on "hqadmin": >>>>> CommandError: An error occurred during rendering >>>>> /home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqadmin/templates/hqadmin/loadtest.html: >>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles' or >>>>> with staticfiles. >>>>> >>>>> Thanks again for all your help. Speak soon! >>>>> >>>>> Taylor >>>>> >>>>> P.S. — In an effort to make this repeatable, we've got a fork of the >>>>> ansible repo going that includes a git submodule with your commcare-deploy >>>>> repo. Our goal is to get this down to a single git clone and a few shell >>>>> commands! Would love any feedback on the directory structure you use >>>>> locally. >>>>> >>>>> On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote: >>>>>> >>>>>> Hey Simon, thanks so much. We've got the fab deploy scripts running >>>>>> now (albeit with lots of warning, sudo received non-zero exit codes*) and >>>>>> finishing successfully. When we ssh into our box, got to the newly created >>>>>> release, activate python and run `runserver` however, we get a server to >>>>>> start but it throws this 500** whenever it's accessed via the web: >>>>>> >>>>>> OfflineGenerationError: You have offline compression enabled but key >>>>>> "89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest. You >>>>>> may need to run "python manage.py compress". >>>>>> 2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0" 500 59 >>>>>> >>>>>> When running compress, we get this font-awesome package error: >>>>>> CommandError: An error occurred during rendering >>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/openmrs/templates/openmrs/importers.html: >>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles' or >>>>>> with staticfiles. >>>>>> >>>>>> Have you bumped into this before? Thanks! >>>>>> >>>>>> **The non-zero exit codes all look pretty much like this:* >>>>>> [165.227.172.214] sudo: >>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>> --check >>>>>> [165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not >>>>>> configured (logging is disabled). Please see the documentation for more >>>>>> information. >>>>>> [165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG >>>>>> [165.227.172.214] out: >>>>>> >>>>>> >>>>>> Warning: sudo() received nonzero return code 1 while executing >>>>>> '/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>> --check'! >>>>>> >>>>>> ***Here's the full 500 error:* >>>>>> https://gist.github.com/taylordowns2000/cebc671a34431826a326b66cadccee9d >>>>>> >>>>>> >>>>>> >>>>>> On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote: >>>>>>> >>>>>>> Hi Taylor >>>>>>> >>>>>>> Our general process is as follows: >>>>>>> >>>>>>> 1. Configure blank VMs (just OS) >>>>>>> 2. Create inventory file and vars files >>>>>>> 3. Run ansible deploy - there are often a few hiccoughs here >>>>>>> since we don't do fresh installs that often >>>>>>> 4. Once everything is setup we deploy our code with fabric >>>>>>> scripts as follows >>>>>>> >>>>>>> fab deploy >>>>>>> >>>>>>> environment is the name of an inventory file here: >>>>>>> https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory >>>>>>> >>>>>>> This also makes use of this 'environments.yml' file which tells >>>>>>> the deploy scripts which services to run where and a few other things: >>>>>>> https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml >>>>>>> >>>>>>> 5. That deploy will checkout the latest code, do the static file >>>>>>> compression etc and also create the supervisor files needed to run the >>>>>>> servers. >>>>>>> >>>>>>> >>>>>>> We've recently made some improvements to our couchdb setup (you >>>>>>> should use couchdb2). I've linked them in comments on your PR. >>>>>>> >>>>>>> We are about to do a whole new cluster setup so it's likely that >>>>>>> there will be some more changes coming soon. >>>>>>> >>>>>>> Re the issues: >>>>>>> 1. Switch to using couchdb2 >>>>>>> 2&3. Resolved in latest master + this PR ( >>>>>>> https://github.com/dimagi/commcarehq-ansible/pull/971) >>>>>>> 4. The virtual env should have already be setup by >>>>>>> the deploy_commcarehq playbook which should execute prior to the touchforms >>>>>>> playbook. Also touchforms is only necessary if you're going to be doing sms >>>>>>> surveys. >>>>>>> >>>>>>> Re the encrypted drives. We run the deploy_stack playbook with >>>>>>> 'after-reboot' tag limited to the rebooted host. This should remount the >>>>>>> encrypted drive and perform a few other actions. >>>>>>> >>>>>>> I hope that helps and thanks for the feedback! >>>>>>> >>>>>>> Simon Kelly >>>>>>> Director of Server Engineer | Dimagi >>>>>>> >>>>>>> On 5 October 2017 at 17:36, wrote: >>>>>>> >>>>>>>> Update: Rory found that one issue lay in the encrypted fs stuff. >>>>>>>> ran: >>>>>>>> >>>>>>>> /etc/init.d/postgresql start >>>>>>>> /etc/init.d/pgbouncer stop >>>>>>>> /etc/init.d/pgbouncer start >>>>>>>> >>>>>>>> >>>>>>>> and we can run the server. This was probably due to us having to >>>>>>>> reboot during the deployment process. >>>>>>>> >>>>>>>> We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py >>>>>>>> migrate) *and get: >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py", >>>>>>>> line 599, in _make_api_call >>>>>>>> raise error_class(parsed_response, operation_name) >>>>>>>> botocore.exceptions.ClientError: An error occurred (AccessDenied) >>>>>>>> when calling the ListObjects operation: Access Denied >>>>>>>> >>>>>>>> This appears to be an S3 issue, but I'm fairly certain I've >>>>>>>> configured my bucket properly and granted access via the access key and >>>>>>>> secret. (These are not part of version control in the shared repo, of >>>>>>>> course.) Will update as we go. >>>>>>>> >>>>>>>> FWIW, *python manage.py compress* fails because it can't find the >>>>>>>> Font Awesome less file: >>>>>>>> CommandError: An error occurred during rendering >>>>>>>> /home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html: >>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles' or >>>>>>>> with staticfiles. >>>>>>>> >>>>>>>> >>>>>>>> On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: >>>>>>>>> >>>>>>>>> Hey guys, >>>>>>>>> >>>>>>>>> Hope all is well. Let me preface this with a thank you—I know >>>>>>>>> you've got a lot going on and don't rely on ansible monolith deployments >>>>>>>>> for your core work, so I realize that any help you provide here is going >>>>>>>>> above and beyond. Thank you for that! >>>>>>>>> >>>>>>>>> My objective is to get *ansible-playbook -i inventories/monolith >>>>>>>>> -u root -e '@vars/dev/dev_private.yml' -e '@vars/dev/dev_public.yml' >>>>>>>>> deploy_stack.yml* running on a freshly provisioned Ubuntu 14.04.5 >>>>>>>>> LTS (GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of memory. >>>>>>>>> >>>>>>>>> While I think that's a solid goal for the whole CommCare >>>>>>>>> open-source community, I'd like to disclose that we've also got a client at >>>>>>>>> Open Function that wants to connect CommCare to another system using >>>>>>>>> OpenFn, but CommCare needs to be hosted on their servers due to regulatory >>>>>>>>> issues. >>>>>>>>> >>>>>>>>> Note that we made a couple of changes vagrant and edited some >>>>>>>>> ansible scripts. You can see this work here: >>>>>>>>> https://github.com/rorymckinley/commcare-sandbox/pull/1/files. >>>>>>>>> One significant change is that we are running the vagrant stuff as root. >>>>>>>>> >>>>>>>>> To the issues: >>>>>>>>> >>>>>>>>> *Issue #1:* >>>>>>>>> TASK [couchdb : Set CouchDB username and password] >>>>>>>>> ***************************** >>>>>>>>> ok: [165.227.172.214] => (item={u'username': u'commcarehq', >>>>>>>>> u'name': u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', >>>>>>>>> u'password': u'commcarehq', u'port': 5984}) >>>>>>>>> failed: [165.227.172.214] (item={u'username': u'commcarehq', >>>>>>>>> u'name': u'commcarehq__users', u'is_https': False, u'host': >>>>>>>>> u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) => >>>>>>>>> {"cache_control": "must-revalidate", "content": >>>>>>>>> "{\"error\":\"unauthorized\",\"reason\":\"You are not a server >>>>>>>>> admin.\"}\n", "content_length": "64", "content_type": "text/plain; >>>>>>>>> charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true, >>>>>>>>> "item": {"host": "165.227.172.214", "is_https": false, "name": >>>>>>>>> "commcarehq__users", "password": "commcarehq", "port": 5984, "username": >>>>>>>>> "commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401: >>>>>>>>> Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang >>>>>>>>> OTP/R16B03)", "status": 401, "url": " >>>>>>>>> http://165.227.172.214:5984/_config/admins/commcarehq"} >>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>> >>>>>>>>> PLAY RECAP >>>>>>>>> ********************************************************************* >>>>>>>>> 165.227.172.214 : ok=135 changed=90 unreachable=0 >>>>>>>>> failed=1 >>>>>>>>> >>>>>>>>> *Possible solution 1:* This task runs twice, but each user in >>>>>>>>> "items" has the same username and password. The failure can be stepped >>>>>>>>> over, as we don't need to (and can't) set up two different couchdb users >>>>>>>>> with commcarehq:commcarehq on the same box. >>>>>>>>> >>>>>>>>> *Issue #2&3: *For both couchdb2 and redis, monit fails. After I >>>>>>>>> reboot the system and start monit manually they pass and redis is running, >>>>>>>>> but couchdb2 still shows "Execution failed". After another system reboot, >>>>>>>>> and manually starting monit, both now show as running and being monitored. >>>>>>>>> >>>>>>>>> monit status: Process 'couchdb2' >>>>>>>>> status Execution failed >>>>>>>>> monitoring status Monitored >>>>>>>>> data collected Thu, 05 Oct 2017 11:59:49 >>>>>>>>> >>>>>>>>> TASK [*couchdb2 : monit*] >>>>>>>>> ******************************************************** >>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": >>>>>>>>> true, "msg": "couchdb2 process not presently configured with monit", >>>>>>>>> "name": "couchdb2", "state": "monitored"} >>>>>>>>> >>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>> ***************************************** >>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>> >>>>>>>>> PLAY RECAP >>>>>>>>> ********************************************************************* >>>>>>>>> 165.227.172.214 : ok=36 changed=20 unreachable=0 >>>>>>>>> failed=1 >>>>>>>>> >>>>>>>>> TASK [*redis : monit*] >>>>>>>>> *********************************************************** >>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, "failed": >>>>>>>>> true, "msg": "redis process not presently configured with monit", "name": >>>>>>>>> "redis", "state": "monitored"} >>>>>>>>> >>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>> ***************************************** >>>>>>>>> >>>>>>>>> RUNNING HANDLER [redis : restart redis] >>>>>>>>> **************************************** >>>>>>>>> >>>>>>>>> RUNNING HANDLER [redis : restart rsyslog] >>>>>>>>> ************************************** >>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>> >>>>>>>>> PLAY RECAP >>>>>>>>> ********************************************************************* >>>>>>>>> 165.227.172.214 : ok=17 changed=10 unreachable=0 >>>>>>>>> failed=1 >>>>>>>>> >>>>>>>>> *Issue 4:* >>>>>>>>> TASK [touchforms : Touchforms user] >>>>>>>>> ******************************************** >>>>>>>>> An exception occurred during task execution. To see the full >>>>>>>>> traceback, use -vvv. The error was: ImportError: No module named django >>>>>>>>> fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => >>>>>>>>> {"changed": false, "failed": true, "module_stderr": "Traceback (most recent >>>>>>>>> call last):\n File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", >>>>>>>>> line 144, in \n main()\n File >>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n >>>>>>>>> user.create_user()\n File >>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in >>>>>>>>> create_user\n superuser=repr(self.superuser),\n File >>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n >>>>>>>>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File >>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n >>>>>>>>> self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", >>>>>>>>> line 792, in wait\n self.handle_command_exit_code(exit_code)\n File >>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in >>>>>>>>> handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: >>>>>>>>> /home/cchq/www/dev/current/python_env/bin/python manage.py shell >>>>>>>>> --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n >>>>>>>>> File \"manage.py\", line 9, in \n import django\nImportError: No >>>>>>>>> module named django\n\n", "module_stdout": "Traceback (most recent call >>>>>>>>> last):\n File \"manage.py\", line 9, in \n import >>>>>>>>> django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} >>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>> >>>>>>>>> Possible solution: Here, we need to SSH in and then: >>>>>>>>> # su - cchq >>>>>>>>> # cd www/dev/current >>>>>>>>> # source python_env/bin/activate >>>>>>>>> # pip install -r requirements/requirements.txt >>>>>>>>> >>>>>>>>> At this point the whole ansible playbook succeeds, but when we >>>>>>>>> visit our IP, we get the maintenance page and see this in the nginx logs: >>>>>>>>> 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111: >>>>>>>>> Connection refused) while connecting to upstream, client: 186.106.251.211, >>>>>>>>> server: 165.227.172.214, request: "GET /favicon.ico HTTP/1.1", upstream: " >>>>>>>>> http://165.227.172.214:9010/favicon.ico", host: >>>>>>>>> "165.227.172.214", referrer: "https://165.227.172.214/solutions/" >>>>>>>>> >>>>>>>>> After activating the python_env we run runserver as `cchq`: >>>>>>>>> ./manage.py runserver 0.0.0.0:9010 >>>>>>>>> >>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection >>>>>>>>> connection = Database.connect(**conn_params) >>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect >>>>>>>>> conn = _connect(dsn, connection_factory=connection_factory, **kwasync) >>>>>>>>> django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server >>>>>>>>> >>>>>>>>> >>>>>>>>> At this point, we're wondering: >>>>>>>>> >>>>>>>>> 1. Why isn't the server running itself? >>>>>>>>> 2. And how do we get it to run? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Taylor >>>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> --- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "CommCare Developers" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "CommCare Developers" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to commcare-developers+unsubscribe@googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "CommCare Developers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to commcare-developers+unsubscribe@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "CommCare Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to commcare-developers+unsubscribe@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > >

Hey

So riak-cs and elasticsearch are completely different systems. You can
think of Riak-CS as and S3 service. Elasticsearch is a distributed search
index.

In localsettings.py the settings for Elasticsearch are the ones I mentioned
before. For Riak the settings are:

S3_BLOB_DB_SETTINGS = {
“url”: “http://localhost:9980/”,
“access_key”: “admin-key”,
“secret_key”: “admin-secret”,
“config”: {“connect_timeout”: 3, “read_timeout”: 5},
}

Note that if you are just running a monolith then it’s not necessary to
have riak at all since you can just the the local filesystem. If you want
to go that route then you should just remove the ‘riak-cs’ group from your
inventory file completely. That should result in the above settings being
removed from your localsettings file which will cause CommCare HQ to switch
to using the filesystem to store binary objects (e.g. form xml).

You should also then set shared_drive_enabled to ‘false’ in your ansible
vars file since you don’t need a NFS drive for just one machine.

Sorry for the complexities here and the lack of docs.

Simon Kelly
Director of Server Engineer | Dimagi

··· On 12 October 2017 at 14:01, wrote:

Thanks Simon.

Just to make sure I am not missing something really obvious (“missing
something really obvious” is in fact, quite an accurate summation of my
adventure so far) - the ansible scripts set up riak-cs, and so I can point
those ES connection strings at the local riak-cs instance?

Regards

Rory

On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote:

That seems like the Elasticsearch address may be incorrect. This error is
happening when the command is trying to create a new index in elasticsearch.

I’d check that you’ve got your ES connection details correct in
localsettings:

  • ELASTICSEARCH_HOST
  • ELASTICSEARCH_PORT

You can test the connection using curl:

$ curl :

{
“status” : 200,
“name” : “Albino”,
“cluster_name” : “agrajag”,
“version” : {
“number” : “1.7.4”,
“build_hash” : “0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e”,
“build_timestamp” : “2015-12-15T16:45:04Z”,
“build_snapshot” : false,
“lucene_version” : “4.10.4”
},
“tagline” : “You Know, for Search”
}

Simon Kelly
Director of Server Engineer | Dimagi

On 11 October 2017 at 11:31, rorymc...@capefox.co wrote:

Hi Simon

Yes, Jenny’s advice helped us out immensely - we now have commcare up
and serving the static assets.

We are seeing what we think are errors connecting to the riak-cs
instance - and I tried running ./manage.py ptop_preindex which produces
some iniital success, but then:

Starting pillow preindex ledgers
Traceback (most recent call last):
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/gevent/greenlet.py”, line 327, in run
result = self._run(*self.args, **self.kwargs)
File “/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hq
case/management/commands/ptop_preindex.py”, line 53, in do_reindex
FACTORIES_BY_SLUGreindex_command.build().reindex()
File “/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py”,
line 137, in build
initialize_index_and_mapping(get_es_new(), CASE_SEARCH_INDEX_INFO)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 87, in
initialize_index_and_mapping
initialize_index(es, index_info)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 92, in
initialize_index
return create_index_and_set_settings_normal(es, index_info.index,
index_info.meta)
File “./corehq/ex-submodules/pillowtop/es_utils.py”, line 73, in
create_index_and_set_settings_normal
es.indices.create(index=index, body=metadata)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/client/utils.py”, line 69, in _wrapped
return func(*args, params=params, **kwargs)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/client/indices.py”, line 103, in create
params=params, body=body)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/transport.py”, line 307, in perform_request
status, headers, data = connection.perform_request(method, url,
params, body, ignore=ignore, timeout=timeout)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/connection/http_urllib3.py”, line 93, in
perform_request
self._raise_error(response.status, raw_data)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/connection/base.py”, line 105, in
_raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code,
error_message, additional_info)
NotFoundError: TransportError(404, u’404 Not
Found

Not Found

The requested document was
not found on this server.


mochiweb+webmachine web
server’)
<Greenlet at 0x7f9713dac2d0: do_reindex(u’case_search’, False)> failed
with NotFoundError

There are more errors in this ilk, the above is merely the first (note:
I have added some debugging print statements, so line numbers may be
slightly out). Does the above point to us doing something that is obviously
wrong?

Thanks in advance.

Rory

On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote:

Been offline travelling so sorry for the slow response. Strange that
you get that error if you’re using the fabric deploy script since it should
do a bower update but I’d check what Jenny suggested to make sure.

Re the “sudo received non-zero exit codes” messages, as long as it’s
only for the ‘preindex’ command that should be fine. If there are any other
errors during deploy then it won’t complete. (also PR to remove those
warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393)

Simon Kelly
Director of Server Engineer | Dimagi

On 10 October 2017 at 11:27, Jenny Schweers jsch...@dimagi.com wrote:

Hi Taylor,

About that compress error: Have you run bower update recently? I’d
run that, verify that the file ./bower_components/font-awesome/less/font-awesome.less
does indeed exist afterwards, and then run collectstatic and compress again.

You can also double-check that your STATICFILES_DIRS contains
bower_components (it should be set up by https://github.com/dimagi/c
ommcare-hq/blob/master/settings.py#L87-L97)

-Jenny

On Mon, Oct 9, 2017 at 5:36 PM, tay...@openfn.org wrote:

Simon, my last update for the day:

I’ve got the server running (and serving html!
https://fd-files-production.s3.amazonaws.com/214131/TeaNBXNn9A1b2cZcaMnhyw?X-Amz-Expires=300&X-Amz-Date=20171009T212816Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIA2QBI5WP5HA3ZEA/20171009/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=56ec6111d2a96ced90fded9f16fc1c6f473796894c6da08c157a7ff3c0e870ae)
when I follow LESS option 1: https://github.com/dimagi/c
ommcare-hq#option-1-let-client-side-javascript-lessjs-handle
-it-for-you.

I cannot get compress to run using either option 2 or option 3,
and with option 1 (as you can probably see from the linked photo) I’m not
actually getting the static assets I need from a CDN.

The error on my compress command is no longer on motech, it’s now
on “hqadmin”:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqa
dmin/templates/hqadmin/loadtest.html: 'font-awesome/less/font-awesome.less’
could not be found in the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles’
or with staticfiles.

Thanks again for all your help. Speak soon!

Taylor

P.S. — In an effort to make this repeatable, we’ve got a fork of the
ansible repo going that includes a git submodule with your commcare-deploy
repo. Our goal is to get this down to a single git clone and a few shell
commands! Would love any feedback on the directory structure you use
locally.

On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote:

Hey Simon, thanks so much. We’ve got the fab deploy scripts running
now (albeit with lots of warning, sudo received non-zero exit codes*) and
finishing successfully. When we ssh into our box, got to the newly created
release, activate python and run runserver however, we get a server to
start but it throws this 500** whenever it’s accessed via the web:

OfflineGenerationError: You have offline compression enabled but key
"89af02fe109c09d9c74742e99d8f3fea" is missing from offline
manifest. You may need to run “python manage.py compress”.
2017-10-09 16:15:37,638 ERROR “GET /accounts/login/ HTTP/1.0” 500 59

When running compress, we get this font-awesome package error:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/
openmrs/templates/openmrs/importers.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles’
or with staticfiles.

Have you bumped into this before? Thanks!

*The non-zero exit codes all look pretty much like this:
[165.227.172.214] sudo: /home/cchq/www/dev/releases/20
17-10-09_16.04/python_env/bin/python /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check
[165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not
configured (logging is disabled). Please see the documentation for more
information.
[165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG
[165.227.172.214] out:

Warning: sudo() received nonzero return code 1 while executing
’/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check’!

**Here’s the full 500 error: https://gist.github.com
/taylordowns2000/cebc671a34431826a326b66cadccee9d

On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote:

Hi Taylor

Our general process is as follows:

  1. Configure blank VMs (just OS)
  2. Create inventory file and vars files
  3. Run ansible deploy - there are often a few hiccoughs here
    since we don’t do fresh installs that often
  4. Once everything is setup we deploy our code with fabric
    scripts https://github.com/dimagi/commcare-hq-deploy as
    follows

fab deploy

environment is the name of an inventory file here:
https://github.com/dimagi/commcare-hq-deploy/tree/
master/fab/inventory
https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory

This also makes use of this ‘environments.yml’ file which tells
the deploy scripts which services to run where and a few other things:
https://github.com/dimagi/commcare-hq-deploy/blob/
master/fab/environments.yml
https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml

  1. That deploy will checkout the latest code, do the static
    file compression etc and also create the supervisor files needed to run the
    servers.

We’ve recently made some improvements to our couchdb setup (you
should use couchdb2). I’ve linked them in comments on your PR.

We are about to do a whole new cluster setup so it’s likely that
there will be some more changes coming soon.

Re the issues:

  1. Switch to using couchdb2
    2&3. Resolved in latest master + this PR (
    https://github.com/dimagi/commcarehq-ansible/pull/971)
  2. The virtual env should have already be setup by
    the deploy_commcarehq playbook which should execute prior to the touchforms
    playbook. Also touchforms is only necessary if you’re going to be doing sms
    surveys.

Re the encrypted drives. We run the deploy_stack playbook with
’after-reboot’ tag limited to the rebooted host. This should remount the
encrypted drive and perform a few other actions.

I hope that helps and thanks for the feedback!

Simon Kelly
Director of Server Engineer | Dimagi

On 5 October 2017 at 17:36, tay...@openfn.org wrote:

Update: Rory found that one issue lay in the encrypted fs stuff.
ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us having to
reboot during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py
migrate) *and get:
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/botocore/client.py”, line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied)
when calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I’m fairly certain I’ve
configured my bucket properly and granted access via the access key and
secret. (These are not part of version control in the shared repo, of
course.) Will update as we go.

FWIW, python manage.py compress fails because it can’t find the
Font Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/reg
istration/templates/registration/domain_request.html:
‘font-awesome/less/font-awesome.less’ could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles’
or with staticfiles.

On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote:

Hey guys,

Hope all is well. Let me preface this with a thank you—I know
you’ve got a lot going on and don’t rely on ansible monolith deployments
for your core work, so I realize that any help you provide here is going
above and beyond. Thank you for that!

My objective is to get ansible-playbook -i inventories/monolith
-u root -e ‘@vars/dev/dev_private.yml’ -e '@vars/dev/dev_public.yml’
deploy_stack.yml
running on a freshly provisioned Ubuntu
14.04.5 LTS (GNU/Linux 3.13.0-125-generic x86_64) droplet with 2 gigs of
memory.

While I think that’s a solid goal for the whole CommCare
open-source community, I’d like to disclose that we’ve also got a client at
Open Function that wants to connect CommCare to another system using
OpenFn, but CommCare needs to be hosted on their servers due to regulatory
issues.

Note that we made a couple of changes vagrant and edited some
ansible scripts. You can see this work here:
https://github.com/rorymckinley/commcare-sandbox/pull/1/files.
One significant change is that we are running the vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]


ok: [165.227.172.214] => (item={u’username’: u’commcarehq’,
u’name’: u’commcarehq’, u’is_https’: False, u’host’: u’165.227.172.214’,
u’password’: u’commcarehq’, u’port’: 5984})
failed: [165.227.172.214] (item={u’username’: u’commcarehq’,
u’name’: u’commcarehq__users’, u’is_https’: False, u’host’:
u’165.227.172.214’, u’password’: u’commcarehq’, u’port’: 5984}) =>
{“cache_control”: “must-revalidate”, “content”:
"{“error”:“unauthorized”,“reason”:“You are not a server
admin.”}\n", “content_length”: “64”, “content_type”: “text/plain;
charset=utf-8”, “date”: “Thu, 05 Oct 2017 11:10:34 GMT”, “failed”: true,
“item”: {“host”: “165.227.172.214”, “is_https”: false, “name”:
“commcarehq__users”, “password”: “commcarehq”, “port”: 5984, “username”:
“commcarehq”}, “msg”: “Status code was not [200]: HTTP Error 401:
Unauthorized”, “redirected”: false, “server”: “CouchDB/1.6.1 (Erlang
OTP/R16B03)”, “status”: 401, “url”: “
http://165.227.172.214:5984/_config/admins/commcarehq”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=135 changed=90 unreachable=0
failed=1

Possible solution 1: This task runs twice, but each user in
"items" has the same username and password. The failure can be stepped
over, as we don’t need to (and can’t) set up two different couchdb users
with commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails. After I
reboot the system and start monit manually they pass and redis is running,
but couchdb2 still shows “Execution failed”. After another system reboot,
and manually starting monit, both now show as running and being monitored.

monit status: Process 'couchdb2’
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017 11:59:49

TASK [couchdb2 : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”:
true, “msg”: “couchdb2 process not presently configured with monit”,
“name”: “couchdb2”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=36 changed=20 unreachable=0
failed=1

TASK [redis : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {“changed”: false, “failed”:
true, “msg”: “redis process not presently configured with monit”, “name”:
“redis”, “state”: “monitored”}

RUNNING HANDLER [monit : reload monit]


RUNNING HANDLER [redis : restart redis]


RUNNING HANDLER [redis : restart rsyslog]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=17 changed=10 unreachable=0
failed=1

Issue 4:
TASK [touchforms : Touchforms user] ******************************


An exception occurred during task execution. To see the full
traceback, use -vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! =>
{“changed”: false, “failed”: true, “module_stderr”: “Traceback (most recent
call last):\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 144, in \n main()\n File “/tmp/ansible_iUft9p/ansible_module_django_user.py”,
line 125, in main\n user.create_user()\n File
”/tmp/ansible_iUft9p/ansible_module_django_user.py", line 84,
in create_user\n superuser=repr(self.superuser),\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 1427, in
call\n return RunningCommand(cmd, call_args, stdin, stdout,
stderr)\n File “/usr/local/lib/python2.7/dist-packages/sh.py”,
line 774, in init\n self.wait()\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 792, in
wait\n self.handle_command_exit_code(exit_code)\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 815, in
handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN:
/home/cchq/www/dev/current/python_env/bin/python manage.py shell
–plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n
File “manage.py”, line 9, in \n import django\nImportError: No
module named django\n\n", “module_stdout”: “Traceback (most recent call
last):\n File “manage.py”, line 9, in \n import
django\nImportError: No module named django\n\n”, “msg”: “MODULE FAILURE”}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when we
visit our IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed (111:
Connection refused) while connecting to upstream, client: 186.106.251.211,
server: 165.227.172.214, request: “GET /favicon.ico HTTP/1.1”, upstream: “
http://165.227.172.214:9010/favicon.ico”, host:
“165.227.172.214”, referrer: “https://165.227.172.214/solutions/

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py”, line 176, in get_new_connection
connection = Database.connect(**conn_params)
File “/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py”, line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we’re wondering:

  1. Why isn’t the server running itself?
  2. And how do we get it to run?

Best,
Taylor


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google
Groups “CommCare Developers” group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

D'oh! Thanks Simon, no this is totally my fault - at some point in the
process my brain conflated elasticsearch and S3, and then never let go :frowning: -
I am not sure why - old age I guess ;).

Thanks for the tips - we will definitely factor them in.

R

··· On Thursday, 12 October 2017 21:30:09 UTC+2, Simon Kelly wrote: > > Hey > > So riak-cs and elasticsearch are completely different systems. You can > think of Riak-CS as and S3 service. Elasticsearch is a distributed search > index. > > In localsettings.py the settings for Elasticsearch are the ones I > mentioned before. For Riak the settings are: > > S3_BLOB_DB_SETTINGS = { > "url": "http://localhost:9980/", > "access_key": "admin-key", > "secret_key": "admin-secret", > "config": {"connect_timeout": 3, "read_timeout": 5}, > } > > Note that if you are just running a monolith then it's not necessary to > have riak at all since you can just the the local filesystem. If you want > to go that route then you should just remove the 'riak-cs' group from your > inventory file completely. That should result in the above settings being > removed from your localsettings file which will cause CommCare HQ to switch > to using the filesystem to store binary objects (e.g. form xml). > > You should also then set `shared_drive_enabled` to 'false' in your ansible > vars file since you don't need a NFS drive for just one machine. > > Sorry for the complexities here and the lack of docs. > > Simon Kelly > Director of Server Engineer | Dimagi > > On 12 October 2017 at 14:01, <rorymc...@capefox.co > wrote: > >> Thanks Simon. >> >> Just to make sure I am not missing something really obvious ("missing >> something really obvious" is in fact, quite an accurate summation of my >> adventure so far) - the ansible scripts set up riak-cs, and so I can point >> those ES connection strings at the local riak-cs instance? >> >> Regards >> >> Rory >> >> On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote: >>> >>> That seems like the Elasticsearch address may be incorrect. This error >>> is happening when the command is trying to create a new index in >>> elasticsearch. >>> >>> I'd check that you've got your ES connection details correct in >>> localsettings: >>> >>> - ELASTICSEARCH_HOST >>> - ELASTICSEARCH_PORT >>> >>> You can test the connection using curl: >>> >>> $ curl : >>> >>> >>> { >>> "status" : 200, >>> "name" : "Albino", >>> "cluster_name" : "agrajag", >>> "version" : { >>> "number" : "1.7.4", >>> "build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e", >>> "build_timestamp" : "2015-12-15T16:45:04Z", >>> "build_snapshot" : false, >>> "lucene_version" : "4.10.4" >>> }, >>> "tagline" : "You Know, for Search" >>> } >>> >>> >>> >>> Simon Kelly >>> Director of Server Engineer | Dimagi >>> >>> On 11 October 2017 at 11:31, wrote: >>> >>>> Hi Simon >>>> >>>> Yes, Jenny's advice helped us out immensely - we now have commcare up >>>> and serving the static assets. >>>> >>>> We are seeing what we think are errors connecting to the riak-cs >>>> instance - and I tried running `./manage.py ptop_preindex` which produces >>>> some iniital success, but then: >>>> >>>> Starting pillow preindex ledgers >>>> Traceback (most recent call last): >>>> File >>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/gevent/greenlet.py", >>>> line 327, in run >>>> result = self._run(*self.args, **self.kwargs) >>>> File >>>> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqcase/management/commands/ptop_preindex.py", >>>> line 53, in do_reindex >>>> FACTORIES_BY_SLUG[reindex_command](**kwargs).build().reindex() >>>> File >>>> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py", >>>> line 137, in build >>>> initialize_index_and_mapping(get_es_new(), CASE_SEARCH_INDEX_INFO) >>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 87, in >>>> initialize_index_and_mapping >>>> initialize_index(es, index_info) >>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 92, in >>>> initialize_index >>>> return create_index_and_set_settings_normal(es, index_info.index, >>>> index_info.meta) >>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 73, in >>>> create_index_and_set_settings_normal >>>> es.indices.create(index=index, body=metadata) >>>> File >>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", >>>> line 69, in _wrapped >>>> return func(*args, params=params, **kwargs) >>>> File >>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/indices.py", >>>> line 103, in create >>>> params=params, body=body) >>>> File >>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/transport.py", >>>> line 307, in perform_request >>>> status, headers, data = connection.perform_request(method, url, >>>> params, body, ignore=ignore, timeout=timeout) >>>> File >>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", >>>> line 93, in perform_request >>>> self._raise_error(response.status, raw_data) >>>> File >>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", >>>> line 105, in _raise_error >>>> raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, >>>> error_message, additional_info) >>>> NotFoundError: TransportError(404, u'404 Not >>>> Found

Not Found

The requested document was not >>>> found on this server.


mochiweb+webmachine web >>>> server') >>>> failed >>>> with NotFoundError >>>> >>>> There are more errors in this ilk, the above is merely the first (note: >>>> I have added some debugging print statements, so line numbers may be >>>> slightly out). Does the above point to us doing something that is obviously >>>> wrong? >>>> >>>> Thanks in advance. >>>> >>>> Rory >>>> >>>> On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote: >>>>> >>>>> Been offline travelling so sorry for the slow response. Strange that >>>>> you get that error if you're using the fabric deploy script since it should >>>>> do a bower update but I'd check what Jenny suggested to make sure. >>>>> >>>>> Re the "sudo received non-zero exit codes" messages, as long as it's >>>>> only for the 'preindex' command that should be fine. If there are any other >>>>> errors during deploy then it won't complete. (also PR to remove those >>>>> warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393) >>>>> >>>>> >>>>> >>>>> Simon Kelly >>>>> Director of Server Engineer | Dimagi >>>>> >>>>> On 10 October 2017 at 11:27, Jenny Schweers wrote: >>>>> >>>>>> Hi Taylor, >>>>>> >>>>>> About that compress error: Have you run `bower update` recently? I'd >>>>>> run that, verify that the >>>>>> file ./bower_components/font-awesome/less/font-awesome.less does indeed >>>>>> exist afterwards, and then run collectstatic and compress again. >>>>>> >>>>>> You can also double-check that your STATICFILES_DIRS contains >>>>>> bower_components (it should be set up by >>>>>> https://github.com/dimagi/commcare-hq/blob/master/settings.py#L87-L97 >>>>>> ) >>>>>> >>>>>> -Jenny >>>>>> >>>>>> On Mon, Oct 9, 2017 at 5:36 PM, wrote: >>>>>> >>>>>>> Simon, my last update for the day: >>>>>>> >>>>>>> I've got the server running (and serving html! >>>>>>> ) >>>>>>> when I follow LESS option 1: >>>>>>> https://github.com/dimagi/commcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you >>>>>>> . >>>>>>> >>>>>>> I cannot get *compress* to run using either option 2 or option 3, >>>>>>> and with option 1 (as you can probably see from the linked photo) I'm not >>>>>>> actually getting the static assets I need from a CDN. >>>>>>> >>>>>>> The error on my *compress* command is no longer on motech, it's now >>>>>>> on "hqadmin": >>>>>>> CommandError: An error occurred during rendering >>>>>>> /home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqadmin/templates/hqadmin/loadtest.html: >>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles' or >>>>>>> with staticfiles. >>>>>>> >>>>>>> Thanks again for all your help. Speak soon! >>>>>>> >>>>>>> Taylor >>>>>>> >>>>>>> P.S. — In an effort to make this repeatable, we've got a fork of the >>>>>>> ansible repo going that includes a git submodule with your commcare-deploy >>>>>>> repo. Our goal is to get this down to a single git clone and a few shell >>>>>>> commands! Would love any feedback on the directory structure you use >>>>>>> locally. >>>>>>> >>>>>>> On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote: >>>>>>>> >>>>>>>> Hey Simon, thanks so much. We've got the fab deploy scripts running >>>>>>>> now (albeit with lots of warning, sudo received non-zero exit codes*) and >>>>>>>> finishing successfully. When we ssh into our box, got to the newly created >>>>>>>> release, activate python and run `runserver` however, we get a server to >>>>>>>> start but it throws this 500** whenever it's accessed via the web: >>>>>>>> >>>>>>>> OfflineGenerationError: You have offline compression enabled but >>>>>>>> key "89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest. >>>>>>>> You may need to run "python manage.py compress". >>>>>>>> 2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0" 500 59 >>>>>>>> >>>>>>>> When running compress, we get this font-awesome package error: >>>>>>>> CommandError: An error occurred during rendering >>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/openmrs/templates/openmrs/importers.html: >>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles' or >>>>>>>> with staticfiles. >>>>>>>> >>>>>>>> Have you bumped into this before? Thanks! >>>>>>>> >>>>>>>> **The non-zero exit codes all look pretty much like this:* >>>>>>>> [165.227.172.214] sudo: >>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>>>> --check >>>>>>>> [165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not >>>>>>>> configured (logging is disabled). Please see the documentation for more >>>>>>>> information. >>>>>>>> [165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG >>>>>>>> [165.227.172.214] out: >>>>>>>> >>>>>>>> >>>>>>>> Warning: sudo() received nonzero return code 1 while executing >>>>>>>> '/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>>>> --check'! >>>>>>>> >>>>>>>> ***Here's the full 500 error:* >>>>>>>> https://gist.github.com/taylordowns2000/cebc671a34431826a326b66cadccee9d >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote: >>>>>>>>> >>>>>>>>> Hi Taylor >>>>>>>>> >>>>>>>>> Our general process is as follows: >>>>>>>>> >>>>>>>>> 1. Configure blank VMs (just OS) >>>>>>>>> 2. Create inventory file and vars files >>>>>>>>> 3. Run ansible deploy - there are often a few hiccoughs here >>>>>>>>> since we don't do fresh installs that often >>>>>>>>> 4. Once everything is setup we deploy our code with fabric >>>>>>>>> scripts as >>>>>>>>> follows >>>>>>>>> >>>>>>>>> fab deploy >>>>>>>>> >>>>>>>>> environment is the name of an inventory file here: >>>>>>>>> https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory >>>>>>>>> >>>>>>>>> This also makes use of this 'environments.yml' file which >>>>>>>>> tells the deploy scripts which services to run where and a few other >>>>>>>>> things: >>>>>>>>> https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml >>>>>>>>> >>>>>>>>> 5. That deploy will checkout the latest code, do the static >>>>>>>>> file compression etc and also create the supervisor files needed to run the >>>>>>>>> servers. >>>>>>>>> >>>>>>>>> >>>>>>>>> We've recently made some improvements to our couchdb setup (you >>>>>>>>> should use couchdb2). I've linked them in comments on your PR. >>>>>>>>> >>>>>>>>> We are about to do a whole new cluster setup so it's likely that >>>>>>>>> there will be some more changes coming soon. >>>>>>>>> >>>>>>>>> Re the issues: >>>>>>>>> 1. Switch to using couchdb2 >>>>>>>>> 2&3. Resolved in latest master + this PR ( >>>>>>>>> https://github.com/dimagi/commcarehq-ansible/pull/971) >>>>>>>>> 4. The virtual env should have already be setup by >>>>>>>>> the deploy_commcarehq playbook which should execute prior to the touchforms >>>>>>>>> playbook. Also touchforms is only necessary if you're going to be doing sms >>>>>>>>> surveys. >>>>>>>>> >>>>>>>>> Re the encrypted drives. We run the deploy_stack playbook with >>>>>>>>> 'after-reboot' tag limited to the rebooted host. This should remount the >>>>>>>>> encrypted drive and perform a few other actions. >>>>>>>>> >>>>>>>>> I hope that helps and thanks for the feedback! >>>>>>>>> >>>>>>>>> Simon Kelly >>>>>>>>> Director of Server Engineer | Dimagi >>>>>>>>> >>>>>>>>> On 5 October 2017 at 17:36, wrote: >>>>>>>>> >>>>>>>>>> Update: Rory found that one issue lay in the encrypted fs stuff. >>>>>>>>>> ran: >>>>>>>>>> >>>>>>>>>> /etc/init.d/postgresql start >>>>>>>>>> /etc/init.d/pgbouncer stop >>>>>>>>>> /etc/init.d/pgbouncer start >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> and we can run the server. This was probably due to us having to >>>>>>>>>> reboot during the deployment process. >>>>>>>>>> >>>>>>>>>> We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py >>>>>>>>>> migrate) *and get: >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py", >>>>>>>>>> line 599, in _make_api_call >>>>>>>>>> raise error_class(parsed_response, operation_name) >>>>>>>>>> botocore.exceptions.ClientError: An error occurred (AccessDenied) >>>>>>>>>> when calling the ListObjects operation: Access Denied >>>>>>>>>> >>>>>>>>>> This appears to be an S3 issue, but I'm fairly certain I've >>>>>>>>>> configured my bucket properly and granted access via the access key and >>>>>>>>>> secret. (These are not part of version control in the shared repo, of >>>>>>>>>> course.) Will update as we go. >>>>>>>>>> >>>>>>>>>> FWIW, *python manage.py compress* fails because it can't find >>>>>>>>>> the Font Awesome less file: >>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>> /home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html: >>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles' or >>>>>>>>>> with staticfiles. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: >>>>>>>>>>> >>>>>>>>>>> Hey guys, >>>>>>>>>>> >>>>>>>>>>> Hope all is well. Let me preface this with a thank you—I know >>>>>>>>>>> you've got a lot going on and don't rely on ansible monolith deployments >>>>>>>>>>> for your core work, so I realize that any help you provide here is going >>>>>>>>>>> above and beyond. Thank you for that! >>>>>>>>>>> >>>>>>>>>>> My objective is to get *ansible-playbook -i >>>>>>>>>>> inventories/monolith -u root -e '@vars/dev/dev_private.yml' -e >>>>>>>>>>> '@vars/dev/dev_public.yml' deploy_stack.yml* running on a >>>>>>>>>>> freshly provisioned Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-125-generic >>>>>>>>>>> x86_64) droplet with 2 gigs of memory. >>>>>>>>>>> >>>>>>>>>>> While I think that's a solid goal for the whole CommCare >>>>>>>>>>> open-source community, I'd like to disclose that we've also got a client at >>>>>>>>>>> Open Function that wants to connect CommCare to another system using >>>>>>>>>>> OpenFn, but CommCare needs to be hosted on their servers due to regulatory >>>>>>>>>>> issues. >>>>>>>>>>> >>>>>>>>>>> Note that we made a couple of changes vagrant and edited some >>>>>>>>>>> ansible scripts. You can see this work here: >>>>>>>>>>> https://github.com/rorymckinley/commcare-sandbox/pull/1/files. >>>>>>>>>>> One significant change is that we are running the vagrant stuff as root. >>>>>>>>>>> >>>>>>>>>>> To the issues: >>>>>>>>>>> >>>>>>>>>>> *Issue #1:* >>>>>>>>>>> TASK [couchdb : Set CouchDB username and password] >>>>>>>>>>> ***************************** >>>>>>>>>>> ok: [165.227.172.214] => (item={u'username': u'commcarehq', >>>>>>>>>>> u'name': u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', >>>>>>>>>>> u'password': u'commcarehq', u'port': 5984}) >>>>>>>>>>> failed: [165.227.172.214] (item={u'username': u'commcarehq', >>>>>>>>>>> u'name': u'commcarehq__users', u'is_https': False, u'host': >>>>>>>>>>> u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) => >>>>>>>>>>> {"cache_control": "must-revalidate", "content": >>>>>>>>>>> "{\"error\":\"unauthorized\",\"reason\":\"You are not a server >>>>>>>>>>> admin.\"}\n", "content_length": "64", "content_type": "text/plain; >>>>>>>>>>> charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true, >>>>>>>>>>> "item": {"host": "165.227.172.214", "is_https": false, "name": >>>>>>>>>>> "commcarehq__users", "password": "commcarehq", "port": 5984, "username": >>>>>>>>>>> "commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401: >>>>>>>>>>> Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang >>>>>>>>>>> OTP/R16B03)", "status": 401, "url": " >>>>>>>>>>> http://165.227.172.214:5984/_config/admins/commcarehq"} >>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>> >>>>>>>>>>> PLAY RECAP >>>>>>>>>>> ********************************************************************* >>>>>>>>>>> 165.227.172.214 : ok=135 changed=90 unreachable=0 >>>>>>>>>>> failed=1 >>>>>>>>>>> >>>>>>>>>>> *Possible solution 1:* This task runs twice, but each user in >>>>>>>>>>> "items" has the same username and password. The failure can be stepped >>>>>>>>>>> over, as we don't need to (and can't) set up two different couchdb users >>>>>>>>>>> with commcarehq:commcarehq on the same box. >>>>>>>>>>> >>>>>>>>>>> *Issue #2&3: *For both couchdb2 and redis, monit fails. After I >>>>>>>>>>> reboot the system and start monit manually they pass and redis is running, >>>>>>>>>>> but couchdb2 still shows "Execution failed". After another system reboot, >>>>>>>>>>> and manually starting monit, both now show as running and being monitored. >>>>>>>>>>> >>>>>>>>>>> monit status: Process 'couchdb2' >>>>>>>>>>> status Execution failed >>>>>>>>>>> monitoring status Monitored >>>>>>>>>>> data collected Thu, 05 Oct 2017 11:59:49 >>>>>>>>>>> >>>>>>>>>>> TASK [*couchdb2 : monit*] >>>>>>>>>>> ******************************************************** >>>>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, >>>>>>>>>>> "failed": true, "msg": "couchdb2 process not presently configured with >>>>>>>>>>> monit", "name": "couchdb2", "state": "monitored"} >>>>>>>>>>> >>>>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>>>> ***************************************** >>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>> >>>>>>>>>>> PLAY RECAP >>>>>>>>>>> ********************************************************************* >>>>>>>>>>> 165.227.172.214 : ok=36 changed=20 unreachable=0 >>>>>>>>>>> failed=1 >>>>>>>>>>> >>>>>>>>>>> TASK [*redis : monit*] >>>>>>>>>>> *********************************************************** >>>>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, >>>>>>>>>>> "failed": true, "msg": "redis process not presently configured with monit", >>>>>>>>>>> "name": "redis", "state": "monitored"} >>>>>>>>>>> >>>>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>>>> ***************************************** >>>>>>>>>>> >>>>>>>>>>> RUNNING HANDLER [redis : restart redis] >>>>>>>>>>> **************************************** >>>>>>>>>>> >>>>>>>>>>> RUNNING HANDLER [redis : restart rsyslog] >>>>>>>>>>> ************************************** >>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>> >>>>>>>>>>> PLAY RECAP >>>>>>>>>>> ********************************************************************* >>>>>>>>>>> 165.227.172.214 : ok=17 changed=10 unreachable=0 >>>>>>>>>>> failed=1 >>>>>>>>>>> >>>>>>>>>>> *Issue 4:* >>>>>>>>>>> TASK [touchforms : Touchforms user] >>>>>>>>>>> ******************************************** >>>>>>>>>>> An exception occurred during task execution. To see the full >>>>>>>>>>> traceback, use -vvv. The error was: ImportError: No module named django >>>>>>>>>>> fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => >>>>>>>>>>> {"changed": false, "failed": true, "module_stderr": "Traceback (most recent >>>>>>>>>>> call last):\n File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", >>>>>>>>>>> line 144, in \n main()\n File >>>>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n >>>>>>>>>>> user.create_user()\n File >>>>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in >>>>>>>>>>> create_user\n superuser=repr(self.superuser),\n File >>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n >>>>>>>>>>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File >>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n >>>>>>>>>>> self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", >>>>>>>>>>> line 792, in wait\n self.handle_command_exit_code(exit_code)\n File >>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in >>>>>>>>>>> handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: >>>>>>>>>>> /home/cchq/www/dev/current/python_env/bin/python manage.py shell >>>>>>>>>>> --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n >>>>>>>>>>> File \"manage.py\", line 9, in \n import django\nImportError: No >>>>>>>>>>> module named django\n\n", "module_stdout": "Traceback (most recent call >>>>>>>>>>> last):\n File \"manage.py\", line 9, in \n import >>>>>>>>>>> django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} >>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>> >>>>>>>>>>> Possible solution: Here, we need to SSH in and then: >>>>>>>>>>> # su - cchq >>>>>>>>>>> # cd www/dev/current >>>>>>>>>>> # source python_env/bin/activate >>>>>>>>>>> # pip install -r requirements/requirements.txt >>>>>>>>>>> >>>>>>>>>>> At this point the whole ansible playbook succeeds, but when we >>>>>>>>>>> visit our IP, we get the maintenance page and see this in the nginx logs: >>>>>>>>>>> 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed >>>>>>>>>>> (111: Connection refused) while connecting to upstream, client: >>>>>>>>>>> 186.106.251.211, server: 165.227.172.214, request: "GET /favicon.ico >>>>>>>>>>> HTTP/1.1", upstream: "http://165.227.172.214:9010/favicon.ico", >>>>>>>>>>> host: "165.227.172.214", referrer: " >>>>>>>>>>> https://165.227.172.214/solutions/" >>>>>>>>>>> >>>>>>>>>>> After activating the python_env we run runserver as `cchq`: >>>>>>>>>>> ./manage.py runserver 0.0.0.0:9010 >>>>>>>>>>> >>>>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection >>>>>>>>>>> connection = Database.connect(**conn_params) >>>>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect >>>>>>>>>>> conn = _connect(dsn, connection_factory=connection_factory, **kwasync) >>>>>>>>>>> django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> At this point, we're wondering: >>>>>>>>>>> >>>>>>>>>>> 1. Why isn't the server running itself? >>>>>>>>>>> 2. And how do we get it to run? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Taylor >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com >>>>>>>>>> . >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>> >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "CommCare Developers" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "CommCare Developers" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "CommCare Developers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to commcare-developers+unsubscribe@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "CommCare Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to commcare-developers+unsubscribe@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > >

:+1:

··· On 13 Oct 2017 00:56, wrote:

D'oh! Thanks Simon, no this is totally my fault - at some point in the
process my brain conflated elasticsearch and S3, and then never let go :frowning: -
I am not sure why - old age I guess ;).

Thanks for the tips - we will definitely factor them in.

R

On Thursday, 12 October 2017 21:30:09 UTC+2, Simon Kelly wrote:

Hey

So riak-cs and elasticsearch are completely different systems. You can
think of Riak-CS as and S3 service. Elasticsearch is a distributed search
index.

In localsettings.py the settings for Elasticsearch are the ones I
mentioned before. For Riak the settings are:

S3_BLOB_DB_SETTINGS = {
"url": "http://localhost:9980/",
"access_key": "admin-key",
"secret_key": "admin-secret",
"config": {"connect_timeout": 3, "read_timeout": 5},
}

Note that if you are just running a monolith then it's not necessary to
have riak at all since you can just the the local filesystem. If you want
to go that route then you should just remove the 'riak-cs' group from your
inventory file completely. That should result in the above settings being
removed from your localsettings file which will cause CommCare HQ to switch
to using the filesystem to store binary objects (e.g. form xml).

You should also then set shared_drive_enabled to 'false' in your
ansible vars file since you don't need a NFS drive for just one machine.

Sorry for the complexities here and the lack of docs.

Simon Kelly
Director of Server Engineer | Dimagi

On 12 October 2017 at 14:01, rorymc...@capefox.co wrote:

Thanks Simon.

Just to make sure I am not missing something really obvious ("missing
something really obvious" is in fact, quite an accurate summation of my
adventure so far) - the ansible scripts set up riak-cs, and so I can point
those ES connection strings at the local riak-cs instance?

Regards

Rory

On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote:

That seems like the Elasticsearch address may be incorrect. This error
is happening when the command is trying to create a new index in
elasticsearch.

I'd check that you've got your ES connection details correct in
localsettings:

  • ELASTICSEARCH_HOST
  • ELASTICSEARCH_PORT

You can test the connection using curl:

$ curl :

{
"status" : 200,
"name" : "Albino",
"cluster_name" : "agrajag",
"version" : {
"number" : "1.7.4",
"build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e",
"build_timestamp" : "2015-12-15T16:45:04Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

Simon Kelly
Director of Server Engineer | Dimagi

On 11 October 2017 at 11:31, rorymc...@capefox.co wrote:

Hi Simon

Yes, Jenny's advice helped us out immensely - we now have commcare up
and serving the static assets.

We are seeing what we think are errors connecting to the riak-cs
instance - and I tried running ./manage.py ptop_preindex which produces
some iniital success, but then:

Starting pillow preindex ledgers
Traceback (most recent call last):
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/gevent/greenlet.py", line 327, in run
result = self._run(*self.args, **self.kwargs)
File "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hq
case/management/commands/ptop_preindex.py", line 53, in do_reindex
FACTORIES_BY_SLUGreindex_command.build().reindex()
File "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py",
line 137, in build
initialize_index_and_mapping(get_es_new(), CASE_SEARCH_INDEX_INFO)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 87, in
initialize_index_and_mapping
initialize_index(es, index_info)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 92, in
initialize_index
return create_index_and_set_settings_normal(es, index_info.index,
index_info.meta)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 73, in
create_index_and_set_settings_normal
es.indices.create(index=index, body=metadata)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/client/indices.py", line 103, in create
params=params, body=body)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/transport.py", line 307, in
perform_request
status, headers, data = connection.perform_request(method, url,
params, body, ignore=ignore, timeout=timeout)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/connection/http_urllib3.py", line 93, in
perform_request
self._raise_error(response.status, raw_data)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/connection/base.py", line 105, in
_raise_error
raise HTTP_EXCEPTIONS.get(status_code,
TransportError)(status_code, error_message, additional_info)
NotFoundError: TransportError(404, u'404 Not
Found

Not Found

The requested document
was not found on this server.


mochiweb+webmachine web
server')
<Greenlet at 0x7f9713dac2d0: do_reindex(u'case_search', False)> failed
with NotFoundError

There are more errors in this ilk, the above is merely the first
(note: I have added some debugging print statements, so line numbers may be
slightly out). Does the above point to us doing something that is obviously
wrong?

Thanks in advance.

Rory

On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote:

Been offline travelling so sorry for the slow response. Strange that
you get that error if you're using the fabric deploy script since it should
do a bower update but I'd check what Jenny suggested to make sure.

Re the "sudo received non-zero exit codes" messages, as long as it's
only for the 'preindex' command that should be fine. If there are any other
errors during deploy then it won't complete. (also PR to remove those
warnings: hide 'sudo received non-zero exit codes' warning by snopoke · Pull Request #393 · dimagi/commcare-hq-deploy · GitHub)

Simon Kelly
Director of Server Engineer | Dimagi

On 10 October 2017 at 11:27, Jenny Schweers jsch...@dimagi.com wrote:

Hi Taylor,

About that compress error: Have you run bower update recently? I'd
run that, verify that the file ./bower_components/font-awesome/less/font-awesome.less
does indeed exist afterwards, and then run collectstatic and compress again.

You can also double-check that your STATICFILES_DIRS contains
bower_components (it should be set up by https://github.com/dimagi/c
ommcare-hq/blob/master/settings.py#L87-L97)

-Jenny

On Mon, Oct 9, 2017 at 5:36 PM, tay...@openfn.org wrote:

Simon, my last update for the day:

I've got the server running (and serving html!
https://fd-files-production.s3.amazonaws.com/214131/TeaNBXNn9A1b2cZcaMnhyw?X-Amz-Expires=300&X-Amz-Date=20171009T212816Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIA2QBI5WP5HA3ZEA/20171009/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=56ec6111d2a96ced90fded9f16fc1c6f473796894c6da08c157a7ff3c0e870ae)
when I follow LESS option 1: https://github.com/dimagi/c
ommcare-hq#option-1-let-client-side-javascript-lessjs-handle
-it-for-you.

I cannot get compress to run using either option 2 or option 3,
and with option 1 (as you can probably see from the linked photo) I'm not
actually getting the static assets I need from a CDN.

The error on my compress command is no longer on motech, it's
now on "hqadmin":
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqa
dmin/templates/hqadmin/loadtest.html:
'font-awesome/less/font-awesome.less' could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles'
or with staticfiles.

Thanks again for all your help. Speak soon!

Taylor

P.S. — In an effort to make this repeatable, we've got a fork of
the ansible repo going that includes a git submodule with your
commcare-deploy repo. Our goal is to get this down to a single git clone
and a few shell commands! Would love any feedback on the directory
structure you use locally.

On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote:

Hey Simon, thanks so much. We've got the fab deploy scripts
running now (albeit with lots of warning, sudo received non-zero exit
codes*) and finishing successfully. When we ssh into our box, got to the
newly created release, activate python and run runserver however, we get
a server to start but it throws this 500** whenever it's accessed via the
web:

OfflineGenerationError: You have offline compression enabled but
key "89af02fe109c09d9c74742e99d8f3fea" is missing from offline
manifest. You may need to run "python manage.py compress".
2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0" 500
59

When running compress, we get this font-awesome package error:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/
openmrs/templates/openmrs/importers.html:
'font-awesome/less/font-awesome.less' could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles'
or with staticfiles.

Have you bumped into this before? Thanks!

*The non-zero exit codes all look pretty much like this:
[165.227.172.214] sudo: /home/cchq/www/dev/releases/20
17-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check
[165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not
configured (logging is disabled). Please see the documentation for more
information.
[165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN LOG
[165.227.172.214] out:

Warning: sudo() received nonzero return code 1 while executing
'/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check'!

**Here's the full 500 error: https://gist.github.com
/taylordowns2000/cebc671a34431826a326b66cadccee9d

On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote:

Hi Taylor

Our general process is as follows:

  1. Configure blank VMs (just OS)
  2. Create inventory file and vars files
  3. Run ansible deploy - there are often a few hiccoughs here
    since we don't do fresh installs that often
  4. Once everything is setup we deploy our code with fabric
    scripts https://github.com/dimagi/commcare-hq-deploy as
    follows

fab deploy

environment is the name of an inventory file here:
https://github.com/dimagi/commcare-hq-deploy/tree/
master/fab/inventory
https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory

This also makes use of this 'environments.yml' file which
tells the deploy scripts which services to run where and a few other
things: https://github.com/dimagi/commcare-hq-deploy/blob/
master/fab/environments.yml

  1. That deploy will checkout the latest code, do the static
    file compression etc and also create the supervisor files needed to run the
    servers.

We've recently made some improvements to our couchdb setup (you
should use couchdb2). I've linked them in comments on your PR.

We are about to do a whole new cluster setup so it's likely that
there will be some more changes coming soon.

Re the issues:

  1. Switch to using couchdb2
    2&3. Resolved in latest master + this PR (
    https://github.com/dimagi/commcarehq-ansible/pull/971)
  2. The virtual env should have already be setup by
    the deploy_commcarehq playbook which should execute prior to the touchforms
    playbook. Also touchforms is only necessary if you're going to be doing sms
    surveys.

Re the encrypted drives. We run the deploy_stack playbook with
'after-reboot' tag limited to the rebooted host. This should remount the
encrypted drive and perform a few other actions.

I hope that helps and thanks for the feedback!

Simon Kelly
Director of Server Engineer | Dimagi

On 5 October 2017 at 17:36, tay...@openfn.org wrote:

Update: Rory found that one issue lay in the encrypted fs stuff.
ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us having to
reboot during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py
migrate) *and get:
File "/home/cchq/www/dev/current/py
thon_env/local/lib/python2.7/site-packages/botocore/client.py",
line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred
(AccessDenied) when calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I'm fairly certain I've
configured my bucket properly and granted access via the access key and
secret. (These are not part of version control in the shared repo, of
course.) Will update as we go.

FWIW, python manage.py compress fails because it can't find
the Font Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/reg
istration/templates/registration/domain_request.html:
'font-awesome/less/font-awesome.less' could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles'
or with staticfiles.

On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote:

Hey guys,

Hope all is well. Let me preface this with a thank you—I know
you've got a lot going on and don't rely on ansible monolith deployments
for your core work, so I realize that any help you provide here is going
above and beyond. Thank you for that!

My objective is to get ansible-playbook -i
inventories/monolith -u root -e '@vars/dev/dev_private.yml' -e
'@vars/dev/dev_public.yml' deploy_stack.yml
running on a
freshly provisioned Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-125-generic
x86_64) droplet with 2 gigs of memory.

While I think that's a solid goal for the whole CommCare
open-source community, I'd like to disclose that we've also got a client at
Open Function that wants to connect CommCare to another system using
OpenFn, but CommCare needs to be hosted on their servers due to regulatory
issues.

Note that we made a couple of changes vagrant and edited some
ansible scripts. You can see this work here:
Getting it to work by taylordowns2000 · Pull Request #1 · rorymckinley/commcare-sandbox · GitHub.
One significant change is that we are running the vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]


ok: [165.227.172.214] => (item={u'username': u'commcarehq',
u'name': u'commcarehq', u'is_https': False, u'host': u'165.227.172.214',
u'password': u'commcarehq', u'port': 5984})
failed: [165.227.172.214] (item={u'username': u'commcarehq',
u'name': u'commcarehq__users', u'is_https': False, u'host':
u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) =>
{"cache_control": "must-revalidate", "content":
"{"error":"unauthorized","reason":"You are not a server
admin."}\n", "content_length": "64", "content_type": "text/plain;
charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true,
"item": {"host": "165.227.172.214", "is_https": false, "name":
"commcarehq__users", "password": "commcarehq", "port": 5984, "username":
"commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401:
Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang
OTP/R16B03)", "status": 401, "url": "
http://165.227.172.214:5984/_config/admins/commcarehq"}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=135 changed=90
unreachable=0 failed=1

Possible solution 1: This task runs twice, but each user in
"items" has the same username and password. The failure can be stepped
over, as we don't need to (and can't) set up two different couchdb users
with commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails. After
I reboot the system and start monit manually they pass and redis is
running, but couchdb2 still shows "Execution failed". After another system
reboot, and manually starting monit, both now show as running and being
monitored.

monit status: Process 'couchdb2'
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017 11:59:49

TASK [couchdb2 : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {"changed": false,
"failed": true, "msg": "couchdb2 process not presently configured with
monit", "name": "couchdb2", "state": "monitored"}

RUNNING HANDLER [monit : reload monit]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=36 changed=20
unreachable=0 failed=1

TASK [redis : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {"changed": false,
"failed": true, "msg": "redis process not presently configured with monit",
"name": "redis", "state": "monitored"}

RUNNING HANDLER [monit : reload monit]


RUNNING HANDLER [redis : restart redis]


RUNNING HANDLER [redis : restart rsyslog]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=17 changed=10
unreachable=0 failed=1

Issue 4:
TASK [touchforms : Touchforms user]


An exception occurred during task execution. To see the full
traceback, use -vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! =>
{"changed": false, "failed": true, "module_stderr": "Traceback (most recent
call last):\n File "/tmp/ansible_iUft9p/ansible_module_django_user.py",
line 144, in \n main()\n File "/tmp/ansible_iUft9p/ansible_module_django_user.py",
line 125, in main\n user.create_user()\n File
"/tmp/ansible_iUft9p/ansible_module_django_user.py", line
84, in create_user\n superuser=repr(self.superuser),\n
File "/usr/local/lib/python2.7/dist-packages/sh.py", line
1427, in call\n return RunningCommand(cmd, call_args, stdin, stdout,
stderr)\n File "/usr/local/lib/python2.7/dist-packages/sh.py",
line 774, in init\n self.wait()\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 792, in
wait\n self.handle_command_exit_code(exit_code)\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 815, in
handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN:
/home/cchq/www/dev/current/python_env/bin/python manage.py
shell --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call
last):\n File "manage.py", line 9, in \n import
django\nImportError: No module named django\n\n", "module_stdout":
"Traceback (most recent call last):\n File "manage.py", line 9, in
\n import django\nImportError: No module named django\n\n",
"msg": "MODULE FAILURE"}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when we
visit our IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed
(111: Connection refused) while connecting to upstream, client:
186.106.251.211, server: 165.227.172.214, request: "GET /favicon.ico
HTTP/1.1", upstream: "http://165.227.172.214:9010/favicon.ico",
host: "165.227.172.214", referrer: "
https://165.227.172.214/solutions/"

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
connection = Database.connect(**conn_params)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we're wondering:

  1. Why isn't the server running itself?
  2. And how do we get it to run?

Best,
Taylor

--


You received this message because you are subscribed to the
Google Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to commcare-developers+unsubscrib
e@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Simon

Quick question:

We set up a trial account with a ES provider (just so we would not get
distracted by the ElasticSearch rabbithole right now) - but the only way I
could get ./manage.py ptop_preindex to connect was to hack in the
necessary params for an SSL connection in _es_hosts() in corehq/elastic.py.

Is there a way to get commcare to work with ES using SSL?

Regards

Rory

··· On Friday, 13 October 2017 13:52:55 UTC+2, Simon Kelly wrote: > > 👍 > > On 13 Oct 2017 00:56, <rorymc...@capefox.co > wrote: > >> D'oh! Thanks Simon, no this is totally my fault - at some point in the >> process my brain conflated elasticsearch and S3, and then never let go :( - >> I am not sure why - old age I guess ;). >> >> Thanks for the tips - we will definitely factor them in. >> >> R >> >> On Thursday, 12 October 2017 21:30:09 UTC+2, Simon Kelly wrote: >>> >>> Hey >>> >>> So riak-cs and elasticsearch are completely different systems. You can >>> think of Riak-CS as and S3 service. Elasticsearch is a distributed search >>> index. >>> >>> In localsettings.py the settings for Elasticsearch are the ones I >>> mentioned before. For Riak the settings are: >>> >>> S3_BLOB_DB_SETTINGS = { >>> "url": "http://localhost:9980/", >>> "access_key": "admin-key", >>> "secret_key": "admin-secret", >>> "config": {"connect_timeout": 3, "read_timeout": 5}, >>> } >>> >>> Note that if you are just running a monolith then it's not necessary to >>> have riak at all since you can just the the local filesystem. If you want >>> to go that route then you should just remove the 'riak-cs' group from your >>> inventory file completely. That should result in the above settings being >>> removed from your localsettings file which will cause CommCare HQ to switch >>> to using the filesystem to store binary objects (e.g. form xml). >>> >>> You should also then set `shared_drive_enabled` to 'false' in your >>> ansible vars file since you don't need a NFS drive for just one machine. >>> >>> Sorry for the complexities here and the lack of docs. >>> >>> Simon Kelly >>> Director of Server Engineer | Dimagi >>> >>> On 12 October 2017 at 14:01, wrote: >>> >>>> Thanks Simon. >>>> >>>> Just to make sure I am not missing something really obvious ("missing >>>> something really obvious" is in fact, quite an accurate summation of my >>>> adventure so far) - the ansible scripts set up riak-cs, and so I can point >>>> those ES connection strings at the local riak-cs instance? >>>> >>>> Regards >>>> >>>> Rory >>>> >>>> On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote: >>>>> >>>>> That seems like the Elasticsearch address may be incorrect. This error >>>>> is happening when the command is trying to create a new index in >>>>> elasticsearch. >>>>> >>>>> I'd check that you've got your ES connection details correct in >>>>> localsettings: >>>>> >>>>> - ELASTICSEARCH_HOST >>>>> - ELASTICSEARCH_PORT >>>>> >>>>> You can test the connection using curl: >>>>> >>>>> $ curl : >>>>> >>>>> >>>>> { >>>>> "status" : 200, >>>>> "name" : "Albino", >>>>> "cluster_name" : "agrajag", >>>>> "version" : { >>>>> "number" : "1.7.4", >>>>> "build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e", >>>>> "build_timestamp" : "2015-12-15T16:45:04Z", >>>>> "build_snapshot" : false, >>>>> "lucene_version" : "4.10.4" >>>>> }, >>>>> "tagline" : "You Know, for Search" >>>>> } >>>>> >>>>> >>>>> >>>>> Simon Kelly >>>>> Director of Server Engineer | Dimagi >>>>> >>>>> On 11 October 2017 at 11:31, wrote: >>>>> >>>>>> Hi Simon >>>>>> >>>>>> Yes, Jenny's advice helped us out immensely - we now have commcare up >>>>>> and serving the static assets. >>>>>> >>>>>> We are seeing what we think are errors connecting to the riak-cs >>>>>> instance - and I tried running `./manage.py ptop_preindex` which produces >>>>>> some iniital success, but then: >>>>>> >>>>>> Starting pillow preindex ledgers >>>>>> Traceback (most recent call last): >>>>>> File >>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/gevent/greenlet.py", >>>>>> line 327, in run >>>>>> result = self._run(*self.args, **self.kwargs) >>>>>> File >>>>>> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqcase/management/commands/ptop_preindex.py", >>>>>> line 53, in do_reindex >>>>>> FACTORIES_BY_SLUG[reindex_command](**kwargs).build().reindex() >>>>>> File >>>>>> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py", >>>>>> line 137, in build >>>>>> initialize_index_and_mapping(get_es_new(), CASE_SEARCH_INDEX_INFO) >>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 87, in >>>>>> initialize_index_and_mapping >>>>>> initialize_index(es, index_info) >>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 92, in >>>>>> initialize_index >>>>>> return create_index_and_set_settings_normal(es, index_info.index, >>>>>> index_info.meta) >>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 73, in >>>>>> create_index_and_set_settings_normal >>>>>> es.indices.create(index=index, body=metadata) >>>>>> File >>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", >>>>>> line 69, in _wrapped >>>>>> return func(*args, params=params, **kwargs) >>>>>> File >>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/indices.py", >>>>>> line 103, in create >>>>>> params=params, body=body) >>>>>> File >>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/transport.py", >>>>>> line 307, in perform_request >>>>>> status, headers, data = connection.perform_request(method, url, >>>>>> params, body, ignore=ignore, timeout=timeout) >>>>>> File >>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", >>>>>> line 93, in perform_request >>>>>> self._raise_error(response.status, raw_data) >>>>>> File >>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", >>>>>> line 105, in _raise_error >>>>>> raise HTTP_EXCEPTIONS.get(status_code, >>>>>> TransportError)(status_code, error_message, additional_info) >>>>>> NotFoundError: TransportError(404, u'404 Not >>>>>> Found

Not Found

The requested document was not >>>>>> found on this server.


mochiweb+webmachine web >>>>>> server') >>>>>> >>>>>> failed with NotFoundError >>>>>> >>>>>> There are more errors in this ilk, the above is merely the first >>>>>> (note: I have added some debugging print statements, so line numbers may be >>>>>> slightly out). Does the above point to us doing something that is obviously >>>>>> wrong? >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>>> Rory >>>>>> >>>>>> On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote: >>>>>>> >>>>>>> Been offline travelling so sorry for the slow response. Strange that >>>>>>> you get that error if you're using the fabric deploy script since it should >>>>>>> do a bower update but I'd check what Jenny suggested to make sure. >>>>>>> >>>>>>> Re the "sudo received non-zero exit codes" messages, as long as >>>>>>> it's only for the 'preindex' command that should be fine. If there are any >>>>>>> other errors during deploy then it won't complete. (also PR to remove those >>>>>>> warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Simon Kelly >>>>>>> Director of Server Engineer | Dimagi >>>>>>> >>>>>>> On 10 October 2017 at 11:27, Jenny Schweers wrote: >>>>>>> >>>>>>>> Hi Taylor, >>>>>>>> >>>>>>>> About that compress error: Have you run `bower update` recently? >>>>>>>> I'd run that, verify that the >>>>>>>> file ./bower_components/font-awesome/less/font-awesome.less does indeed >>>>>>>> exist afterwards, and then run collectstatic and compress again. >>>>>>>> >>>>>>>> You can also double-check that your STATICFILES_DIRS contains >>>>>>>> bower_components (it should be set up by >>>>>>>> https://github.com/dimagi/commcare-hq/blob/master/settings.py#L87-L97 >>>>>>>> ) >>>>>>>> >>>>>>>> -Jenny >>>>>>>> >>>>>>>> On Mon, Oct 9, 2017 at 5:36 PM, wrote: >>>>>>>> >>>>>>>>> Simon, my last update for the day: >>>>>>>>> >>>>>>>>> I've got the server running (and serving html! >>>>>>>>> ) >>>>>>>>> when I follow LESS option 1: >>>>>>>>> https://github.com/dimagi/commcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you >>>>>>>>> . >>>>>>>>> >>>>>>>>> I cannot get *compress* to run using either option 2 or option 3, >>>>>>>>> and with option 1 (as you can probably see from the linked photo) I'm not >>>>>>>>> actually getting the static assets I need from a CDN. >>>>>>>>> >>>>>>>>> The error on my *compress* command is no longer on motech, it's >>>>>>>>> now on "hqadmin": >>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqadmin/templates/hqadmin/loadtest.html: >>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles' or >>>>>>>>> with staticfiles. >>>>>>>>> >>>>>>>>> Thanks again for all your help. Speak soon! >>>>>>>>> >>>>>>>>> Taylor >>>>>>>>> >>>>>>>>> P.S. — In an effort to make this repeatable, we've got a fork of >>>>>>>>> the ansible repo going that includes a git submodule with your >>>>>>>>> commcare-deploy repo. Our goal is to get this down to a single git clone >>>>>>>>> and a few shell commands! Would love any feedback on the directory >>>>>>>>> structure you use locally. >>>>>>>>> >>>>>>>>> On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote: >>>>>>>>>> >>>>>>>>>> Hey Simon, thanks so much. We've got the fab deploy scripts >>>>>>>>>> running now (albeit with lots of warning, sudo received non-zero exit >>>>>>>>>> codes*) and finishing successfully. When we ssh into our box, got to the >>>>>>>>>> newly created release, activate python and run `runserver` however, we get >>>>>>>>>> a server to start but it throws this 500** whenever it's accessed via the >>>>>>>>>> web: >>>>>>>>>> >>>>>>>>>> OfflineGenerationError: You have offline compression enabled but >>>>>>>>>> key "89af02fe109c09d9c74742e99d8f3fea" is missing from offline manifest. >>>>>>>>>> You may need to run "python manage.py compress". >>>>>>>>>> 2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0" 500 >>>>>>>>>> 59 >>>>>>>>>> >>>>>>>>>> When running compress, we get this font-awesome package error: >>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/openmrs/templates/openmrs/importers.html: >>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles' or >>>>>>>>>> with staticfiles. >>>>>>>>>> >>>>>>>>>> Have you bumped into this before? Thanks! >>>>>>>>>> >>>>>>>>>> **The non-zero exit codes all look pretty much like this:* >>>>>>>>>> [165.227.172.214] sudo: >>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>>>>>> --check >>>>>>>>>> [165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not >>>>>>>>>> configured (logging is disabled). Please see the documentation for more >>>>>>>>>> information. >>>>>>>>>> [165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN >>>>>>>>>> LOG >>>>>>>>>> [165.227.172.214] out: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Warning: sudo() received nonzero return code 1 while executing >>>>>>>>>> '/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>>>>>> --check'! >>>>>>>>>> >>>>>>>>>> ***Here's the full 500 error:* >>>>>>>>>> https://gist.github.com/taylordowns2000/cebc671a34431826a326b66cadccee9d >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Taylor >>>>>>>>>>> >>>>>>>>>>> Our general process is as follows: >>>>>>>>>>> >>>>>>>>>>> 1. Configure blank VMs (just OS) >>>>>>>>>>> 2. Create inventory file and vars files >>>>>>>>>>> 3. Run ansible deploy - there are often a few hiccoughs here >>>>>>>>>>> since we don't do fresh installs that often >>>>>>>>>>> 4. Once everything is setup we deploy our code with fabric >>>>>>>>>>> scripts as >>>>>>>>>>> follows >>>>>>>>>>> >>>>>>>>>>> fab deploy >>>>>>>>>>> >>>>>>>>>>> environment is the name of an inventory file here: >>>>>>>>>>> https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory >>>>>>>>>>> >>>>>>>>>>> This also makes use of this 'environments.yml' file which >>>>>>>>>>> tells the deploy scripts which services to run where and a few other >>>>>>>>>>> things: >>>>>>>>>>> https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml >>>>>>>>>>> >>>>>>>>>>> 5. That deploy will checkout the latest code, do the static >>>>>>>>>>> file compression etc and also create the supervisor files needed to run the >>>>>>>>>>> servers. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We've recently made some improvements to our couchdb setup (you >>>>>>>>>>> should use couchdb2). I've linked them in comments on your PR. >>>>>>>>>>> >>>>>>>>>>> We are about to do a whole new cluster setup so it's likely that >>>>>>>>>>> there will be some more changes coming soon. >>>>>>>>>>> >>>>>>>>>>> Re the issues: >>>>>>>>>>> 1. Switch to using couchdb2 >>>>>>>>>>> 2&3. Resolved in latest master + this PR ( >>>>>>>>>>> https://github.com/dimagi/commcarehq-ansible/pull/971) >>>>>>>>>>> 4. The virtual env should have already be setup by >>>>>>>>>>> the deploy_commcarehq playbook which should execute prior to the touchforms >>>>>>>>>>> playbook. Also touchforms is only necessary if you're going to be doing sms >>>>>>>>>>> surveys. >>>>>>>>>>> >>>>>>>>>>> Re the encrypted drives. We run the deploy_stack playbook with >>>>>>>>>>> 'after-reboot' tag limited to the rebooted host. This should remount the >>>>>>>>>>> encrypted drive and perform a few other actions. >>>>>>>>>>> >>>>>>>>>>> I hope that helps and thanks for the feedback! >>>>>>>>>>> >>>>>>>>>>> Simon Kelly >>>>>>>>>>> Director of Server Engineer | Dimagi >>>>>>>>>>> >>>>>>>>>>> On 5 October 2017 at 17:36, wrote: >>>>>>>>>>> >>>>>>>>>>>> Update: Rory found that one issue lay in the encrypted fs >>>>>>>>>>>> stuff. ran: >>>>>>>>>>>> >>>>>>>>>>>> /etc/init.d/postgresql start >>>>>>>>>>>> /etc/init.d/pgbouncer stop >>>>>>>>>>>> /etc/init.d/pgbouncer start >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> and we can run the server. This was probably due to us having >>>>>>>>>>>> to reboot during the deployment process. >>>>>>>>>>>> >>>>>>>>>>>> We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py >>>>>>>>>>>> migrate) *and get: >>>>>>>>>>>> File >>>>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py", >>>>>>>>>>>> line 599, in _make_api_call >>>>>>>>>>>> raise error_class(parsed_response, operation_name) >>>>>>>>>>>> botocore.exceptions.ClientError: An error occurred >>>>>>>>>>>> (AccessDenied) when calling the ListObjects operation: Access Denied >>>>>>>>>>>> >>>>>>>>>>>> This appears to be an S3 issue, but I'm fairly certain I've >>>>>>>>>>>> configured my bucket properly and granted access via the access key and >>>>>>>>>>>> secret. (These are not part of version control in the shared repo, of >>>>>>>>>>>> course.) Will update as we go. >>>>>>>>>>>> >>>>>>>>>>>> FWIW, *python manage.py compress* fails because it can't find >>>>>>>>>>>> the Font Awesome less file: >>>>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html: >>>>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles' or >>>>>>>>>>>> with staticfiles. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey guys, >>>>>>>>>>>>> >>>>>>>>>>>>> Hope all is well. Let me preface this with a thank you—I know >>>>>>>>>>>>> you've got a lot going on and don't rely on ansible monolith deployments >>>>>>>>>>>>> for your core work, so I realize that any help you provide here is going >>>>>>>>>>>>> above and beyond. Thank you for that! >>>>>>>>>>>>> >>>>>>>>>>>>> My objective is to get *ansible-playbook -i >>>>>>>>>>>>> inventories/monolith -u root -e '@vars/dev/dev_private.yml' -e >>>>>>>>>>>>> '@vars/dev/dev_public.yml' deploy_stack.yml* running on a >>>>>>>>>>>>> freshly provisioned Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-125-generic >>>>>>>>>>>>> x86_64) droplet with 2 gigs of memory. >>>>>>>>>>>>> >>>>>>>>>>>>> While I think that's a solid goal for the whole CommCare >>>>>>>>>>>>> open-source community, I'd like to disclose that we've also got a client at >>>>>>>>>>>>> Open Function that wants to connect CommCare to another system using >>>>>>>>>>>>> OpenFn, but CommCare needs to be hosted on their servers due to regulatory >>>>>>>>>>>>> issues. >>>>>>>>>>>>> >>>>>>>>>>>>> Note that we made a couple of changes vagrant and edited some >>>>>>>>>>>>> ansible scripts. You can see this work here: >>>>>>>>>>>>> https://github.com/rorymckinley/commcare-sandbox/pull/1/files. >>>>>>>>>>>>> One significant change is that we are running the vagrant stuff as root. >>>>>>>>>>>>> >>>>>>>>>>>>> To the issues: >>>>>>>>>>>>> >>>>>>>>>>>>> *Issue #1:* >>>>>>>>>>>>> TASK [couchdb : Set CouchDB username and password] >>>>>>>>>>>>> ***************************** >>>>>>>>>>>>> ok: [165.227.172.214] => (item={u'username': u'commcarehq', >>>>>>>>>>>>> u'name': u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', >>>>>>>>>>>>> u'password': u'commcarehq', u'port': 5984}) >>>>>>>>>>>>> failed: [165.227.172.214] (item={u'username': u'commcarehq', >>>>>>>>>>>>> u'name': u'commcarehq__users', u'is_https': False, u'host': >>>>>>>>>>>>> u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) => >>>>>>>>>>>>> {"cache_control": "must-revalidate", "content": >>>>>>>>>>>>> "{\"error\":\"unauthorized\",\"reason\":\"You are not a server >>>>>>>>>>>>> admin.\"}\n", "content_length": "64", "content_type": "text/plain; >>>>>>>>>>>>> charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true, >>>>>>>>>>>>> "item": {"host": "165.227.172.214", "is_https": false, "name": >>>>>>>>>>>>> "commcarehq__users", "password": "commcarehq", "port": 5984, "username": >>>>>>>>>>>>> "commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401: >>>>>>>>>>>>> Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang >>>>>>>>>>>>> OTP/R16B03)", "status": 401, "url": " >>>>>>>>>>>>> http://165.227.172.214:5984/_config/admins/commcarehq"} >>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>> >>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>> 165.227.172.214 : ok=135 changed=90 >>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>> >>>>>>>>>>>>> *Possible solution 1:* This task runs twice, but each user in >>>>>>>>>>>>> "items" has the same username and password. The failure can be stepped >>>>>>>>>>>>> over, as we don't need to (and can't) set up two different couchdb users >>>>>>>>>>>>> with commcarehq:commcarehq on the same box. >>>>>>>>>>>>> >>>>>>>>>>>>> *Issue #2&3: *For both couchdb2 and redis, monit fails. After >>>>>>>>>>>>> I reboot the system and start monit manually they pass and redis is >>>>>>>>>>>>> running, but couchdb2 still shows "Execution failed". After another system >>>>>>>>>>>>> reboot, and manually starting monit, both now show as running and being >>>>>>>>>>>>> monitored. >>>>>>>>>>>>> >>>>>>>>>>>>> monit status: Process 'couchdb2' >>>>>>>>>>>>> status Execution failed >>>>>>>>>>>>> monitoring status Monitored >>>>>>>>>>>>> data collected Thu, 05 Oct 2017 11:59:49 >>>>>>>>>>>>> >>>>>>>>>>>>> TASK [*couchdb2 : monit*] >>>>>>>>>>>>> ******************************************************** >>>>>>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, >>>>>>>>>>>>> "failed": true, "msg": "couchdb2 process not presently configured with >>>>>>>>>>>>> monit", "name": "couchdb2", "state": "monitored"} >>>>>>>>>>>>> >>>>>>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>>>>>> ***************************************** >>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>> >>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>> 165.227.172.214 : ok=36 changed=20 >>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>> >>>>>>>>>>>>> TASK [*redis : monit*] >>>>>>>>>>>>> *********************************************************** >>>>>>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, >>>>>>>>>>>>> "failed": true, "msg": "redis process not presently configured with monit", >>>>>>>>>>>>> "name": "redis", "state": "monitored"} >>>>>>>>>>>>> >>>>>>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>>>>>> ***************************************** >>>>>>>>>>>>> >>>>>>>>>>>>> RUNNING HANDLER [redis : restart redis] >>>>>>>>>>>>> **************************************** >>>>>>>>>>>>> >>>>>>>>>>>>> RUNNING HANDLER [redis : restart rsyslog] >>>>>>>>>>>>> ************************************** >>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>> >>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>> 165.227.172.214 : ok=17 changed=10 >>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>> >>>>>>>>>>>>> *Issue 4:* >>>>>>>>>>>>> TASK [touchforms : Touchforms user] >>>>>>>>>>>>> ******************************************** >>>>>>>>>>>>> An exception occurred during task execution. To see the full >>>>>>>>>>>>> traceback, use -vvv. The error was: ImportError: No module named django >>>>>>>>>>>>> fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => >>>>>>>>>>>>> {"changed": false, "failed": true, "module_stderr": "Traceback (most recent >>>>>>>>>>>>> call last):\n File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", >>>>>>>>>>>>> line 144, in \n main()\n File >>>>>>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n >>>>>>>>>>>>> user.create_user()\n File >>>>>>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in >>>>>>>>>>>>> create_user\n superuser=repr(self.superuser),\n File >>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n >>>>>>>>>>>>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File >>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n >>>>>>>>>>>>> self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", >>>>>>>>>>>>> line 792, in wait\n self.handle_command_exit_code(exit_code)\n File >>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in >>>>>>>>>>>>> handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: >>>>>>>>>>>>> /home/cchq/www/dev/current/python_env/bin/python manage.py shell >>>>>>>>>>>>> --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n >>>>>>>>>>>>> File \"manage.py\", line 9, in \n import django\nImportError: No >>>>>>>>>>>>> module named django\n\n", "module_stdout": "Traceback (most recent call >>>>>>>>>>>>> last):\n File \"manage.py\", line 9, in \n import >>>>>>>>>>>>> django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} >>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>> >>>>>>>>>>>>> Possible solution: Here, we need to SSH in and then: >>>>>>>>>>>>> # su - cchq >>>>>>>>>>>>> # cd www/dev/current >>>>>>>>>>>>> # source python_env/bin/activate >>>>>>>>>>>>> # pip install -r requirements/requirements.txt >>>>>>>>>>>>> >>>>>>>>>>>>> At this point the whole ansible playbook succeeds, but when we >>>>>>>>>>>>> visit our IP, we get the maintenance page and see this in the nginx logs: >>>>>>>>>>>>> 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed >>>>>>>>>>>>> (111: Connection refused) while connecting to upstream, client: >>>>>>>>>>>>> 186.106.251.211, server: 165.227.172.214, request: "GET /favicon.ico >>>>>>>>>>>>> HTTP/1.1", upstream: "http://165.227.172.214:9010/favicon.ico", >>>>>>>>>>>>> host: "165.227.172.214", referrer: " >>>>>>>>>>>>> https://165.227.172.214/solutions/" >>>>>>>>>>>>> >>>>>>>>>>>>> After activating the python_env we run runserver as `cchq`: >>>>>>>>>>>>> ./manage.py runserver 0.0.0.0:9010 >>>>>>>>>>>>> >>>>>>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection >>>>>>>>>>>>> connection = Database.connect(**conn_params) >>>>>>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect >>>>>>>>>>>>> conn = _connect(dsn, connection_factory=connection_factory, **kwasync) >>>>>>>>>>>>> django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> At this point, we're wondering: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Why isn't the server running itself? >>>>>>>>>>>>> 2. And how do we get it to run? >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Taylor >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> --- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to >>>>>>>>>>>> commcare-developers+unsubscribe@googlegroups.com. >>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>> >>>>>>>>> --- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "CommCare Developers" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> --- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "CommCare Developers" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "CommCare Developers" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "CommCare Developers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to commcare-developers+unsubscribe@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "CommCare Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to commcare-developers+unsubscribe@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> >

We don't use SSL for ES since we don't use external ES service. But you
could submit a PR that adds the ability to provide the necessary parameters.

Simon Kelly
Director of Server Engineer | Dimagi

··· On 16 October 2017 at 11:25, wrote:

Hi Simon

Quick question:

We set up a trial account with a ES provider (just so we would not get
distracted by the ElasticSearch rabbithole right now) - but the only way I
could get ./manage.py ptop_preindex to connect was to hack in the
necessary params for an SSL connection in _es_hosts() in corehq/elastic.py.

Is there a way to get commcare to work with ES using SSL?

Regards

Rory

On Friday, 13 October 2017 13:52:55 UTC+2, Simon Kelly wrote:

:+1:

On 13 Oct 2017 00:56, rorymc...@capefox.co wrote:

D'oh! Thanks Simon, no this is totally my fault - at some point in the
process my brain conflated elasticsearch and S3, and then never let go :frowning: -
I am not sure why - old age I guess ;).

Thanks for the tips - we will definitely factor them in.

R

On Thursday, 12 October 2017 21:30:09 UTC+2, Simon Kelly wrote:

Hey

So riak-cs and elasticsearch are completely different systems. You can
think of Riak-CS as and S3 service. Elasticsearch is a distributed search
index.

In localsettings.py the settings for Elasticsearch are the ones I
mentioned before. For Riak the settings are:

S3_BLOB_DB_SETTINGS = {
"url": "http://localhost:9980/",
"access_key": "admin-key",
"secret_key": "admin-secret",
"config": {"connect_timeout": 3, "read_timeout": 5},
}

Note that if you are just running a monolith then it's not necessary to
have riak at all since you can just the the local filesystem. If you want
to go that route then you should just remove the 'riak-cs' group from your
inventory file completely. That should result in the above settings being
removed from your localsettings file which will cause CommCare HQ to switch
to using the filesystem to store binary objects (e.g. form xml).

You should also then set shared_drive_enabled to 'false' in your
ansible vars file since you don't need a NFS drive for just one machine.

Sorry for the complexities here and the lack of docs.

Simon Kelly
Director of Server Engineer | Dimagi

On 12 October 2017 at 14:01, rorymc...@capefox.co wrote:

Thanks Simon.

Just to make sure I am not missing something really obvious ("missing
something really obvious" is in fact, quite an accurate summation of my
adventure so far) - the ansible scripts set up riak-cs, and so I can point
those ES connection strings at the local riak-cs instance?

Regards

Rory

On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote:

That seems like the Elasticsearch address may be incorrect. This
error is happening when the command is trying to create a new index in
elasticsearch.

I'd check that you've got your ES connection details correct in
localsettings:

  • ELASTICSEARCH_HOST
  • ELASTICSEARCH_PORT

You can test the connection using curl:

$ curl :

{
"status" : 200,
"name" : "Albino",
"cluster_name" : "agrajag",
"version" : {
"number" : "1.7.4",
"build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e",
"build_timestamp" : "2015-12-15T16:45:04Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

Simon Kelly
Director of Server Engineer | Dimagi

On 11 October 2017 at 11:31, rorymc...@capefox.co wrote:

Hi Simon

Yes, Jenny's advice helped us out immensely - we now have commcare
up and serving the static assets.

We are seeing what we think are errors connecting to the riak-cs
instance - and I tried running ./manage.py ptop_preindex which produces
some iniital success, but then:

Starting pillow preindex ledgers
Traceback (most recent call last):
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/gevent/greenlet.py", line 327, in run
result = self._run(*self.args, **self.kwargs)
File "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hq
case/management/commands/ptop_preindex.py", line 53, in do_reindex
FACTORIES_BY_SLUGreindex_command.build().reindex()
File "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py",
line 137, in build
initialize_index_and_mapping(get_es_new(),
CASE_SEARCH_INDEX_INFO)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 87, in
initialize_index_and_mapping
initialize_index(es, index_info)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 92, in
initialize_index
return create_index_and_set_settings_normal(es,
index_info.index, index_info.meta)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 73, in
create_index_and_set_settings_normal
es.indices.create(index=index, body=metadata)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/client/indices.py", line 103, in create
params=params, body=body)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/transport.py", line 307, in
perform_request
status, headers, data = connection.perform_request(method, url,
params, body, ignore=ignore, timeout=timeout)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/connection/http_urllib3.py", line 93,
in perform_request
self._raise_error(response.status, raw_data)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/connection/base.py", line 105, in
_raise_error
raise HTTP_EXCEPTIONS.get(status_code,
TransportError)(status_code, error_message, additional_info)
NotFoundError: TransportError(404, u'404 Not
Found

Not Found

The requested document
was not found on this server.


mochiweb+webmachine
web server')
<Greenlet at 0x7f9713dac2d0: do_reindex(u'case_search', False)>
failed with NotFoundError

There are more errors in this ilk, the above is merely the first
(note: I have added some debugging print statements, so line numbers may be
slightly out). Does the above point to us doing something that is obviously
wrong?

Thanks in advance.

Rory

On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote:

Been offline travelling so sorry for the slow response. Strange
that you get that error if you're using the fabric deploy script since it
should do a bower update but I'd check what Jenny suggested to make sure.

Re the "sudo received non-zero exit codes" messages, as long as
it's only for the 'preindex' command that should be fine. If there are any
other errors during deploy then it won't complete. (also PR to remove those
warnings: hide 'sudo received non-zero exit codes' warning by snopoke · Pull Request #393 · dimagi/commcare-hq-deploy · GitHub)

Simon Kelly
Director of Server Engineer | Dimagi

On 10 October 2017 at 11:27, Jenny Schweers jsch...@dimagi.com wrote:

Hi Taylor,

About that compress error: Have you run bower update recently?
I'd run that, verify that the file ./bower_components/font-awesome/less/font-awesome.less
does indeed exist afterwards, and then run collectstatic and compress again.

You can also double-check that your STATICFILES_DIRS contains
bower_components (it should be set up by
https://github.com/dimagi/commcare-hq/blob/master/setting
s.py#L87-L97)

-Jenny

On Mon, Oct 9, 2017 at 5:36 PM, tay...@openfn.org wrote:

Simon, my last update for the day:

I've got the server running (and serving html!
https://fd-files-production.s3.amazonaws.com/214131/TeaNBXNn9A1b2cZcaMnhyw?X-Amz-Expires=300&X-Amz-Date=20171009T212816Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIA2QBI5WP5HA3ZEA/20171009/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=56ec6111d2a96ced90fded9f16fc1c6f473796894c6da08c157a7ff3c0e870ae)
when I follow LESS option 1: https://github.com/dimagi/c
ommcare-hq#option-1-let-client-side-javascript-lessjs-handle
-it-for-you.

I cannot get compress to run using either option 2 or option
3, and with option 1 (as you can probably see from the linked photo) I'm
not actually getting the static assets I need from a CDN.

The error on my compress command is no longer on motech, it's
now on "hqadmin":
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqa
dmin/templates/hqadmin/loadtest.html:
'font-awesome/less/font-awesome.less' could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles'
or with staticfiles.

Thanks again for all your help. Speak soon!

Taylor

P.S. — In an effort to make this repeatable, we've got a fork of
the ansible repo going that includes a git submodule with your
commcare-deploy repo. Our goal is to get this down to a single git clone
and a few shell commands! Would love any feedback on the directory
structure you use locally.

On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote:

Hey Simon, thanks so much. We've got the fab deploy scripts
running now (albeit with lots of warning, sudo received non-zero exit
codes*) and finishing successfully. When we ssh into our box, got to the
newly created release, activate python and run runserver however, we get
a server to start but it throws this 500** whenever it's accessed via the
web:

OfflineGenerationError: You have offline compression enabled but
key "89af02fe109c09d9c74742e99d8f3fea" is missing from offline
manifest. You may need to run "python manage.py compress".
2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0"
500 59

When running compress, we get this font-awesome package error:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/
openmrs/templates/openmrs/importers.html:
'font-awesome/less/font-awesome.less' could not be found in the
COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles'
or with staticfiles.

Have you bumped into this before? Thanks!

*The non-zero exit codes all look pretty much like this:
[165.227.172.214] sudo: /home/cchq/www/dev/releases/20
17-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check
[165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is not
configured (logging is disabled). Please see the documentation for more
information.
[165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN
LOG
[165.227.172.214] out:

Warning: sudo() received nonzero return code 1 while executing
'/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check'!

**Here's the full 500 error: https://gist.github.com
/taylordowns2000/cebc671a34431826a326b66cadccee9d

On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote:

Hi Taylor

Our general process is as follows:

  1. Configure blank VMs (just OS)
  2. Create inventory file and vars files
  3. Run ansible deploy - there are often a few hiccoughs
    here since we don't do fresh installs that often
  4. Once everything is setup we deploy our code with fabric
    scripts https://github.com/dimagi/commcare-hq-deploy as
    follows

fab deploy

environment is the name of an inventory file here:
https://github.com/dimagi/commcare-hq-deploy/tree/
master/fab/inventory
https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory

This also makes use of this 'environments.yml' file which
tells the deploy scripts which services to run where and a few other
things: https://github.com/dimagi/commcare-hq-deploy/blob/
master/fab/environments.yml

  1. That deploy will checkout the latest code, do the static
    file compression etc and also create the supervisor files needed to run the
    servers.

We've recently made some improvements to our couchdb setup (you
should use couchdb2). I've linked them in comments on your PR.

We are about to do a whole new cluster setup so it's likely
that there will be some more changes coming soon.

Re the issues:

  1. Switch to using couchdb2
    2&3. Resolved in latest master + this PR (
    https://github.com/dimagi/commcarehq-ansible/pull/971)
  2. The virtual env should have already be setup by
    the deploy_commcarehq playbook which should execute prior to the touchforms
    playbook. Also touchforms is only necessary if you're going to be doing sms
    surveys.

Re the encrypted drives. We run the deploy_stack playbook with
'after-reboot' tag limited to the rebooted host. This should remount the
encrypted drive and perform a few other actions.

I hope that helps and thanks for the feedback!

Simon Kelly
Director of Server Engineer | Dimagi

On 5 October 2017 at 17:36, tay...@openfn.org wrote:

Update: Rory found that one issue lay in the encrypted fs
stuff. ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us having
to reboot during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py
migrate) *and get:
File "/home/cchq/www/dev/current/py
thon_env/local/lib/python2.7/site-packages/botocore/client.py",
line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred
(AccessDenied) when calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I'm fairly certain I've
configured my bucket properly and granted access via the access key and
secret. (These are not part of version control in the shared repo, of
course.) Will update as we go.

FWIW, python manage.py compress fails because it can't find
the Font Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/reg
istration/templates/registration/domain_request.html:
'font-awesome/less/font-awesome.less' could not be found in
the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles'
or with staticfiles.

On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote:

Hey guys,

Hope all is well. Let me preface this with a thank you—I know
you've got a lot going on and don't rely on ansible monolith deployments
for your core work, so I realize that any help you provide here is going
above and beyond. Thank you for that!

My objective is to get ansible-playbook -i
inventories/monolith -u root -e '@vars/dev/dev_private.yml' -e
'@vars/dev/dev_public.yml' deploy_stack.yml
running on a
freshly provisioned Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-125-generic
x86_64) droplet with 2 gigs of memory.

While I think that's a solid goal for the whole CommCare
open-source community, I'd like to disclose that we've also got a client at
Open Function that wants to connect CommCare to another system using
OpenFn, but CommCare needs to be hosted on their servers due to regulatory
issues.

Note that we made a couple of changes vagrant and edited some
ansible scripts. You can see this work here:
Getting it to work by taylordowns2000 · Pull Request #1 · rorymckinley/commcare-sandbox · GitHub.
One significant change is that we are running the vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]


ok: [165.227.172.214] => (item={u'username': u'commcarehq',
u'name': u'commcarehq', u'is_https': False, u'host': u'165.227.172.214',
u'password': u'commcarehq', u'port': 5984})
failed: [165.227.172.214] (item={u'username': u'commcarehq',
u'name': u'commcarehq__users', u'is_https': False, u'host':
u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) =>
{"cache_control": "must-revalidate", "content":
"{"error":"unauthorized","reason":"You are not a
server admin."}\n", "content_length": "64", "content_type": "text/plain;
charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true,
"item": {"host": "165.227.172.214", "is_https": false, "name":
"commcarehq__users", "password": "commcarehq", "port": 5984, "username":
"commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401:
Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang
OTP/R16B03)", "status": 401, "url": "
http://165.227.172.214:5984/_config/admins/commcarehq"}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=135 changed=90
unreachable=0 failed=1

Possible solution 1: This task runs twice, but each user
in "items" has the same username and password. The failure can be stepped
over, as we don't need to (and can't) set up two different couchdb users
with commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails.
After I reboot the system and start monit manually they pass and redis is
running, but couchdb2 still shows "Execution failed". After another system
reboot, and manually starting monit, both now show as running and being
monitored.

monit status: Process 'couchdb2'
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017 11:59:49

TASK [couchdb2 : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {"changed": false,
"failed": true, "msg": "couchdb2 process not presently configured with
monit", "name": "couchdb2", "state": "monitored"}

RUNNING HANDLER [monit : reload monit]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=36 changed=20
unreachable=0 failed=1

TASK [redis : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {"changed": false,
"failed": true, "msg": "redis process not presently configured with monit",
"name": "redis", "state": "monitored"}

RUNNING HANDLER [monit : reload monit]


RUNNING HANDLER [redis : restart redis]


RUNNING HANDLER [redis : restart rsyslog]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=17 changed=10
unreachable=0 failed=1

Issue 4:
TASK [touchforms : Touchforms user]


An exception occurred during task execution. To see the full
traceback, use -vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! =>
{"changed": false, "failed": true, "module_stderr": "Traceback (most recent
call last):\n File "/tmp/ansible_iUft9p/ansible_module_django_user.py",
line 144, in \n main()\n File "/tmp/ansible_iUft9p/ansible_module_django_user.py",
line 125, in main\n user.create_user()\n File
"/tmp/ansible_iUft9p/ansible_module_django_user.py", line
84, in create_user\n superuser=repr(self.superuser),\n
File "/usr/local/lib/python2.7/dist-packages/sh.py", line
1427, in call\n return RunningCommand(cmd, call_args, stdin, stdout,
stderr)\n File "/usr/local/lib/python2.7/dist-packages/sh.py",
line 774, in init\n self.wait()\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line 792,
in wait\n self.handle_command_exit_code(exit_code)\n
File "/usr/local/lib/python2.7/dist-packages/sh.py", line
815, in handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1:
\n\n RAN: /home/cchq/www/dev/current/python_env/bin/python
manage.py shell --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent
call last):\n File "manage.py", line 9, in \n import
django\nImportError: No module named django\n\n", "module_stdout":
"Traceback (most recent call last):\n File "manage.py", line 9, in
\n import django\nImportError: No module named django\n\n",
"msg": "MODULE FAILURE"}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when
we visit our IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed
(111: Connection refused) while connecting to upstream, client:
186.106.251.211, server: 165.227.172.214, request: "GET /favicon.ico
HTTP/1.1", upstream: "http://165.227.172.214:9010/favicon.ico",
host: "165.227.172.214", referrer: "
https://165.227.172.214/solutions/"

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
connection = Database.connect(**conn_params)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we're wondering:

  1. Why isn't the server running itself?
  2. And how do we get it to run?

Best,
Taylor

--


You received this message because you are subscribed to the
Google Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to commcare-developers+unsubscrib
e@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the
Google Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Simon - it turns out that the customer wants to use local ES, so we
won't be using the offboard service after all (not even for testing).

It feels as if we have commcare most of the way there now, most pages seem
to load and the number of obvious errors :slight_smile: are very few.

Taylor has tried sending out invites to users, but he says he never
receives the mails. There is no obvious signs that anything is going awry

  • the only clue I have found in the logs is as follows:

a.b.c.d - - - - - [18/Oct/2017:13:00:57 +0000] "POST /hq/notifications/service/ HTTP/1.1" 200 94 "https://y.y.y.y/a/xxxxxx/settings/users/web/ https://165.227.172.214/a/grabel-test/settings/users/web/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"

I am not sure if this is at all related to what Taylor is trying to do? The
other logs have quite a regular complaint about something called toggle.js,
but I am not sure if that is related either (as you may be picking up here,
there is a lot I am not sure about :slight_smile: ).

Thanks in advance

Rory

PS I sanitised some of the log entry.

··· On Monday, 16 October 2017 18:51:16 UTC+2, Simon Kelly wrote: > > We don't use SSL for ES since we don't use external ES service. But you > could submit a PR that adds the ability to provide the necessary parameters. > > Simon Kelly > Director of Server Engineer | Dimagi > > On 16 October 2017 at 11:25, <rorymc...@capefox.co > wrote: > >> Hi Simon >> >> Quick question: >> >> We set up a trial account with a ES provider (just so we would not get >> distracted by the ElasticSearch rabbithole right now) - but the only way I >> could get `./manage.py ptop_preindex` to connect was to hack in the >> necessary params for an SSL connection in _es_hosts() in corehq/elastic.py. >> >> Is there a way to get commcare to work with ES using SSL? >> >> Regards >> >> Rory >> >> On Friday, 13 October 2017 13:52:55 UTC+2, Simon Kelly wrote: >>> >>> 👍 >>> >>> On 13 Oct 2017 00:56, wrote: >>> >>>> D'oh! Thanks Simon, no this is totally my fault - at some point in the >>>> process my brain conflated elasticsearch and S3, and then never let go :( - >>>> I am not sure why - old age I guess ;). >>>> >>>> Thanks for the tips - we will definitely factor them in. >>>> >>>> R >>>> >>>> On Thursday, 12 October 2017 21:30:09 UTC+2, Simon Kelly wrote: >>>>> >>>>> Hey >>>>> >>>>> So riak-cs and elasticsearch are completely different systems. You can >>>>> think of Riak-CS as and S3 service. Elasticsearch is a distributed search >>>>> index. >>>>> >>>>> In localsettings.py the settings for Elasticsearch are the ones I >>>>> mentioned before. For Riak the settings are: >>>>> >>>>> S3_BLOB_DB_SETTINGS = { >>>>> "url": "http://localhost:9980/", >>>>> "access_key": "admin-key", >>>>> "secret_key": "admin-secret", >>>>> "config": {"connect_timeout": 3, "read_timeout": 5}, >>>>> } >>>>> >>>>> Note that if you are just running a monolith then it's not necessary >>>>> to have riak at all since you can just the the local filesystem. If you >>>>> want to go that route then you should just remove the 'riak-cs' group from >>>>> your inventory file completely. That should result in the above settings >>>>> being removed from your localsettings file which will cause CommCare HQ to >>>>> switch to using the filesystem to store binary objects (e.g. form xml). >>>>> >>>>> You should also then set `shared_drive_enabled` to 'false' in your >>>>> ansible vars file since you don't need a NFS drive for just one machine. >>>>> >>>>> Sorry for the complexities here and the lack of docs. >>>>> >>>>> Simon Kelly >>>>> Director of Server Engineer | Dimagi >>>>> >>>>> On 12 October 2017 at 14:01, wrote: >>>>> >>>>>> Thanks Simon. >>>>>> >>>>>> Just to make sure I am not missing something really obvious ("missing >>>>>> something really obvious" is in fact, quite an accurate summation of my >>>>>> adventure so far) - the ansible scripts set up riak-cs, and so I can point >>>>>> those ES connection strings at the local riak-cs instance? >>>>>> >>>>>> Regards >>>>>> >>>>>> Rory >>>>>> >>>>>> On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote: >>>>>>> >>>>>>> That seems like the Elasticsearch address may be incorrect. This >>>>>>> error is happening when the command is trying to create a new index in >>>>>>> elasticsearch. >>>>>>> >>>>>>> I'd check that you've got your ES connection details correct in >>>>>>> localsettings: >>>>>>> >>>>>>> - ELASTICSEARCH_HOST >>>>>>> - ELASTICSEARCH_PORT >>>>>>> >>>>>>> You can test the connection using curl: >>>>>>> >>>>>>> $ curl : >>>>>>> >>>>>>> >>>>>>> { >>>>>>> "status" : 200, >>>>>>> "name" : "Albino", >>>>>>> "cluster_name" : "agrajag", >>>>>>> "version" : { >>>>>>> "number" : "1.7.4", >>>>>>> "build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e", >>>>>>> "build_timestamp" : "2015-12-15T16:45:04Z", >>>>>>> "build_snapshot" : false, >>>>>>> "lucene_version" : "4.10.4" >>>>>>> }, >>>>>>> "tagline" : "You Know, for Search" >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> Simon Kelly >>>>>>> Director of Server Engineer | Dimagi >>>>>>> >>>>>>> On 11 October 2017 at 11:31, wrote: >>>>>>> >>>>>>>> Hi Simon >>>>>>>> >>>>>>>> Yes, Jenny's advice helped us out immensely - we now have commcare >>>>>>>> up and serving the static assets. >>>>>>>> >>>>>>>> We are seeing what we think are errors connecting to the riak-cs >>>>>>>> instance - and I tried running `./manage.py ptop_preindex` which produces >>>>>>>> some iniital success, but then: >>>>>>>> >>>>>>>> Starting pillow preindex ledgers >>>>>>>> Traceback (most recent call last): >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/gevent/greenlet.py", >>>>>>>> line 327, in run >>>>>>>> result = self._run(*self.args, **self.kwargs) >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqcase/management/commands/ptop_preindex.py", >>>>>>>> line 53, in do_reindex >>>>>>>> FACTORIES_BY_SLUG[reindex_command](**kwargs).build().reindex() >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py", >>>>>>>> line 137, in build >>>>>>>> initialize_index_and_mapping(get_es_new(), >>>>>>>> CASE_SEARCH_INDEX_INFO) >>>>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 87, in >>>>>>>> initialize_index_and_mapping >>>>>>>> initialize_index(es, index_info) >>>>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 92, in >>>>>>>> initialize_index >>>>>>>> return create_index_and_set_settings_normal(es, >>>>>>>> index_info.index, index_info.meta) >>>>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 73, in >>>>>>>> create_index_and_set_settings_normal >>>>>>>> es.indices.create(index=index, body=metadata) >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", >>>>>>>> line 69, in _wrapped >>>>>>>> return func(*args, params=params, **kwargs) >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/indices.py", >>>>>>>> line 103, in create >>>>>>>> params=params, body=body) >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/transport.py", >>>>>>>> line 307, in perform_request >>>>>>>> status, headers, data = connection.perform_request(method, url, >>>>>>>> params, body, ignore=ignore, timeout=timeout) >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", >>>>>>>> line 93, in perform_request >>>>>>>> self._raise_error(response.status, raw_data) >>>>>>>> File >>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", >>>>>>>> line 105, in _raise_error >>>>>>>> raise HTTP_EXCEPTIONS.get(status_code, >>>>>>>> TransportError)(status_code, error_message, additional_info) >>>>>>>> NotFoundError: TransportError(404, u'404 Not >>>>>>>> Found

Not Found

The requested document was not >>>>>>>> found on this server.


mochiweb+webmachine web >>>>>>>> server') >>>>>>>> >>>>>>>> failed with NotFoundError >>>>>>>> >>>>>>>> There are more errors in this ilk, the above is merely the first >>>>>>>> (note: I have added some debugging print statements, so line numbers may be >>>>>>>> slightly out). Does the above point to us doing something that is obviously >>>>>>>> wrong? >>>>>>>> >>>>>>>> Thanks in advance. >>>>>>>> >>>>>>>> Rory >>>>>>>> >>>>>>>> On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote: >>>>>>>>> >>>>>>>>> Been offline travelling so sorry for the slow response. Strange >>>>>>>>> that you get that error if you're using the fabric deploy script since it >>>>>>>>> should do a bower update but I'd check what Jenny suggested to make sure. >>>>>>>>> >>>>>>>>> Re the "sudo received non-zero exit codes" messages, as long as >>>>>>>>> it's only for the 'preindex' command that should be fine. If there are any >>>>>>>>> other errors during deploy then it won't complete. (also PR to remove those >>>>>>>>> warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Simon Kelly >>>>>>>>> Director of Server Engineer | Dimagi >>>>>>>>> >>>>>>>>> On 10 October 2017 at 11:27, Jenny Schweers wrote: >>>>>>>>> >>>>>>>>>> Hi Taylor, >>>>>>>>>> >>>>>>>>>> About that compress error: Have you run `bower update` recently? >>>>>>>>>> I'd run that, verify that the >>>>>>>>>> file ./bower_components/font-awesome/less/font-awesome.less does indeed >>>>>>>>>> exist afterwards, and then run collectstatic and compress again. >>>>>>>>>> >>>>>>>>>> You can also double-check that your STATICFILES_DIRS contains >>>>>>>>>> bower_components (it should be set up by >>>>>>>>>> https://github.com/dimagi/commcare-hq/blob/master/settings.py#L87-L97 >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> -Jenny >>>>>>>>>> >>>>>>>>>> On Mon, Oct 9, 2017 at 5:36 PM, wrote: >>>>>>>>>> >>>>>>>>>>> Simon, my last update for the day: >>>>>>>>>>> >>>>>>>>>>> I've got the server running (and serving html! >>>>>>>>>>> ) >>>>>>>>>>> when I follow LESS option 1: >>>>>>>>>>> https://github.com/dimagi/commcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> I cannot get *compress* to run using either option 2 or option >>>>>>>>>>> 3, and with option 1 (as you can probably see from the linked photo) I'm >>>>>>>>>>> not actually getting the static assets I need from a CDN. >>>>>>>>>>> >>>>>>>>>>> The error on my *compress* command is no longer on motech, it's >>>>>>>>>>> now on "hqadmin": >>>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqadmin/templates/hqadmin/loadtest.html: >>>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles' or >>>>>>>>>>> with staticfiles. >>>>>>>>>>> >>>>>>>>>>> Thanks again for all your help. Speak soon! >>>>>>>>>>> >>>>>>>>>>> Taylor >>>>>>>>>>> >>>>>>>>>>> P.S. — In an effort to make this repeatable, we've got a fork of >>>>>>>>>>> the ansible repo going that includes a git submodule with your >>>>>>>>>>> commcare-deploy repo. Our goal is to get this down to a single git clone >>>>>>>>>>> and a few shell commands! Would love any feedback on the directory >>>>>>>>>>> structure you use locally. >>>>>>>>>>> >>>>>>>>>>> On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hey Simon, thanks so much. We've got the fab deploy scripts >>>>>>>>>>>> running now (albeit with lots of warning, sudo received non-zero exit >>>>>>>>>>>> codes*) and finishing successfully. When we ssh into our box, got to the >>>>>>>>>>>> newly created release, activate python and run `runserver` however, we get >>>>>>>>>>>> a server to start but it throws this 500** whenever it's accessed via the >>>>>>>>>>>> web: >>>>>>>>>>>> >>>>>>>>>>>> OfflineGenerationError: You have offline compression enabled >>>>>>>>>>>> but key "89af02fe109c09d9c74742e99d8f3fea" is missing from offline >>>>>>>>>>>> manifest. You may need to run "python manage.py compress". >>>>>>>>>>>> 2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0" >>>>>>>>>>>> 500 59 >>>>>>>>>>>> >>>>>>>>>>>> When running compress, we get this font-awesome package error: >>>>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/openmrs/templates/openmrs/importers.html: >>>>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles' or >>>>>>>>>>>> with staticfiles. >>>>>>>>>>>> >>>>>>>>>>>> Have you bumped into this before? Thanks! >>>>>>>>>>>> >>>>>>>>>>>> **The non-zero exit codes all look pretty much like this:* >>>>>>>>>>>> [165.227.172.214] sudo: >>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>>>>>>>> --check >>>>>>>>>>>> [165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is >>>>>>>>>>>> not configured (logging is disabled). Please see the documentation for more >>>>>>>>>>>> information. >>>>>>>>>>>> [165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: BEGIN >>>>>>>>>>>> LOG >>>>>>>>>>>> [165.227.172.214] out: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Warning: sudo() received nonzero return code 1 while executing >>>>>>>>>>>> '/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>>>>>>>> --check'! >>>>>>>>>>>> >>>>>>>>>>>> ***Here's the full 500 error:* >>>>>>>>>>>> https://gist.github.com/taylordowns2000/cebc671a34431826a326b66cadccee9d >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Taylor >>>>>>>>>>>>> >>>>>>>>>>>>> Our general process is as follows: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Configure blank VMs (just OS) >>>>>>>>>>>>> 2. Create inventory file and vars files >>>>>>>>>>>>> 3. Run ansible deploy - there are often a few hiccoughs >>>>>>>>>>>>> here since we don't do fresh installs that often >>>>>>>>>>>>> 4. Once everything is setup we deploy our code with fabric >>>>>>>>>>>>> scripts as >>>>>>>>>>>>> follows >>>>>>>>>>>>> >>>>>>>>>>>>> fab deploy >>>>>>>>>>>>> >>>>>>>>>>>>> environment is the name of an inventory file here: >>>>>>>>>>>>> https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory >>>>>>>>>>>>> >>>>>>>>>>>>> This also makes use of this 'environments.yml' file which >>>>>>>>>>>>> tells the deploy scripts which services to run where and a few other >>>>>>>>>>>>> things: >>>>>>>>>>>>> https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml >>>>>>>>>>>>> >>>>>>>>>>>>> 5. That deploy will checkout the latest code, do the >>>>>>>>>>>>> static file compression etc and also create the supervisor files needed to >>>>>>>>>>>>> run the servers. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> We've recently made some improvements to our couchdb setup >>>>>>>>>>>>> (you should use couchdb2). I've linked them in comments on your PR. >>>>>>>>>>>>> >>>>>>>>>>>>> We are about to do a whole new cluster setup so it's likely >>>>>>>>>>>>> that there will be some more changes coming soon. >>>>>>>>>>>>> >>>>>>>>>>>>> Re the issues: >>>>>>>>>>>>> 1. Switch to using couchdb2 >>>>>>>>>>>>> 2&3. Resolved in latest master + this PR ( >>>>>>>>>>>>> https://github.com/dimagi/commcarehq-ansible/pull/971) >>>>>>>>>>>>> 4. The virtual env should have already be setup by >>>>>>>>>>>>> the deploy_commcarehq playbook which should execute prior to the touchforms >>>>>>>>>>>>> playbook. Also touchforms is only necessary if you're going to be doing sms >>>>>>>>>>>>> surveys. >>>>>>>>>>>>> >>>>>>>>>>>>> Re the encrypted drives. We run the deploy_stack playbook with >>>>>>>>>>>>> 'after-reboot' tag limited to the rebooted host. This should remount the >>>>>>>>>>>>> encrypted drive and perform a few other actions. >>>>>>>>>>>>> >>>>>>>>>>>>> I hope that helps and thanks for the feedback! >>>>>>>>>>>>> >>>>>>>>>>>>> Simon Kelly >>>>>>>>>>>>> Director of Server Engineer | Dimagi >>>>>>>>>>>>> >>>>>>>>>>>>> On 5 October 2017 at 17:36, wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Update: Rory found that one issue lay in the encrypted fs >>>>>>>>>>>>>> stuff. ran: >>>>>>>>>>>>>> >>>>>>>>>>>>>> /etc/init.d/postgresql start >>>>>>>>>>>>>> /etc/init.d/pgbouncer stop >>>>>>>>>>>>>> /etc/init.d/pgbouncer start >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> and we can run the server. This was probably due to us having >>>>>>>>>>>>>> to reboot during the deployment process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python manage.py >>>>>>>>>>>>>> migrate) *and get: >>>>>>>>>>>>>> File >>>>>>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py", >>>>>>>>>>>>>> line 599, in _make_api_call >>>>>>>>>>>>>> raise error_class(parsed_response, operation_name) >>>>>>>>>>>>>> botocore.exceptions.ClientError: An error occurred >>>>>>>>>>>>>> (AccessDenied) when calling the ListObjects operation: Access Denied >>>>>>>>>>>>>> >>>>>>>>>>>>>> This appears to be an S3 issue, but I'm fairly certain I've >>>>>>>>>>>>>> configured my bucket properly and granted access via the access key and >>>>>>>>>>>>>> secret. (These are not part of version control in the shared repo, of >>>>>>>>>>>>>> course.) Will update as we go. >>>>>>>>>>>>>> >>>>>>>>>>>>>> FWIW, *python manage.py compress* fails because it can't >>>>>>>>>>>>>> find the Font Awesome less file: >>>>>>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html: >>>>>>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles' or >>>>>>>>>>>>>> with staticfiles. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey guys, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hope all is well. Let me preface this with a thank you—I >>>>>>>>>>>>>>> know you've got a lot going on and don't rely on ansible monolith >>>>>>>>>>>>>>> deployments for your core work, so I realize that any help you provide here >>>>>>>>>>>>>>> is going above and beyond. Thank you for that! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My objective is to get *ansible-playbook -i >>>>>>>>>>>>>>> inventories/monolith -u root -e '@vars/dev/dev_private.yml' -e >>>>>>>>>>>>>>> '@vars/dev/dev_public.yml' deploy_stack.yml* running on a >>>>>>>>>>>>>>> freshly provisioned Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-125-generic >>>>>>>>>>>>>>> x86_64) droplet with 2 gigs of memory. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> While I think that's a solid goal for the whole CommCare >>>>>>>>>>>>>>> open-source community, I'd like to disclose that we've also got a client at >>>>>>>>>>>>>>> Open Function that wants to connect CommCare to another system using >>>>>>>>>>>>>>> OpenFn, but CommCare needs to be hosted on their servers due to regulatory >>>>>>>>>>>>>>> issues. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Note that we made a couple of changes vagrant and edited >>>>>>>>>>>>>>> some ansible scripts. You can see this work here: >>>>>>>>>>>>>>> https://github.com/rorymckinley/commcare-sandbox/pull/1/files. >>>>>>>>>>>>>>> One significant change is that we are running the vagrant stuff as root. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To the issues: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Issue #1:* >>>>>>>>>>>>>>> TASK [couchdb : Set CouchDB username and password] >>>>>>>>>>>>>>> ***************************** >>>>>>>>>>>>>>> ok: [165.227.172.214] => (item={u'username': u'commcarehq', >>>>>>>>>>>>>>> u'name': u'commcarehq', u'is_https': False, u'host': u'165.227.172.214', >>>>>>>>>>>>>>> u'password': u'commcarehq', u'port': 5984}) >>>>>>>>>>>>>>> failed: [165.227.172.214] (item={u'username': u'commcarehq', >>>>>>>>>>>>>>> u'name': u'commcarehq__users', u'is_https': False, u'host': >>>>>>>>>>>>>>> u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) => >>>>>>>>>>>>>>> {"cache_control": "must-revalidate", "content": >>>>>>>>>>>>>>> "{\"error\":\"unauthorized\",\"reason\":\"You are not a server >>>>>>>>>>>>>>> admin.\"}\n", "content_length": "64", "content_type": "text/plain; >>>>>>>>>>>>>>> charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true, >>>>>>>>>>>>>>> "item": {"host": "165.227.172.214", "is_https": false, "name": >>>>>>>>>>>>>>> "commcarehq__users", "password": "commcarehq", "port": 5984, "username": >>>>>>>>>>>>>>> "commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401: >>>>>>>>>>>>>>> Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang >>>>>>>>>>>>>>> OTP/R16B03)", "status": 401, "url": " >>>>>>>>>>>>>>> http://165.227.172.214:5984/_config/admins/commcarehq"} >>>>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>>>> 165.227.172.214 : ok=135 changed=90 >>>>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Possible solution 1:* This task runs twice, but each user >>>>>>>>>>>>>>> in "items" has the same username and password. The failure can be stepped >>>>>>>>>>>>>>> over, as we don't need to (and can't) set up two different couchdb users >>>>>>>>>>>>>>> with commcarehq:commcarehq on the same box. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Issue #2&3: *For both couchdb2 and redis, monit fails. >>>>>>>>>>>>>>> After I reboot the system and start monit manually they pass and redis is >>>>>>>>>>>>>>> running, but couchdb2 still shows "Execution failed". After another system >>>>>>>>>>>>>>> reboot, and manually starting monit, both now show as running and being >>>>>>>>>>>>>>> monitored. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> monit status: Process 'couchdb2' >>>>>>>>>>>>>>> status Execution failed >>>>>>>>>>>>>>> monitoring status Monitored >>>>>>>>>>>>>>> data collected Thu, 05 Oct 2017 11:59:49 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TASK [*couchdb2 : monit*] >>>>>>>>>>>>>>> ******************************************************** >>>>>>>>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, >>>>>>>>>>>>>>> "failed": true, "msg": "couchdb2 process not presently configured with >>>>>>>>>>>>>>> monit", "name": "couchdb2", "state": "monitored"} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>>>>>>>> ***************************************** >>>>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>>>> 165.227.172.214 : ok=36 changed=20 >>>>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TASK [*redis : monit*] >>>>>>>>>>>>>>> *********************************************************** >>>>>>>>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, >>>>>>>>>>>>>>> "failed": true, "msg": "redis process not presently configured with monit", >>>>>>>>>>>>>>> "name": "redis", "state": "monitored"} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>>>>>>>> ***************************************** >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> RUNNING HANDLER [redis : restart redis] >>>>>>>>>>>>>>> **************************************** >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> RUNNING HANDLER [redis : restart rsyslog] >>>>>>>>>>>>>>> ************************************** >>>>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>>>> 165.227.172.214 : ok=17 changed=10 >>>>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Issue 4:* >>>>>>>>>>>>>>> TASK [touchforms : Touchforms user] >>>>>>>>>>>>>>> ******************************************** >>>>>>>>>>>>>>> An exception occurred during task execution. To see the full >>>>>>>>>>>>>>> traceback, use -vvv. The error was: ImportError: No module named django >>>>>>>>>>>>>>> fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => >>>>>>>>>>>>>>> {"changed": false, "failed": true, "module_stderr": "Traceback (most recent >>>>>>>>>>>>>>> call last):\n File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", >>>>>>>>>>>>>>> line 144, in \n main()\n File >>>>>>>>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n >>>>>>>>>>>>>>> user.create_user()\n File >>>>>>>>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in >>>>>>>>>>>>>>> create_user\n superuser=repr(self.superuser),\n File >>>>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n >>>>>>>>>>>>>>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File >>>>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n >>>>>>>>>>>>>>> self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", >>>>>>>>>>>>>>> line 792, in wait\n self.handle_command_exit_code(exit_code)\n File >>>>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in >>>>>>>>>>>>>>> handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: >>>>>>>>>>>>>>> /home/cchq/www/dev/current/python_env/bin/python manage.py shell >>>>>>>>>>>>>>> --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n >>>>>>>>>>>>>>> File \"manage.py\", line 9, in \n import django\nImportError: No >>>>>>>>>>>>>>> module named django\n\n", "module_stdout": "Traceback (most recent call >>>>>>>>>>>>>>> last):\n File \"manage.py\", line 9, in \n import >>>>>>>>>>>>>>> django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} >>>>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Possible solution: Here, we need to SSH in and then: >>>>>>>>>>>>>>> # su - cchq >>>>>>>>>>>>>>> # cd www/dev/current >>>>>>>>>>>>>>> # source python_env/bin/activate >>>>>>>>>>>>>>> # pip install -r requirements/requirements.txt >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> At this point the whole ansible playbook succeeds, but when >>>>>>>>>>>>>>> we visit our IP, we get the maintenance page and see this in the nginx logs: >>>>>>>>>>>>>>> 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed >>>>>>>>>>>>>>> (111: Connection refused) while connecting to upstream, client: >>>>>>>>>>>>>>> 186.106.251.211, server: 165.227.172.214, request: "GET /favicon.ico >>>>>>>>>>>>>>> HTTP/1.1", upstream: " >>>>>>>>>>>>>>> http://165.227.172.214:9010/favicon.ico", host: >>>>>>>>>>>>>>> "165.227.172.214", referrer: " >>>>>>>>>>>>>>> https://165.227.172.214/solutions/" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> After activating the python_env we run runserver as `cchq`: >>>>>>>>>>>>>>> ./manage.py runserver 0.0.0.0:9010 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection >>>>>>>>>>>>>>> connection = Database.connect(**conn_params) >>>>>>>>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect >>>>>>>>>>>>>>> conn = _connect(dsn, connection_factory=connection_factory, **kwasync) >>>>>>>>>>>>>>> django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> At this point, we're wondering: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Why isn't the server running itself? >>>>>>>>>>>>>>> 2. And how do we get it to run? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Taylor >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>>> it, send an email to >>>>>>>>>>>>>> commcare-developers+unsubscribe@googlegroups.com. >>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> --- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to >>>>>>>>>>> commcare-developers+unsubscribe@googlegroups.com. >>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com >>>>>>>>>> . >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> >>>>>>>> --- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "CommCare Developers" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "CommCare Developers" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "CommCare Developers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to commcare-developers+unsubscribe@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "CommCare Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to commcare-developers+unsubscribe@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > >

Hi Rory

You can customize email with these settings (you should set them in your
'localsettings.py' file):

You will also need to set the EMAIL_BACKEND according to your specific
needs:
https://docs.djangoproject.com/en/1.11/topics/email/#topic-email-backends

Simon Kelly
Director of Server Engineer | Dimagi

··· On 18 October 2017 at 12:04, wrote:

Thanks Simon - it turns out that the customer wants to use local ES, so we
won't be using the offboard service after all (not even for testing).

It feels as if we have commcare most of the way there now, most pages seem
to load and the number of obvious errors :slight_smile: are very few.

Taylor has tried sending out invites to users, but he says he never
receives the mails. There is no obvious signs that anything is going awry

  • the only clue I have found in the logs is as follows:

a.b.c.d - - - - - [18/Oct/2017:13:00:57 +0000] "POST /hq/notifications/service/ HTTP/1.1" 200 94 "https://y.y.y.y/a/xxxxxx/settings/users/web/ https://165.227.172.214/a/grabel-test/settings/users/web/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"

I am not sure if this is at all related to what Taylor is trying to do?
The other logs have quite a regular complaint about something called
toggle.js, but I am not sure if that is related either (as you may be
picking up here, there is a lot I am not sure about :slight_smile: ).

Thanks in advance

Rory

PS I sanitised some of the log entry.
On Monday, 16 October 2017 18:51:16 UTC+2, Simon Kelly wrote:

We don't use SSL for ES since we don't use external ES service. But you
could submit a PR that adds the ability to provide the necessary parameters.

Simon Kelly
Director of Server Engineer | Dimagi

On 16 October 2017 at 11:25, rorymc...@capefox.co wrote:

Hi Simon

Quick question:

We set up a trial account with a ES provider (just so we would not get
distracted by the ElasticSearch rabbithole right now) - but the only way I
could get ./manage.py ptop_preindex to connect was to hack in the
necessary params for an SSL connection in _es_hosts() in corehq/elastic.py.

Is there a way to get commcare to work with ES using SSL?

Regards

Rory

On Friday, 13 October 2017 13:52:55 UTC+2, Simon Kelly wrote:

:+1:

On 13 Oct 2017 00:56, rorymc...@capefox.co wrote:

D'oh! Thanks Simon, no this is totally my fault - at some point in the
process my brain conflated elasticsearch and S3, and then never let go :frowning: -
I am not sure why - old age I guess ;).

Thanks for the tips - we will definitely factor them in.

R

On Thursday, 12 October 2017 21:30:09 UTC+2, Simon Kelly wrote:

Hey

So riak-cs and elasticsearch are completely different systems. You
can think of Riak-CS as and S3 service. Elasticsearch is a distributed
search index.

In localsettings.py the settings for Elasticsearch are the ones I
mentioned before. For Riak the settings are:

S3_BLOB_DB_SETTINGS = {
"url": "http://localhost:9980/",
"access_key": "admin-key",
"secret_key": "admin-secret",
"config": {"connect_timeout": 3, "read_timeout": 5},
}

Note that if you are just running a monolith then it's not necessary
to have riak at all since you can just the the local filesystem. If you
want to go that route then you should just remove the 'riak-cs' group from
your inventory file completely. That should result in the above settings
being removed from your localsettings file which will cause CommCare HQ to
switch to using the filesystem to store binary objects (e.g. form xml).

You should also then set shared_drive_enabled to 'false' in your
ansible vars file since you don't need a NFS drive for just one machine.

Sorry for the complexities here and the lack of docs.

Simon Kelly
Director of Server Engineer | Dimagi

On 12 October 2017 at 14:01, rorymc...@capefox.co wrote:

Thanks Simon.

Just to make sure I am not missing something really obvious
("missing something really obvious" is in fact, quite an accurate summation
of my adventure so far) - the ansible scripts set up riak-cs, and so I can
point those ES connection strings at the local riak-cs instance?

Regards

Rory

On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote:

That seems like the Elasticsearch address may be incorrect. This
error is happening when the command is trying to create a new index in
elasticsearch.

I'd check that you've got your ES connection details correct in
localsettings:

  • ELASTICSEARCH_HOST
  • ELASTICSEARCH_PORT

You can test the connection using curl:

$ curl :

{
"status" : 200,
"name" : "Albino",
"cluster_name" : "agrajag",
"version" : {
"number" : "1.7.4",
"build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e",
"build_timestamp" : "2015-12-15T16:45:04Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

Simon Kelly
Director of Server Engineer | Dimagi

On 11 October 2017 at 11:31, rorymc...@capefox.co wrote:

Hi Simon

Yes, Jenny's advice helped us out immensely - we now have commcare
up and serving the static assets.

We are seeing what we think are errors connecting to the riak-cs
instance - and I tried running ./manage.py ptop_preindex which produces
some iniital success, but then:

Starting pillow preindex ledgers
Traceback (most recent call last):
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/gevent/greenlet.py", line 327, in run
result = self._run(*self.args, **self.kwargs)
File "/home/cchq/www/dev/releases/2
017-10-09_18.02/corehq/apps/hqcase/management/commands/ptop_preindex.py",
line 53, in do_reindex
FACTORIES_BY_SLUGreindex_command.build().reindex()
File "/home/cchq/www/dev/releases/2
017-10-09_18.02/corehq/pillows/case_search.py", line 137, in build
initialize_index_and_mapping(get_es_new(),
CASE_SEARCH_INDEX_INFO)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 87,
in initialize_index_and_mapping
initialize_index(es, index_info)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 92,
in initialize_index
return create_index_and_set_settings_normal(es,
index_info.index, index_info.meta)
File "./corehq/ex-submodules/pillowtop/es_utils.py", line 73,
in create_index_and_set_settings_normal
es.indices.create(index=index, body=metadata)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/client/indices.py", line 103, in
create
params=params, body=body)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/transport.py", line 307, in
perform_request
status, headers, data = connection.perform_request(method,
url, params, body, ignore=ignore, timeout=timeout)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/connection/http_urllib3.py", line 93,
in perform_request
self._raise_error(response.status, raw_data)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/
site-packages/elasticsearch/connection/base.py", line 105, in
_raise_error
raise HTTP_EXCEPTIONS.get(status_code,
TransportError)(status_code, error_message, additional_info)
NotFoundError: TransportError(404, u'404 Not
Found

Not Found

The requested
document was not found on this server.


mochiweb+webmachine
web server')
<Greenlet at 0x7f9713dac2d0: do_reindex(u'case_search', False)>
failed with NotFoundError

There are more errors in this ilk, the above is merely the first
(note: I have added some debugging print statements, so line numbers may be
slightly out). Does the above point to us doing something that is obviously
wrong?

Thanks in advance.

Rory

On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote:

Been offline travelling so sorry for the slow response. Strange
that you get that error if you're using the fabric deploy script since it
should do a bower update but I'd check what Jenny suggested to make sure.

Re the "sudo received non-zero exit codes" messages, as long as
it's only for the 'preindex' command that should be fine. If there are any
other errors during deploy then it won't complete. (also PR to remove those
warnings: hide 'sudo received non-zero exit codes' warning by snopoke · Pull Request #393 · dimagi/commcare-hq-deploy · GitHub)

Simon Kelly
Director of Server Engineer | Dimagi

On 10 October 2017 at 11:27, Jenny Schweers jsch...@dimagi.com wrote:

Hi Taylor,

About that compress error: Have you run bower update recently?
I'd run that, verify that the file ./bower_components/font-awesome/less/font-awesome.less
does indeed exist afterwards, and then run collectstatic and compress again.

You can also double-check that your STATICFILES_DIRS contains
bower_components (it should be set up by
https://github.com/dimagi/commcare-hq/blob/master/setting
s.py#L87-L97)

-Jenny

On Mon, Oct 9, 2017 at 5:36 PM, tay...@openfn.org wrote:

Simon, my last update for the day:

I've got the server running (and serving html!
https://fd-files-production.s3.amazonaws.com/214131/TeaNBXNn9A1b2cZcaMnhyw?X-Amz-Expires=300&X-Amz-Date=20171009T212816Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIA2QBI5WP5HA3ZEA/20171009/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=56ec6111d2a96ced90fded9f16fc1c6f473796894c6da08c157a7ff3c0e870ae)
when I follow LESS option 1: https://github.com/dimagi/c
ommcare-hq#option-1-let-client-side-javascript-lessjs-handle
-it-for-you.

I cannot get compress to run using either option 2 or option
3, and with option 1 (as you can probably see from the linked photo) I'm
not actually getting the static assets I need from a CDN.

The error on my compress command is no longer on motech,
it's now on "hqadmin":
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqa
dmin/templates/hqadmin/loadtest.html:
'font-awesome/less/font-awesome.less' could not be found in
the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles'
or with staticfiles.

Thanks again for all your help. Speak soon!

Taylor

P.S. — In an effort to make this repeatable, we've got a fork
of the ansible repo going that includes a git submodule with your
commcare-deploy repo. Our goal is to get this down to a single git clone
and a few shell commands! Would love any feedback on the directory
structure you use locally.

On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote:

Hey Simon, thanks so much. We've got the fab deploy scripts
running now (albeit with lots of warning, sudo received non-zero exit
codes*) and finishing successfully. When we ssh into our box, got to the
newly created release, activate python and run runserver however, we get
a server to start but it throws this 500** whenever it's accessed via the
web:

OfflineGenerationError: You have offline compression enabled
but key "89af02fe109c09d9c74742e99d8f3fea" is missing from
offline manifest. You may need to run "python manage.py compress".
2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0"
500 59

When running compress, we get this font-awesome package error:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/
openmrs/templates/openmrs/importers.html:
'font-awesome/less/font-awesome.less' could not be found in
the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles'
or with staticfiles.

Have you bumped into this before? Thanks!

*The non-zero exit codes all look pretty much like this:
[165.227.172.214] sudo: /home/cchq/www/dev/releases/20
17-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check
[165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is
not configured (logging is disabled). Please see the documentation for more
information.
[165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES:
BEGIN LOG
[165.227.172.214] out:

Warning: sudo() received nonzero return code 1 while executing
'/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python
/home/cchq/www/dev/releases/2017-10-09_16.04/manage.py
preindex_everything --check'!

**Here's the full 500 error: https://gist.github.com
/taylordowns2000/cebc671a34431826a326b66cadccee9d

On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote:

Hi Taylor

Our general process is as follows:

  1. Configure blank VMs (just OS)
  2. Create inventory file and vars files
  3. Run ansible deploy - there are often a few hiccoughs
    here since we don't do fresh installs that often
  4. Once everything is setup we deploy our code with fabric
    scripts https://github.com/dimagi/commcare-hq-deploy as
    follows

fab deploy

environment is the name of an inventory file here:
https://github.com/dimagi/commcare-hq-deploy/tree/
master/fab/inventory
https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory

This also makes use of this 'environments.yml' file which
tells the deploy scripts which services to run where and a few other
things: https://github.com/dimagi/commcare-hq-deploy/blob/
master/fab/environments.yml

  1. That deploy will checkout the latest code, do the
    static file compression etc and also create the supervisor files needed to
    run the servers.

We've recently made some improvements to our couchdb setup
(you should use couchdb2). I've linked them in comments on your PR.

We are about to do a whole new cluster setup so it's likely
that there will be some more changes coming soon.

Re the issues:

  1. Switch to using couchdb2
    2&3. Resolved in latest master + this PR (
    https://github.com/dimagi/commcarehq-ansible/pull/971)
  2. The virtual env should have already be setup by
    the deploy_commcarehq playbook which should execute prior to the touchforms
    playbook. Also touchforms is only necessary if you're going to be doing sms
    surveys.

Re the encrypted drives. We run the deploy_stack playbook
with 'after-reboot' tag limited to the rebooted host. This should remount
the encrypted drive and perform a few other actions.

I hope that helps and thanks for the feedback!

Simon Kelly
Director of Server Engineer | Dimagi

On 5 October 2017 at 17:36, tay...@openfn.org wrote:

Update: Rory found that one issue lay in the encrypted fs
stuff. ran:

/etc/init.d/postgresql start
/etc/init.d/pgbouncer stop
/etc/init.d/pgbouncer start

and we can run the server. This was probably due to us
having to reboot during the deployment process.

We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python
manage.py migrate) *and get:
File "/home/cchq/www/dev/current/py
thon_env/local/lib/python2.7/site-packages/botocore/client.py",
line 599, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred
(AccessDenied) when calling the ListObjects operation: Access Denied

This appears to be an S3 issue, but I'm fairly certain I've
configured my bucket properly and granted access via the access key and
secret. (These are not part of version control in the shared repo, of
course.) Will update as we go.

FWIW, python manage.py compress fails because it can't
find the Font Awesome less file:
CommandError: An error occurred during rendering
/home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/reg
istration/templates/registration/domain_request.html:
'font-awesome/less/font-awesome.less' could not be found in
the COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles'
or with staticfiles.

On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote:

Hey guys,

Hope all is well. Let me preface this with a thank you—I
know you've got a lot going on and don't rely on ansible monolith
deployments for your core work, so I realize that any help you provide here
is going above and beyond. Thank you for that!

My objective is to get ansible-playbook -i
inventories/monolith -u root -e '@vars/dev/dev_private.yml' -e
'@vars/dev/dev_public.yml' deploy_stack.yml
running on a
freshly provisioned Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-125-generic
x86_64) droplet with 2 gigs of memory.

While I think that's a solid goal for the whole CommCare
open-source community, I'd like to disclose that we've also got a client at
Open Function that wants to connect CommCare to another system using
OpenFn, but CommCare needs to be hosted on their servers due to regulatory
issues.

Note that we made a couple of changes vagrant and edited
some ansible scripts. You can see this work here:
https://github.com/rorymckinley/commcare-sandbox/pull/
1/files. One significant change is that we are running the
vagrant stuff as root.

To the issues:

Issue #1:
TASK [couchdb : Set CouchDB username and password]


ok: [165.227.172.214] => (item={u'username': u'commcarehq',
u'name': u'commcarehq', u'is_https': False, u'host': u'165.227.172.214',
u'password': u'commcarehq', u'port': 5984})
failed: [165.227.172.214] (item={u'username':
u'commcarehq', u'name': u'commcarehq__users', u'is_https': False, u'host':
u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) =>
{"cache_control": "must-revalidate", "content":
"{"error":"unauthorized","reason":"You are not a
server admin."}\n", "content_length": "64", "content_type": "text/plain;
charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true,
"item": {"host": "165.227.172.214", "is_https": false, "name":
"commcarehq__users", "password": "commcarehq", "port": 5984, "username":
"commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401:
Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang
OTP/R16B03)", "status": 401, "url": "
http://165.227.172.214:5984/_config/admins/commcarehq"}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=135 changed=90
unreachable=0 failed=1

Possible solution 1: This task runs twice, but each user
in "items" has the same username and password. The failure can be stepped
over, as we don't need to (and can't) set up two different couchdb users
with commcarehq:commcarehq on the same box.

*Issue #2&3: *For both couchdb2 and redis, monit fails.
After I reboot the system and start monit manually they pass and redis is
running, but couchdb2 still shows "Execution failed". After another system
reboot, and manually starting monit, both now show as running and being
monitored.

monit status: Process 'couchdb2'
status Execution failed
monitoring status Monitored
data collected Thu, 05 Oct 2017
11:59:49

TASK [couchdb2 : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {"changed": false,
"failed": true, "msg": "couchdb2 process not presently configured with
monit", "name": "couchdb2", "state": "monitored"}

RUNNING HANDLER [monit : reload monit]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=36 changed=20
unreachable=0 failed=1

TASK [redis : monit] ******************************


fatal: [165.227.172.214]: FAILED! => {"changed": false,
"failed": true, "msg": "redis process not presently configured with monit",
"name": "redis", "state": "monitored"}

RUNNING HANDLER [monit : reload monit]


RUNNING HANDLER [redis : restart redis]


RUNNING HANDLER [redis : restart rsyslog]


to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

PLAY RECAP ******************************


165.227.172.214 : ok=17 changed=10
unreachable=0 failed=1

Issue 4:
TASK [touchforms : Touchforms user]


An exception occurred during task execution. To see the
full traceback, use -vvv. The error was: ImportError: No module named django
fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! =>
{"changed": false, "failed": true, "module_stderr": "Traceback (most recent
call last):\n File "/tmp/ansible_iUft9p/ansible_module_django_user.py",
line 144, in \n main()\n File "/tmp/ansible_iUft9p/ansible_module_django_user.py",
line 125, in main\n user.create_user()\n File
"/tmp/ansible_iUft9p/ansible_module_django_user.py",
line 84, in create_user\n superuser=repr(self.superuser),\n
File "/usr/local/lib/python2.7/dist-packages/sh.py",
line 1427, in call\n return RunningCommand(cmd, call_args, stdin,
stdout, stderr)\n File "/usr/local/lib/python2.7/dist-packages/sh.py",
line 774, in init\n self.wait()\n File
"/usr/local/lib/python2.7/dist-packages/sh.py", line
792, in wait\n self.handle_command_exit_code(exit_code)\n
File "/usr/local/lib/python2.7/dist-packages/sh.py",
line 815, in handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1:
\n\n RAN: /home/cchq/www/dev/current/python_env/bin/python
manage.py shell --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent
call last):\n File "manage.py", line 9, in \n import
django\nImportError: No module named django\n\n", "module_stdout":
"Traceback (most recent call last):\n File "manage.py", line 9, in
\n import django\nImportError: No module named django\n\n",
"msg": "MODULE FAILURE"}
to retry, use: --limit @/vagrant/ansible/deploy_stack.retry

Possible solution: Here, we need to SSH in and then:

su - cchq

cd www/dev/current

source python_env/bin/activate

pip install -r requirements/requirements.txt

At this point the whole ansible playbook succeeds, but when
we visit our IP, we get the maintenance page and see this in the nginx logs:
2017/10/05 13:56:16 [error] 1064#1064: *18 connect() failed
(111: Connection refused) while connecting to upstream, client:
186.106.251.211, server: 165.227.172.214, request: "GET /favicon.ico
HTTP/1.1", upstream: "http://165.227.172.214:9010/f
avicon.ico", host: "165.227.172.214", referrer: "
https://165.227.172.214/solutions/"

After activating the python_env we run runserver as cchq:
./manage.py runserver 0.0.0.0:9010

File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
connection = Database.connect(**conn_params)
File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/init.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server

At this point, we're wondering:

  1. Why isn't the server running itself?
  2. And how do we get it to run?

Best,
Taylor

--


You received this message because you are subscribed to the
Google Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to commcare-developers+unsubscrib
e@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the
Google Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to commcare-developers+unsubscrib
e@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the
Google Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to commcare-developers+unsubscrib
e@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google
Groups "CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--


You received this message because you are subscribed to the Google Groups
"CommCare Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to commcare-developers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Simon, Jenny, and team,

The new master on commcarehq-ansible works much better. Thank you! I’ve got a
branch
https://github.com/dimagi/commcarehq-ansible/compare/master...OpenFn:master
that almost runs through on the very first go.

Three quick questions:

  1. Right now, we’re only able to get ansible to run if we set a
    FORMPLAYER_INTERNAL_AUTH_KEY to an empty string, but since we later have
    issues with Formplayer I’m worried that this isn’t the right move. What
    should we do here?
  2. With which user do you run ansible on a Digital Ocean box? (I noticed
    an “ansible” user gets configured, but presumably you’ve got to first run
    as root. Are you meant to run once as root and then after it fails
    subsequently run as ansible?)
  3. Where do you clone the commcare-hq-deploy repo and with which user do
    you run fab deploy?

And then one higher-level question to make sure I’m understanding things
correctly: It seems as though deployment on a new box requires the cloning
and configuration of three separate repos: (1) commcarehq-ansible
https://github.com/dimagi/commcarehq-ansible, (2) commcare-hq-deploy
https://github.com/dimagi/commcare-hq-deploy, and (3) formplayer
https://github.com/dimagi/formplayer. If we’re trying to get this down to
a single repo (or at least a single README) can you describe the
relationship between these three repos theoretically and in
user/directory terms? It would be amazing to know where you clone each of
them and how you run them in relation to each other. We’ve been doing all
the ansible stuff as root and the django stuff as cchq, but that may not be
right.

Again, thank you so much. I owe you so many coffees/beers/loaves of
bread/pretty-much-you-name-it next time I’m in South Africa.

Taylor

··· On Thursday, October 19, 2017 at 8:56:17 AM UTC-4, Simon Kelly wrote: > > Hi Rory > > You can customize email with these settings (you should set them in your > 'localsettings.py' file): > https://github.com/dimagi/commcare-hq/blob/6238482bace149b57b13ebaf66b669edd6e372f4/settings.py#L470-L499 > > You will also need to set the EMAIL_BACKEND according to your specific > needs: > https://docs.djangoproject.com/en/1.11/topics/email/#topic-email-backends > > Simon Kelly > Director of Server Engineer | Dimagi > > On 18 October 2017 at 12:04, <rorymc...@capefox.co > wrote: > >> Thanks Simon - it turns out that the customer wants to use local ES, so >> we won't be using the offboard service after all (not even for testing). >> >> It feels as if we have commcare most of the way there now, most pages >> seem to load and the number of obvious errors :) are very few. >> >> Taylor has tried sending out invites to users, but he says he never >> receives the mails. There is no obvious signs that anything is going awry >> - the only clue I have found in the logs is as follows: >> >> a.b.c.d - - - - - [18/Oct/2017:13:00:57 +0000] "POST /hq/notifications/service/ HTTP/1.1" 200 94 "https://y.y.y.y/a/xxxxxx/settings/users/web/ " "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" >> >> >> I am not sure if this is at all related to what Taylor is trying to do? >> The other logs have quite a regular complaint about something called >> toggle.js, but I am not sure if that is related either (as you may be >> picking up here, there is a lot I am not sure about :) ). >> >> Thanks in advance >> >> Rory >> >> PS I sanitised some of the log entry. >> On Monday, 16 October 2017 18:51:16 UTC+2, Simon Kelly wrote: >>> >>> We don't use SSL for ES since we don't use external ES service. But you >>> could submit a PR that adds the ability to provide the necessary parameters. >>> >>> Simon Kelly >>> Director of Server Engineer | Dimagi >>> >>> On 16 October 2017 at 11:25, wrote: >>> >>>> Hi Simon >>>> >>>> Quick question: >>>> >>>> We set up a trial account with a ES provider (just so we would not get >>>> distracted by the ElasticSearch rabbithole right now) - but the only way I >>>> could get `./manage.py ptop_preindex` to connect was to hack in the >>>> necessary params for an SSL connection in _es_hosts() in corehq/elastic.py. >>>> >>>> Is there a way to get commcare to work with ES using SSL? >>>> >>>> Regards >>>> >>>> Rory >>>> >>>> On Friday, 13 October 2017 13:52:55 UTC+2, Simon Kelly wrote: >>>>> >>>>> 👍 >>>>> >>>>> On 13 Oct 2017 00:56, wrote: >>>>> >>>>>> D'oh! Thanks Simon, no this is totally my fault - at some point in >>>>>> the process my brain conflated elasticsearch and S3, and then never let go >>>>>> :( - I am not sure why - old age I guess ;). >>>>>> >>>>>> Thanks for the tips - we will definitely factor them in. >>>>>> >>>>>> R >>>>>> >>>>>> On Thursday, 12 October 2017 21:30:09 UTC+2, Simon Kelly wrote: >>>>>>> >>>>>>> Hey >>>>>>> >>>>>>> So riak-cs and elasticsearch are completely different systems. You >>>>>>> can think of Riak-CS as and S3 service. Elasticsearch is a distributed >>>>>>> search index. >>>>>>> >>>>>>> In localsettings.py the settings for Elasticsearch are the ones I >>>>>>> mentioned before. For Riak the settings are: >>>>>>> >>>>>>> S3_BLOB_DB_SETTINGS = { >>>>>>> "url": "http://localhost:9980/", >>>>>>> "access_key": "admin-key", >>>>>>> "secret_key": "admin-secret", >>>>>>> "config": {"connect_timeout": 3, "read_timeout": 5}, >>>>>>> } >>>>>>> >>>>>>> Note that if you are just running a monolith then it's not necessary >>>>>>> to have riak at all since you can just the the local filesystem. If you >>>>>>> want to go that route then you should just remove the 'riak-cs' group from >>>>>>> your inventory file completely. That should result in the above settings >>>>>>> being removed from your localsettings file which will cause CommCare HQ to >>>>>>> switch to using the filesystem to store binary objects (e.g. form xml). >>>>>>> >>>>>>> You should also then set `shared_drive_enabled` to 'false' in your >>>>>>> ansible vars file since you don't need a NFS drive for just one machine. >>>>>>> >>>>>>> Sorry for the complexities here and the lack of docs. >>>>>>> >>>>>>> Simon Kelly >>>>>>> Director of Server Engineer | Dimagi >>>>>>> >>>>>>> On 12 October 2017 at 14:01, wrote: >>>>>>> >>>>>>>> Thanks Simon. >>>>>>>> >>>>>>>> Just to make sure I am not missing something really obvious >>>>>>>> ("missing something really obvious" is in fact, quite an accurate summation >>>>>>>> of my adventure so far) - the ansible scripts set up riak-cs, and so I can >>>>>>>> point those ES connection strings at the local riak-cs instance? >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> Rory >>>>>>>> >>>>>>>> On Wednesday, 11 October 2017 20:09:47 UTC+2, Simon Kelly wrote: >>>>>>>>> >>>>>>>>> That seems like the Elasticsearch address may be incorrect. This >>>>>>>>> error is happening when the command is trying to create a new index in >>>>>>>>> elasticsearch. >>>>>>>>> >>>>>>>>> I'd check that you've got your ES connection details correct in >>>>>>>>> localsettings: >>>>>>>>> >>>>>>>>> - ELASTICSEARCH_HOST >>>>>>>>> - ELASTICSEARCH_PORT >>>>>>>>> >>>>>>>>> You can test the connection using curl: >>>>>>>>> >>>>>>>>> $ curl : >>>>>>>>> >>>>>>>>> >>>>>>>>> { >>>>>>>>> "status" : 200, >>>>>>>>> "name" : "Albino", >>>>>>>>> "cluster_name" : "agrajag", >>>>>>>>> "version" : { >>>>>>>>> "number" : "1.7.4", >>>>>>>>> "build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e", >>>>>>>>> "build_timestamp" : "2015-12-15T16:45:04Z", >>>>>>>>> "build_snapshot" : false, >>>>>>>>> "lucene_version" : "4.10.4" >>>>>>>>> }, >>>>>>>>> "tagline" : "You Know, for Search" >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Simon Kelly >>>>>>>>> Director of Server Engineer | Dimagi >>>>>>>>> >>>>>>>>> On 11 October 2017 at 11:31, wrote: >>>>>>>>> >>>>>>>>>> Hi Simon >>>>>>>>>> >>>>>>>>>> Yes, Jenny's advice helped us out immensely - we now have >>>>>>>>>> commcare up and serving the static assets. >>>>>>>>>> >>>>>>>>>> We are seeing what we think are errors connecting to the riak-cs >>>>>>>>>> instance - and I tried running `./manage.py ptop_preindex` which produces >>>>>>>>>> some iniital success, but then: >>>>>>>>>> >>>>>>>>>> Starting pillow preindex ledgers >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/gevent/greenlet.py", >>>>>>>>>> line 327, in run >>>>>>>>>> result = self._run(*self.args, **self.kwargs) >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqcase/management/commands/ptop_preindex.py", >>>>>>>>>> line 53, in do_reindex >>>>>>>>>> FACTORIES_BY_SLUG[reindex_command](**kwargs).build().reindex() >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/releases/2017-10-09_18.02/corehq/pillows/case_search.py", >>>>>>>>>> line 137, in build >>>>>>>>>> initialize_index_and_mapping(get_es_new(), >>>>>>>>>> CASE_SEARCH_INDEX_INFO) >>>>>>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 87, >>>>>>>>>> in initialize_index_and_mapping >>>>>>>>>> initialize_index(es, index_info) >>>>>>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 92, >>>>>>>>>> in initialize_index >>>>>>>>>> return create_index_and_set_settings_normal(es, >>>>>>>>>> index_info.index, index_info.meta) >>>>>>>>>> File "./corehq/ex-submodules/pillowtop/es_utils.py", line 73, >>>>>>>>>> in create_index_and_set_settings_normal >>>>>>>>>> es.indices.create(index=index, body=metadata) >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", >>>>>>>>>> line 69, in _wrapped >>>>>>>>>> return func(*args, params=params, **kwargs) >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/client/indices.py", >>>>>>>>>> line 103, in create >>>>>>>>>> params=params, body=body) >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/transport.py", >>>>>>>>>> line 307, in perform_request >>>>>>>>>> status, headers, data = connection.perform_request(method, >>>>>>>>>> url, params, body, ignore=ignore, timeout=timeout) >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", >>>>>>>>>> line 93, in perform_request >>>>>>>>>> self._raise_error(response.status, raw_data) >>>>>>>>>> File >>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", >>>>>>>>>> line 105, in _raise_error >>>>>>>>>> raise HTTP_EXCEPTIONS.get(status_code, >>>>>>>>>> TransportError)(status_code, error_message, additional_info) >>>>>>>>>> NotFoundError: TransportError(404, u'404 Not >>>>>>>>>> Found

Not Found

The requested document was not >>>>>>>>>> found on this server.


mochiweb+webmachine web >>>>>>>>>> server') >>>>>>>>>> >>>>>>>>>> failed with NotFoundError >>>>>>>>>> >>>>>>>>>> There are more errors in this ilk, the above is merely the first >>>>>>>>>> (note: I have added some debugging print statements, so line numbers may be >>>>>>>>>> slightly out). Does the above point to us doing something that is obviously >>>>>>>>>> wrong? >>>>>>>>>> >>>>>>>>>> Thanks in advance. >>>>>>>>>> >>>>>>>>>> Rory >>>>>>>>>> >>>>>>>>>> On Tuesday, 10 October 2017 23:46:31 UTC+2, Simon Kelly wrote: >>>>>>>>>>> >>>>>>>>>>> Been offline travelling so sorry for the slow response. Strange >>>>>>>>>>> that you get that error if you're using the fabric deploy script since it >>>>>>>>>>> should do a bower update but I'd check what Jenny suggested to make sure. >>>>>>>>>>> >>>>>>>>>>> Re the "sudo received non-zero exit codes" messages, as long as >>>>>>>>>>> it's only for the 'preindex' command that should be fine. If there are any >>>>>>>>>>> other errors during deploy then it won't complete. (also PR to remove those >>>>>>>>>>> warnings: https://github.com/dimagi/commcare-hq-deploy/pull/393) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Simon Kelly >>>>>>>>>>> Director of Server Engineer | Dimagi >>>>>>>>>>> >>>>>>>>>>> On 10 October 2017 at 11:27, Jenny Schweers wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Taylor, >>>>>>>>>>>> >>>>>>>>>>>> About that compress error: Have you run `bower update` >>>>>>>>>>>> recently? I'd run that, verify that the >>>>>>>>>>>> file ./bower_components/font-awesome/less/font-awesome.less does indeed >>>>>>>>>>>> exist afterwards, and then run collectstatic and compress again. >>>>>>>>>>>> >>>>>>>>>>>> You can also double-check that your STATICFILES_DIRS contains >>>>>>>>>>>> bower_components (it should be set up by >>>>>>>>>>>> https://github.com/dimagi/commcare-hq/blob/master/settings.py#L87-L97 >>>>>>>>>>>> ) >>>>>>>>>>>> >>>>>>>>>>>> -Jenny >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 9, 2017 at 5:36 PM, wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Simon, my last update for the day: >>>>>>>>>>>>> >>>>>>>>>>>>> I've got the server running (and serving html! >>>>>>>>>>>>> ) >>>>>>>>>>>>> when I follow LESS option 1: >>>>>>>>>>>>> https://github.com/dimagi/commcare-hq#option-1-let-client-side-javascript-lessjs-handle-it-for-you >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>>> I cannot get *compress* to run using either option 2 or >>>>>>>>>>>>> option 3, and with option 1 (as you can probably see from the linked photo) >>>>>>>>>>>>> I'm not actually getting the static assets I need from a CDN. >>>>>>>>>>>>> >>>>>>>>>>>>> The error on my *compress* command is no longer on motech, >>>>>>>>>>>>> it's now on "hqadmin": >>>>>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_18.02/corehq/apps/hqadmin/templates/hqadmin/loadtest.html: >>>>>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_18.02/staticfiles' or >>>>>>>>>>>>> with staticfiles. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks again for all your help. Speak soon! >>>>>>>>>>>>> >>>>>>>>>>>>> Taylor >>>>>>>>>>>>> >>>>>>>>>>>>> P.S. — In an effort to make this repeatable, we've got a fork >>>>>>>>>>>>> of the ansible repo going that includes a git submodule with your >>>>>>>>>>>>> commcare-deploy repo. Our goal is to get this down to a single git clone >>>>>>>>>>>>> and a few shell commands! Would love any feedback on the directory >>>>>>>>>>>>> structure you use locally. >>>>>>>>>>>>> >>>>>>>>>>>>> On Monday, October 9, 2017 at 12:22:29 PM UTC-4, tay...@openfn.org wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Simon, thanks so much. We've got the fab deploy scripts >>>>>>>>>>>>>> running now (albeit with lots of warning, sudo received non-zero exit >>>>>>>>>>>>>> codes*) and finishing successfully. When we ssh into our box, got to the >>>>>>>>>>>>>> newly created release, activate python and run `runserver` however, we get >>>>>>>>>>>>>> a server to start but it throws this 500** whenever it's accessed via the >>>>>>>>>>>>>> web: >>>>>>>>>>>>>> >>>>>>>>>>>>>> OfflineGenerationError: You have offline compression enabled >>>>>>>>>>>>>> but key "89af02fe109c09d9c74742e99d8f3fea" is missing from offline >>>>>>>>>>>>>> manifest. You may need to run "python manage.py compress". >>>>>>>>>>>>>> 2017-10-09 16:15:37,638 ERROR "GET /accounts/login/ HTTP/1.0" >>>>>>>>>>>>>> 500 59 >>>>>>>>>>>>>> >>>>>>>>>>>>>> When running compress, we get this font-awesome package error: >>>>>>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/corehq/motech/openmrs/templates/openmrs/importers.html: >>>>>>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-09_16.04/staticfiles' or >>>>>>>>>>>>>> with staticfiles. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Have you bumped into this before? Thanks! >>>>>>>>>>>>>> >>>>>>>>>>>>>> **The non-zero exit codes all look pretty much like this:* >>>>>>>>>>>>>> [165.227.172.214] sudo: >>>>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>>>>>>>>>> --check >>>>>>>>>>>>>> [165.227.172.214] out: 2017-10-09 16:08:10,599 INFO Raven is >>>>>>>>>>>>>> not configured (logging is disabled). Please see the documentation for more >>>>>>>>>>>>>> information. >>>>>>>>>>>>>> [165.227.172.214] out: 2017-10-09 16:08:12,031 INFO AXES: >>>>>>>>>>>>>> BEGIN LOG >>>>>>>>>>>>>> [165.227.172.214] out: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Warning: sudo() received nonzero return code 1 while >>>>>>>>>>>>>> executing >>>>>>>>>>>>>> '/home/cchq/www/dev/releases/2017-10-09_16.04/python_env/bin/python >>>>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-09_16.04/manage.py preindex_everything >>>>>>>>>>>>>> --check'! >>>>>>>>>>>>>> >>>>>>>>>>>>>> ***Here's the full 500 error:* >>>>>>>>>>>>>> https://gist.github.com/taylordowns2000/cebc671a34431826a326b66cadccee9d >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Friday, October 6, 2017 at 9:19:09 AM UTC-3, Simon Kelly wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Taylor >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Our general process is as follows: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Configure blank VMs (just OS) >>>>>>>>>>>>>>> 2. Create inventory file and vars files >>>>>>>>>>>>>>> 3. Run ansible deploy - there are often a few hiccoughs >>>>>>>>>>>>>>> here since we don't do fresh installs that often >>>>>>>>>>>>>>> 4. Once everything is setup we deploy our code with fabric >>>>>>>>>>>>>>> scripts >>>>>>>>>>>>>>> as follows >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> fab deploy >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> environment is the name of an inventory file here: >>>>>>>>>>>>>>> https://github.com/dimagi/commcare-hq-deploy/tree/master/fab/inventory >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This also makes use of this 'environments.yml' file >>>>>>>>>>>>>>> which tells the deploy scripts which services to run where and a few other >>>>>>>>>>>>>>> things: >>>>>>>>>>>>>>> https://github.com/dimagi/commcare-hq-deploy/blob/master/fab/environments.yml >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 5. That deploy will checkout the latest code, do the >>>>>>>>>>>>>>> static file compression etc and also create the supervisor files needed to >>>>>>>>>>>>>>> run the servers. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We've recently made some improvements to our couchdb setup >>>>>>>>>>>>>>> (you should use couchdb2). I've linked them in comments on your PR. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We are about to do a whole new cluster setup so it's likely >>>>>>>>>>>>>>> that there will be some more changes coming soon. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Re the issues: >>>>>>>>>>>>>>> 1. Switch to using couchdb2 >>>>>>>>>>>>>>> 2&3. Resolved in latest master + this PR ( >>>>>>>>>>>>>>> https://github.com/dimagi/commcarehq-ansible/pull/971) >>>>>>>>>>>>>>> 4. The virtual env should have already be setup by >>>>>>>>>>>>>>> the deploy_commcarehq playbook which should execute prior to the touchforms >>>>>>>>>>>>>>> playbook. Also touchforms is only necessary if you're going to be doing sms >>>>>>>>>>>>>>> surveys. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Re the encrypted drives. We run the deploy_stack playbook >>>>>>>>>>>>>>> with 'after-reboot' tag limited to the rebooted host. This should remount >>>>>>>>>>>>>>> the encrypted drive and perform a few other actions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I hope that helps and thanks for the feedback! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Simon Kelly >>>>>>>>>>>>>>> Director of Server Engineer | Dimagi >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 5 October 2017 at 17:36, wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Update: Rory found that one issue lay in the encrypted fs >>>>>>>>>>>>>>>> stuff. ran: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /etc/init.d/postgresql start >>>>>>>>>>>>>>>> /etc/init.d/pgbouncer stop >>>>>>>>>>>>>>>> /etc/init.d/pgbouncer start >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> and we can run the server. This was probably due to us >>>>>>>>>>>>>>>> having to reboot during the deployment process. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We run migrations (*CCHQ_IS_FRESH_INSTALL=1 python >>>>>>>>>>>>>>>> manage.py migrate) *and get: >>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>> "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/botocore/client.py", >>>>>>>>>>>>>>>> line 599, in _make_api_call >>>>>>>>>>>>>>>> raise error_class(parsed_response, operation_name) >>>>>>>>>>>>>>>> botocore.exceptions.ClientError: An error occurred >>>>>>>>>>>>>>>> (AccessDenied) when calling the ListObjects operation: Access Denied >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This appears to be an S3 issue, but I'm fairly certain I've >>>>>>>>>>>>>>>> configured my bucket properly and granted access via the access key and >>>>>>>>>>>>>>>> secret. (These are not part of version control in the shared repo, of >>>>>>>>>>>>>>>> course.) Will update as we go. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> FWIW, *python manage.py compress* fails because it can't >>>>>>>>>>>>>>>> find the Font Awesome less file: >>>>>>>>>>>>>>>> CommandError: An error occurred during rendering >>>>>>>>>>>>>>>> /home/cchq/www/dev/releases/2017-10-05_12.28/corehq/apps/registration/templates/registration/domain_request.html: >>>>>>>>>>>>>>>> 'font-awesome/less/font-awesome.less' could not be found in the >>>>>>>>>>>>>>>> COMPRESS_ROOT '/home/cchq/www/dev/releases/2017-10-05_12.28/staticfiles' or >>>>>>>>>>>>>>>> with staticfiles. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thursday, October 5, 2017 at 11:37:21 AM UTC-3, tay...@openfn.org wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey guys, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hope all is well. Let me preface this with a thank you—I >>>>>>>>>>>>>>>>> know you've got a lot going on and don't rely on ansible monolith >>>>>>>>>>>>>>>>> deployments for your core work, so I realize that any help you provide here >>>>>>>>>>>>>>>>> is going above and beyond. Thank you for that! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My objective is to get *ansible-playbook -i >>>>>>>>>>>>>>>>> inventories/monolith -u root -e '@vars/dev/dev_private.yml' -e >>>>>>>>>>>>>>>>> '@vars/dev/dev_public.yml' deploy_stack.yml* running on a >>>>>>>>>>>>>>>>> freshly provisioned Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-125-generic >>>>>>>>>>>>>>>>> x86_64) droplet with 2 gigs of memory. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> While I think that's a solid goal for the whole CommCare >>>>>>>>>>>>>>>>> open-source community, I'd like to disclose that we've also got a client at >>>>>>>>>>>>>>>>> Open Function that wants to connect CommCare to another system using >>>>>>>>>>>>>>>>> OpenFn, but CommCare needs to be hosted on their servers due to regulatory >>>>>>>>>>>>>>>>> issues. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Note that we made a couple of changes vagrant and edited >>>>>>>>>>>>>>>>> some ansible scripts. You can see this work here: >>>>>>>>>>>>>>>>> https://github.com/rorymckinley/commcare-sandbox/pull/1/files. >>>>>>>>>>>>>>>>> One significant change is that we are running the vagrant stuff as root. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To the issues: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Issue #1:* >>>>>>>>>>>>>>>>> TASK [couchdb : Set CouchDB username and password] >>>>>>>>>>>>>>>>> ***************************** >>>>>>>>>>>>>>>>> ok: [165.227.172.214] => (item={u'username': >>>>>>>>>>>>>>>>> u'commcarehq', u'name': u'commcarehq', u'is_https': False, u'host': >>>>>>>>>>>>>>>>> u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) >>>>>>>>>>>>>>>>> failed: [165.227.172.214] (item={u'username': >>>>>>>>>>>>>>>>> u'commcarehq', u'name': u'commcarehq__users', u'is_https': False, u'host': >>>>>>>>>>>>>>>>> u'165.227.172.214', u'password': u'commcarehq', u'port': 5984}) => >>>>>>>>>>>>>>>>> {"cache_control": "must-revalidate", "content": >>>>>>>>>>>>>>>>> "{\"error\":\"unauthorized\",\"reason\":\"You are not a server >>>>>>>>>>>>>>>>> admin.\"}\n", "content_length": "64", "content_type": "text/plain; >>>>>>>>>>>>>>>>> charset=utf-8", "date": "Thu, 05 Oct 2017 11:10:34 GMT", "failed": true, >>>>>>>>>>>>>>>>> "item": {"host": "165.227.172.214", "is_https": false, "name": >>>>>>>>>>>>>>>>> "commcarehq__users", "password": "commcarehq", "port": 5984, "username": >>>>>>>>>>>>>>>>> "commcarehq"}, "msg": "Status code was not [200]: HTTP Error 401: >>>>>>>>>>>>>>>>> Unauthorized", "redirected": false, "server": "CouchDB/1.6.1 (Erlang >>>>>>>>>>>>>>>>> OTP/R16B03)", "status": 401, "url": " >>>>>>>>>>>>>>>>> http://165.227.172.214:5984/_config/admins/commcarehq"} >>>>>>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>>>>>> 165.227.172.214 : ok=135 changed=90 >>>>>>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Possible solution 1:* This task runs twice, but each >>>>>>>>>>>>>>>>> user in "items" has the same username and password. The failure can be >>>>>>>>>>>>>>>>> stepped over, as we don't need to (and can't) set up two different couchdb >>>>>>>>>>>>>>>>> users with commcarehq:commcarehq on the same box. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Issue #2&3: *For both couchdb2 and redis, monit fails. >>>>>>>>>>>>>>>>> After I reboot the system and start monit manually they pass and redis is >>>>>>>>>>>>>>>>> running, but couchdb2 still shows "Execution failed". After another system >>>>>>>>>>>>>>>>> reboot, and manually starting monit, both now show as running and being >>>>>>>>>>>>>>>>> monitored. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> monit status: Process 'couchdb2' >>>>>>>>>>>>>>>>> status Execution failed >>>>>>>>>>>>>>>>> monitoring status Monitored >>>>>>>>>>>>>>>>> data collected Thu, 05 Oct 2017 >>>>>>>>>>>>>>>>> 11:59:49 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> TASK [*couchdb2 : monit*] >>>>>>>>>>>>>>>>> ******************************************************** >>>>>>>>>>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, >>>>>>>>>>>>>>>>> "failed": true, "msg": "couchdb2 process not presently configured with >>>>>>>>>>>>>>>>> monit", "name": "couchdb2", "state": "monitored"} >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>>>>>>>>>> ***************************************** >>>>>>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>>>>>> 165.227.172.214 : ok=36 changed=20 >>>>>>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> TASK [*redis : monit*] >>>>>>>>>>>>>>>>> *********************************************************** >>>>>>>>>>>>>>>>> fatal: [165.227.172.214]: FAILED! => {"changed": false, >>>>>>>>>>>>>>>>> "failed": true, "msg": "redis process not presently configured with monit", >>>>>>>>>>>>>>>>> "name": "redis", "state": "monitored"} >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> RUNNING HANDLER [monit : reload monit] >>>>>>>>>>>>>>>>> ***************************************** >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> RUNNING HANDLER [redis : restart redis] >>>>>>>>>>>>>>>>> **************************************** >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> RUNNING HANDLER [redis : restart rsyslog] >>>>>>>>>>>>>>>>> ************************************** >>>>>>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PLAY RECAP >>>>>>>>>>>>>>>>> ********************************************************************* >>>>>>>>>>>>>>>>> 165.227.172.214 : ok=17 changed=10 >>>>>>>>>>>>>>>>> unreachable=0 failed=1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Issue 4:* >>>>>>>>>>>>>>>>> TASK [touchforms : Touchforms user] >>>>>>>>>>>>>>>>> ******************************************** >>>>>>>>>>>>>>>>> An exception occurred during task execution. To see the >>>>>>>>>>>>>>>>> full traceback, use -vvv. The error was: ImportError: No module named django >>>>>>>>>>>>>>>>> fatal: [165.227.172.214 -> 165.227.172.214]: FAILED! => >>>>>>>>>>>>>>>>> {"changed": false, "failed": true, "module_stderr": "Traceback (most recent >>>>>>>>>>>>>>>>> call last):\n File \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", >>>>>>>>>>>>>>>>> line 144, in \n main()\n File >>>>>>>>>>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 125, in main\n >>>>>>>>>>>>>>>>> user.create_user()\n File >>>>>>>>>>>>>>>>> \"/tmp/ansible_iUft9p/ansible_module_django_user.py\", line 84, in >>>>>>>>>>>>>>>>> create_user\n superuser=repr(self.superuser),\n File >>>>>>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 1427, in __call__\n >>>>>>>>>>>>>>>>> return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n File >>>>>>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 774, in __init__\n >>>>>>>>>>>>>>>>> self.wait()\n File \"/usr/local/lib/python2.7/dist-packages/sh.py\", >>>>>>>>>>>>>>>>> line 792, in wait\n self.handle_command_exit_code(exit_code)\n File >>>>>>>>>>>>>>>>> \"/usr/local/lib/python2.7/dist-packages/sh.py\", line 815, in >>>>>>>>>>>>>>>>> handle_command_exit_code\n raise exc\nsh.ErrorReturnCode_1: \n\n RAN: >>>>>>>>>>>>>>>>> /home/cchq/www/dev/current/python_env/bin/python manage.py shell >>>>>>>>>>>>>>>>> --plain\n\n STDOUT:\n\n\n STDERR:\nTraceback (most recent call last):\n >>>>>>>>>>>>>>>>> File \"manage.py\", line 9, in \n import django\nImportError: No >>>>>>>>>>>>>>>>> module named django\n\n", "module_stdout": "Traceback (most recent call >>>>>>>>>>>>>>>>> last):\n File \"manage.py\", line 9, in \n import >>>>>>>>>>>>>>>>> django\nImportError: No module named django\n\n", "msg": "MODULE FAILURE"} >>>>>>>>>>>>>>>>> to retry, use: --limit @/vagrant/ansible/deploy_stack.retry >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Possible solution: Here, we need to SSH in and then: >>>>>>>>>>>>>>>>> # su - cchq >>>>>>>>>>>>>>>>> # cd www/dev/current >>>>>>>>>>>>>>>>> # source python_env/bin/activate >>>>>>>>>>>>>>>>> # pip install -r requirements/requirements.txt >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> At this point the whole ansible playbook succeeds, but >>>>>>>>>>>>>>>>> when we visit our IP, we get the maintenance page and see this in the nginx >>>>>>>>>>>>>>>>> logs: >>>>>>>>>>>>>>>>> 2017/10/05 13:56:16 [error] 1064#1064: *18 connect() >>>>>>>>>>>>>>>>> failed (111: Connection refused) while connecting to upstream, client: >>>>>>>>>>>>>>>>> 186.106.251.211, server: 165.227.172.214, request: "GET /favicon.ico >>>>>>>>>>>>>>>>> HTTP/1.1", upstream: " >>>>>>>>>>>>>>>>> http://165.227.172.214:9010/favicon.ico", host: >>>>>>>>>>>>>>>>> "165.227.172.214", referrer: " >>>>>>>>>>>>>>>>> https://165.227.172.214/solutions/" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> After activating the python_env we run runserver as `cchq`: >>>>>>>>>>>>>>>>> ./manage.py runserver 0.0.0.0:9010 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection >>>>>>>>>>>>>>>>> connection = Database.connect(**conn_params) >>>>>>>>>>>>>>>>> File "/home/cchq/www/dev/current/python_env/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect >>>>>>>>>>>>>>>>> conn = _connect(dsn, connection_factory=connection_factory, **kwasync) >>>>>>>>>>>>>>>>> django.db.utils.OperationalError: ERROR: pgbouncer cannot connect to server >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> At this point, we're wondering: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. Why isn't the server running itself? >>>>>>>>>>>>>>>>> 2. And how do we get it to run? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Taylor >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>>> from it, send an email to >>>>>>>>>>>>>>>> commcare-developers+unsubscribe@googlegroups.com. >>>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> --- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to >>>>>>>>>>>>> commcare-developers+unsubscribe@googlegroups.com. >>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> --- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to >>>>>>>>>>>> commcare-developers+unsubscribe@googlegroups.com. >>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "CommCare Developers" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com >>>>>>>>>> . >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> >>>>>>>> --- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "CommCare Developers" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "CommCare Developers" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to commcare-developers+unsubscribe@googlegroups.com. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "CommCare Developers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to commcare-developers+unsubscribe@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "CommCare Developers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to commcare-developers+unsubscribe@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > >

Hey answers inline:

  1. Right now, we’re only able to get ansible to run if we set a
    FORMPLAYER_INTERNAL_AUTH_KEY to an empty string, but since we later have
    issues with Formplayer I’m worried that this isn’t the right move. What
    should we do here?

That is a shared secret between HQ and Formplayer which allows formplayer
to authenticate API calls to HQ. You should set it to a secret key with a
reasonable amount of entropy.

  1. With which user do you run ansible on a Digital Ocean box? (I
    noticed an “ansible” user gets configured, but presumably you’ve got to
    first run as root. Are you meant to run once as root and then after it
    fails subsequently run as ansible?)

It’s usually necessary to run it as a privileged user once to setup the
user accounts. We would normally run just the ‘users’ tag as root and from
then on we can run it as the ‘ansible’ user.

  1. Where do you clone the commcare-hq-deploy repo and with which user
    do you run fab deploy?

For running deploy you can have the repo anywhere you like as long as you
have access to the machines you’re deploying to from there. For deploy you
don’t require any external dependencies (other than those defined in
requirements.txt.

And then one higher-level question to make sure I’m understanding things
correctly: It seems as though deployment on a new box requires the cloning
and configuration of three separate repos: (1) commcarehq-ansible
https://github.com/dimagi/commcarehq-ansible, (2) commcare-hq-deploy
https://github.com/dimagi/commcare-hq-deploy, and (3) formplayer
https://github.com/dimagi/formplayer. If we’re trying to get this down
to a single repo (or at least a single README) can you describe the
relationship between these three repos theoretically and in
user/directory terms? It would be amazing to know where you clone each of
them and how you run them in relation to each other. We’ve been doing all
the ansible stuff as root and the django stuff as cchq, but that may not
be right.

None of these repo’s need to be anywhere specific. The formplayer repo in
particular should not be needed for anything. Currently when we deploy
formplayer it pulls the latest version from our Jenkins build server.

The other two are related as follows (at least for our setup):

  • ansible repo: stores all the ansible playbooks and vault files (with
    all the secret keys etc).
  • deploy repo: has the deploy scripts and the ansible inventory files
    (required for both deploy and ansible)

We always setup one of the VMs in our clusters as a ‘control’ machine from
where we can run the ansible playbooks (and also normal deploys if we
want). Once you have an account on this machine you can follow the
instructions in the readme: https://github.com/dimagi/commcarehq-ansible#
setting-up-a-dev-account-on-ansible-control-machine

This should setup the ‘commcare-hq-deploy’ repo and also the python
virtualenv for ansible. It will also create some bash aliases that make it
easier to run ansible playbooks.

I hope that answer’s your questions. Let me know if you have follow ups.

Cheers
Simon

Simon, this is fantastic—thank you. Very quick one:

  1. I ran ansible-playbook -i inventories/monolith -e
    @vars/dev/dev_private.yml’ -e ‘@vars/dev/dev_public.yml’ deploy_stack.yml
    -u root --tags=users
  2. Then ansible-playbook -i inventories/monolith -e
    @vars/dev/dev_private.yml’ -e ‘@vars/dev/dev_public.yml’ deploy_stack.yml
    -u ansible

on a brand new box, but get:

TASK [apt]

··· *********************************************************************************************************************************************************************************** fatal: [159.203.132.215]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to 159.203.132.215 closed.\r\n", "module_stdout": "sudo: a password is required\r\n", "msg": "MODULE FAILURE", "rc": 1}

Was I misinterpreting the suggestion? Or is there a change that must be
made either to the ansible user setup or with visudo to get this working?
Conceivably, we could set up password-less sudo privs for ansible like
this: ansible ALL=(ALL) NOPASSWD:ALL

Best,
Taylor

On Monday, October 30, 2017 at 5:18:30 AM UTC-3, Simon Kelly wrote:

Hey answers inline:

  1. Right now, we’re only able to get ansible to run if we set a
    FORMPLAYER_INTERNAL_AUTH_KEY to an empty string, but since we later have
    issues with Formplayer I’m worried that this isn’t the right move. What
    should we do here?

That is a shared secret between HQ and Formplayer which allows formplayer
to authenticate API calls to HQ. You should set it to a secret key with a
reasonable amount of entropy.

  1. With which user do you run ansible on a Digital Ocean box? (I
    noticed an “ansible” user gets configured, but presumably you’ve got to
    first run as root. Are you meant to run once as root and then after it
    fails subsequently run as ansible?)

It’s usually necessary to run it as a privileged user once to setup the
user accounts. We would normally run just the ‘users’ tag as root and from
then on we can run it as the ‘ansible’ user.

  1. Where do you clone the commcare-hq-deploy repo and with which user
    do you run fab deploy?

For running deploy you can have the repo anywhere you like as long as you
have access to the machines you’re deploying to from there. For deploy you
don’t require any external dependencies (other than those defined in
requirements.txt.

And then one higher-level question to make sure I’m understanding things
correctly: It seems as though deployment on a new box requires the cloning
and configuration of three separate repos: (1) commcarehq-ansible
https://github.com/dimagi/commcarehq-ansible, (2) commcare-hq-deploy
https://github.com/dimagi/commcare-hq-deploy, and (3) formplayer
https://github.com/dimagi/formplayer. If we’re trying to get this down
to a single repo (or at least a single README) can you describe the
relationship between these three repos theoretically and in
user/directory terms? It would be amazing to know where you clone each of
them and how you run them in relation to each other. We’ve been doing
all the ansible stuff as root and the django stuff as cchq, but that may
not be right.

None of these repo’s need to be anywhere specific. The formplayer repo in
particular should not be needed for anything. Currently when we deploy
formplayer it pulls the latest version from our Jenkins build server.

The other two are related as follows (at least for our setup):

  • ansible repo: stores all the ansible playbooks and vault files (with
    all the secret keys etc).
  • deploy repo: has the deploy scripts and the ansible inventory files
    (required for both deploy and ansible)

We always setup one of the VMs in our clusters as a ‘control’ machine from
where we can run the ansible playbooks (and also normal deploys if we
want). Once you have an account on this machine you can follow the
instructions in the readme:
https://github.com/dimagi/commcarehq-ansible#setting-up-a-dev-account-on-ansible-control-machine

This should setup the ‘commcare-hq-deploy’ repo and also the python
virtualenv for ansible. It will also create some bash aliases that make it
easier to run ansible playbooks.

I hope that answer’s your questions. Let me know if you have follow ups.

Cheers
Simon