Couchdb failed to replicate

Hi,

Couchdb failed to replicate after the instances were upgraded to Ubuntu 22.04.

The steps followed to install and replicate the CouchDB:

  1. It is installed on the nodes( version: 3.3.1) using cchq echis deploy-stack limit=<couchdb-servers
curl -XGET 172.19.3.35:15984
{"couchdb":"Welcome","version":"3.3.1","git_sha":"1fd50b82a","uuid":"61f66e67e65a525997c23960fb11ef50","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}
  1. The nodes are added to the cluster (cchq echis aps --tags=add_couch_nodes --limit=<couchdb-server )
  2. The back up was copied to the nodes and restored using restore_couchdb_backup.sh bash script, used this guide.
  3. Checked their functionality one by one by adding to the proxy, each of them are working properly (only one node is added to the proxy right now).
  4. Port and IP address are added to local.ini in the [chttpd] section of each node.
[chttpd]
port = 15984
bind_address = 172.19.3.37

6.The plan is created

target_allocation:
- echis_server55,echis_server34,echis_server32,echis_server14:4
  1. migrate-couchdb myplan.yml migrate --no-stop, but has got stuck with the following failure message.
Give ansible user access to couchdb files:
ansible couchdb2 -m user -i /home/administrator/commcare-cloud/environments/echis/inventory.ini -a 'user=ansible groups=couchdb append=yes' -u ansible --become -e @/home/administrator/commcare-cloud/environments/echis/public.yml -e @/home/administrator/commcare-cloud/environments/echis/.generated.yml -e @/home/administrator/commcare-cloud/environments/echis/vault.yml --vault-password-file=/home/administrator/commcare-cloud/src/commcare_cloud/ansible/echo_vault_password.sh '--ssh-common-args=-o UserKnownHostsFile=/home/administrator/commcare-cloud/environments/echis/known_hosts' --diff
172.19.3.37 | SUCCESS => {
    "append": true,
    "changed": false,
    "comment": ",,,",
    "group": 3001,
    "groups": "couchdb",
    "home": "/home/ansible",
    "move_home": false,
    "name": "ansible",
    "shell": "/bin/bash",
    "state": "present",
    "uid": 1001
}
172.19.3.35 | SUCCESS => {
    "append": true,
    "changed": false,
    "comment": ",,,",
    "group": 3001,
    "groups": "couchdb",
    "home": "/home/ansible",
    "move_home": false,
    "name": "ansible",
    "shell": "/bin/bash",
    "state": "present",
    "uid": 1001
}
172.19.3.55 | SUCCESS => {
    "append": true,
    "changed": false,
    "comment": ",,,",
    "group": 3001,
    "groups": "couchdb",
    "home": "/home/ansible",
    "move_home": false,
    "name": "ansible",
    "shell": "/bin/bash",
    "state": "present",
    "uid": 1001
}
172.19.4.50 | SUCCESS => {
    "append": true,
    "changed": false,
    "comment": "",
    "group": 3001,
    "groups": "couchdb",
    "home": "/home/ansible",
    "move_home": false,
    "name": "ansible",
    "shell": "/bin/bash",
    "state": "present",
    "uid": 1001
}
ansible couchdb2 -m file -i /home/administrator/commcare-cloud/environments/echis/inventory.ini -a 'path=/opt/data/couchdb2/ mode=0755' -u ansible --become -e @/home/administrator/commcare-cloud/environments/echis/public.yml -e @/home/administrator/commcare-cloud/environments/echis/.generated.yml -e @/home/administrator/commcare-cloud/environments/echis/vault.yml --vault-password-file=/home/administrator/commcare-cloud/src/commcare_cloud/ansible/echo_vault_password.sh '--ssh-common-args=-o UserKnownHostsFile=/home/administrator/commcare-cloud/environments/echis/known_hosts' --diff

172.19.4.50 | SUCCESS => {
    "changed": false,
    "gid": 125,
    "group": "couchdb",
    "mode": "0755",
    "owner": "couchdb",
    "path": "/opt/data/couchdb2/",
    "size": 4096,
    "state": "directory",
    "uid": 118
}

172.19.3.37 | SUCCESS => {
    "changed": false,
    "gid": 125,
    "group": "couchdb",
    "mode": "0755",
    "owner": "couchdb",
    "path": "/opt/data/couchdb2/",
    "size": 4096,
    "state": "directory",
    "uid": 118
}

172.19.3.55 | SUCCESS => {
    "changed": false,
    "gid": 125,
    "group": "couchdb",
    "mode": "0755",
    "owner": "couchdb",
    "path": "/opt/data/couchdb2/",
    "size": 4096,
    "state": "directory",
    "uid": 118
}

172.19.3.35 | SUCCESS => {
    "changed": false,
    "gid": 125,
    "group": "couchdb",
    "mode": "0755",
    "owner": "couchdb",
    "path": "/opt/data/couchdb2/",
    "size": 4096,
    "state": "directory",
    "uid": 118
}
Copy file lists to nodes:
ansible all -m shell -i /home/administrator/commcare-cloud/environments/echis/inventory.ini -a '/tmp/file_migration/file_migration_rsync.sh --dry-run' -u ansible --become -e @/home/administrator/commcare-cloud/environments/echis/public.yml -e @/home/administrator/commcare-cloud/environments/echis/.generated.yml -e @/home/administrator/commcare-cloud/environments/echis/vault.yml --vault-password-file=/home/administrator/commcare-cloud/src/commcare_cloud/ansible/echo_vault_password.sh '--ssh-common-args=-o UserKnownHostsFile=/home/administrator/commcare-cloud/environments/echis/known_hosts' --diff --limit=
172.19.3.34 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.39 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.50 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.54 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.47 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.41 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.41 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.97 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.38 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.40 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.31 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.42 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.75 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.54 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.53 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.55 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.33 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.50 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.51 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.57 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.52 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.76 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.43 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.44 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.77 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.79 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.48 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.3.36 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.43 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.37 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.61 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.62 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.60 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.48 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.59 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.71 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.72 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.36 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.46 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code
172.19.4.63 | FAILED | rc=127 >>
/bin/sh: 1: /tmp/file_migration/file_migration_rsync.sh: not foundnon-zero return code

CoucDB hosts in the inventory.ini file:

[couchdb2:children]
echis_server32
echis_server34
echis_server14
echis_server55

Thank you,

Hi @sirajhassan

According to this section, you need to deploy the new nodes and add them to the cluster. Can you confirm that you did this?

Hi Chris,

I ran the command (cchq echis aps --tags=add_couch_nodes --limit=couchdb2) again, 'Add nodes' task is skipping some nodes. Is it an issue?

TASK [couchdb2 : Add nodes] *******************************************************************************************************
skipping: [172.19.3.35] => (item=172.19.3.35)
skipping: [172.19.3.37] => (item=172.19.3.35)
skipping: [172.19.3.37] => (item=172.19.3.37)
skipping: [172.19.3.37] => (item=172.19.4.50)
skipping: [172.19.3.37] => (item=172.19.3.55)
skipping: [172.19.4.50] => (item=172.19.3.35)
skipping: [172.19.4.50] => (item=172.19.3.37)
skipping: [172.19.4.50] => (item=172.19.4.50)
skipping: [172.19.4.50] => (item=172.19.3.55)
skipping: [172.19.3.55] => (item=172.19.3.35)
skipping: [172.19.3.55] => (item=172.19.3.37)
skipping: [172.19.3.55] => (item=172.19.4.50)
skipping: [172.19.3.55] => (item=172.19.3.55)
ok: [172.19.3.35] => (item=172.19.3.37)
ok: [172.19.3.35] => (item=172.19.4.50)
ok: [172.19.3.35] => (item=172.19.3.55)

Recap:

172.19.3.35                : ok=15   changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
172.19.3.37                : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
172.19.3.55                : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
172.19.4.50                : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

✓ Apply completed with status code 0

Can you share the output of running curl -s http://172.19.3.35:15984/_membership and also the output of /etc/default/couchdb for the couch machines please?

If 127.0.0.1 is in /etc/default/couchdb for the machines running version 3.3.1 (which should be all of them from your original question), please remove the 127.0.0.1 address, and then try running cchq echis aps --tags=add_couch_nodes --limit=couchdb2 command again

127.0.0.1 is removed from /etc/default/couchdb (applied on all couchdb machines).

The output of curl -s http://172.19.3.35:15984/_membership is:
{"all_nodes":["couchdb@172.19.3.35","couchdb@172.19.3.37","couchdb@172.19.3.54","couchdb@172.19.3.55","couchdb@172.19.3.56","couchdb@172.19.4.49","couchdb@172.19.4.50"],"cluster_nodes":["couchdb@172.19.3.35","couchdb@172.19.3.37","couchdb@172.19.3.54","couchdb@172.19.3.55","couchdb@172.19.3.56","couchdb@172.19.4.49","couchdb@172.19.4.50"]}

"couchdb@172.19.3.54", "couchdb@172.19.3.56" and "couchdb@172.19.4.49" persist even though they are not in the plan.

They are removed from the cluster and this is the output

{"all_nodes":["couchdb@172.19.3.35","couchdb@172.19.3.37","couchdb@172.19.3.54","couchdb@172.19.3.55","couchdb@172.19.3.56","couchdb@172.19.4.49","couchdb@172.19.4.50"],"cluster_nodes":["couchdb@172.19.3.35","couchdb@172.19.3.37","couchdb@172.19.3.55","couchdb@172.19.4.50"]}

Thank you so much @smittieC for your kind support.

I have checked manually that now they are replicating each other.

1 Like