Live migration from s3_to_s3

Hi there,

I was trying to migrate from one Minio_object to another. I followed the steps from commcare-cloud: Migrate from one S3 backend to another | CommCare Cloud

the migration works seem finished after a long time.

In the end, i check missing migration with the following cmd.

cchq echis django-manage --tmux check_blob_logs /opt/data/blobdb-migration-logs/migrate_backend-blob-migration-<timestamp>.txt```

It displays me with a list of unmatched. found list on old-blobdb but not on new one. So i used the next command . which is --migrate to migrate items “Found in old db”. Here is the output on some of them.

Traceback (most recent call last):
File "src/gevent/greenlet.py", line 766, in gevent._greenlet.Greenlet.run
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/management/commands/check_blob_logs.py", line 110, in process
category = check_blob(rec, old_db, new_db, migrate)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/management/commands/check_blob_logs.py", line 123, in check_blob
with old_db.get(key=key) as content:
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/ex-submodules/dimagi/utils/retry.py", line 31, in retry
return func(*args, **kw)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/s3db.py", line 96, in get
key = self._validate_get_args(key, type_code, meta)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/interface.py", line 80, in _validate_get_args
raise ValueError("'key' must be specified with 'type_code'")
ValueError: 'key' must be specified with 'type_code'
2021-06-08T04:57:16Z <Greenlet at 0x7f26d056c948: process({'blobmeta_id': 679030, 'domain': 'fmoh-echis', 't, <corehq.blobs.s3db.S3BlobDB object at 0x7f26dabb38, <corehq.blobs.s3db.S3BlobDB object at 0x7f26db2a72, True)> failed with ValueError

Hi Demisew

Thanks for reporting this issue. I've created a PR with a fix: pass in type_code in check_blob_logs by snopoke · Pull Request #29878 · dimagi/commcare-hq · GitHub

In the mean time until that get's merged you could apply that change manually to the file in your release folder:


File: corehq/blobs/management/commands/check_blob_logs.py

123 -  with old_db.get(key=key) as content:
123 +  with old_db.get(key=key, type_code=CODES.maybe_compressed) as content:

Thanks Simon,
it works that way. Now i am able to migrate most of missing objects using the command:

cchq echis django-manage check_blob_logs --migrate /opt/data/blobdb-migration-logs/migrate_backend-blob-migration-20210428T132533Z.txt

The result is some files is still on old-blobdb and not migrated to new-blobdb. Or number not matched as shown below.

Is it save to flip to new blobdb server ?

Hi Demisew

To be on the safe side I would run the 'check' migration once more:

python manage.py run_blob_migration migrate_backend_check --log-dir=/opt/data/blobdb-migration-logs --chunk-size=1000 --num-workers=15

This is almost the same as the first migration but it will first check if the object is in the new BlobDB and only attempt to copy it over if it isn't. This should make it much faster and will also give you a log output for anything that's missing.

Once that is complete you should inspect that log to see if anything critical is missing. If not you can flip. You could also run the check_blob_logs again on the new log file to get the nice summary output.

Hi Simon,
i am getting this error while running the command

cchq echis django-manage run_blob_migration migrate_backend_check --log-dir=/opt/data/blobdb-migration-logs --chunk-size=1000 --num-workers=15

Error processing blob:
Traceback (most recent call last):
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/migrate.py", line 484, in work_on
ok = migrator.migrate(doc)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/migrate.py", line 305, in migrate
content = self.db.old_db.get(key=meta.key)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/ex-submodules/dimagi/utils/retry.py", line 31, in retry
return func(*args, **kw)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/s3db.py", line 96, in get
key = self._validate_get_args(key, type_code, meta)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/interface.py", line 80, in _validate_get_args
raise ValueError("'key' must be specified with 'type_code'")
ValueError: 'key' must be specified with 'type_code'
will retry c857c170bb2914f540e9f342ae004cb1 71
Error processing blob:
Traceback (most recent call last):
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/migrate.py", line 484, in work_on
ok = migrator.migrate(doc)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/migrate.py", line 305, in migrate
content = self.db.old_db.get(key=meta.key)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/ex-submodules/dimagi/utils/retry.py", line 31, in retry
return func(*args, **kw)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/s3db.py", line 96, in get
key = self._validate_get_args(key, type_code, meta)
File "/home/cchq/www/echis/releases/2021-06-01_04.52/corehq/blobs/interface.py", line 80, in _validate_get_args
raise ValueError("'key' must be specified with 'type_code'")
ValueError: 'key' must be specified with 'type_code'
will retry d563de2ee8584917ba614422d3a1e33c 68

There was a second place that had the same issue as the other one, you can see the diff here: pass in type_code in check_blob_logs by snopoke · Pull Request #29878 · dimagi/commcare-hq · GitHub

That has been merged so either you can create a new release or apply the fix manually.

Thank you for the update.

Now it's working as expected.

Ubuntu 18.04.5 LTS
Migration log: /opt/data/blobdb-migration-logs/migrate_backend_check-blob-migration-20210609T102634Z.txt
Processing 8850317 documents (~2 already processed): BlobMeta...
Processed 1/3 of 8850317 documents in 0:00:00.162905 (16 days, 16:29:20.402170 remaining)
Processed 1000/1002 of 8850317 documents in 0:00:47.246532 (4 days, 20:08:14.736490 remaining)