URGENT - new issues after installing Changelog 0087

erobinson · September 14, 2025, 9:41pm

On one server we manage, I’ve been able to upgrade to ES6 successfully - or so it seemed.

I had numerous challenges previously as per this thread: Issues with Changelog 0087 - upgrade to ES 6 however it seemed that I was able to upgrade ElasticSearch successfully this time.

To perform the 0087 changelog upgrade I first deployed this changelog: Merge pull request #36134 from dimagi/pkv/fcm-analytics-label · dimagi/commcare-hq@ded7c3f · GitHub

I then performed all steps in 0087 successfully and took a backup before restarting the server but since that update, I’ve been unable to perform any further deployments. I first tried the latest verified production deployment: b82c59c but I get a failure on the migrations with this error:

I restored my backup prior to attempting that deployment and tried again, this time with a much older changelog - the next newest verified production deploy: Merge pull request #36721 from dimagi/create-pull-request/update-tran… · dimagi/commcare-hq@4bc899f · GitHub but that also fails during migrations (with this error):

If I’m understanding correctly, both times, 0083_reset_ccu_dm_is_active.py fails when trying to query Elasticsearch for CommCareUser documents. ES returns a 400 error with search_phase_execution_exception

ES cluster health looks good - green.

I have since restored the system prior to the ES6 upgrade just in case but I have a QA server ready to use for testing. Please let me know what I can do to help troubleshoot this issue.

Thanks!

Ethan_Soergel · September 16, 2025, 8:43am

I’m not certain why that migration would fail - couple thoughts though.

Are you able to query elasticsearch at all? What about the users index specifically? You can try this in a ./manage.py shell

from corehq.apps.es import UserES
UserES().count()

That should output the number of user documents stored in elasticsearch. If that works, can you try this?

UserES().mobile_users().nested(
    'user_domain_memberships',
    filters.term('user_domain_memberships.is_active', False),
).count()

That will give the number of documents to be affected by this migration. I’d expect that for most environments, it will be 0. If that’s the case, you can just skip the migration entirely, by faking it, though if those commands work in a shell session I don’t know why they wouldn’t succeed during the deploy.

erobinson · September 16, 2025, 8:55am

Thanks for the reply Ethan, I’m going to give it a go on the QA server today and will follow up.

erobinson · September 16, 2025, 7:51pm

this is failing on filters.term….:

In [1]: from corehq.apps.es import UserES

In [2]: UserES().count()
Out[2]: 1268

In [3]: UserES().mobile_users().nested(
   ...:     'user_domain_memberships',
   ...:     filters.term('user_domain_memberships.is_active', False),
   ...: ).count()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 3
      1 UserES().mobile_users().nested(
      2     'user_domain_memberships',
----> 3     filters.term('user_domain_memberships.is_active', False),
      4 ).count()

NameError: name 'filters' is not defined

Thanks

erobinson · September 16, 2025, 8:11pm

In case it’s useful, I popped the original error in the opening post into claude.ai and it returned the following:

—
The migration failed due to an Elasticsearch query error. Here's what happened:

Core Issue

The migration script 0083_reset_ccu_dm_is_active.py tried to query Elasticsearch to find affected user IDs, but the query failed with a 400 error: search_phase_execution_exception.

Root Cause

The Elasticsearch query is trying to search for a nested field user_domain_memberships.is_active that either:

Doesn't exist in the current index mapping, or
Has a different mapping structure than expected

The Problematic Query

The query looks for:

Document type: CommCareUser
Nested field: user_domain_memberships.is_active = false
Base document: couchuser

But Elasticsearch can't execute this query structure against the current index.

What Likely Went Wrong

Index mapping mismatch - The user_domain_memberships field may not be mapped as a nested field in Elasticsearch, or the is_active subfield doesn't exist
Timing issue - The Elasticsearch index mapping update happened earlier in the migration (you can see it added is_account_confirmed field), but the nested structure for user_domain_memberships.is_active wasn't properly established

Next Steps

Check Elasticsearch mapping for the users index to see if user_domain_memberships.is_active exists
Reindex users if the mapping is incorrect
Run the migration again after fixing the Elasticsearch mapping
Or skip this specific migration if the data migration isn't critical (though this needs careful consideration)

The migration appears to be trying to reset some user domain membership active flags, but can't find the users to update due to the ES query failure.

Also to note, this has happened on our QA and production servers for this client.

Ethan_Soergel · September 18, 2025, 9:50am

Ah right, you’ll also have to import filters:

from corehq.apps.es import filters

I wonder if this is an order of operations thing - the user_domain_memberships field was added in May in es migration 0015_add_user_domain_memberships. If you run

./manage.py showmigrations es

does it show that that’s been applied?

erobinson · September 18, 2025, 10:36am

Thanks Ethan, I’ll check, I imagine those are applied given we applied this update before running the ES6 upgrade: Merge pull request #36134 from dimagi/pkv/fcm-analytics-label · dimagi/commcare-hq@ded7c3f · GitHub

I’ll provide feedback shortly.

erobinson · September 18, 2025, 12:02pm

Here’s the output:

In [1]: from corehq.apps.es import UserES

In [2]: from corehq.apps.es import filters

In [3]: UserES().mobile_users().nested(...: 'user_domain_memberships',...: filters.term('user_domain_memberships.is_active', False),...: ).count()

RequestError Traceback (most recent call last)Cell In[3], line 41 UserES().mobile_users().nested(2 'user_domain_memberships',3 filters.term('user_domain_memberships.is_active', False),----> 4 ).count()

File ~/www/monolith/releases/2025-09-01_20.31/corehq/apps/es/es_query.py:472, in ESQuery.count(self)471 def count(self):--> 472 return self.adapter.count(self.raw_query)

File ~/www/monolith/releases/2025-09-01_20.31/corehq/apps/es/client.py:610, in ElasticDocumentAdapter.count(self, query)604 """Return the number of documents matched by the query605606 :param query: dict query body607 :returns: int608 """609 query = self._prepare_count_query(query)--> 610 return self._es.count(self.index_name, self.type, query).get("count")

File ~/www/monolith/releases/2025-09-01_20.31/.venv/lib/python3.13/site-packages/elasticsearch6/client/utils.py:101, in query_params.._wrapper.._wrapped(*args, **kwargs)99 if p in kwargs:100 params[p] = kwargs.pop(p)--> 101 return func(*args, params=params, **kwargs)

File ~/www/monolith/releases/2025-09-01_20.31/.venv/lib/python3.13/site-packages/elasticsearch6/client/init.py:1518, in Elasticsearch.count(self, index, doc_type, body, params)1515 if doc_type and not index:1516 index = "_all"-> 1518 return self.transport.perform_request(1519 "GET", _make_path(index, doc_type, "_count"), params=params, body=body1520 )

File ~/www/monolith/releases/2025-09-01_20.31/.venv/lib/python3.13/site-packages/elasticsearch6/transport.py:402, in Transport.perform_request(self, method, url, headers, params, body)400 delay = 2 ** attempt - 1401 time.sleep(delay)--> 402 status, headers_response, data = connection.perform_request(403 method,404 url,405 params,406 body,407 headers=headers,408 ignore=ignore,409 timeout=timeout,410 )412 except TransportError as e:413 if method == "HEAD" and e.status_code == 404:

File ~/www/monolith/releases/2025-09-01_20.31/.venv/lib/python3.13/site-packages/elasticsearch6/connection/http_urllib3.py:252, in Urllib3HttpConnection.perform_request(self, method, url, params, body, timeout, ignore, headers)248 if not (200 <= response.status < 300) and response.status not in ignore:249 self.log_request_fail(250 method, full_url, url, orig_body, duration, response.status, raw_data251 )--> 252 self._raise_error(response.status, raw_data)254 self.log_request_success(255 method, full_url, url, orig_body, response.status, raw_data, duration256 )258 return response.status, response.getheaders(), raw_data

File ~/www/monolith/releases/2025-09-01_20.31/.venv/lib/python3.13/site-packages/elasticsearch6/connection/base.py:253, in Connection._raise_error(self, status_code, raw_data)250 except (ValueError, TypeError) as err:251 logger.warning("Undecodable raw error response from server: %s", err)--> 253 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(254 status_code, error_message, additional_info255 )

RequestError(400, 'search_phase_execution_exception', 'failed to create query: {

...

"nested" : {

"query" : {
"bool" : {

  "filter" : \[

    {

      "term" : {

        "user_domain_memberships.is_active" : {

          "value" : false,

          "boost" : 1.0

        }

      }

    }

  \],
...

"path" : "user_domain_memberships",

"ignore_unmapped" : false,

"score_mode" : "avg",

"boost" : 1.0

}

For reference, this produces output:
In [4]: UserES().mobile_users().filter(
filters.term('user_domain_memberships.is_active', False)
).count()
Out[4]: 0

I assume that means user_domain_memberships is not nested?

erobinson · September 18, 2025, 12:09pm

These are the migrations already done:

(commcare-hq) (monolith) cchq@monolith:~/www/monolith/current$ ./manage.py showmigrations eses 0001_bootstrap_es_indexes 0002_add_tombstones

0003_add_assigned_location_ids 0004_make_new_indexes

0005_add_epoch_as_valid_date_to_forms 0006_verify_es2_indices_reindexed

0007_init_indices_for_fresh_es_5 0008_add_doc_id_to_all_mappings

0009_add_indices_for_reindex_in_es5 0010_delete_reverted_indices

0011_add_indices_for_es5_reindex 0012_add_new_index_for_bha

0013_add_last_modifed

0014_enable_slowlogs 0015_add_user_domain_memberships

0016_add_new_index_for_cc_perf

Ethan_Soergel · September 18, 2025, 1:30pm

That is odd - I wonder if something went wrong during the upgrade. You could try checking the mapping stored in ES to see if it’s out of date - it should align with the one in the code:
https://github.com/dimagi/commcare-hq/blob/master/corehq/apps/es/mappings/user_mapping.py

This bit of documentation describes how to do that:

If that is out of date, then my guess would be that something went amiss during the upgrade to ES6 such that these migrations were applied to an older copy of the index, not the live version.

erobinson · September 18, 2025, 2:01pm

This was my suspicion some time back when troubleshooting the issue. I’ll go back and see if I can find where I penned those thoughts.

erobinson · September 18, 2025, 3:22pm

Hey @Ethan_Soergel so I took a look on the production server prior to the ES 6 upgrade and there are indeed duplicate indexes on there prior to running the upgrade and I suspect the wrong indexes are being worked on? This server had been through the ES2 → ES5 upgrade a year ago and it’s possible the old indexes were there from that or perhaps from a prior failed attempt at the ES6 upgrade. Looking at the list, it’s clear the -20230524 are the active indexes. I imagine deleting the -2024-05-09 and retrying the upgrade may fix things? Not sure. Let me know your thoughts?

health status index                  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   domains-2024-05-09     a27Iub_TRIWRMOEzOs8Ubg   5   0          0            0       960b           960b
green  open   sms-2024-05-09         q44ocUApRh2-XEZuJ1X2Bw   5   0          0            0       960b           960b
green  open   apps-2024-05-09        CRkqqTmwSdCfvPvQMO0_eQ   5   0          0            0       960b           960b
green  open   forms-20230524         MLyRSMXISPe8HWGMH8HwVw   5   0     574561         7862      1.8gb          1.8gb
green  open   apps-20230524          5ANxtK4QQ2uIPZAVk6rE-w   5   0      33380         2738    156.2mb        156.2mb
green  open   case-search-20230524   Xr8lJ60yQPqtYXigHdXXJg   5   0   21827296       315583      2.1gb          2.1gb
green  open   users-20230524         RFHIyZD-T0yRgpWd3s4IVQ   2   0      17028          145      6.2mb          6.2mb
green  open   case-search-2024-05-09 vti9dSqpT7CdPEx7ebne6Q   5   0          0            0       960b           960b
green  open   cases-20230524         oGBxz_d0ToKeoDJT0DUhhA   5   0    1864795        36802    862.6mb        862.6mb
green  open   groups-20230524        o14_BR56QGGJV0bof_-XIw   5   0         83           45    146.3kb        146.3kb
green  open   domains-20230524       mYeytKTjRpm2KdPHB5Yj-A   5   0         26            2    423.3kb        423.3kb
green  open   forms-2024-05-09       vJVHSne2QduPCeT-FjeiRw   5   0          0            0       960b           960b
green  open   users-2024-05-09       4QOgU9w9SZy3Spv6nLr6CA   2   0          0            0       384b           384b
green  open   groups-2024-05-09      l73bxkXBT6GqF3sn22fnDA   5   0          0            0       960b           960b
green  open   sms-20230524           HWcbmOlHSNGC_7huzEfrzg   5   0          0            0        1kb            1kb
green  open   cases-2024-05-09       IrjFZejNQUi2oZBRnVrKFw   5   0          0            0       960b           960b

erobinson · September 19, 2025, 1:09pm

Hey Ethan, I’m following these instructions but am having trouble translating the smslogs_2020-01-28:sms bit in the second command - what should I be using to compare the users index? The first command was:

./manage.py print_elastic_mappings users --no-names > ./users-in-code.py

What would the second be to get the users-20230524 index mapping?

cchq <env> django-manage print_elastic_mappings smslogs_2020-01-28:sms --no-names > ./sms-live.py

Then once I’ve confirmed whether they’re different or not, can I apply the ES migration manually?
0015_add_user_domain_memberships

aphulera · September 19, 2025, 2:03pm

Hey Ed!

I went through the details you have shared but I was unable to figure out how did you end up in this state.

I had a discussion with Ethan and we agreed that best path forward would be fix your mappings for all the indices to ensure that your ES state is in sync with what HQ expects.

Would you be be able to run the following snippet in the django-manage shell of the system which has improper mapping configuration?

from corehq.apps.es.migration_operations import UpdateIndexMapping
from corehq.apps.es.transient_util import iter_doc_adapters

for adapter in iter_doc_adapters():
    print(adapter.index_name, adapter.type)
    UpdateIndexMapping(name=adapter.index_name, type_=adapter.type, properties=adapter.mapping['properties'], es_versions=[6]).run()

After running this you can try to run migrate again -

./manage.py migrate

If this runs without any issues your deploy should go fine after that.

Feel free to reach out if you face any issues.

Thanks for your patience.

erobinson · September 19, 2025, 2:09pm

Thanks Amit. I’ll do that,

Ed

erobinson · September 19, 2025, 2:20pm

I’m running these on the machine prior to ES6 upgrade and am getting the following:

apps-20230524 app
2025-09-19 14:22:55,775 INFO [corehq.apps.es.migration_operations] The mappings were created for Elasticsearch version/s [6]
2025-09-19 14:22:55,775 INFO [corehq.apps.es.migration_operations] Current Elasticsearch version in 5. Skipping the operation.
cases-20230524 case
2025-09-19 14:22:55,775 INFO [corehq.apps.es.migration_operations] The mappings were created for Elasticsearch version/s [6]
2025-09-19 14:22:55,775 INFO [corehq.apps.es.migration_operations] Current Elasticsearch version in 5. Skipping the operation.
case-search-20230524 case
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] The mappings were created for Elasticsearch version/s [6]
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] Current Elasticsearch version in 5. Skipping the operation.
domains-20230524 hqdomain
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] The mappings were created for Elasticsearch version/s [6]
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] Current Elasticsearch version in 5. Skipping the operation.
forms-20230524 xform
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] The mappings were created for Elasticsearch version/s [6]
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] Current Elasticsearch version in 5. Skipping the operation.
groups-20230524 group
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] The mappings were created for Elasticsearch version/s [6]
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] Current Elasticsearch version in 5. Skipping the operation.
sms-20230524 sms
2025-09-19 14:22:55,776 INFO [corehq.apps.es.migration_operations] The mappings were created for Elasticsearch version/s [6]
2025-09-19 14:22:55,777 INFO [corehq.apps.es.migration_operations] Current Elasticsearch version in 5. Skipping the operation.
users-20230524 user
2025-09-19 14:22:55,777 INFO [corehq.apps.es.migration_operations] The mappings were created for Elasticsearch version/s [6]
2025-09-19 14:22:55,777 INFO [corehq.apps.es.migration_operations] Current Elasticsearch version in 5. Skipping the operation.

I assume I should run the ES6 upgrade first then run it?

aphulera · September 19, 2025, 2:31pm

Yeah. You need to do an ES upgrade first.
And you should not deploy HQ beyond Merge pull request #36468 from dimagi/ap/stop-es-5-support · dimagi/commcare-hq@0bdf2c7 · GitHub if you have not upgraded elasticsearch.

erobinson · September 19, 2025, 2:59pm

One quick question I have - I have these duplicate indexes showing with no data in them - this is before I begin the ES upgrade process (including the update-settings after adding the various _INDEX_MULTIPLEXED rows into public.yml). I was wondering if these indices have something to do with the issue. I assume it would be safe to delete them prior to running the process. Below is the list of indexes (after the first elastic_sync_multiplexed on apps): All the 2024-05-09 items were empty. prior to starting.

Alternatively, perhaps it’s safe to leave them - it seems to be using them though during the multiplexing process. Thanks!

aphulera · September 20, 2025, 12:20pm

The indices are expected to be there. These will be your new indices after you upgrade to ES 6. You should not delete them.

I was wondering if these indices have something to do with the issue.

I don’t think the issue would be because of the these indices. Issue is coming because some mappings that were applied by migrations do not exist in the new indices. If you have deployed Merge pull request #36468 from dimagi/ap/stop-es-5-support · dimagi/commcare-hq@0bdf2c7 · GitHub this commit before moving to ES 6?

If yes then your ES indices might end up in state and might require full reindex of your ES cluster.

erobinson · September 20, 2025, 4:12pm

No, I’m on Merge pull request #36134 from dimagi/pkv/fcm-analytics-label · dimagi/commcare-hq@ded7c3f · GitHub before upgrading. I’ll keep you posted. Thanks.