Elasticsearch: TransportError

Hi

Am getting the following error after i tried to delete and regenerate elasticsearch indices. following the link below:

Problem:

  1. I delete and regenerate elasticsearch indices using the following cmd.
    curl -XDELETE 172.19.3.41:9200/index name

  2. regenreate elasticsearch:
    sudo -iu cchq
    cd www/echis/current
    source python_env-3.6/bin/activate
    ./manage.py ptop_preindex
    All indices recreated

  3. But the size of Indices doesn’t much the original size before deletion. So that I applied to each index.
    ./manage.py ptop_reindexer_v2 case --cleanup

Output: Are you sure you want to delete the current index (if it exists)? y/n
y

2019-11-12 11:45:21,679 INFO es_utils Initializing elasticsearch mapping for [case]
2019-11-12 11:45:23,601 INFO reindexer setting checkpoint to {TopicPartition(topic=‘case’, partition=0): 3806, TopicPartition(topic=‘case’, partition=3): 3103, TopicPartition(topic=‘case’, partition=9): 1660, TopicPartition(topic=‘case’, partition=4): 1504, TopicPartition(topic=‘case’, partition=5): 3401, TopicPartition(topic=‘case’, partition=6): 1727, TopicPartition(topic=‘case’, partition=1): 1404, TopicPartition(topic=‘case’, partition=7): 942, TopicPartition(topic=‘case’, partition=2): 438, TopicPartition(topic=‘case’, partition=8): 2330, TopicPartition(topic=‘case-sql’, partition=2): 831116, TopicPartition(topic=‘case-sql’, partition=6): 407248, TopicPartition(topic=‘case-sql’, partition=8): 423195, TopicPartition(topic=‘case-sql’, partition=3): 455878, TopicPartition(topic=‘case-sql’, partition=1): 579424, TopicPartition(topic=‘case-sql’, partition=0): 1598718, TopicPartition(topic=‘case-sql’, partition=7): 429793, TopicPartition(topic=‘case-sql’, partition=5): 717302, TopicPartition(topic=‘case-sql’, partition=4): 581104, TopicPartition(topic=‘case-sql’, partition=9): 547762}
2019-11-12 11:45:23,601 INFO manager (CaseToElasticsearchPillow-hqcases_2016-03-04) setting checkpoint: {“case,0”: 3806, “case,3”: 3103, “case,9”: 1660, “case,4”: 1504, “case,5”: 3401, “case,6”: 1727, “case,1”: 1404, “case,7”: 942, “case,2”: 438, “case,8”: 2330, “case-sql,2”: 831116, “case-sql,6”: 407248, “case-sql,8”: 423195, “case-sql,3”: 455878, “case-sql,1”: 579424, “case-sql,0”: 1598718, “case-sql,7”: 429793, “case-sql,5”: 717302, “case-sql,4”: 581104, “case-sql,9”: 547762}
Processing 0 documents: …
Processed 0/0 of 0 documents in 0:00:00.233131 (? remaining)
Processed 0/0 of 0 documents (0 previously processed).

Note: Not able to repopulate the data.

  1. when i look at the django.log: tail -f /home/cchq/www/echis/log/django.log I got the following error:

elasticsearch.exceptions.RequestError: TransportError(400, ‘SearchPhaseExecutionException[Failed to ex
ecute phase [query], all shards failed; shardFailures {[OSPeX2RUROWtiiGpguQsNg][hqusers_2017-09-07][0]
: RemoteTransportException[[es1][inet[/172.19.3.40:9300]][indices:data/read/search[phase/query]]]; nes
ted: SearchParseException[[hqusers_2017-09-07][0]: query[ConstantScore(+cache(domain.exact:fmoh-echis)
cache(domain_memberships.domain.exact:fmoh-echis) +cache(base_doc:couchuser))],from[-1],size[-1]: Par
se Failure [Failed to parse source [{“query”: {“filtered”: {“filter”: {“and”: [{“or”: [{“term”: {“doma
in.exact”: “fmoh-echis”}}, {“term”: {“domain_memberships.domain.exact”: “fmoh-echis”}}]}, {“term”: {“b
ase_doc”: “couchuser”}}]}, “query”: {“match_all”: {}}}}, “sort”: [{“username.exact”: {“order”: “asc”}}
], “from”: 0, “size”: 0, “_source”: ["_id", “username”, “first_name”, “last_name”, “doc_type”, “is_act
ive”, “location_id”, “__group_ids”]}]]]; nested: SearchParseException[[hqusers_2017-09-07][0]: query[C
onstantScore(+cache(domain.exact:fmoh-echis) cache(domain_memberships.domain.exact:fmoh-echis) +cache(
base_doc:couchuser))],from[-1],size[-1]: Parse Failure [No mapping found for [username.exact] in order
to sort on]]; }{[OSPeX2RUROWtiiGpguQsNg][hqusers_2017-09-07][1]: RemoteTransportException[[es1][inet[
/172.19.3.40:9300]][indices:data/read/search[phase/query]]]; nested: SearchParseException[[hqusers_201
7-09-07][1]: query[ConstantScore(+cache(domain.exact:fmoh-echis) cache(domain_memberships.domain.exact
:fmoh-echis) +cache(base_doc:couchuser))],from[-1],size[-1]: Parse Failure [Failed to parse source [{"
query": {“filtered”: {“filter”: {“and”: [{“or”: [{“term”: {“domain.exact”: “fmoh-echis”}}, {“term”: {"
domain_memberships.domain.exact": “fmoh-echis”}}]}, {“term”: {“base_doc”: “couchuser”}}]}, “query”: {"
match_all": {}}}}, “sort”: [{“username.exact”: {“order”: “asc”}}], “from”: 0, “size”: 0, “_source”: ["
_id", “username”, “first_name”, “last_name”, “doc_type”, “is_active”, “location_id”, “__group_ids”]}]]
]; nested: SearchParseException[[hqusers_2017-09-07][1]: query[ConstantScore(+cache(domain.exact:fmoh-
echis) cache(domain_memberships.domain.exact:fmoh-echis) +cache(base_doc:couchuser))],from[-1],size[-1
]: Parse Failure [No mapping found for [username.exact] in order to sort on]]; }]’)

Hi Demisew,

You wrote:

This reindexes only cases on projects using the couch backend. I expect you’re using the SQL “scale” backend (all new projects are), so you’ll want to run

./manage.py ptop_reindexer_v2 sql-case --cleanup

(You can double check the backend at
/a//settings/project/flags/ - search for “Using new SQL/scale backend”)


The error you’re running in to is

No mapping found for [username.exact] in order to sort on

This means that the index doesn’t have a mapping registered (the users index in this case). I’m not sure how that could’ve happened. I have seen this in a development environment, but not in production. I’d probably just try deleting and recreating the index if that’s an option. The users index is much smaller than forms or cases, so it shouldn’t take long to rebuild.

Hi, am not able to search for the feature flag: “Using new SQL/scale backend”, Is there anything i missed.

It looks like that’s the main feature flags page, available at /hq/flags/. The page I’m describing is available at /a/settings/project/flags/

The indicator I’m describing is on the bottom of that page (though I believe that’s the same condition described in the yellow alert box in my first image)

I’d be quite surprised if this project isn’t using the scale backend, but it couldn’t hurt to check.

Hi Ethan

the page is not available as shown on the screenshot below. I tried by replacing project with project name and as it is.

Oh, sorry, the forum software removed <domain> from my earlier link. The proper URL is

/a/<domain>/settings/project/flags/

So for your project, it should be available at

https://echisethiopia.org/a/fmoh-echis/settings/project/flags/

Yes it’s enabled.

But the error is still there. Here is the steps i followed:

I was following the following Post:
Elasticsearch node health in the red

  1. Delete Indices

  2. recreate indices using /manage.py ptop_preindex
    Issue: Indices created but size of indices are less than the previous size
    and Indices created with a replica value of 1. Previously, we didn’t have a replica for each index.

  3. run run ./manage.py ptop_reindexer_v2 [datatype] --cleanup” where datatype is one of

‘domain’, ‘user’, ‘group’, ‘groups-to-user’, ‘case’, ‘form’, ‘sql-case’, ‘sql-form’, ‘case-search’,
‘case-search-resumable’, ‘report-case’, ‘report-xform’, ‘sql-app-form-submission’,‘user-app-form-submission’,‘sms’,‘app’, ‘couch-app-form-submission’, ‘user-sync-history’.

it’s ok for SQL-case but brings an error for SQL-form.

  1. Then I look at the other post: Submit History and Case List shows no data

Now i run ./manage.py ptop_reindexer_v2 sql-case --reset

After running the above cmd case, hqcases_2016-03-04 index size changed from 30 MB to 5.8 GB. which is OK

  1. Running ./manage.py ptop_reindexer_v2 sql-form --reset
    running ok and brings an error in the middle. The following is the output

Processed 61000/61000 of 2295034 documents in 0:27:00.875969 (16:29:20.517414 remaining)
2019-11-17 13:11:38,923 INFO reindexer Processing batch of 1000 docs
Processed 62000/62000 of 2295034 documents in 0:27:27.464704 (16:28:56.179448 remaining)


Traceback (most recent call last):
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/memoized.py”, line 138, in call
return cache[key]
KeyError: ()
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/memo
ized.py”, line 138, in call
return cache[key]
KeyError: ()
----------

File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/memoized.py”, line 18, in _inner
return m(*args, **kwargs)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/memoized.py”, line 140, in call
value = self.func( *args, **kwargs)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/form_processor/models.py”, line 550, in
get_xml
raise MissingFormXml(self.form_id)
corehq.form_processor.exceptions.MissingFormXml: 78ce09b4-3af6-4e1a-b5e0-b2c6c60bae7f
Sentry is attempting to send 0 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit

6.The following is the output of indices: curl -XGET 172.19.3.41:9200/_cat/indices

green open smslogs_2017-02-09 5 1 0 0 1.4kb 720b
green open xforms_2016-07-07 5 1 64840 0 332.4mb 166.3mb
green open hqusers_2017-09-07 2 1 79869 190 28.2mb 14.1mb
green open hqdomains_2016-08-08 5 1 13 13 360kb 172.6kb
green open case_search_2018-05-29 5 1 2033433 3100 5.8gb 2.9gb
green open report_cases_czei39du507m9mmpqk3y01x72a3ux4p0 5 1 0 0 970b 575b
green open hqgroups_2017-05-29 5 1 4 4 26.2kb 13.1kb
green open report_xforms_20160824_1708 5 1 0 0 1.4kb 720b
green open hqcases_2016-03-04 5 1 6204612 10108 6.2gb 3.1gb
green open hqapps_2019-08-14 5 1 15798 1171 142.1mb 71mb

It sounds like you’re describing two different errors - are they both still occurring? In an earlier post, you described this error

No mapping found for [username.exact] in order to sort on

That’s a problem running queries against the “Users” index. Are you still running in to that error in the Django logs?


The second error you described is occurring while attempting to rebuild the “sql-forms” index.

You mentioned that you ran ptop_preindex and ptop_reindexer_v2 --cleanup. You don’t need to use both of these together.

  • ptop_preindex will build any missing indices, such as one you deleted manually.
  • ptop_reindexer_v2 --cleanup will delete the data from an existing index, then rebuild it.

Both of these run the same code to actually build the index.

This is the error you ran in to:

File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/form_processor/models.py”, line 550, in
get_xml
raise MissingFormXml(self.form_id)
corehq.form_processor.exceptions.MissingFormXml: 78ce09b4-3af6-4e1a-b5e0-b2c6c60bae7f

Do you have access to more of the stack trace there? I’d like to see what was happening when that error was raised.

The error is that the reindexer is unable to find the XML content of the form with ID 78ce09b4-3af6-4e1a-b5e0-b2c6c60bae7f. Inspecting that form may give a better idea of the problem. You should be able to view that form here:
https://echisethiopia.org/a/fmoh-echis/reports/form_data/78ce09b4-3af6-4e1a-b5e0-b2c6c60bae7f
Or in its raw form here:
https://echisethiopia.org/hq/admin/raw_doc/?id=78ce09b4-3af6-4e1a-b5e0-b2c6c60bae7f

The XML may be missing under normal operation in a few circumstances. Do you see anything out-of-the-ordinary about the form? When you view that second link above, what are the values of the state and problem fields?

  1. The previous error has gone from django.log file that is:
    No mapping found for [username.exact] in order to sort on is not more an issue after i run:
    ./manage.py ptop_reindexer_v2 user --cleanup

  2. current log output: tail -f /home/cchq/www/echis/log/django.log

    cursor.execute(statement, parameters)
    sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

[SQL: SELECT report_week_date AS report_week_date, sum(still_live) AS report_live_births, sum(still_births) AS report_still_births, sum(total_deaths) AS total_deaths, sum(deaths_first_seven_days) AS deaths_first_seven_days, sum(postnatal_0_48) AS postnatal_0_48, sum(total_malaria_confirmed_and_clinical) AS total_malaria_confirmed_and_clinical, sum(total_malaria_suspected_fever_cases_examined_by_rdt) AS total_malaria_suspected_fever_cases_examined_by_rdt, sum(cases_positive_malaria_by_rdt_or_microscopy_pfalciparum) AS cases_positive_malaria_by_rdt_or_microscopy_pfalciparum, sum(cases_positive_malaria_by_rdt_or_microscopy_pvivax) AS cases_positive_malaria_by_rdt_or_microscopy_pvivax, sum(diarrhoea_with_blood) AS diarrhoea_with_blood, sum(acute_febrile_illness) AS acute_febrile_illness, sum(cases_treated_for_malaria_diarrhea_pneumonia_iccm) AS cases_treated_for_malaria_diarrhea_pneumonia_iccm
FROM “ucr_fmoh-echis_static-hmis_job_aid_d0f39572”
WHERE health_post_id IN (%s) AND report_week_date_for_filter_gregorian > %s GROUP BY report_week_date
ORDER BY report_week_date ASC]
[parameters: (‘86c84198259f42898e294f43d02e45c6’, datetime.date(2019, 10, 20))]
(Background on this error at: http://sqlalche.me/e/e3q8)

That represents an error raised by PostgreSQL while running a custom report. It’s unrelated to the elasticsearch indexing issue.

3. running ./manage.py ptop_reindexer_v2 sql-form --reset brings the same error before. That is

2019-11-19 19:37:11,384 INFO reindexer Processing batch of 1000 docs
Processed 1000/1000 of 2303921 documents in 0:00:37.874276 (1 day, 0:13:40.829954 remaining)
2019-11-19 19:37:49,777 INFO reindexer Processing batch of 1000 docs



File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-pac[108/1804$ized.py”, line 138, in call
return cache[key]
KeyError: ()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/memo
ized.py”, line 138, in call
return cache[key]
KeyError: ()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/blobs/s3db.py”, line 212, in maybe_not_f
ound
yield
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/blobs/s3db.py”, line 80, in get
resp = self._s3_bucket().Object(key).get()
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/boto
3/resources/factory.py”, line 520, in do_action
response = action(self, *args, **kwargs)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/boto
3/resources/action.py”, line 83, in call
response = getattr(parent.meta.client, operation_name)(**params)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/boto
core/client.py”, line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/boto
core/client.py”, line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: T
he specified key does not exist.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/form_processor/models.py”, line 548, in
get_xml
return self.get_attachment(‘form.xml’)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/form_processor/models.py”, line 285, in
get_attachment
with attachment.open() as content:
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/blobs/models.py”, line 101, in open
return get_blob_db().get(key=self.key)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/blobs/s3db.py”, line 80, in get
resp = self._s3_bucket().Object(key).get()
File “/usr/lib/python3.6/contextlib.py”, line 99, in exit
self.gen.throw(type, value, traceback)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/blobs/s3db.py”, line 218, in maybe_not_f
ound
raise throw
corehq.blobs.exceptions.NotFound: 449659176c414f8caad99a77cf9a9324

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “./manage.py”, line 195, in
main()
File “./manage.py”, line 41, in main
execute_from_command_line(sys.argv)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/djan
go/core/management/init.py”, line 364, in execute_from_command_line
utility.execute()
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/djan
go/core/management/init.py”, line 356, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/apps/hqcase/management/commands/ptop_rei
ndexer_v2.py”, line 82, in run_from_argv
super(SubCommand, self).run_from_argv(argv)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/djan
go/core/management/base.py”, line 283, in run_from_argv
self.execute(*args, **cmd_options)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/djan
go/core/management/base.py”, line 330, in execute
output = self.handle(*args, **options)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/apps/hqcase/management/commands/ptop_rei
ndexer_v2.py”, line 146, in handle
reindexer.reindex()
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/ex-submodules/pillowtop/reindexer/reinde
xer.py”, line 269, in reindex
processor.run()
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/util/doc_processor/interface.py”, line 1
20, in run
for doc in self.document_iterator:
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/util/doc_processor/sql.py”, line 59, in
iter
yield reindex_accessor.doc_to_json(doc)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/form_processor/backends/sql/dbaccessors.
py”, line 364, in doc_to_json
return doc.to_json(include_attachments=self.include_attachments)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/form_processor/models.py”, line 516, in
to_json
data = dict(serializer.data)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/rest
_framework/serializers.py”, line 531, in data
ret = super(Serializer, self).data
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/rest
_framework/serializers.py”, line 262, in data
self._data = self.to_representation(self.instance)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/rest
_framework/serializers.py”, line 487, in to_representation
attribute = field.get_attribute(instance)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/rest
_framework/fields.py”, line 444, in get_attribute
return get_attribute(instance, self.source_attrs)
File "/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/res


File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/form_processor/models.py”, line 480, in
form_data
xml = self.get_xml()
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/memo
ized.py”, line 18, in _inner
return m(*args, **kwargs)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/python_env-3.6/lib/python3.6/site-packages/memo
ized.py”, line 140, in call
value = self.func( *args, **kwargs)
File “/home/cchq/www/echis/releases/2019-11-10_16.43/corehq/form_processor/models.py”, line 550, in
get_xml
raise MissingFormXml(self.form_id)
corehq.form_processor.exceptions.MissingFormXml: 78ce09b4-3af6-4e1a-b5e0-b2c6c60bae7f
Sentry is attempting to send 0 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit

4. after the above error th django is still displaying the same error. That is:

tail -f /home/cchq/www/echis/log/django.log

[SQL: SELECT report_week_date AS report_week_date, sum(still_live) AS report_live_births, sum(still_births) AS report_still_births, sum(total_deaths) AS total_deaths, sum(deaths_first_seven_days) AS deaths_first_seven_days, sum(postnatal_0_48) AS postnatal_0_48, sum(total_malaria_confirmed_and_clinical) AS total_malaria_confirmed_and_clinical, sum(total_malaria_suspected_fever_cases_examined_by_rdt) AS total_malaria_suspected_fever_cases_examined_by_rdt, sum(cases_positive_malaria_by_rdt_or_microscopy_pfalciparum) AS cases_positive_malaria_by_rdt_or_microscopy_pfalciparum, sum(cases_positive_malaria_by_rdt_or_microscopy_pvivax) AS cases_positive_malaria_by_rdt_or_microscopy_pvivax, sum(diarrhoea_with_blood) AS diarrhoea_with_blood, sum(acute_febrile_illness) AS acute_febrile_illness, sum(cases_treated_for_malaria_diarrhea_pneumonia_iccm) AS cases_treated_for_malaria_diarrhea_pneumonia_iccm
FROM “ucr_fmoh-echis_static-hmis_job_aid_d0f39572”
WHERE health_post_id IN (%s) AND report_week_date_for_filter_gregorian > %s GROUP BY report_week_date ORDER BY report_week_date ASC]
[parameters: (‘86c84198259f42898e294f43d02e45c6’, datetime.date(2019, 10, 20))]
(Background on this error at: http://sqlalche.me/e/e3q8)

5. Am able to get the form on:

https://www.echisethiopia.org/hq/admin/raw_doc/?id=78ce09b4-3af6-4e1a-b5e0-b2c6c60bae7f

when u type the url, please use www before domain name.It works that way

FYI Demisew and I are resolving offline. It looks like the issue is that one of these forms is missing its xml.