Elasticsearch connection timeout

I am getting the following error while trying to do some operation on ES like:

cchq echis django-manage resave_failed_forms_and_cases fmoh-echis 2015-07-01 2021-07-12 --cases

Traceback (most recent call last):
File "/home/cchq/www/echis/releases/2021-07-07_18.33/python_env-3.6/lib/python3.6/site-packages/elasticsearch2/connection/http_urllib3.py", line 95, in perform_req$
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
File "/home/cchq/www/echis/releases/2021-07-07_18.33/python_env-3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/home/cchq/www/echis/releases/2021-07-07_18.33/python_env-3.6/lib/python3.6/site-packages/urllib3/util/retry.py", line 507, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/cchq/www/echis/releases/2021-07-07_18.33/python_env-3.6/lib/python3.6/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/home/cchq/www/echis/releases/2021-07-07_18.33/python_env-3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
File "/home/cchq/www/echis/releases/2021-07-07_18.33/python_env-3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 447, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/cchq/www/echis/releases/2021-07-07_18.33/python_env-3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 337, in _raise_timeout
self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='', port=9200): Read timed out. (read timeout=30)

It seems the timeout is 30 seconds and it’s taking longer than that for the server to respond. Is there a setting to globally change this timeout period?

Hello Demis,

The timeout could happen if the cluster is slow to respond for some reason.

The timeout setting comes from ES_SEARCH_TIMEOUT localsetting which is by default 30. Updating this is not recommended, as this will not only update for the command you are running but also for your HQ instance as well. You could try to update that locally on the directory you are running the command from and revert it latter. But in general, 30 seconds is a reasonable timeout, so I recommend to do following instead.

  1. Make sure cluster health is green and other operations are going well by looking at Datadog Elasticsearch metrics.
  2. Depending on how frequently you are getting this error, you could split the daterange to smaller ranges and run the command for each smaller range separately. This way, it will be easier to make incremental progress without having to rerun the command for entire range again and again.