UCR Building getting stuck after reaching a certain number

Below error and warning messages are found in the celery_ucr_queue logs. Are they helpful to investigate the issue?
Unfortunately, I couldn't get celery_background_log file.

2023-05-14 11:33:22,268 ERROR [celery.utils.dispatch.signal] Signal handler <function celery_add_time_sent at 0x7f05793bea60> raised: ConnectionError('Error 111 connecting to 172.19.4.33:6379. Connection refused.')
Traceback (most recent call last):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/cache.py", line 27, in _decorator
    return method(self, *args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/cache.py", line 76, in set
    return self.client.set(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/client/default.py", line 166, in set
    raise ConnectionInterrupted(connection=client) from e
django_redis.exceptions.ConnectionInterrupted: Redis ConnectionError: Error 111 connecting to 172.19.4.33:6379. Connection refused.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/utils/dispatch/signal.py", line 276, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/corehq/celery_monitoring/signals.py", line 22, in celery_add_time_sent
    TimeToStartTimer(task_id).start_timing(eta)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/corehq/celery_monitoring/signals.py", line 110, in start_timing
    cache.set(self._cache_key, eta or datetime.datetime.utcnow(), timeout=3 * 24 * 60 * 60)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/cache.py", line 34, in _decorator
    raise e.__cause__
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/client/default.py", line 156, in set
    return bool(client.set(nkey, nvalue, nx=nx, px=timeout, xx=xx))
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/redis/commands/core.py", line 2302, in set
    return self.execute_command("SET", *pieces, **options)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/sentry_sdk/integrations/redis.py", line 170, in sentry_patched_execute_command
    return old_execute_command(self, name, *args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/redis/client.py", line 1255, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/redis/connection.py", line 1442, in get_connection
    connection.connect()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/redis/connection.py", line 704, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.19.4.33:6379. Connection refused.
2023-05-14 11:36:23,159 ERROR [celery.utils.dispatch.signal] Signal handler <function update_celery_state at 0x7f056fa189d0> raised: OperationalError('connection to server at "172.19.3.36", port 6432 failed: FATAL:  client_login_timeout (server down)\n')
Traceback (most recent call last):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/cache.py", line 27, in _decorator
    return method(self, *args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/cache.py", line 94, in _get
    return self.client.get(key, default=default, version=version, client=client)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/client/default.py", line 222, in get
    raise ConnectionInterrupted(connection=client) from e
django_redis.exceptions.ConnectionInterrupted: Redis ConnectionError: Error 111 connecting to 172.19.4.33:6379. Connection refused.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 229, in _inner
    reraise(*exc_info)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/sentry_sdk/_compat.py", line 60, in reraise
    raise value
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 224, in _inner
    return f(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/corehq/celery_monitoring/heartbeat.py", line 118, in heartbeat
    self.get_and_report_blockage_duration()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/corehq/celery_monitoring/heartbeat.py", line 74, in get_and_report_blockage_duration
    blockage_duration = self.get_blockage_duration()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/corehq/celery_monitoring/heartbeat.py", line 70, in get_blockage_duration
    return max(datetime.datetime.utcnow() - self.get_last_seen() - HEARTBEAT_FREQUENCY,
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/corehq/celery_monitoring/heartbeat.py", line 49, in get_last_seen
    value = self._heartbeat_cache.get()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/corehq/celery_monitoring/heartbeat.py", line 27, in get
    return cache.get(self._cache_key())
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/cache.py", line 87, in get
    value = self._get(key, default, version, client)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/cache.py", line 34, in _decorator
    raise e.__cause__
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_redis/client/default.py", line 220, in get
    value = client.get(key)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/redis/commands/core.py", line 1790, in get
    return self.execute_command("GET", name)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/sentry_sdk/integrations/redis.py", line 170, in sentry_patched_execute_command
    return old_execute_command(self, name, *args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/redis/client.py", line 1255, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/redis/connection.py", line 1442, in get_connection
    connection.connect()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/redis/connection.py", line 704, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 172.19.4.33:6379. Connection refused.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/base/base.py", line 219, in ensure_connection
    self.connect()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/sentry_sdk/integrations/django/__init__.py", line 605, in connect
    return real_connect(self)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/base/base.py", line 200, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 187, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "172.19.3.36", port 6432 failed: FATAL:  client_login_timeout (server down)


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/utils/dispatch/signal.py", line 276, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/corehq/ex-submodules/casexml/apps/phone/tasks.py", line 68, in update_celery_state
    backend.store_result(headers['id'], None, ASYNC_RESTORE_SENT)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/backends/base.py", line 528, in store_result
    self._store_result(task_id, result, state, traceback,
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_celery_results/backends/database.py", line 132, in _store_result
    self.TaskModel._default_manager.store_result(**task_props)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_celery_results/managers.py", line 43, in _inner
    return fun(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django_celery_results/managers.py", line 168, in store_result
    obj, created = self.using(using).get_or_create(task_id=task_id,
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/models/query.py", line 581, in get_or_create
    return self.get(**kwargs), False
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/models/query.py", line 431, in get
    num = len(clone)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/models/query.py", line 262, in __len__
    self._fetch_all()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
    cursor = self.connection.cursor()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/base/base.py", line 259, in cursor
    return self._cursor()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/base/base.py", line 235, in _cursor
    self.ensure_connection()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/base/base.py", line 219, in ensure_connection
    self.connect()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/base/base.py", line 219, in ensure_connection
    self.connect()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/sentry_sdk/integrations/django/__init__.py", line 605, in connect
    return real_connect(self)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/base/base.py", line 200, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
    return func(*args, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 187, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: connection to server at "172.19.3.36", port 6432 failed: FATAL:  client_login_timeout (server down)

2023-05-14 11:36:23,727 WARNING [celery.worker.consumer.consumer] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 332, in start
    blueprint.start(self)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 628, in start
    c.loop(*c.loop_args())
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/worker/loops.py", line 130, in synloop
    connection.drain_events(timeout=2.0)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/kombu/connection.py", line 316, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/kombu/transport/pyamqp.py", line 169, in drain_events
    return connection.drain_events(**kwargs)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/amqp/connection.py", line 525, in drain_events
    while not self.blocking_read(timeout):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/amqp/connection.py", line 530, in blocking_read
    frame = self.transport.read_frame()
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/amqp/transport.py", line 294, in read_frame
    frame_header = read(7, True)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/amqp/transport.py", line 635, in _read
    raise OSError('Server unexpectedly closed connection')
OSError: Server unexpectedly closed connection
2023-05-14 11:36:23,729 WARNING [py.warnings] /home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/celery/worker/consumer/consumer.py:367: CPendingDeprecationWarning: 
In Celery 5.1 we introduced an optional breaking change which
on connection loss cancels all currently executed tasks with late acknowledgement enabled.
These tasks cannot be acknowledged as the connection is gone, and the tasks are automatically redelivered back to the queue.
You can enable this behavior using the worker_cancel_long_running_tasks_on_connection_loss setting.
In Celery 5.1 it is set to False by default. The setting will be set to True by default in Celery 6.0.

  warnings.warn(CANCEL_TASKS_BY_DEFAULT, CPendingDeprecationWarning)





  2023-05-14 11:36:24,973 CRITICAL [celery.worker.request] Couldn't ack 47, reason:RecoverableConnectionError(None, 'connection already closed', None, '')
Traceback (most recent call last):
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/kombu/message.py", line 128, in ack_log_error
    self.ack(multiple=multiple)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/kombu/message.py", line 123, in ack
    self.channel.basic_ack(self.delivery_tag, multiple=multiple)
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/amqp/channel.py", line 1407, in basic_ack
    return self.send_method(
  File "/home/cchq/www/echis/releases/2023-05-13_11.49/python_env-3.9/lib/python3.9/site-packages/amqp/abstract_channel.py", line 67, in send_method
    raise RecoverableConnectionError('connection already closed')

    

Thank you,

1 Like