I've had odd issues with Celery crashing on two servers when running newer builds. I got one server up by reverting to an older production build, however on the other server, using the latest build, celery was crashing. It usually crashes within 10-30 minutes. I tailed the log and noticed that consecutive celery heartbeat tasks are being received 0.06 seconds apart. e.g. below - note this is an edited log with 4 different tasks showing the first of each received task followed by the next of the same type to indicate the time between repeats of the same task):
[2022-03-15 17:01:41,922: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__saved_exports_queue[a6d31388-66e8-413e-ba5f-24de52e9ef55]
[2022-03-15 17:01:41,982: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__saved_exports_queue[46ffe0ee-a7e8-4e90-82d3-c4b3a41b1e92]
[2022-03-15 17:01:41,926: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__analytics_queue[1112cc2f-4773-41a8-8303-05789edfb02d]
[2022-03-15 17:01:41,987: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__analytics_queue[72d70fbe-94ec-4acf-b527-5ca223d5daaa]
[2022-03-15 17:01:41,931: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__case_rule_queue[b5bb20a8-5978-4821-bf9f-7d423587d878]
[2022-03-15 17:01:41,992: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__case_rule_queue[d9a2318a-28c3-44a4-b727-6b2da5fd9012]
[2022-03-15 17:01:41,936: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__background_queue[b4430e0f-ca8e-4619-88af-25171592131a]
[2022-03-15 17:01:41,997: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__background_queue[bf5f538e-cb97-4b31-9f80-a51bb927570a]
Is this expected behavior?
Here's the older build - celery appears stable on this one - each consecutive heartbeat task is around 10 seconds apart:
[2022-03-15 17:20:10,662: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__ucr_queue[cb5c1f23-1248-4a59-8bf6-58e44b9668aa]
[2022-03-15 17:20:20,723: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__ucr_queue[99929f85-bf7a-41c1-9930-3284368b27d6]
[2022-03-15 17:20:10,699: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__celery_periodic[8be0e76b-bd31-406f-b474-334e6c324675]
[2022-03-15 17:20:20,708: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__celery_periodic[cb72e7ca-e606-4fb5-bc4a-755d01aca399]
[2022-03-15 17:20:10,677: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__saved_exports_queue[de5a520d-2c28-49d6-82b3-80c821b1d9f5]
[2022-03-15 17:20:20,719: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__saved_exports_queue[bc8fab72-834c-47bd-a407-2edd7871ef93]
[2022-03-15 17:20:10,681: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__repeat_record_queue[1c03eb23-0d4b-41ce-baee-3306e7cff2ee]
[2022-03-15 17:20:20,718: INFO/MainProcess] Received task: corehq.celery_monitoring.heartbeat.heartbeat__repeat_record_queue[2d4a6a0f-4de8-45d7-a39b-ebbace5d9fe6]
Is this configurable somewhere? I suspect may be partially responsible for Celery crashing. Not to mention log files growing pretty quickly.
EDIT perhaps this is normal behavior if celery has been down for an extended period and it's playing catch-up?
Thanks!