Full disclosure - the app that we're currently having this challenge has grown significantly to the point that we want to split it up into multiple apps, so we're aware of that and I imagine that is contributing to this issue, but it seems that our pillowtop service stalls periodically for the xform-pillow and case-pillow pillows.
What I'm having to do is restart those pillows, after which around 10 get processed, then restart them again, otherwise they just sit there and don't process.
e.g. I'm looking at my case pillows on the system info page:
My understanding is that there are 507 outstanding items to process. It's doesn't budge until I restart the pillow with:
cchq monolith service pillowtop restart --only=case-pillow
...after which it seems to get through 10:
...and it sits there until I restart again.
I had similar issues with both the xform and case pillows (xform is now cleared). The log shows this from the time of restarting the pillow:
2024-08-19 08:20:45,361 INFO interface Starting pillow <class 'pillowtop.pillow.interface.ConstructedPillow'>
2024-08-19 08:20:46,041 WARNING pillow UCR pillow has no configs to process
2024-08-19 08:20:46,045 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:46,297 INFO manager (case-pillow-cases-20230524-case-search-20230524-messaging-sync) setting checkpoint: {"case-sql,0": 6591638}
2024-08-19 08:20:46,317 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:46,530 INFO manager Heartbeat: {TopicPartition(topic='case-sql', partition=0): 6591648}
2024-08-19 08:20:46,540 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:46,761 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:47,001 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:47,227 INFO elastic Processing chunk of changes in BulkElasticProcessor
failed to send, dropping 100 traces to intake at http://localhost:8126/v0.5/traces after 3 retries
2024-08-19 08:20:47,263 ERROR [ddtrace.internal.writer.writer] failed to send, dropping 100 traces to intake at http://localhost:8126/v0.5/traces after 3 retries
2024-08-19 08:20:47,461 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:47,676 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:47,907 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:48,135 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:48,345 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:48,617 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:48,838 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:49,056 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:49,286 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:49,518 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:49,730 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:49,951 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:50,177 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:50,402 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:50,620 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:50,867 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:51,091 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:51,356 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:51,569 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:51,794 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:52,022 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:52,252 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:52,474 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:52,697 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:52,936 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:53,167 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:53,397 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:53,633 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:53,879 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:54,118 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:54,359 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:54,609 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:54,848 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:55,077 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:55,293 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:55,540 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:55,769 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:56,061 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:56,284 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:56,528 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:56,738 INFO manager Heartbeat: {TopicPartition(topic='case-sql', partition=0): 6592088}
2024-08-19 08:20:56,748 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:56,987 INFO elastic Processing chunk of changes in BulkElasticProcessor
2024-08-19 08:20:57,206 INFO elastic Processing chunk of changes in BulkElasticProcessor
Could this be related to the size of the app, or is there something else I should look at to keep the system processing forms and cases, or am I misunderstanding something about the process?
Thanks!
EDIT I do wonder if I'm misunderstanding the process and / or these numbers. I can (for example), restart the service and reduce the number in brackets by 10 each time, but it goes down to single digits and the number in brackets doesn't go down to 0 with restarts: