greenwave in staging is piling messages #7818

Closed
opened 2019-05-21 15:33:12 +00:00 by pingou · 6 comments

@kevin reported on #7817 that greenwave in staging has 155k+ messages in its queue, so the consumer may have an issue.

Source: https://pagure.io/fedora-infrastructure/issue/7817#comment-571406

@kevin reported on #7817 that greenwave in staging has 155k+ messages in its queue, so the consumer may have an issue. Source: https://pagure.io/fedora-infrastructure/issue/7817#comment-571406

The fedmsg consumer pod is crashing. Is it expected to still run this with the move to fedora-messaging?

Anyhow, it's crashing with:

[greenwave.consumers.resultsdb INFO] Greenwave resultsdb handler listening on: ['org.fedoraproject.staging.taskotron.result.new']
[greenwave.consumers.fedora_messaging_consumer INFO] Sending message received to: ResultsDBHandler
[greenwave.consumers.resultsdb DEBUG] Processing message "{'msg': {'task': {'type': 'bodhi_update', 'item': 'FEDORA-2019-9244c8b209', 'name': 'update.advisory_boot'}, 'result': {'id': 23523568, 'submit_time': '2019-04-24 13:06:12 UTC', 'prev_outcome': None, 'outcome': 'PASSED', 'log_url': 'https://openqa.stg.fedoraproject.org/tests/528801' }}}"
[fedora_messaging.twisted.protocol ERROR] Received unexpected exception from consumer Consumer(queue=greenwave.stg, callback=<function fedora_messaging_callback at 0x7fbeb1c099d8>)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/fedora_messaging/twisted/protocol.py", line 239, in _read
    yield threads.deferToThread(consumer.callback, message)
  File "/usr/lib64/python3.7/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/usr/lib64/python3.7/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/usr/lib64/python3.7/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib64/python3.7/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "/usr/local/lib/python3.7/site-packages/greenwave/consumers/fedora_messaging_consumer.py", line 61, in fedora_messaging_callback
    handler.consume(msg)
  File "/usr/local/lib/python3.7/site-packages/greenwave/consumers/resultsdb.py", line 247, in consume
    submit_time = message['msg']['submit_time']
KeyError: 'submit_time'
[fedora_messaging.cli ERROR] Unexpected error occurred in consumer Consumer(queue=greenwave.stg, callback=<function fedora_messaging_callback at 0x7fbeb1c099d8>): <twisted.python.failure.Failure builtins.KeyError: 'submit_time'>
[fedora_messaging.twisted.protocol INFO] Waiting for 0 consumer(s) to finish processing before halting
[fedora_messaging.twisted.protocol INFO] Finished canceling 0 consumers
[fedora_messaging.twisted.protocol INFO] Disconnect requested, but AMQP connection already gone
The fedmsg consumer pod is crashing. Is it expected to still run this with the move to fedora-messaging? Anyhow, it's crashing with: ``` [greenwave.consumers.resultsdb INFO] Greenwave resultsdb handler listening on: ['org.fedoraproject.staging.taskotron.result.new'] [greenwave.consumers.fedora_messaging_consumer INFO] Sending message received to: ResultsDBHandler [greenwave.consumers.resultsdb DEBUG] Processing message "{'msg': {'task': {'type': 'bodhi_update', 'item': 'FEDORA-2019-9244c8b209', 'name': 'update.advisory_boot'}, 'result': {'id': 23523568, 'submit_time': '2019-04-24 13:06:12 UTC', 'prev_outcome': None, 'outcome': 'PASSED', 'log_url': 'https://openqa.stg.fedoraproject.org/tests/528801' }}}" [fedora_messaging.twisted.protocol ERROR] Received unexpected exception from consumer Consumer(queue=greenwave.stg, callback=<function fedora_messaging_callback at 0x7fbeb1c099d8>) Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/fedora_messaging/twisted/protocol.py", line 239, in _read yield threads.deferToThread(consumer.callback, message) File "/usr/lib64/python3.7/site-packages/twisted/python/threadpool.py", line 250, in inContext result = inContext.theWork() File "/usr/lib64/python3.7/site-packages/twisted/python/threadpool.py", line 266, in <lambda> inContext.theWork = lambda: context.call(ctx, func, *args, **kw) File "/usr/lib64/python3.7/site-packages/twisted/python/context.py", line 122, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/lib64/python3.7/site-packages/twisted/python/context.py", line 85, in callWithContext return func(*args,**kw) File "/usr/local/lib/python3.7/site-packages/greenwave/consumers/fedora_messaging_consumer.py", line 61, in fedora_messaging_callback handler.consume(msg) File "/usr/local/lib/python3.7/site-packages/greenwave/consumers/resultsdb.py", line 247, in consume submit_time = message['msg']['submit_time'] KeyError: 'submit_time' [fedora_messaging.cli ERROR] Unexpected error occurred in consumer Consumer(queue=greenwave.stg, callback=<function fedora_messaging_callback at 0x7fbeb1c099d8>): <twisted.python.failure.Failure builtins.KeyError: 'submit_time'> [fedora_messaging.twisted.protocol INFO] Waiting for 0 consumer(s) to finish processing before halting [fedora_messaging.twisted.protocol INFO] Finished canceling 0 consumers [fedora_messaging.twisted.protocol INFO] Disconnect requested, but AMQP connection already gone ```

Working on this. Any idea why is the submit_time format different from what is in resultsdb [1]?

Currently, format %Y-%m-%dT%H:%M:%S.%f is hard-coded.

[1] https://taskotron.stg.fedoraproject.org/resultsdb_api/api/v2.0/results/23523568

Working on this. Any idea why is the `submit_time` format different from what is in resultsdb [1]? Currently, format `%Y-%m-%dT%H:%M:%S.%f` is hard-coded. [1] https://taskotron.stg.fedoraproject.org/resultsdb_api/api/v2.0/results/23523568

Working on this. Any idea why is the submit_time format different from what is in resultsdb [1]?

Looks like, it's just resultsdb differently serializes data/time for fedora-messaging. We need to handle this case in waiverdb.

> Working on this. Any idea why is the submit_time format different from what is in resultsdb [1]? Looks like, it's just resultsdb differently serializes data/time for fedora-messaging. We need to handle this case in waiverdb.
Pull request: https://pagure.io/greenwave/pull-request/438

Commit 19416d32 fixes this issue

Commit [19416d32](https://pagure.io/fedora-infrastructure/c/19416d32) fixes this issue

The patch helped but it seems the consumer is now blocked by dogpile (maybe memcached pod problem).

[greenwave.consumers.resultsdb INFO] Greenwave resultsdb handler listening on: ['org.fedoraproject.staging.taskotron.result.new']
[greenwave.consumers.fedora_messaging_consumer INFO] Sending message received to: ResultsDBHandler
[greenwave.consumers.resultsdb DEBUG] Processing message "{'msg': {'task': {'type': 'koji_build', 'item': 'graphviz-2.40.1-39.fc29', 'name': 'dist.rpmgrill.elf-checks'}, 'result': {'id': 23526664, 'submit_time': '2019-04-24 14:40:06 UTC', 'prev_outcome': None, 'outcome': 'FAILED', 'log_url': 'https://taskotron.stg.fedoraproject.org/artifacts/all/748d0752-669e-11e9-8059-5254003dbbef/tests.yml/rpmgrill.json' }}}"
[greenwave.consumers.resultsdb DEBUG] Considering subject koji_build: 'graphviz-2.40.1-39.fc29'
[greenwave.consumers.resultsdb DEBUG] Invalidating cache for 'greenwave.resources:CachedResults|koji_build graphviz-2.40.1-39.fc29 dist.rpmgrill.elf-checks'
[greenwave.consumers.resultsdb DEBUG] Invalidating cache for 'greenwave.resources:CachedResults|koji_build graphviz-2.40.1-39.fc29 None'
[dogpile.lock DEBUG] NeedRegenerationException
[dogpile.lock DEBUG] no value, waiting for create lock

Would be nice to also have health check endpoint for the consumers (for livenessProbe in OpenShift).

The patch helped but it seems the consumer is now blocked by dogpile (maybe memcached pod problem). ``` [greenwave.consumers.resultsdb INFO] Greenwave resultsdb handler listening on: ['org.fedoraproject.staging.taskotron.result.new'] [greenwave.consumers.fedora_messaging_consumer INFO] Sending message received to: ResultsDBHandler [greenwave.consumers.resultsdb DEBUG] Processing message "{'msg': {'task': {'type': 'koji_build', 'item': 'graphviz-2.40.1-39.fc29', 'name': 'dist.rpmgrill.elf-checks'}, 'result': {'id': 23526664, 'submit_time': '2019-04-24 14:40:06 UTC', 'prev_outcome': None, 'outcome': 'FAILED', 'log_url': 'https://taskotron.stg.fedoraproject.org/artifacts/all/748d0752-669e-11e9-8059-5254003dbbef/tests.yml/rpmgrill.json' }}}" [greenwave.consumers.resultsdb DEBUG] Considering subject koji_build: 'graphviz-2.40.1-39.fc29' [greenwave.consumers.resultsdb DEBUG] Invalidating cache for 'greenwave.resources:CachedResults|koji_build graphviz-2.40.1-39.fc29 dist.rpmgrill.elf-checks' [greenwave.consumers.resultsdb DEBUG] Invalidating cache for 'greenwave.resources:CachedResults|koji_build graphviz-2.40.1-39.fc29 None' [dogpile.lock DEBUG] NeedRegenerationException [dogpile.lock DEBUG] no value, waiting for create lock ``` Would be nice to also have health check endpoint for the consumers (for `livenessProbe` in OpenShift).
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/fedora-infrastructure#7818
No description provided.