Expected resultsdb notification not found

pingou commented

2019-07-17 15:56:38 +00:00

Good Morning,

I've added a test.yml and a gating.yaml to fedora-gather-easyfix: https://src.fedoraproject.org/rpms/fedora-gather-easyfix/tree/master

I've built a new version with them and created the corresponding bodhi update: https://bodhi.fedoraproject.org/updates/FEDORA-2019-4ffedfe629
Bodhi does see two failed results, one of which is required.

However, looking at datagrepper: https://apps.fedoraproject.org/datagrepper/raw?category=greenwave the last message is 4 days old.

Do we have any info that the message was sent?

Good Morning, I've added a test.yml and a gating.yaml to fedora-gather-easyfix: https://src.fedoraproject.org/rpms/fedora-gather-easyfix/tree/master I've built a new version with them and created the corresponding bodhi update: https://bodhi.fedoraproject.org/updates/FEDORA-2019-4ffedfe629 Bodhi does see two failed results, one of which is required. However, looking at datagrepper: https://apps.fedoraproject.org/datagrepper/raw?category=greenwave the last message is 4 days old. Do we have any info that the message was sent?

pingou commented

2019-07-17 15:57:45 +00:00

Author

I'm reporting this here in case more people can help debugging this.
Sounds like a good test-case to figure out how to track "lost" messages in fedora-messaging :)

I'm reporting this here in case more people can help debugging this. Sounds like a good test-case to figure out how to track "lost" messages in fedora-messaging :)

gnaponie commented

2019-07-18 07:55:31 +00:00

It seems like there was something wrong with resultsdb, there were no messages yesterday from 09:22:56 am until 18:02:38:
https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb&delta=127800
The results for that element were created all around 15.20:
https://taskotron.fedoraproject.org/resultsdb_api/api/v2.0/results?item=fedora-gather-easyfix-0.1.1-17.fc30

and checking Greenwave logs I also see that greenwave didn't get any messages from resultsdb during that period of time.
I would suggest to check if there was some resultsdb outage or UMB outage.

It seems like there was something wrong with resultsdb, there were no messages yesterday from 09:22:56 am until 18:02:38: https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb&delta=127800 The results for that element were created all around 15.20: https://taskotron.fedoraproject.org/resultsdb_api/api/v2.0/results?item=fedora-gather-easyfix-0.1.1-17.fc30 and checking Greenwave logs I also see that greenwave didn't get any messages from resultsdb during that period of time. I would suggest to check if there was some resultsdb outage or UMB outage.

pingou commented

2019-07-18 13:03:32 +00:00

Author

So doing some more debugging of this.

I've created another :

bodhi update: https://bodhi.fedoraproject.org/updates/FEDORA-2019-13b9ffe38a
CI ran: https://jenkins-continuous-infra.apps.ci.centos.org/blue/organizations/jenkins/fedora-f30-build-pipeline/detail/fedora-f30-build-pipeline/649/pipeline/
our resultsdb instance in rdu got the info: http://resultsdb.ci.centos.org/resultsdb/results/416228
our main resultsdb got the info as well: https://taskotron.fedoraproject.org/resultsdb/results/30940202 (via rdbsync)
However, checking datagrepper, we do not have the corresponding message from resultsdb: https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb&rows_per_page=1&delta=127800

So it seems something is off with resultsdb.

So doing some more debugging of this. I've created another : - bodhi update: https://bodhi.fedoraproject.org/updates/FEDORA-2019-13b9ffe38a - CI ran: https://jenkins-continuous-infra.apps.ci.centos.org/blue/organizations/jenkins/fedora-f30-build-pipeline/detail/fedora-f30-build-pipeline/649/pipeline/ - our resultsdb instance in rdu got the info: http://resultsdb.ci.centos.org/resultsdb/results/416228 - our main resultsdb got the info as well: https://taskotron.fedoraproject.org/resultsdb/results/30940202 (via rdbsync) - However, checking datagrepper, we do not have the corresponding message from resultsdb: https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb&rows_per_page=1&delta=127800 So it seems something is off with resultsdb.

pingou commented

2019-07-18 13:04:16 +00:00

Author

Rephrasing the title now that we've narrowed the issue

pingou commented

2019-07-18 13:04:32 +00:00

Author

Metadata Update from @pingou:

Assignee reset

**Metadata Update from @pingou**: - Assignee reset

pingou commented

2019-07-18 13:18:39 +00:00

Author

I have restarted apache on resultsdb01 as well as rdbsync after that and I already see a message from CI in datagrepper: https://apps.fedoraproject.org/datagrepper/id?id=2019-54bb2f78-f2fa-41e9-8816-2f9a3d289b1a&is_raw=true&size=extra-large

pingou commented

2019-07-18 13:36:42 +00:00

Author

Restarted the process:

New bodhi update: https://bodhi.fedoraproject.org/updates/FEDORA-2019-aa0905dc7b
CI ran: https://jenkins-continuous-infra.apps.ci.centos.org/blue/organizations/jenkins/fedora-f30-build-pipeline/detail/fedora-f30-build-pipeline/651/pipeline/ and https://apps.fedoraproject.org/datagrepper/id?id=2019-e8d1ac1b-745c-4edf-8d98-53505ad158bc&is_raw=true&size=extra-large
resultsdb in rdu has had the result: http://resultsdb.ci.centos.org/resultsdb/results/416463
resultsdb in prod has had the result: https://taskotron.fedoraproject.org/resultsdb/results/30943449
resultsdb announced the new results: https://apps.fedoraproject.org/datagrepper/id?id=2019-2bd07c31-5f08-468f-b14d-977d6b29990b&is_raw=true&size=extra-large
greenwave announced its decision: https://apps.fedoraproject.org/datagrepper/id?id=2019-6c0c74ff-4f8f-4986-8597-b0c637cc3dda&is_raw=true&size=extra-large

So I think this is fixed :)

Restarted the process: - New bodhi update: https://bodhi.fedoraproject.org/updates/FEDORA-2019-aa0905dc7b - CI ran: https://jenkins-continuous-infra.apps.ci.centos.org/blue/organizations/jenkins/fedora-f30-build-pipeline/detail/fedora-f30-build-pipeline/651/pipeline/ and https://apps.fedoraproject.org/datagrepper/id?id=2019-e8d1ac1b-745c-4edf-8d98-53505ad158bc&is_raw=true&size=extra-large - resultsdb in rdu has had the result: http://resultsdb.ci.centos.org/resultsdb/results/416463 - resultsdb in prod has had the result: https://taskotron.fedoraproject.org/resultsdb/results/30943449 - resultsdb announced the new results: https://apps.fedoraproject.org/datagrepper/id?id=2019-2bd07c31-5f08-468f-b14d-977d6b29990b&is_raw=true&size=extra-large - greenwave announced its decision: https://apps.fedoraproject.org/datagrepper/id?id=2019-6c0c74ff-4f8f-4986-8597-b0c637cc3dda&is_raw=true&size=extra-large So I think this is fixed :)

pingou commented

2019-07-18 13:36:42 +00:00

Author

Metadata Update from @pingou:

Issue close_status updated to: Fixed
Issue status updated to: Closed (was: Open)

**Metadata Update from @pingou**: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)

gnaponie commented

2019-07-18 14:24:49 +00:00

Awesome!

cverna commented

2019-07-18 17:41:44 +00:00

Metadata Update from @cverna:

Issue status updated to: Open (was: Closed)

**Metadata Update from @cverna**: - Issue status updated to: Open (was: Closed)

cverna commented

2019-07-18 17:43:12 +00:00

I think that we should not close this until we have added the proper monitoring for resultsdb. If we don't do it now we will never do it 😄

I think that we should not close this until we have added the proper monitoring for resultsdb. If we don't do it now we will never do it :smile:

pingou commented

2019-07-19 09:08:35 +00:00

Author

Ran my debug script today and the issue happened again (no messages from resultsdb), I've tried to add some more debugging to figure out what is going on but there is something up there.

pingou commented

2019-07-19 09:44:14 +00:00

Author

Ok, finally found the reason for the error: ERROR - reasons: Channel is closed.

We'll need some help to fix this

Ok, finally found the reason for the error: ``ERROR - reasons: Channel is closed. `` We'll need some help to fix this

pingou commented

2019-07-19 12:12:16 +00:00

Author

Ok, so there is something weird going on here:

This is what I see in the logs:

[Fri Jul 19 11:59:55.198875 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:58428] [2019-07-19 11:59:55,198] DEBUG in api_v2: Preparing to publish message for result id 30961751
[Fri Jul 19 11:59:55.223947 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:58428] [2019-07-19 11:59:55,223] DEBUG in api_v2: Preparing to publish Taskotron message for result id 30961751
[Fri Jul 19 12:00:25.511575 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] [2019-07-19 12:00:25,511] DEBUG in api_v2: Created new result for testcase org.centos.prod.ci.pipeline.allpackages-build.complete with outcome
 FAILED
[Fri Jul 19 12:00:25.515414 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] [2019-07-19 12:00:25,515] DEBUG in api_v2: Preparing to publish message for result id 30961752
[Fri Jul 19 12:00:25.531425 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] messaging    2019-07-19 12:00:25 ERROR   Error sending message daece1ce-8900-4828-b41e-be5af445fd78: 
[Fri Jul 19 12:00:25.531869 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] messaging    2019-07-19 12:00:25 ERROR     ERROR: 
[Fri Jul 19 12:00:25.532264 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] messaging    2019-07-19 12:00:25 ERROR     ERROR - reasons: Channel is closed.

However, there is in datagrepper a message about that result: https://apps.fedoraproject.org/datagrepper/id?id=2019-5109396d-0196-4b3a-9230-7363df36b4ac&is_raw=true&size=extra-large

@jcline if you have some time to help us figuring out what's going on here, it would be nice :)

And to be complete, that result id is only listed twice in the logs:

# grep 30961751 /var/log/httpd/error_log
[Fri Jul 19 11:59:55.198875 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:58428] [2019-07-19 11:59:55,198] DEBUG in api_v2: Preparing to publish message for result id 30961751
[Fri Jul 19 11:59:55.223947 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:58428] [2019-07-19 11:59:55,223] DEBUG in api_v2: Preparing to publish Taskotron message for result id 30961751

Ok, so there is something weird going on here: This is what I see in the logs: ```` [Fri Jul 19 11:59:55.198875 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:58428] [2019-07-19 11:59:55,198] DEBUG in api_v2: Preparing to publish message for result id 30961751 [Fri Jul 19 11:59:55.223947 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:58428] [2019-07-19 11:59:55,223] DEBUG in api_v2: Preparing to publish Taskotron message for result id 30961751 [Fri Jul 19 12:00:25.511575 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] [2019-07-19 12:00:25,511] DEBUG in api_v2: Created new result for testcase org.centos.prod.ci.pipeline.allpackages-build.complete with outcome FAILED [Fri Jul 19 12:00:25.515414 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] [2019-07-19 12:00:25,515] DEBUG in api_v2: Preparing to publish message for result id 30961752 [Fri Jul 19 12:00:25.531425 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] messaging 2019-07-19 12:00:25 ERROR Error sending message daece1ce-8900-4828-b41e-be5af445fd78: [Fri Jul 19 12:00:25.531869 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] messaging 2019-07-19 12:00:25 ERROR ERROR: [Fri Jul 19 12:00:25.532264 2019] [wsgi:error] [pid 764:tid 139635107587840] [remote ::1:58438] messaging 2019-07-19 12:00:25 ERROR ERROR - reasons: Channel is closed. ```` However, there is in datagrepper a message about that result: https://apps.fedoraproject.org/datagrepper/id?id=2019-5109396d-0196-4b3a-9230-7363df36b4ac&is_raw=true&size=extra-large @jcline if you have some time to help us figuring out what's going on here, it would be nice :) And to be complete, that result id is only listed twice in the logs: ```` # grep 30961751 /var/log/httpd/error_log [Fri Jul 19 11:59:55.198875 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:58428] [2019-07-19 11:59:55,198] DEBUG in api_v2: Preparing to publish message for result id 30961751 [Fri Jul 19 11:59:55.223947 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:58428] [2019-07-19 11:59:55,223] DEBUG in api_v2: Preparing to publish Taskotron message for result id 30961751 ````

lholecek commented

2019-07-19 12:19:49 +00:00

It's the ID 30961752 that failed - I cannot find any datagrepper entry for it.

pingou commented

2019-07-19 12:29:49 +00:00

Author

Oh, shoot you're right!

So the error doesn't trigger all the time, could it be that it's sending too many messages? Some are going through and others are not :(

Oh, shoot you're right! So the error doesn't trigger all the time, could it be that it's sending too many messages? Some are going through and others are not :(

pingou commented

2019-07-19 12:29:49 +00:00

Author

Metadata Update from @pingou:

Issue assigned to gnaponie

**Metadata Update from @pingou**: - Issue assigned to gnaponie

pingou commented

2019-07-19 12:57:09 +00:00

Author

Just saw a new stacktrace in the logs:

[Fri Jul 19 12:50:05.123171 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] [2019-07-19 12:50:05,122] DEBUG in api_v2: Preparing to publish message for result id 30962369
[Fri Jul 19 12:50:05.144310 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] io_services_utils 2019-07-19 12:50:05 ERROR   _AsyncBaseTransport._consume() failed, aborting connection: error=SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2488)'); sock=<ssl.SSLSocket fd=26, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.5.124.207', 45620)>; Caller's stack:
[Fri Jul 19 12:50:05.144380 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] Traceback (most recent call last):
[Fri Jul 19 12:50:05.144398 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
[Fri Jul 19 12:50:05.144454 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     super(_AsyncSSLTransport, self)._consume()
[Fri Jul 19 12:50:05.144466 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
[Fri Jul 19 12:50:05.144477 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
[Fri Jul 19 12:50:05.144488 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
[Fri Jul 19 12:50:05.144552 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return func(*args, **kwargs)
[Fri Jul 19 12:50:05.144567 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
[Fri Jul 19 12:50:05.144578 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return sock.recv(max_bytes)
[Fri Jul 19 12:50:05.144588 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
[Fri Jul 19 12:50:05.144599 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return self.read(buflen)
[Fri Jul 19 12:50:05.144609 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib64/python3.7/ssl.py", line 913, in read
[Fri Jul 19 12:50:05.144619 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return self._sslobj.read(len)
[Fri Jul 19 12:50:05.144629 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
[Fri Jul 19 12:50:05.144640 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] Traceback (most recent call last):
[Fri Jul 19 12:50:05.144650 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
[Fri Jul 19 12:50:05.144660 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     super(_AsyncSSLTransport, self)._consume()
[Fri Jul 19 12:50:05.144669 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
[Fri Jul 19 12:50:05.144681 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
[Fri Jul 19 12:50:05.144691 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
[Fri Jul 19 12:50:05.144721 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return func(*args, **kwargs)
[Fri Jul 19 12:50:05.144747 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
[Fri Jul 19 12:50:05.144759 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return sock.recv(max_bytes)
[Fri Jul 19 12:50:05.144769 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
[Fri Jul 19 12:50:05.144779 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return self.read(buflen)
[Fri Jul 19 12:50:05.144789 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib64/python3.7/ssl.py", line 913, in read
[Fri Jul 19 12:50:05.144799 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return self._sslobj.read(len)
[Fri Jul 19 12:50:05.144822 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
[Fri Jul 19 12:50:05.145722 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] diagnostic_utils 2019-07-19 12:50:05 ERROR   Wrapped func exited with exception. Caller's stack:
[Fri Jul 19 12:50:05.145804 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] Traceback (most recent call last):
[Fri Jul 19 12:50:05.145822 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/diagnostic_utils.py", line 53, in log_exception_func_wrap
[Fri Jul 19 12:50:05.145833 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return func(*args, **kwargs)
[Fri Jul 19 12:50:05.145843 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
[Fri Jul 19 12:50:05.145854 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     super(_AsyncSSLTransport, self)._consume()
[Fri Jul 19 12:50:05.145863 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
[Fri Jul 19 12:50:05.145874 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
[Fri Jul 19 12:50:05.145885 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
[Fri Jul 19 12:50:05.145895 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return func(*args, **kwargs)
[Fri Jul 19 12:50:05.145905 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
[Fri Jul 19 12:50:05.145915 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return sock.recv(max_bytes)
[Fri Jul 19 12:50:05.145924 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
[Fri Jul 19 12:50:05.145935 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return self.read(buflen)
[Fri Jul 19 12:50:05.145944 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib64/python3.7/ssl.py", line 913, in read
[Fri Jul 19 12:50:05.145954 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return self._sslobj.read(len)
[Fri Jul 19 12:50:05.145964 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)
[Fri Jul 19 12:50:05.145975 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] Traceback (most recent call last):
[Fri Jul 19 12:50:05.145985 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/diagnostic_utils.py", line 53, in log_exception_func_wrap
[Fri Jul 19 12:50:05.145995 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return func(*args, **kwargs)
[Fri Jul 19 12:50:05.146005 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume
[Fri Jul 19 12:50:05.146016 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     super(_AsyncSSLTransport, self)._consume()
[Fri Jul 19 12:50:05.146026 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume
[Fri Jul 19 12:50:05.146069 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES)
[Fri Jul 19 12:50:05.146185 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap
[Fri Jul 19 12:50:05.146202 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return func(*args, **kwargs)
[Fri Jul 19 12:50:05.146212 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv
[Fri Jul 19 12:50:05.146222 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return sock.recv(max_bytes)
[Fri Jul 19 12:50:05.146232 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib64/python3.7/ssl.py", line 1037, in recv
[Fri Jul 19 12:50:05.146242 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return self.read(buflen)
[Fri Jul 19 12:50:05.146251 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]   File "/usr/lib64/python3.7/ssl.py", line 913, in read
[Fri Jul 19 12:50:05.146261 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544]     return self._sslobj.read(len)
[Fri Jul 19 12:50:05.146270 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488)

@abompard @jcline does this ring a bell to you?

Just saw a new stacktrace in the logs: ```` [Fri Jul 19 12:50:05.123171 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] [2019-07-19 12:50:05,122] DEBUG in api_v2: Preparing to publish message for result id 30962369 [Fri Jul 19 12:50:05.144310 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] io_services_utils 2019-07-19 12:50:05 ERROR _AsyncBaseTransport._consume() failed, aborting connection: error=SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2488)'); sock=<ssl.SSLSocket fd=26, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.5.124.207', 45620)>; Caller's stack: [Fri Jul 19 12:50:05.144380 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] Traceback (most recent call last): [Fri Jul 19 12:50:05.144398 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume [Fri Jul 19 12:50:05.144454 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] super(_AsyncSSLTransport, self)._consume() [Fri Jul 19 12:50:05.144466 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume [Fri Jul 19 12:50:05.144477 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES) [Fri Jul 19 12:50:05.144488 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap [Fri Jul 19 12:50:05.144552 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return func(*args, **kwargs) [Fri Jul 19 12:50:05.144567 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv [Fri Jul 19 12:50:05.144578 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return sock.recv(max_bytes) [Fri Jul 19 12:50:05.144588 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib64/python3.7/ssl.py", line 1037, in recv [Fri Jul 19 12:50:05.144599 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return self.read(buflen) [Fri Jul 19 12:50:05.144609 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib64/python3.7/ssl.py", line 913, in read [Fri Jul 19 12:50:05.144619 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return self._sslobj.read(len) [Fri Jul 19 12:50:05.144629 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488) [Fri Jul 19 12:50:05.144640 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] Traceback (most recent call last): [Fri Jul 19 12:50:05.144650 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume [Fri Jul 19 12:50:05.144660 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] super(_AsyncSSLTransport, self)._consume() [Fri Jul 19 12:50:05.144669 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume [Fri Jul 19 12:50:05.144681 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES) [Fri Jul 19 12:50:05.144691 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap [Fri Jul 19 12:50:05.144721 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return func(*args, **kwargs) [Fri Jul 19 12:50:05.144747 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv [Fri Jul 19 12:50:05.144759 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return sock.recv(max_bytes) [Fri Jul 19 12:50:05.144769 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib64/python3.7/ssl.py", line 1037, in recv [Fri Jul 19 12:50:05.144779 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return self.read(buflen) [Fri Jul 19 12:50:05.144789 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib64/python3.7/ssl.py", line 913, in read [Fri Jul 19 12:50:05.144799 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return self._sslobj.read(len) [Fri Jul 19 12:50:05.144822 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488) [Fri Jul 19 12:50:05.145722 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] diagnostic_utils 2019-07-19 12:50:05 ERROR Wrapped func exited with exception. Caller's stack: [Fri Jul 19 12:50:05.145804 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] Traceback (most recent call last): [Fri Jul 19 12:50:05.145822 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/diagnostic_utils.py", line 53, in log_exception_func_wrap [Fri Jul 19 12:50:05.145833 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return func(*args, **kwargs) [Fri Jul 19 12:50:05.145843 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume [Fri Jul 19 12:50:05.145854 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] super(_AsyncSSLTransport, self)._consume() [Fri Jul 19 12:50:05.145863 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume [Fri Jul 19 12:50:05.145874 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES) [Fri Jul 19 12:50:05.145885 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap [Fri Jul 19 12:50:05.145895 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return func(*args, **kwargs) [Fri Jul 19 12:50:05.145905 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv [Fri Jul 19 12:50:05.145915 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return sock.recv(max_bytes) [Fri Jul 19 12:50:05.145924 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib64/python3.7/ssl.py", line 1037, in recv [Fri Jul 19 12:50:05.145935 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return self.read(buflen) [Fri Jul 19 12:50:05.145944 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib64/python3.7/ssl.py", line 913, in read [Fri Jul 19 12:50:05.145954 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return self._sslobj.read(len) [Fri Jul 19 12:50:05.145964 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488) [Fri Jul 19 12:50:05.145975 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] Traceback (most recent call last): [Fri Jul 19 12:50:05.145985 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/diagnostic_utils.py", line 53, in log_exception_func_wrap [Fri Jul 19 12:50:05.145995 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return func(*args, **kwargs) [Fri Jul 19 12:50:05.146005 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 1222, in _consume [Fri Jul 19 12:50:05.146016 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] super(_AsyncSSLTransport, self)._consume() [Fri Jul 19 12:50:05.146026 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 791, in _consume [Fri Jul 19 12:50:05.146069 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] data = self._sigint_safe_recv(self._sock, self._MAX_RECV_BYTES) [Fri Jul 19 12:50:05.146185 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 79, in retry_sigint_wrap [Fri Jul 19 12:50:05.146202 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return func(*args, **kwargs) [Fri Jul 19 12:50:05.146212 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib/python3.7/site-packages/pika/adapters/utils/io_services_utils.py", line 846, in _sigint_safe_recv [Fri Jul 19 12:50:05.146222 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return sock.recv(max_bytes) [Fri Jul 19 12:50:05.146232 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib64/python3.7/ssl.py", line 1037, in recv [Fri Jul 19 12:50:05.146242 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return self.read(buflen) [Fri Jul 19 12:50:05.146251 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] File "/usr/lib64/python3.7/ssl.py", line 913, in read [Fri Jul 19 12:50:05.146261 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] return self._sslobj.read(len) [Fri Jul 19 12:50:05.146270 2019] [wsgi:error] [pid 765:tid 139635065624320] [remote ::1:59544] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2488) ```` @abompard @jcline does this ring a bell to you?

pingou commented

2019-07-19 12:57:10 +00:00

Author

Metadata Update from @pingou:

Assignee reset

**Metadata Update from @pingou**: - Assignee reset

pingou commented

2019-07-22 08:40:28 +00:00

Author

Note that this stacktrace doesn't happen all the time, most often all that is in the logs are Channel is closed.

Note that this stacktrace doesn't happen all the time, most often all that is in the logs are ``Channel is closed.``

pingou commented

2019-07-22 08:40:28 +00:00

Author

Metadata Update from @pingou:

Issue assigned to gnaponie

**Metadata Update from @pingou**: - Issue assigned to gnaponie

pingou commented

2019-07-22 08:40:50 +00:00

Author

Metadata Update from @pingou:

Assignee reset

**Metadata Update from @pingou**: - Assignee reset

abompard commented

2019-07-22 08:42:40 +00:00

Yes, this does ring a bell. Which version of fedora-messaging are you running?

pingou commented

2019-07-22 08:46:35 +00:00

Author

# rpm -q python3-fedora-messaging
python3-fedora-messaging-1.6.0-1.fc29.noarch

```` # rpm -q python3-fedora-messaging python3-fedora-messaging-1.6.0-1.fc29.noarch ````

pingou commented

2019-07-22 08:48:20 +00:00

Author

I've upgraded to 1.7.1, let's see if this happens again :)

abompard commented

2019-07-22 09:24:37 +00:00

Yes this should fix it, see https://github.com/fedora-infra/fedora-messaging/issues/175

pingou commented

2019-07-22 09:56:05 +00:00

Author

Thanks for the explanation, server side I no longer see the "Channel closed" error.

Let's consider this fixed then, thanks! :)

Thanks for the explanation, server side I no longer see the "Channel closed" error. Let's consider this fixed then, thanks! :)

pingou commented

2019-07-22 09:56:06 +00:00

Author

Metadata Update from @pingou:

Issue close_status updated to: Fixed
Issue status updated to: Closed (was: Open)

**Metadata Update from @pingou**: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)

Rows
Columns

Expected resultsdb notification not found #8011