Bug #7093
openDeadlock when chaining http-requests
0%
Description
I have found some sort of deadlock when chaining http-requests.
The lock happens every once in a while (~ 100 requests).
When it occurs WtHttp does not serve additional requests until the initial one is completed.
It happens regardless of Wt::Http::Client (i tried wininet and winhttp as alternatives)
Windows 10
Wt-Version: 3.3.12
Boost: 1.69
MSVC 2017
Files
Updated by Roel Standaert over 5 years ago
- Status changed from New to Feedback
Isn't this just a result of you using up all of threads in the pool in that busy wait loop?
WServer has a fixed size thread pool. If all threads in the pool are doing a busy wait, there's no more threads for IO.
Updated by Marco Kinski over 5 years ago
- File main.cpp main.cpp added
- File ParallelThreads_success.PNG ParallelThreads_success.PNG added
- File ParallelThreads_failure.PNG ParallelThreads_failure.PNG added
It should only block 3- 4 threads.
- Connection from curl inside the wait loop
- HTTP Client outgoing
- HTTP Client ingoing
- session cleanup thread
I have updated the code.
I modified the example to use wApp which was not valid.
ParallelThreads_success.png is shown when singlestepping a successfull (not hangig) request in the debugger.
ParallelThreads_failure.png is shown when pausing a hanging request in the debugger.
I am not familiar with boost asio, maybe it's an easy problem.
Updated by Roel Standaert over 5 years ago
How many of those curl processes are you running at the same time when you observe this issue?
Updated by Marco Kinski over 5 years ago
one at a time, repeatedly:
$ while [ "$(curl -s 'http://localhost/FetchData')" == "done" ]; do echo -n '.'; sleep 1; done
After a request was hanging (and got aborted by a timeout situation in curl) the subsequent requests are processed normaly. Until its hanging again after (~100 request).
Updated by Marco Kinski over 5 years ago
Marco Kinski wrote:
one at a time, repeatedly:
$ while [ "$(curl -s 'http://localhost/FetchData')" == "done" ]; do echo -n '.'; sleep 1; doneAfter a request was hanging (and got aborted by a timeout situation) the subsequent requests are processed normaly. Until its hanging again after (~100 request).
Updated by Roel Standaert over 5 years ago
- File issue_7093.cpp issue_7093.cpp added
Wt submits tasks to a pool of by default 10 threads. If all of those 10 threads are busy, it can't do anything else.
I think maybe you're expecting it to only do one FetchData::handleRequest()
at a time? It's perfectly possible that 10 threads are in FetchData::handleRequest()
handling 10 requests at the same time. Of course, if all of those wait for another task to be completed using the same thread pool (either handling another request coming in, or the actual request being performed by the client), then they will hang.
So, it's not surprising to me that this deadlocks as a result of how FetchData::handleRequest()
is implemented. It is maybe a bit surprising to me to see where it is exactly hanging on that screenshot, I'd rather expect it to hang in the while (!done) sleep
loop.
The solution here is to not block. See the attachment to see how a continuation can be used instead.
Updated by Roel Standaert over 5 years ago
Or... do you mean that you are actually just doing that one while loop and nothing else? That would be strange.
Updated by Marco Kinski over 5 years ago
Its this one loop at the time it hangs.
The 9 other threads of the pool seem to wait for work.
Updated by Roel Standaert over 5 years ago
Ok, sorry for the confusion. I do observe this myself now. Not sure why it would hang like that, though.
Updated by Marco Kinski over 5 years ago
No problem, its realy a strange problem with lots of possibilities caused by wrong usage :-)
Updated by Marco Kinski over 5 years ago
This issue is very urgent for me. Any idea what's wrong is welcome.
The suggested fix is not fisible for me, the real world situation is much more complex.
It seems that any connection waiting to be processed can pause any parallel upcoming request.
Updated by Roel Standaert over 5 years ago
I frankly don't have a clue. I think it's absolutely bizarre, and I haven't found a solution for it yet. Actually making it so it doesn't do a busy wait (using a continuation) oddly did actually seem to fix it, though.
Updated by Marco Kinski over 4 years ago
- File HttpClientServerTest.C HttpClientServerTest.C added
I modified test/HttpClientServerTest.C to include a test. Hopefully this helps hunting it down.
Updated by Marco Kinski over 4 years ago
With the latest test code (HttpClientServerTest.C) I get an access violation. I am not sure if this is inside wt or a wrong usage of the Client class from the test. The access violation happens during destruction of Wt::Core::Impl::observer_info
which gets called parallel to Wt::Core::Impl::observer_info::removeObserver
.
I noticed that while the lockup happens the callstack of the running thread is always inside Wt::WReply::consumeRequestBody
. After removing the optimization (line 240) to handle requests directly instead of posting these to the strand, the lockup is gone.