Improvements #13676
openSelf-deleting Http::Client
0%
Description
I am in a situation where I need a Http::Client to perform a web request (a payment process to be precise), which is triggered from within an application by the user. The request sends relevant data to an external web server and receives a response. As you can imagine, it is important that the response is properly handled. Due to the asynchronous nature of the web request, this raises some questions regarding the lifetime of the Http::Client. Storing it within the application is not viable, since the application (and thus the client) might be destroyed before the response is received. Storing it in the server itself is possible, but raises the question how and when to delete the client, since keeping it as long as the server is running seems unreasonable. Since there seems to be no direct way to check whether a client is done, a server-side polling solution seems unviable. The only possibility would be to connect a handler to the done
signal to set a flag that the client is done, and regularly checking this flag, but this intoduces race conditions, since the server (due to threading issues) might detect the set flag and delete the client before the handler finished.
Long story short: When trying to find a solution, I came up with the idea of a self-deleting Http::Client, that can be used like this:
auto client = std::make_shared<Http::Client>();
client->done().connect([client](AsioWrapper::error_code ec, Http::Response& r) mutable
{
/* do something */
client.reset();
});
client->post(...);
The idea is to pass a shared_ptr
owning the client to the connected event handler, which ultimately deletes the client at the end of the handler. This way, the client is ensured to live as long as the request is pending and either succeeds or gets a timeout - both triggers the done
signal. This used to work until Wt 4.10.1 for the following reasons: An object is allowed to delete itself within a member function if the function does not access/modify the object after deleting it, since that would cause an access violation. Emitting the done
signal in Client::emitDone
is the last command in that function. As by the documentation, "the signal may be deleted at any time, in particular also while it is being emitted". Thus it is protected against self deletion and the Client
object is no longer accessed as soon as the signal is being emitted, making it safe for self-deletion, too.
Unfortunately, WT-11408 changed that by introducing the (perfectly fine and reasonable) lock_guard
/mutex
to the Client::emitDone
function, since the lock_guard
is destroyed at the end of the function, which has to release the mutex
and thus needs to access the client again, resulting in an access violation if the client already deleted itself from a connected handler. The same would happend, if the deletion happens anywhere else before the emitDone function is completely finished.
The proposed solution/improvement for this issue is to release the lock right after its job to protect the access to the _impl
pointer is done, right before emitting the done
signal. Thus, the client is no longer accessed as soon as emitting the signal starts, restoring the old behavior. I will create a pull request containing the change shortly. I also attached a small demo app to demonstrate the access violation with the current implementation.
I would be happy if you could merge that change into Wt. I am aware that self deleting objects may pose a risk if not handled with proper care, but it is also a powerful tool to handle the lifetime of asynchronous objects.
Best,
Steven
Files
Updated by Steven Köhler 3 months ago
I have created the corresponding pull request #230.
Updated by Matthias Van Ceulebroeck about 2 months ago · Edited
- Target version set to 4.12.1
Hello Steven,
from a glance, this looks fine. This maintains the intended behavior of the locks, and allows for some flexibility.
I have scheduled this for 4.12.1, as I will of course need to verify the constraints of #11408 still are matched.
Updated by Romain Mardulyn 30 days ago
- Status changed from New to InProgress
- Assignee set to Romain Mardulyn
Updated by Romain Mardulyn 27 days ago
- Status changed from InProgress to Review
- Assignee deleted (
Romain Mardulyn)
Updated by Romain Mardulyn 27 days ago
- Status changed from Review to InProgress
- Assignee set to Romain Mardulyn
Updated by Romain Mardulyn 22 days ago
Hello Steven,
The problem with the fix you proposed is that it can cause a race condition. While a signal can be deleted while being emitted, this is only safe when done from inside the same thread. The issue with Http::Client
is that a different thread could delete the Client
object while Client::emitDone
is executing in another thread. This leads to the signal being emitted and deleted concurrently, which is not thread-safe.
We don't plan to make signals thread-safe due to the complexity that this would introduce. Therefore, the emission of that signal must be guarded by a lock, as it currently is.
I'm sorry this breaks your self-deleting Http::Client. I hope you'll be able to find an alternative solution to your issue.
Best,
Romain
Updated by Steven Köhler 21 days ago
Hello Romain,
thanks for your reply. That's not exactly what I have hoped for...
Your statement, that Client::emitDone
and deleting a Client
might happen concurrently, made me skeptical. So I did a quick check and I'm afraid, there is a much bigger problem at hand, if those calls can happen concurrently. I'll try to elaborate. Assume we have two threads A and B, where A ist calling emitDone
and B is destroying the Client
and lets further assume, that they are called concurrently. Now there are two options:
- A grabs the
implementationMutex_
, B has to wait: In this case emitting thedone
signal is finished before the mutex is unlocked and the destructor can grab it, which works fine - albeit by luck. - B grabs the
implementationMutex_
, A has to wait: This case is highly problematic, since the mutex is only grabbed in theabort
function, which is called first in the destructor. As soon asabort
is done the mutex is unlocked - before the rest of the destructor runs. At this point there is nothing to prevent A from grabbing the mutex. This induces several issues. The first one is the concurrent call to `impl_.lock()
in the destructor and the call toimpl_.reset()
inemitDone
. If that works by chance, the destructor begins to destroy theClient
member objects like thedone_
signal, whileemitDone
will try to call it. Depending on what happens first or how the thread scheduler interleaves it, this might have various ill behaviors. If by chance the destructor manages to finish before A continues, A will try to lock the (by now) deleted mutex, which is undefined behavior. If, by any chance, that works in our favor, accessing theimpl_
,redirectCount_
ordone_
members will wreak havoc, since they have been deleted at that point.
Summarizing: There already is a race condition in Client
. Even if signals were thread safe, it would provide no protection against it. Explicitly locking the mutex in the destructor will not prevent that, since it has to be unlocked at the end of the destructor body - before the members are deleted. Not unlocking it would mean undefined behavior in the mutex destructor. What my change does is merely allowing the race condition to surface while emitting the done signal, since the mutex is unlocked earlier - but it does not cause the underlying problem.
Since there is no way for an object to check if it was deleted, the described 'use after free' issue if emitDone
executes after the destructor finished cannot be solved inside the Client
class, but needs proper external synchronisation.
Best,
Steven
PS: I added a conversation with ChatGPTs o4-mini-high model to double check my suspicion. https://chatgpt.com/share/68494307-dc60-8005-bfbc-908c1fc413b8
Updated by Romain Mardulyn 20 days ago
Hello Steven,
Thanks for your detailed response. After a closer look, I agree with your finding: there was indeed a race condition because impl_.lock() wasn't guarded by implementationMutex_. Fortunately, simply protecting that specific line with implementationMutex_ is enough to fix that issue.
Regarding your broader concern about the Http::Client object's destruction while Client::emitDone() is running, there's a crucial distinction. Client::emitDone() is guarded by Impl::updateMutex_, which Impl::removeClient() (involved in client destruction) also acquires. This design effectively prevents the Http::Client instance from being destroyed while its impl_ object is actively calling one of its methods.
However, the problem with Client::emitDone() lies in its specific action: it resets the impl_ pointer. When this happens, the Impl::updateMutex_ is no longer protecting the Http::Client's lifetime from external destruction. This is precisely why implementationMutex_ must remain locked until the very end of Client::emitDone(). It effectively takes over the role of blocking the Http::Client's destruction once impl_ has been reset, ensuring the client object remains valid for the duration of the function's execution.
Best,
Romain
Updated by Steven Köhler 20 days ago
Hello Romain,
thanks again for your reply. I took another, more detailed look at the whole implementation. I found the clientMutex_
you mentioned and it indeed is locked in Client::Impl::emitDone
and Client::Impl::removeClient
, thus preventing the Client
destructor to continue beyond the impl->removeClient();
line in Client::~Client()
while Client::emitDone
runs - assuming impl_
was not yet reset, which indeed could be ensured with properly locking the mutex.
However, since Client::impl_
is a std::weak_ptr
, it does not own the Client::Impl
instance. Resetting it in Client::emitDone
does not delete the Client::Impl
instance, but only loses access to it. Thus, clientMutex_
is not affected by resetting the impl_
ptr, which leads to Client::emitDone
is completey protected by clientMutex_
- especially since Client::Impl::emitDone
properly locks the mutex and is the only place where Client::emitDone
is called. Thus, your statement that clientMutex_
cannot fully protect Client::emitDone
seems false.
Interestingly, the Client::Impl
instance cannot be deleted from inside of Client
, since it has no shared_ptr
to it, but only a weak_ptr
. There are, however, serveral strand_.wrap(std::bind(...))
calls in Client::Impl
that pass a copy of the actual Client::Impl
shared_ptr, so its lifetime extends until the last of those function objects is deleted, after it has been executed. No lifetime issues there. But this also means that Client::Impl
can potentially outlive Client
. That causes the race issues, because the longer living Client::Impl
might try to access the shorter living Client
.
A rather straight forward solution would be to wrap all of Client
s relevant data in a helper struct, let's say Client::Data
and have Client
hold a shared_ptr to it, as well as Client::Impl
. That way, the lifetime of all signals and other data would be ensured to match the Client::Impl
lifetime, resolving all potential race conditions from prematurely deleting Client
at once. There are still a few minor details to work out, like maybe adding helper functions to the Data
struct, that are called from Client
as well as from Client::Impl
to ensure they are still available after Client
was destructed, but I think you get the idea.
What do you think?
Best,
Steven
Updated by Romain Mardulyn 16 days ago
Hello Steven,
The scenario you described isn't possible. The Impl::emitDone
call, which causes the Client
to reset its impl
pointer, only happens once the request has either completed or failed. This means there won't be any impl_
function calls remaining in the strand_
(with the possible exception of Impl::stop
).
Regarding your proposed solution, having the done_
signal potentially emitted after the Client
's destruction is problematic and goes against the fundamental design of a signal. A signal is meant to be emitted by its associated object; therefore, it doesn't make sense for done_
to be emitted after the Client
object no longer exists.
While this does break your self-deleting Client
, the underlying problem it aimed to solve can now be effectively addressed. As you initially suggested, you can regularly check for a flag set by the done_
signal. This approach is now viable since the Client
cannot be deleted anymore before the signal handler has finished executing.
Best,
Romain
Updated by Steven Köhler 16 days ago
Hi Romain,
thanks for your reply and sorry for taking up so much of your time. I just wanted to point out that the Impl
s lifetime is acutually bound to the strand, and not the Client
instance itself, and hence the lifetime issues occur. Your are, of course, absolutely right that emittig the signal after the Client
is deleted is not reasonable - my intention was merely to prolong its lifetime until emitDone
is finished (if it already started) - after that, impl_
is destroyed within the strand and everything would be cleanud up nicely. And since Impl
checks the client_
pointer before actually calling any of its functions, deleting the client somewhat earlier would result in client_ == nullptr
and thus skip the calls to Client
- just as intended.
I'm still not convinced that guarding the Client
destructor with the mutex is enough to resolve the race issue, but I don't want to waste your time either. Would you mind sharing a preview of your intended fix with me (a small patch or something like this)? Then I could test it more thoroughly and either find that it works as intended and subsequently try to integrate it into a solution for my initially described issue - which I'm sure you'll understand is important to me to work properly - or get proof that the issue is still there, in which case another solution might need to be considered.
Best,
Steven
Updated by Romain Mardulyn 16 days ago
- File HttpClient.patch HttpClient.patch added
Hi Steven,
Here's the patch I've put together that I believe resolves the issue. Please note that this is still subject to change, as Matthias needs to validate it.
Let us know if you find the issue is not completely solved or if this patch introduces any new problems.
Thanks again for your valuable input on this; it's been very helpful.
Best,
Romain
Updated by Romain Mardulyn 12 days ago
- Status changed from InProgress to Review