Improvements #13676
openSelf-deleting Http::Client
0%
Description
I am in a situation where I need a Http::Client to perform a web request (a payment process to be precise), which is triggered from within an application by the user. The request sends relevant data to an external web server and receives a response. As you can imagine, it is important that the response is properly handled. Due to the asynchronous nature of the web request, this raises some questions regarding the lifetime of the Http::Client. Storing it within the application is not viable, since the application (and thus the client) might be destroyed before the response is received. Storing it in the server itself is possible, but raises the question how and when to delete the client, since keeping it as long as the server is running seems unreasonable. Since there seems to be no direct way to check whether a client is done, a server-side polling solution seems unviable. The only possibility would be to connect a handler to the done
signal to set a flag that the client is done, and regularly checking this flag, but this intoduces race conditions, since the server (due to threading issues) might detect the set flag and delete the client before the handler finished.
Long story short: When trying to find a solution, I came up with the idea of a self-deleting Http::Client, that can be used like this:
auto client = std::make_shared<Http::Client>();
client->done().connect([client](AsioWrapper::error_code ec, Http::Response& r) mutable
{
/* do something */
client.reset();
});
client->post(...);
The idea is to pass a shared_ptr
owning the client to the connected event handler, which ultimately deletes the client at the end of the handler. This way, the client is ensured to live as long as the request is pending and either succeeds or gets a timeout - both triggers the done
signal. This used to work until Wt 4.10.1 for the following reasons: An object is allowed to delete itself within a member function if the function does not access/modify the object after deleting it, since that would cause an access violation. Emitting the done
signal in Client::emitDone
is the last command in that function. As by the documentation, "the signal may be deleted at any time, in particular also while it is being emitted". Thus it is protected against self deletion and the Client
object is no longer accessed as soon as the signal is being emitted, making it safe for self-deletion, too.
Unfortunately, WT-11408 changed that by introducing the (perfectly fine and reasonable) lock_guard
/mutex
to the Client::emitDone
function, since the lock_guard
is destroyed at the end of the function, which has to release the mutex
and thus needs to access the client again, resulting in an access violation if the client already deleted itself from a connected handler. The same would happend, if the deletion happens anywhere else before the emitDone function is completely finished.
The proposed solution/improvement for this issue is to release the lock right after its job to protect the access to the _impl
pointer is done, right before emitting the done
signal. Thus, the client is no longer accessed as soon as emitting the signal starts, restoring the old behavior. I will create a pull request containing the change shortly. I also attached a small demo app to demonstrate the access violation with the current implementation.
I would be happy if you could merge that change into Wt. I am aware that self deleting objects may pose a risk if not handled with proper care, but it is also a powerful tool to handle the lifetime of asynchronous objects.
Best,
Steven
Files
Updated by Steven Köhler 3 months ago
I have created the corresponding pull request #230.
Updated by Matthias Van Ceulebroeck about 1 month ago · Edited
- Target version set to 4.12.1
Hello Steven,
from a glance, this looks fine. This maintains the intended behavior of the locks, and allows for some flexibility.
I have scheduled this for 4.12.1, as I will of course need to verify the constraints of #11408 still are matched.
Updated by Romain Mardulyn 13 days ago
- Status changed from New to InProgress
- Assignee set to Romain Mardulyn
Updated by Romain Mardulyn 9 days ago
- Status changed from InProgress to Review
- Assignee deleted (
Romain Mardulyn)
Updated by Romain Mardulyn 9 days ago
- Status changed from Review to InProgress
- Assignee set to Romain Mardulyn
Updated by Romain Mardulyn 5 days ago
Hello Steven,
The problem with the fix you proposed is that it can cause a race condition. While a signal can be deleted while being emitted, this is only safe when done from inside the same thread. The issue with Http::Client
is that a different thread could delete the Client
object while Client::emitDone
is executing in another thread. This leads to the signal being emitted and deleted concurrently, which is not thread-safe.
We don't plan to make signals thread-safe due to the complexity that this would introduce. Therefore, the emission of that signal must be guarded by a lock, as it currently is.
I'm sorry this breaks your self-deleting Http::Client. I hope you'll be able to find an alternative solution to your issue.
Best,
Romain
Updated by Steven Köhler 4 days ago
Hello Romain,
thanks for your reply. That's not exactly what I have hoped for...
Your statement, that Client::emitDone
and deleting a Client
might happen concurrently, made me skeptical. So I did a quick check and I'm afraid, there is a much bigger problem at hand, if those calls can happen concurrently. I'll try to elaborate. Assume we have two threads A and B, where A ist calling emitDone
and B is destroying the Client
and lets further assume, that they are called concurrently. Now there are two options:
- A grabs the
implementationMutex_
, B has to wait: In this case emitting thedone
signal is finished before the mutex is unlocked and the destructor can grab it, which works fine - albeit by luck. - B grabs the
implementationMutex_
, A has to wait: This case is highly problematic, since the mutex is only grabbed in theabort
function, which is called first in the destructor. As soon asabort
is done the mutex is unlocked - before the rest of the destructor runs. At this point there is nothing to prevent A from grabbing the mutex. This induces several issues. The first one is the concurrent call to `impl_.lock()
in the destructor and the call toimpl_.reset()
inemitDone
. If that works by chance, the destructor begins to destroy theClient
member objects like thedone_
signal, whileemitDone
will try to call it. Depending on what happens first or how the thread scheduler interleaves it, this might have various ill behaviors. If by chance the destructor manages to finish before A continues, A will try to lock the (by now) deleted mutex, which is undefined behavior. If, by any chance, that works in our favor, accessing theimpl_
,redirectCount_
ordone_
members will wreak havoc, since they have been deleted at that point.
Summarizing: There already is a race condition in Client
. Even if signals were thread safe, it would provide no protection against it. Explicitly locking the mutex in the destructor will not prevent that, since it has to be unlocked at the end of the destructor body - before the members are deleted. Not unlocking it would mean undefined behavior in the mutex destructor. What my change does is merely allowing the race condition to surface while emitting the done signal, since the mutex is unlocked earlier - but it does not cause the underlying problem.
Since there is no way for an object to check if it was deleted, the described 'use after free' issue if emitDone
executes after the destructor finished cannot be solved inside the Client
class, but needs proper external synchronisation.
Best,
Steven
PS: I added a conversation with ChatGPTs o4-mini-high model to double check my suspicion. https://chatgpt.com/share/68494307-dc60-8005-bfbc-908c1fc413b8
Updated by Romain Mardulyn 3 days ago
Hello Steven,
Thanks for your detailed response. After a closer look, I agree with your finding: there was indeed a race condition because impl_.lock() wasn't guarded by implementationMutex_. Fortunately, simply protecting that specific line with implementationMutex_ is enough to fix that issue.
Regarding your broader concern about the Http::Client object's destruction while Client::emitDone() is running, there's a crucial distinction. Client::emitDone() is guarded by Impl::updateMutex_, which Impl::removeClient() (involved in client destruction) also acquires. This design effectively prevents the Http::Client instance from being destroyed while its impl_ object is actively calling one of its methods.
However, the problem with Client::emitDone() lies in its specific action: it resets the impl_ pointer. When this happens, the Impl::updateMutex_ is no longer protecting the Http::Client's lifetime from external destruction. This is precisely why implementationMutex_ must remain locked until the very end of Client::emitDone(). It effectively takes over the role of blocking the Http::Client's destruction once impl_ has been reset, ensuring the client object remains valid for the duration of the function's execution.
Best,
Romain
Updated by Steven Köhler 3 days ago
Hello Romain,
thanks again for your reply. I took another, more detailed look at the whole implementation. I found the clientMutex_
you mentioned and it indeed is locked in Client::Impl::emitDone
and Client::Impl::removeClient
, thus preventing the Client
destructor to continue beyond the impl->removeClient();
line in Client::~Client()
while Client::emitDone
runs - assuming impl_
was not yet reset, which indeed could be ensured with properly locking the mutex.
However, since Client::impl_
is a std::weak_ptr
, it does not own the Client::Impl
instance. Resetting it in Client::emitDone
does not delete the Client::Impl
instance, but only loses access to it. Thus, clientMutex_
is not affected by resetting the impl_
ptr, which leads to Client::emitDone
is completey protected by clientMutex_
- especially since Client::Impl::emitDone
properly locks the mutex and is the only place where Client::emitDone
is called. Thus, your statement that clientMutex_
cannot fully protect Client::emitDone
seems false.
Interestingly, the Client::Impl
instance cannot be deleted from inside of Client
, since it has no shared_ptr
to it, but only a weak_ptr
. There are, however, serveral strand_.wrap(std::bind(...))
calls in Client::Impl
that pass a copy of the actual Client::Impl
shared_ptr, so its lifetime extends until the last of those function objects is deleted, after it has been executed. No lifetime issues there. But this also means that Client::Impl
can potentially outlive Client
. That causes the race issues, because the longer living Client::Impl
might try to access the shorter living Client
.
A rather straight forward solution would be to wrap all of Client
s relevant data in a helper struct, let's say Client::Data
and have Client
hold a shared_ptr to it, as well as Client::Impl
. That way, the lifetime of all signals and other data would be ensured to match the Client::Impl
lifetime, resolving all potential race conditions from prematurely deleting Client
at once. There are still a few minor details to work out, like maybe adding helper functions to the Data
struct, that are called from Client
as well as from Client::Impl
to ensure they are still available after Client
was destructed, but I think you get the idea.
What do you think?
Best,
Steven