Bug #11301
closedtest.http server race can result in use-after-free
100%
Description
There appears to be a race condition in the test.http Server that can cause a timing-dependent use-after-free. On my system with wt-4.9.1, valgrind-3.19.0, this can be observed with:
valgrind --fair-sched=yes --leak-check=full --num-callers=30 --log-file=valgrind_%p.log ./test.http -r detailed --run_test=http_client_server_test4
The issue does not appear with an ASan build. I think the difference is that valgrind greatly affects the timing which makes the race visible. As a result, it may be difficult to reproduce in different environments.
I've attached a patch with a test case (http_client_server_clean_shutdown) that attempts to more reliably drive one of the issues from this http_client_server_test4 test case. In my testing, both ASan and valgrind report the memory leak. It may be necessary to adjust some of the constants to get it to reproduce in another environment.
I believe the root issue is with the test Server framework which destructs the test resource before the base class WServer.
I've attached a patch that addresses this issue and should pass the new http_client_server_clean_shutdown test. However, there is a second issue in http/Server.C that will still result in a use-after-free on http_client_server_test4 under valgrind. That issue is not limited to the test framework, and I will open a separate issue report for it.
There are additional notes on the patch files. Testing was done with wt 4.9.1.
Possibly related to issue #10507.
Files
Updated by Roel Standaert over 1 year ago
I think we simply need to call stop()
in the destructor of Server
.
Updated by Bruce Toll over 1 year ago
Thanks for following-up. I didn't re-test, but I agree that calling stop() seems like a better alternative for test.http -- it is less convoluted and reflects the way that WServer is normally used.
There's a note in the documentation for WServer::~WServer() about the importance of calling stop() explicitly in order to catch server exceptions. Perhaps, a note could be added to the WServer documentation about the need to call stop() before destructing any static resources with entry points that have been added to the server?
As a possible enhancement, perhaps a vector of observing_ptrs could be used in the server to track static resources that have been added via addResource. Then, server stop() could throw an exception or log an error if any of the resources have already been deleted. This is just a rough idea. I'm not sure if it is actually practical or worth the effort.
Updated by Roel Standaert over 1 year ago
I suppose there would still be a dangling pointer:
stop()
is called- resource is deleted (entrypoint's resource pointer becomes dangling)
- server is deleted
Now, at the moment we're not actually dereferencing that pointer, but it would probably be best to either remove the entrypoint or let the resource outlive the server. So I think the destructor should be:
~Server() override
{
if (isRunning()) {
stop();
}
removeEntryPoint("/test");
}
I agree that the expectation of the resource outliving the server should be documented. It's probably best to do that in addResource
.
a vector of observing_ptrs
We'd have to be very careful with thread safety then, though. I've noticed that it's far too easy to accidentally copy an observing_ptr
in a non thread safe manner (see e.g. issue #7617).
Updated by Roel Standaert over 1 year ago
I think perhaps we should add an addResource
that takes a std::shared_ptr<WResource>
and deprecate the old one that takes a raw pointer.
Updated by Bruce Toll over 1 year ago
I really like your idea of providing an addResource that accepts a shared_ptr. It should be easy to update test.http to use the new addResource call, addressing this issue cleanly with no other changes. Outside of the test framework, I suspect that most existing code is not affected by the issue if it is structured along the lines of a Wt example or uses WRun. But, using a shared_ptr seems like a more elegant approach that precludes misuse.
Updated by Roel Standaert over 1 year ago
- Related to Bug #11408: Concurrency issues that thread sanitizer found added
Updated by Roel Standaert over 1 year ago
- Target version changed from 4.9.2 to 4.10.0
I'll be targeting Wt 4.10.0, and introducing the addResource
that takes a std::shared_ptr
and deprecates the old one.
Updated by Roel Standaert over 1 year ago
- Precedes Bug #11302: Race when closing HTTP listening sockets can result in use-after-free added
Updated by Roel Standaert over 1 year ago
- Status changed from New to InProgress
- Assignee set to Roel Standaert
Updated by Roel Standaert over 1 year ago
- Status changed from InProgress to Review
- Assignee deleted (
Roel Standaert)
Updated by Roel Standaert over 1 year ago
- File 0002-WT-11301-add-WServer-addResource-taking-a-shared_ptr.patch 0002-WT-11301-add-WServer-addResource-taking-a-shared_ptr.patch added
The fix is now in internal review. If you want to, you can also review the attached patch.
Updated by Bruce Toll over 1 year ago
Thanks for providing the patch! It looks good to me. It fixes the reported issue and I think the updated API is a great improvement. The documentation was clear and helpful.
One trivial comment: The existing note on removeEntryPoint could be updated to reflect the new API, e.g. "if the entry point was added through the deprecated addResource() as a raw pointer, it will not be deleted".
Updated by Roel Standaert over 1 year ago
This may stretch the lifetime a little too far, though. It's probably best to pass the listener to handleTcpAccept
or handleSslAccept
using a weak_ptr
.
Updated by Roel Standaert over 1 year ago
- File 0002-WT-11302-avoid-use-after-free-with-closed-listeners.patch 0002-WT-11302-avoid-use-after-free-with-closed-listeners.patch added
Attached is a solution that uses a weak_ptr
.
Updated by Bruce Toll over 1 year ago
NOTE: I think prior two comments may be intended for the closely related #11302.
Updated by Bruce Toll over 1 year ago
As a follow-up to #11301-15, I tested the weak_ptr patch 0002-WT-11302-avoid-use-after-free-with-closed-listeners.patch with valgrind. It seems that there is still a use-after-free, even though handleTcpAccept uses a weak_ptr to avoid accessing the passed TcpListener if it has already been deleted in handleStop.
I believe the problem is that asio, itself, accesses data through the TcpListener after handleStop does the socket close. This asio access can happen asynchronously after the TcpListener is deleted by the tcp_listeners_.clear() and before the subsequent call to handleTcpAccept.
One example from valgrind was an access 952 bytes into a 1024 byte block deleted from http::server::TcpConnection::~TcpConnection (TcpConnection.h:33) which I believe is an access to boost::asio::ip::tcp::socket socket_. The TcpConnection was deleted as part of the TcpListener from handleStop().
If you have trouble reproducing the issue, I can provide valgrind output, but its based on 4.9.1 with a set of patches including a modified version of my http_server_clean_close test.
Updated by Roel Standaert over 1 year ago
Updated by Roel Standaert over 1 year ago
- Status changed from Review to Implemented @Emweb
- Assignee set to Roel Standaert
- % Done changed from 0 to 100
Updated by Roel Standaert over 1 year ago
- Status changed from Implemented @Emweb to Resolved
Updated by Matthias Van Ceulebroeck over 1 year ago
- Status changed from Resolved to Closed