Bug #11302
closedRace when closing HTTP listening sockets can result in use-after-free
100%
Description
There appears to be a race condition in http/Server.C that can cause a timing-dependent use-after-free.
I've attached a test case (http_server_clean_close) that attempts to drive the issue. In my testing, both ASan and valgrind reliably report the memory issue. The test may also crash with a memory access vioation when run normally. I believe the root issue is a race where Asio can attempt to access the listener after it has been deleted. The listeners should not be deleted until Asio confirms the listener socket close.
I've attached a patch that attempts to address this issue and should pass the new http_server_clean_close test, when applied in conjunction with the patches on issue #11301.
A second patch attempts to handle SSL listeners analogously. I don't have experience testing Wt with SSL, so this patch is completely untested. I also did not test multiple listeners.
It also seems that Server::resume does not use accept_strand_.wrap() and I wonder if it should? In any case, the resume functionality is another area that I did not test.
There are additional notes on the patch files. Testing was done with wt 4.9.1.
Files
Updated by Bruce Toll almost 2 years ago
- File 0003-Add-test-for-use-after-free-race-on-listener-close_REV01.patch 0003-Add-test-for-use-after-free-race-on-listener-close_REV01.patch added
I made a mistake in the originally attached file "0003-Add-test-for-use-after-free-race-on-listener-close.patch". I've attached a revised version: "0003-Add-test-for-use-after-free-race-on-listener-close_REV01.patch". The Server::stop method was only used by the newly added http_server_clean_close test and I re-tested to confirm that it still fails prior to incorporating 0004-Track-closing-tcp-connections-to-avoid-race.patch and succeeds after. Sorry about that!
Prior to finding/fixing this mistake, I did some additional sanity checking of the patch in 0004-Track-closing-tcp-connections-to-avoid-race.patch with the rr debugger on a run of hello.wt with two listeners and it seemed to be working as expected -- although it doesn't exercise all of the code paths.
If I can provide any additional information, just let me know.
Updated by Roel Standaert over 1 year ago
- Related to Bug #11408: Concurrency issues that thread sanitizer found added
Updated by Roel Standaert over 1 year ago
- Due date set to 02/02/2023
- Start date changed from 02/01/2023 to 02/02/2023
- Follows Bug #11301: test.http server race can result in use-after-free added
Updated by Roel Standaert over 1 year ago
- File 0001-WT-11302-extend-lifetime-of-closed-listeners.patch 0001-WT-11302-extend-lifetime-of-closed-listeners.patch added
- Target version changed from 4.9.2 to 4.10.0
I think another solution for this would be to use a shared_ptr
to extend the lifetime (see attached patch).
Updated by Bruce Toll over 1 year ago
That's a clever approach. It seems to work well.
Updated by Roel Standaert over 1 year ago
Updated by Roel Standaert over 1 year ago
- Status changed from New to InProgress
- Assignee set to Roel Standaert
Updated by Bruce Toll over 1 year ago
Also following-up on #11301-17, I'll clean-up/attach the set of patches I was using on 4.9.1, and repeat the test w/valgrind and provide output here. It's certainly possible I made a mistake. I may not get it done before the weekend, though.
Updated by Bruce Toll over 1 year ago
Updated by Bruce Toll over 1 year ago
As a follow-up, I added w20230310b_TEST_11302-avoid-use-after-free-with-closed-listeners.patch which has a series of 5 commits that can be applied to Wt 4.9.1 with git am. I annotated the commits with links to the Wt Redmine attachments, where applicable. Here is a summary of the commits:
- WT-11301: add WServer::addResource taking a shared_ptr
- WT-11302: avoid use-after-free with closed listeners (the weak_ptr version)
commits 3-5. add test code, including a new test.http case: http_server_clean_close
Using git am to apply the commits on top of wt 4.9.1, I then ran test.http under valgrind with the command line in: valgrind_command_20230310a.txt. The output of the command is also in that attachment.
The corresponding valgrind output is in: valgrind_80332.log
Here is a copy (from #11301-17) of what I think may be happening:
It seems that there is still a use-after-free, even though handleTcpAccept uses a weak_ptr to avoid accessing the passed TcpListener if it has already been deleted in handleStop.
I believe the problem is that asio, itself, accesses data through the TcpListener after handleStop does the socket close. This asio access can happen asynchronously after the TcpListener is deleted by the tcp_listeners_.clear() and before the subsequent call to handleTcpAccept.
One example from valgrind was an access 952 bytes into a 1024 byte block deleted from http::server::TcpConnection::~TcpConnection (TcpConnection.h:33) which I believe is an access to boost::asio::ip::tcp::socket socket_. The TcpConnection was deleted as part of the TcpListener from handleStop().
As a sanity check, I ran the same valgrind commmand on three other builds:
- Reverting the second commit in the series (WT-11302: avoid use-after-free with closed listeners -- the weak_ptr version). This logged many more valgrind errors and the test.http failed with a fatai error: signal: SIGABRT.
- Adding the earlier shared pointer patch from this issue 0001-WT-11302-extend-lifetime-of-closed-listeners.patch. This did not log any valgrind errors.
- Replacing the shared pointer patch with just the tcp connections patch 0004-Track-closing-tcp-connections-to-avoid-race.patch. This also did not log any valgrind errors.
I really appreciate the time and effort you've put into resolving this issue.
Updated by Matthias Van Ceulebroeck over 1 year ago
- Target version changed from 4.10.0 to 4.10.1
Updated by Matthias Van Ceulebroeck over 1 year ago
- Assignee changed from Roel Standaert to Matthias Van Ceulebroeck
Updated by Matthias Van Ceulebroeck over 1 year ago
- Status changed from InProgress to Review
- Assignee deleted (
Matthias Van Ceulebroeck)
Updated by Matthias Van Ceulebroeck over 1 year ago
- Assignee set to Marnik Roosen
Updated by Matthias Van Ceulebroeck over 1 year ago
- Status changed from Review to Implemented @Emweb
- Assignee changed from Marnik Roosen to Matthias Van Ceulebroeck
Updated by Matthias Van Ceulebroeck over 1 year ago
- % Done changed from 0 to 100
Updated by Matthias Van Ceulebroeck about 1 year ago
- Status changed from Implemented @Emweb to Closed