Race when closing HTTP listening sockets can result in use-after-free
There appears to be a race condition in http/Server.C that can cause a timing-dependent use-after-free.
I've attached a test case (http_server_clean_close) that attempts to drive the issue. In my testing, both ASan and valgrind reliably report the memory issue. The test may also crash with a memory access vioation when run normally. I believe the root issue is a race where Asio can attempt to access the listener after it has been deleted. The listeners should not be deleted until Asio confirms the listener socket close.
I've attached a patch that attempts to address this issue and should pass the new http_server_clean_close test, when applied in conjunction with the patches on issue #11301.
A second patch attempts to handle SSL listeners analogously. I don't have experience testing Wt with SSL, so this patch is completely untested. I also did not test multiple listeners.
It also seems that Server::resume does not use accept_strand_.wrap() and I wonder if it should? In any case, the resume functionality is another area that I did not test.
There are additional notes on the patch files. Testing was done with wt 4.9.1.
Updated by Bruce Toll 10 months ago
- File 0003-Add-test-for-use-after-free-race-on-listener-close_REV01.patch 0003-Add-test-for-use-after-free-race-on-listener-close_REV01.patch added
I made a mistake in the originally attached file "0003-Add-test-for-use-after-free-race-on-listener-close.patch". I've attached a revised version: "0003-Add-test-for-use-after-free-race-on-listener-close_REV01.patch". The Server::stop method was only used by the newly added http_server_clean_close test and I re-tested to confirm that it still fails prior to incorporating 0004-Track-closing-tcp-connections-to-avoid-race.patch and succeeds after. Sorry about that!
Prior to finding/fixing this mistake, I did some additional sanity checking of the patch in 0004-Track-closing-tcp-connections-to-avoid-race.patch with the rr debugger on a run of hello.wt with two listeners and it seemed to be working as expected -- although it doesn't exercise all of the code paths.
If I can provide any additional information, just let me know.
Updated by Roel Standaert 9 months ago
- File 0001-WT-11302-extend-lifetime-of-closed-listeners.patch 0001-WT-11302-extend-lifetime-of-closed-listeners.patch added
- Target version changed from 4.9.2 to 4.10.0
I think another solution for this would be to use a
shared_ptr to extend the lifetime (see attached patch).
Updated by Bruce Toll 9 months ago
Updated by Bruce Toll 9 months ago
As a follow-up, I added w20230310b_TEST_11302-avoid-use-after-free-with-closed-listeners.patch which has a series of 5 commits that can be applied to Wt 4.9.1 with git am. I annotated the commits with links to the Wt Redmine attachments, where applicable. Here is a summary of the commits:
- WT-11301: add WServer::addResource taking a shared_ptr
- WT-11302: avoid use-after-free with closed listeners (the weak_ptr version)
commits 3-5. add test code, including a new test.http case: http_server_clean_close
Using git am to apply the commits on top of wt 4.9.1, I then ran test.http under valgrind with the command line in: valgrind_command_20230310a.txt. The output of the command is also in that attachment.
The corresponding valgrind output is in: valgrind_80332.log
Here is a copy (from #11301-17) of what I think may be happening:
It seems that there is still a use-after-free, even though handleTcpAccept uses a weak_ptr to avoid accessing the passed TcpListener if it has already been deleted in handleStop.
I believe the problem is that asio, itself, accesses data through the TcpListener after handleStop does the socket close. This asio access can happen asynchronously after the TcpListener is deleted by the tcp_listeners_.clear() and before the subsequent call to handleTcpAccept.
One example from valgrind was an access 952 bytes into a 1024 byte block deleted from http::server::TcpConnection::~TcpConnection (TcpConnection.h:33) which I believe is an access to boost::asio::ip::tcp::socket socket_. The TcpConnection was deleted as part of the TcpListener from handleStop().
As a sanity check, I ran the same valgrind commmand on three other builds:
- Reverting the second commit in the series (WT-11302: avoid use-after-free with closed listeners -- the weak_ptr version). This logged many more valgrind errors and the test.http failed with a fatai error: signal: SIGABRT.
- Adding the earlier shared pointer patch from this issue 0001-WT-11302-extend-lifetime-of-closed-listeners.patch. This did not log any valgrind errors.
- Replacing the shared pointer patch with just the tcp connections patch 0004-Track-closing-tcp-connections-to-avoid-race.patch. This also did not log any valgrind errors.
I really appreciate the time and effort you've put into resolving this issue.