Project

General

Profile

Actions

Bug #11302

closed

Race when closing HTTP listening sockets can result in use-after-free

Added by Bruce Toll almost 2 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Target version:
Start date:
02/02/2023
Due date:
% Done:

100%

Estimated time:

Description

There appears to be a race condition in http/Server.C that can cause a timing-dependent use-after-free.

I've attached a test case (http_server_clean_close) that attempts to drive the issue. In my testing, both ASan and valgrind reliably report the memory issue. The test may also crash with a memory access vioation when run normally. I believe the root issue is a race where Asio can attempt to access the listener after it has been deleted. The listeners should not be deleted until Asio confirms the listener socket close.

I've attached a patch that attempts to address this issue and should pass the new http_server_clean_close test, when applied in conjunction with the patches on issue #11301.

A second patch attempts to handle SSL listeners analogously. I don't have experience testing Wt with SSL, so this patch is completely untested. I also did not test multiple listeners.
It also seems that Server::resume does not use accept_strand_.wrap() and I wonder if it should? In any case, the resume functionality is another area that I did not test.

There are additional notes on the patch files. Testing was done with wt 4.9.1.

See also #11301
Possibly related to issue #10507.


Files


Related issues 2 (0 open2 closed)

Related to Bug #11408: Concurrency issues that thread sanitizer foundClosedMatthias Van Ceulebroeck03/07/2023

Actions
Follows Bug #11301: test.http server race can result in use-after-freeClosedRoel Standaert02/01/2023

Actions
Actions #1

Updated by Bruce Toll almost 2 years ago

I made a mistake in the originally attached file "0003-Add-test-for-use-after-free-race-on-listener-close.patch". I've attached a revised version: "0003-Add-test-for-use-after-free-race-on-listener-close_REV01.patch". The Server::stop method was only used by the newly added http_server_clean_close test and I re-tested to confirm that it still fails prior to incorporating 0004-Track-closing-tcp-connections-to-avoid-race.patch and succeeds after. Sorry about that!

Prior to finding/fixing this mistake, I did some additional sanity checking of the patch in 0004-Track-closing-tcp-connections-to-avoid-race.patch with the rr debugger on a run of hello.wt with two listeners and it seemed to be working as expected -- although it doesn't exercise all of the code paths.

If I can provide any additional information, just let me know.

Actions #2

Updated by Roel Standaert almost 2 years ago

  • Target version set to 4.9.2
Actions #3

Updated by Roel Standaert over 1 year ago

  • Related to Bug #11408: Concurrency issues that thread sanitizer found added
Actions #4

Updated by Roel Standaert over 1 year ago

  • Due date set to 02/02/2023
  • Start date changed from 02/01/2023 to 02/02/2023
  • Follows Bug #11301: test.http server race can result in use-after-free added
Actions #5

Updated by Roel Standaert over 1 year ago

  • Due date deleted (02/02/2023)
Actions #6

Updated by Roel Standaert over 1 year ago

I think another solution for this would be to use a shared_ptr to extend the lifetime (see attached patch).

Actions #7

Updated by Bruce Toll over 1 year ago

That's a clever approach. It seems to work well.

Actions #8

Updated by Roel Standaert over 1 year ago

I'm going to try not to accidentally comment in the #11301 thread, so continuing from #11301-17: I haven't been able to reproduce that issue so far.

Actions #9

Updated by Roel Standaert over 1 year ago

  • Status changed from New to InProgress
  • Assignee set to Roel Standaert
Actions #10

Updated by Bruce Toll over 1 year ago

Also following-up on #11301-17, I'll clean-up/attach the set of patches I was using on 4.9.1, and repeat the test w/valgrind and provide output here. It's certainly possible I made a mistake. I may not get it done before the weekend, though.

Actions #12

Updated by Bruce Toll over 1 year ago

As a follow-up, I added w20230310b_TEST_11302-avoid-use-after-free-with-closed-listeners.patch which has a series of 5 commits that can be applied to Wt 4.9.1 with git am. I annotated the commits with links to the Wt Redmine attachments, where applicable. Here is a summary of the commits:

  1. WT-11301: add WServer::addResource taking a shared_ptr
  2. WT-11302: avoid use-after-free with closed listeners (the weak_ptr version)

commits 3-5. add test code, including a new test.http case: http_server_clean_close

Using git am to apply the commits on top of wt 4.9.1, I then ran test.http under valgrind with the command line in: valgrind_command_20230310a.txt. The output of the command is also in that attachment.
The corresponding valgrind output is in: valgrind_80332.log

Here is a copy (from #11301-17) of what I think may be happening:

It seems that there is still a use-after-free, even though handleTcpAccept uses a weak_ptr to avoid accessing the passed TcpListener if it has already been deleted in handleStop.

I believe the problem is that asio, itself, accesses data through the TcpListener after handleStop does the socket close. This asio access can happen asynchronously after the TcpListener is deleted by the tcp_listeners_.clear() and before the subsequent call to handleTcpAccept.

One example from valgrind was an access 952 bytes into a 1024 byte block deleted from http::server::TcpConnection::~TcpConnection (TcpConnection.h:33) which I believe is an access to boost::asio::ip::tcp::socket socket_. The TcpConnection was deleted as part of the TcpListener from handleStop().

As a sanity check, I ran the same valgrind commmand on three other builds:

  1. Reverting the second commit in the series (WT-11302: avoid use-after-free with closed listeners -- the weak_ptr version). This logged many more valgrind errors and the test.http failed with a fatai error: signal: SIGABRT.
  2. Adding the earlier shared pointer patch from this issue 0001-WT-11302-extend-lifetime-of-closed-listeners.patch. This did not log any valgrind errors.
  3. Replacing the shared pointer patch with just the tcp connections patch 0004-Track-closing-tcp-connections-to-avoid-race.patch. This also did not log any valgrind errors.

I really appreciate the time and effort you've put into resolving this issue.

Actions #13

Updated by Matthias Van Ceulebroeck over 1 year ago

  • Target version changed from 4.10.0 to 4.10.1
Actions #14

Updated by Matthias Van Ceulebroeck over 1 year ago

  • Assignee changed from Roel Standaert to Matthias Van Ceulebroeck
Actions #15

Updated by Matthias Van Ceulebroeck over 1 year ago

  • Status changed from InProgress to Review
  • Assignee deleted (Matthias Van Ceulebroeck)
Actions #16

Updated by Matthias Van Ceulebroeck over 1 year ago

  • Assignee set to Marnik Roosen
Actions #17

Updated by Matthias Van Ceulebroeck over 1 year ago

  • Status changed from Review to Implemented @Emweb
  • Assignee changed from Marnik Roosen to Matthias Van Ceulebroeck
Actions #18

Updated by Matthias Van Ceulebroeck over 1 year ago

  • % Done changed from 0 to 100
Actions #19

Updated by Matthias Van Ceulebroeck about 1 year ago

  • Status changed from Implemented @Emweb to Closed
Actions

Also available in: Atom PDF