Project

General

Profile

Actions

Bug #11301

closed

test.http server race can result in use-after-free

Added by Bruce Toll almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Roel Standaert
Target version:
Start date:
02/01/2023
Due date:
% Done:

100%

Estimated time:

Description

There appears to be a race condition in the test.http Server that can cause a timing-dependent use-after-free. On my system with wt-4.9.1, valgrind-3.19.0, this can be observed with:

valgrind --fair-sched=yes --leak-check=full --num-callers=30 --log-file=valgrind_%p.log ./test.http -r detailed --run_test=http_client_server_test4

The issue does not appear with an ASan build. I think the difference is that valgrind greatly affects the timing which makes the race visible. As a result, it may be difficult to reproduce in different environments.

I've attached a patch with a test case (http_client_server_clean_shutdown) that attempts to more reliably drive one of the issues from this http_client_server_test4 test case. In my testing, both ASan and valgrind report the memory leak. It may be necessary to adjust some of the constants to get it to reproduce in another environment.

I believe the root issue is with the test Server framework which destructs the test resource before the base class WServer.

I've attached a patch that addresses this issue and should pass the new http_client_server_clean_shutdown test. However, there is a second issue in http/Server.C that will still result in a use-after-free on http_client_server_test4 under valgrind. That issue is not limited to the test framework, and I will open a separate issue report for it.

There are additional notes on the patch files. Testing was done with wt 4.9.1.

Possibly related to issue #10507.


Files


Related issues 2 (0 open2 closed)

Related to Bug #11408: Concurrency issues that thread sanitizer foundClosedMatthias Van Ceulebroeck03/07/2023

Actions
Precedes Bug #11302: Race when closing HTTP listening sockets can result in use-after-freeClosedMatthias Van Ceulebroeck02/02/2023

Actions
Actions #1

Updated by Roel Standaert almost 2 years ago

  • Target version set to 4.9.2
Actions #2

Updated by Roel Standaert over 1 year ago

I think we simply need to call stop() in the destructor of Server.

Actions #3

Updated by Bruce Toll over 1 year ago

Thanks for following-up. I didn't re-test, but I agree that calling stop() seems like a better alternative for test.http -- it is less convoluted and reflects the way that WServer is normally used.

There's a note in the documentation for WServer::~WServer() about the importance of calling stop() explicitly in order to catch server exceptions. Perhaps, a note could be added to the WServer documentation about the need to call stop() before destructing any static resources with entry points that have been added to the server?

As a possible enhancement, perhaps a vector of observing_ptrs could be used in the server to track static resources that have been added via addResource. Then, server stop() could throw an exception or log an error if any of the resources have already been deleted. This is just a rough idea. I'm not sure if it is actually practical or worth the effort.

Actions #4

Updated by Roel Standaert over 1 year ago

I suppose there would still be a dangling pointer:

  1. stop() is called
  2. resource is deleted (entrypoint's resource pointer becomes dangling)
  3. server is deleted

Now, at the moment we're not actually dereferencing that pointer, but it would probably be best to either remove the entrypoint or let the resource outlive the server. So I think the destructor should be:

~Server() override
{
  if (isRunning()) {
    stop();
  }
  removeEntryPoint("/test");
}

I agree that the expectation of the resource outliving the server should be documented. It's probably best to do that in addResource.

a vector of observing_ptrs

We'd have to be very careful with thread safety then, though. I've noticed that it's far too easy to accidentally copy an observing_ptr in a non thread safe manner (see e.g. issue #7617).

Actions #5

Updated by Roel Standaert over 1 year ago

I think perhaps we should add an addResource that takes a std::shared_ptr<WResource> and deprecate the old one that takes a raw pointer.

Actions #6

Updated by Bruce Toll over 1 year ago

I really like your idea of providing an addResource that accepts a shared_ptr. It should be easy to update test.http to use the new addResource call, addressing this issue cleanly with no other changes. Outside of the test framework, I suspect that most existing code is not affected by the issue if it is structured along the lines of a Wt example or uses WRun. But, using a shared_ptr seems like a more elegant approach that precludes misuse.

Actions #7

Updated by Roel Standaert over 1 year ago

  • Related to Bug #11408: Concurrency issues that thread sanitizer found added
Actions #8

Updated by Roel Standaert over 1 year ago

  • Target version changed from 4.9.2 to 4.10.0

I'll be targeting Wt 4.10.0, and introducing the addResource that takes a std::shared_ptr and deprecates the old one.

Actions #9

Updated by Roel Standaert over 1 year ago

  • Precedes Bug #11302: Race when closing HTTP listening sockets can result in use-after-free added
Actions #10

Updated by Roel Standaert over 1 year ago

  • Status changed from New to InProgress
  • Assignee set to Roel Standaert
Actions #11

Updated by Roel Standaert over 1 year ago

  • Status changed from InProgress to Review
  • Assignee deleted (Roel Standaert)
Actions #12

Updated by Roel Standaert over 1 year ago

The fix is now in internal review. If you want to, you can also review the attached patch.

Actions #13

Updated by Bruce Toll over 1 year ago

Thanks for providing the patch! It looks good to me. It fixes the reported issue and I think the updated API is a great improvement. The documentation was clear and helpful.
One trivial comment: The existing note on removeEntryPoint could be updated to reflect the new API, e.g. "if the entry point was added through the deprecated addResource() as a raw pointer, it will not be deleted".

Actions #14

Updated by Roel Standaert over 1 year ago

This may stretch the lifetime a little too far, though. It's probably best to pass the listener to handleTcpAccept or handleSslAccept using a weak_ptr.

Actions #15

Updated by Roel Standaert over 1 year ago

Attached is a solution that uses a weak_ptr.

Actions #16

Updated by Bruce Toll over 1 year ago

NOTE: I think prior two comments may be intended for the closely related #11302.

Actions #17

Updated by Bruce Toll over 1 year ago

As a follow-up to #11301-15, I tested the weak_ptr patch 0002-WT-11302-avoid-use-after-free-with-closed-listeners.patch with valgrind. It seems that there is still a use-after-free, even though handleTcpAccept uses a weak_ptr to avoid accessing the passed TcpListener if it has already been deleted in handleStop.

I believe the problem is that asio, itself, accesses data through the TcpListener after handleStop does the socket close. This asio access can happen asynchronously after the TcpListener is deleted by the tcp_listeners_.clear() and before the subsequent call to handleTcpAccept.

One example from valgrind was an access 952 bytes into a 1024 byte block deleted from http::server::TcpConnection::~TcpConnection (TcpConnection.h:33) which I believe is an access to boost::asio::ip::tcp::socket socket_. The TcpConnection was deleted as part of the TcpListener from handleStop().

If you have trouble reproducing the issue, I can provide valgrind output, but its based on 4.9.1 with a set of patches including a modified version of my http_server_clean_close test.

Actions #18

Updated by Roel Standaert over 1 year ago

Bruce Toll wrote in #note-16:

NOTE: I think prior two comments may be intended for the closely related #11302.

Whoops, you're right.

Actions #19

Updated by Roel Standaert over 1 year ago

  • Status changed from Review to Implemented @Emweb
  • Assignee set to Roel Standaert
  • % Done changed from 0 to 100
Actions #20

Updated by Roel Standaert over 1 year ago

  • Status changed from Implemented @Emweb to Resolved
Actions #21

Updated by Matthias Van Ceulebroeck over 1 year ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF