Project

General

Profile

Actions

Improvements #13878

closed

Improvements #13877: Be less permissive to bots

Don't allow session-related requests to come from bots

Added by Matthias Van Ceulebroeck 5 days ago. Updated 4 days ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
07/29/2025
Due date:
% Done:

0%

Estimated time:

Description

When a bot requests a page, they will (likely) be given a HTML-only page. Wt is quite event-driven, and will try to remain so.
For a normal session (even without JS), it will attach wtd and signal to ensure that the server side can remain consistent with the client side. The session remains alive on the server, and by means of the wtd it can match incoming requests to sessions. signals will be used to track certain other events (like navigation).

For sessions detected as bots this is all irrelevant. They have no persistent session, as their session gets killed immediately after the response is served to them.
This means that any request that carries a wtd or signal comes from a page that has been served to a bot, and which it will then try to crawl. The session context is however useless.

We should ensure two things here (when handling requests):

  • a signal should NOT be allowed through. A signal will mean that a client remains on the same page, and interacted in some way. This is purely functionality that requires the session to be alive. A bot should never touch this. When such a request comes in, we can aggressively block this, and serve an error code in the 400 range.
  • a wtd is LIKELY a navigation request (unless its a resource request). This can be served, but the wtd parameter can be omitted (and no matching on the session should be attempted)

Note : request=resource will be handled in a different ticket.
Note : the output leading to the above behavior will also be handled in a different ticket.

Actions #1

Updated by Romain Mardulyn 4 days ago

  • Status changed from New to InProgress
  • Assignee set to Romain Mardulyn
Actions #2

Updated by Matthias Van Ceulebroeck 4 days ago

Like the #13879, this has already been resolved by improving the widgetgallery bot detection.
However, this does NOT correctly render out href for WMenu items.

Actions #3

Updated by Matthias Van Ceulebroeck 4 days ago

  • Status changed from InProgress to Rejected
  • Assignee deleted (Romain Mardulyn)
  • Target version deleted (4.12.1)

However, this does NOT correctly render out href for WMenu items.

This was a wrong assumption, internal path handling was NOT enabled for the menu, which then functions as it ought to.

Actions

Also available in: Atom PDF