Improvements #13878
closedImprovements #13877: Be less permissive to bots
Don't allow session-related requests to come from bots
0%
Description
When a bot requests a page, they will (likely) be given a HTML-only page. Wt is quite event-driven, and will try to remain so.
For a normal session (even without JS), it will attach wtd
and signal
to ensure that the server side can remain consistent with the client side. The session remains alive on the server, and by means of the wtd
it can match incoming requests to sessions. signal
s will be used to track certain other events (like navigation).
For sessions detected as bots this is all irrelevant. They have no persistent session, as their session gets killed immediately after the response is served to them.
This means that any request that carries a wtd
or signal
comes from a page that has been served to a bot, and which it will then try to crawl. The session context is however useless.
We should ensure two things here (when handling requests):
- a
signal
should NOT be allowed through. Asignal
will mean that a client remains on the same page, and interacted in some way. This is purely functionality that requires the session to be alive. A bot should never touch this. When such a request comes in, we can aggressively block this, and serve an error code in the 400 range. - a
wtd
is LIKELY a navigation request (unless its a resource request). This can be served, but thewtd
parameter can be omitted (and no matching on the session should be attempted)
Note : request=resource
will be handled in a different ticket.
Note : the output leading to the above behavior will also be handled in a different ticket.
Updated by Romain Mardulyn 4 days ago
- Status changed from New to InProgress
- Assignee set to Romain Mardulyn
Updated by Matthias Van Ceulebroeck 4 days ago
Like the #13879, this has already been resolved by improving the widgetgallery bot detection.
However, this does NOT correctly render out href
for WMenu
items.
Updated by Matthias Van Ceulebroeck 4 days ago
- Status changed from InProgress to Rejected
- Assignee deleted (
Romain Mardulyn) - Target version deleted (
4.12.1)
However, this does NOT correctly render out href for WMenu items.
This was a wrong assumption, internal path handling was NOT enabled for the menu, which then functions as it ought to.