Project

General

Profile

Actions

Improvements #14205

closed

Allow all unicode to be parsed

Added by Matthias Van Ceulebroeck 3 months ago. Updated 18 days ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Start date:
12/12/2025
Due date:
% Done:

0%

Estimated time:

Description

Currently JWT uses NanoXML (n3). This is a old library that no longer maintains support.
We should switch to a more modern library, that is still supported.

The case that performs the check on unicode is very straightforward though, the StdXMLReader.read() checks a character's range.
This range omits some non-character items. But seems to omit too many. An example is "🔥" (read as integer 55357 AND 56613, or HEX value 0xD83D AND 0xDD25). Unicode specifies this at 0x1F525, but this is being read as UTF16.
This leads the characters to fall into the range of high surrogates. Currently unused values, but valid nonetheless.

They should be included (and allowed) when parsing.

Actions #1

Updated by Matthias Van Ceulebroeck 3 months ago

  • Status changed from InProgress to Review
  • Assignee deleted (Matthias Van Ceulebroeck)
Actions #2

Updated by Romain Mardulyn about 1 month ago

  • Assignee set to Romain Mardulyn
  • Target version changed from 4.12.3 to 4.12.4
Actions #3

Updated by Romain Mardulyn 25 days ago

  • Status changed from Review to Implemented @Emweb
  • Assignee changed from Romain Mardulyn to Matthias Van Ceulebroeck
Actions #4

Updated by Romain Mardulyn 18 days ago

  • Status changed from Implemented @Emweb to Closed
Actions

Also available in: Atom PDF