Project

General

Profile

Actions

Improvements #14205

open

Allow all unicode to be parsed

Added by Matthias Van Ceulebroeck about 23 hours ago. Updated about 22 hours ago.

Status:
Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
12/12/2025
Due date:
% Done:

0%

Estimated time:

Description

Currently JWT uses NanoXML (n3). This is a old library that no longer maintains support.
We should switch to a more modern library, that is still supported.

The case that performs the check on unicode is very straightforward though, the StdXMLReader.read() checks a character's range.
This range omits some non-character items. But seems to omit too many. An example is "🔥" (read as integer 55357 AND 56613, or HEX value 0xD83D AND 0xDD25). Unicode specifies this at 0x1F525, but this is being read as UTF16.
This leads the characters to fall into the range of high surrogates. Currently unused values, but valid nonetheless.

They should be included (and allowed) when parsing.

Actions #1

Updated by Matthias Van Ceulebroeck about 22 hours ago

  • Status changed from InProgress to Review
  • Assignee deleted (Matthias Van Ceulebroeck)
Actions

Also available in: Atom PDF