Improvements #14205: Allow all unicode to be parsed - JWt - Redmine

Actions

Copy link

Improvements #14205

open

Allow all unicode to be parsed

Added by Matthias Van Ceulebroeck about 2 months ago. Updated about 2 months ago.

Status:

Review

Priority:

Normal

Assignee:

Category:

Target version:

4.12.3

Start date:

12/12/2025

Due date:

% Done:

Estimated time:

Description

Currently JWT uses NanoXML (n3). This is a old library that no longer maintains support.
We should switch to a more modern library, that is still supported.

The case that performs the check on unicode is very straightforward though, the StdXMLReader.read() checks a character's range.
This range omits some non-character items. But seems to omit too many. An example is "🔥" (read as integer 55357 AND 56613, or HEX value 0xD83D AND 0xDD25). Unicode specifies this at 0x1F525, but this is being read as UTF16.
This leads the characters to fall into the range of high surrogates. Currently unused values, but valid nonetheless.

They should be included (and allowed) when parsing.

Actions

Copy link

Updated by Matthias Van Ceulebroeck about 2 months ago

Status changed from InProgress to Review
Assignee deleted (~~Matthias Van Ceulebroeck~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

JWt