Project

General

Profile

Actions

Bug #13362

open

WRegExpValidator causes massive delays when used with std::locale::global in UTF-8

Added by Stefan Bn 4 months ago. Updated 28 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Target version:
-
Start date:
12/29/2024
Due date:
% Done:

0%

Estimated time:

Description

If the global C++ locale is set to a UTF-8 mode

std::locale::global(std::locale("en_EN.UTF-8"));

then WRegExpValidator causes massive delays when processing special characters (e.g. German Umlauts).

See the small example main.cpp attached that demonstrates the issue and logs the timing behavior. There is a 9.5 sec delay until the dialog is shown:

[2024-Dec-29 10:32:18.933] 10928 [/ B2KDnbdy2mMcIthk] [Debug] Time-In
[2024-Dec-29 10:32:25.252] 10928 [/ B2KDnbdy2mMcIthk] [Debug] Time-Out

[2024-Dec-29 10:32:28.393] 10928 - [info] "WebRequest: took 9461.01 ms"`

If the global locale is not set, then everything works smoothly.

This refers to current Wt Version 4.11.1 using Windows 10/64 bit and shows the same behavior in different web browsers.


Files

main.cpp (1.74 KB) main.cpp Stefan Bn, 12/29/2024 10:36 AM
main2.cpp (2.4 KB) main2.cpp Stefan Bn, 01/31/2025 02:06 PM
debug.log (4.05 KB) debug.log Stefan Bn, 01/31/2025 02:21 PM
clipboard-202504071609-xhgxd.png (118 KB) clipboard-202504071609-xhgxd.png Stefan Bn, 04/07/2025 04:09 PM
Actions #1

Updated by Matthias Van Ceulebroeck 4 months ago

  • Status changed from New to Feedback
  • Assignee set to Stefan Bn

Hey Stefan,

thank you for the report, and the nice test case as well! I wish you a very happy new year as well (which is why I'm only replying now!)

Maybe this is because the locale doesn't exist? Although that should then quickly revert to the default locale on your environment. On Linux this will result in a fatal error, because the locale doesn't exist, and result in the session terminating early.
This will likely be platform dependent, and Windows may handle this differently, reverting to your locale. I will test this out tomorrow on Windows.

Can you tell me what your default locale is, and how Windows responds to the above call, std::locale::global(std::locale("en_EN.UTF-8"));? i.e. what the current value of std::locale().name() is?

Thank you!
Matthias

Actions #2

Updated by Stefan Bn 4 months ago

Hi Matthias,

thanks for looking into this and happy new your to you too!

Before setting the locale the value of std::locale().name() is "C".

After setting the locale using the call above, the value of std::locale().name() is "en_EN.UTF-8". (There is a typo in my code sample, the correct name should be ' en_us.utf8 ').

On Windows this is an acceptable value as stated in the documentation here:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-160#utf-8-support

Also I can see the std-libraries are working as expected, e.g. std::ofstream is writing UTF-8 compatible filenames (which is the main reason I need this setting).

Today I found another big issue in Wt coming from this UTF-8 locale setting. A call of WPopupMenu::addItem with UTF-8 characters in it leads to an immediate hard crash of the application at a code position like this:

WPopupMenu::addItem("Item with Chinese letters 中文 (Zhōngwén); 汉语, 漢語")

If I don't use the UTF-8 locale (or don't use international special characters), the popup menu is shown as expected.

Could you please verify this as well?

Best,
Stefan

Updated by Stefan Bn 3 months ago

This is a follow-up. This issue remains with current version Wt 4.11.2 using Windows.

Also the WPopupMenu::addItem crash with UTF-8 international characters is a real error that can be reproduced.

I've modified the code sample that now shows both issues: Long delays with WRegExpValidator and the crash in WPopupMenu::addItem when the UTF-8 locale is set.

For the crash in WPopupMenu I've attachted my post-mortem backtrace debug log. The error is here:

#8 0x00007fff138531bd in std::__1::__tree<boost::iterator_range<std::__1::__wrap_iter<char const*> >, std::__1::less<boost::iterator_range<std::__1::__wrap_iter<char const*> > >, std::__1::allocator<boost::iterator_range<std::__1::__wrap_iter<char const*> > > >::~__tree (this=0x5de15fcb88) at D:/Dev/Tools/msys64/clang64/include/c++/v1/__tree:1532
#9 std::__1::set<boost::iterator_range<std::__1::__wrap_iter<char const*> >, std::__1::less<boost::iterator_range<std::__1::__wrap_iter<char const*> > >, std::__1::allocator<boost::iterator_range<std::__1::__wrap_iter<char const*> > > >::~set[abi:ne190107]() (this=0x5de15fcb88) at D:/Dev/Tools/msys64/clang64/include/c++/v1/set:699
#10 Wt::WWebWidget::hasStyleClass (this=<optimized out>, styleClass=...) at D:/Dev/BuildLibs/wt/wt-4.11.2/src/Wt/WWebWidget.C:832
#11 0x00007fff13852bed in Wt::WWebWidget::removeStyleClass (this=0x1f047e30d20, styleClass=..., force=true) at D:/Dev/BuildLibs/wt/wt-4.11.2/src/Wt/WWebWidget.C:789

Actions #4

Updated by Stefan Bn 3 months ago

Update: These errors exist also in current version Wt 4.11.3.

I did a little more research: Besides Wt, the base implementation of std::regex seems to have a poor performance in gernal when it comes to processing UTF-8 characters.

Actions #5

Updated by Stefan Bn about 1 month ago

These issues remain in version Wt 4.11.4.

The hard crash when using UTF-8 (as shown in main2.cpp attachement) is very problematic in my view.
May I kindly ask, whether these issues can be confirmed?

Actions #6

Updated by Matthias Van Ceulebroeck about 1 month ago

Hello Stefan,

I apologize leaving this issue for a while. On Linux this does not result in any performance degradation, nor a crash for WPopupMenu::addItem(). The time difference is roughly a couple of milliseconds.
I am currently building on a Windows machine, to investigate both issues. I suspect this may have to do with STL regex on MSVC, and a missing UTF8 conversion, specifically. This is both to be verified.

the base implementation of std::regex seems to have a poor performance in gernal when it comes to processing UTF-8 characters.

Yes, it's a fairly annoying, known fact about std::regex. The committee essentially build std::regex from scratch, instead of rely on a lot of pre-existing material. And now they don't want to break ABI, and cannot improve it much without doing so.


Regardless, I have a Windows 11 build running successfully. I do not see a delay, not a crash.
I did test on Windows 11, not 10. And I am using the 64 bit MSVC 14.3 version.

Can I ask which version you are using, and what SDK/Build tools versions you are using to run it?

Actions #7

Updated by Stefan Bn about 1 month ago

Thank you very much Matthias, for taking time and looking into this! Very interesting that you don't see neither of these two issues.

I can reproduce it on Windows 10 as well as on Windows 11 and use the following tools to compile Wt and my projects:

CMake > Ninja > Clang 20.1

The build tool binaries, C/C++ libraries, Boost etc. are coming from the MSYS2 environment:
https://www.msys2.org/

I will investigate and debug a bit deeper, to see if I can find out more specific details.

Actions #8

Updated by Stefan Bn about 1 month ago

Matthias: Could you please remove the "File clipboard" attachment from my previous post? It doesn't belong there.
Unfortunately the platform here doesn't allow to edit previous posts :-(

Actions #9

Updated by Stefan Bn 28 days ago

I did a lot of testing, debugging and getting help from AI assistance ;-)
There seem to be problems in the way UTF-8 characters are handled and I narrowed it down to problems or life-time issues with:

std::set<boost::iterator_range<char const*>>
boost::split(..., styleClass.toUTF8(), boost::is_any_of(" "))

that is used in

WWebWidget.C:832
Wt::WWebWidget::removeStyleClass

Since there are also the massive std::regex delays, I will no longer use std::locale::global in my code. Without this all the problems go away :-)
As a workaround I will just change to locale where needed within the particular methods, e.g. where filesystem operations are used.

You may close/reject this ticket or keep it open if you feel challenged towards a solution ;-)

Thanks!
Stefan

Actions

Also available in: Atom PDF