Project

General

Profile

Actions

Bug #13362

open

WRegExpValidator causes massive delays when used with std::locale::global in UTF-8

Added by Stefan Bn about 2 months ago. Updated 1 day ago.

Status:
Feedback
Priority:
Normal
Assignee:
Target version:
-
Start date:
12/29/2024
Due date:
% Done:

0%

Estimated time:

Description

If the global C++ locale is set to a UTF-8 mode

std::locale::global(std::locale("en_EN.UTF-8"));

then WRegExpValidator causes massive delays when processing special characters (e.g. German Umlauts).

See the small example main.cpp attached that demonstrates the issue and logs the timing behavior. There is a 9.5 sec delay until the dialog is shown:

[2024-Dec-29 10:32:18.933] 10928 [/ B2KDnbdy2mMcIthk] [Debug] Time-In
[2024-Dec-29 10:32:25.252] 10928 [/ B2KDnbdy2mMcIthk] [Debug] Time-Out

[2024-Dec-29 10:32:28.393] 10928 - [info] "WebRequest: took 9461.01 ms"`

If the global locale is not set, then everything works smoothly.

This refers to current Wt Version 4.11.1 using Windows 10/64 bit and shows the same behavior in different web browsers.


Files

main.cpp (1.74 KB) main.cpp Stefan Bn, 12/29/2024 10:36 AM
main2.cpp (2.4 KB) main2.cpp Stefan Bn, 01/31/2025 02:06 PM
debug.log (4.05 KB) debug.log Stefan Bn, 01/31/2025 02:21 PM
Actions #1

Updated by Matthias Van Ceulebroeck about 2 months ago

  • Status changed from New to Feedback
  • Assignee set to Stefan Bn

Hey Stefan,

thank you for the report, and the nice test case as well! I wish you a very happy new year as well (which is why I'm only replying now!)

Maybe this is because the locale doesn't exist? Although that should then quickly revert to the default locale on your environment. On Linux this will result in a fatal error, because the locale doesn't exist, and result in the session terminating early.
This will likely be platform dependent, and Windows may handle this differently, reverting to your locale. I will test this out tomorrow on Windows.

Can you tell me what your default locale is, and how Windows responds to the above call, std::locale::global(std::locale("en_EN.UTF-8"));? i.e. what the current value of std::locale().name() is?

Thank you!
Matthias

Actions #2

Updated by Stefan Bn about 2 months ago

Hi Matthias,

thanks for looking into this and happy new your to you too!

Before setting the locale the value of std::locale().name() is "C".

After setting the locale using the call above, the value of std::locale().name() is "en_EN.UTF-8". (There is a typo in my code sample, the correct name should be ' en_us.utf8 ').

On Windows this is an acceptable value as stated in the documentation here:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-160#utf-8-support

Also I can see the std-libraries are working as expected, e.g. std::ofstream is writing UTF-8 compatible filenames (which is the main reason I need this setting).

Today I found another big issue in Wt coming from this UTF-8 locale setting. A call of WPopupMenu::addItem with UTF-8 characters in it leads to an immediate hard crash of the application at a code position like this:

WPopupMenu::addItem("Item with Chinese letters 中文 (Zhōngwén); 汉语, 漢語")

If I don't use the UTF-8 locale (or don't use international special characters), the popup menu is shown as expected.

Could you please verify this as well?

Best,
Stefan

Updated by Stefan Bn 23 days ago

This is a follow-up. This issue remains with current version Wt 4.11.2 using Windows.

Also the WPopupMenu::addItem crash with UTF-8 international characters is a real error that can be reproduced.

I've modified the code sample that now shows both issues: Long delays with WRegExpValidator and the crash in WPopupMenu::addItem when the UTF-8 locale is set.

For the crash in WPopupMenu I've attachted my post-mortem backtrace debug log. The error is here:

#8 0x00007fff138531bd in std::__1::__tree<boost::iterator_range<std::__1::__wrap_iter<char const*> >, std::__1::less<boost::iterator_range<std::__1::__wrap_iter<char const*> > >, std::__1::allocator<boost::iterator_range<std::__1::__wrap_iter<char const*> > > >::~__tree (this=0x5de15fcb88) at D:/Dev/Tools/msys64/clang64/include/c++/v1/__tree:1532
#9 std::__1::set<boost::iterator_range<std::__1::__wrap_iter<char const*> >, std::__1::less<boost::iterator_range<std::__1::__wrap_iter<char const*> > >, std::__1::allocator<boost::iterator_range<std::__1::__wrap_iter<char const*> > > >::~set[abi:ne190107]() (this=0x5de15fcb88) at D:/Dev/Tools/msys64/clang64/include/c++/v1/set:699
#10 Wt::WWebWidget::hasStyleClass (this=<optimized out>, styleClass=...) at D:/Dev/BuildLibs/wt/wt-4.11.2/src/Wt/WWebWidget.C:832
#11 0x00007fff13852bed in Wt::WWebWidget::removeStyleClass (this=0x1f047e30d20, styleClass=..., force=true) at D:/Dev/BuildLibs/wt/wt-4.11.2/src/Wt/WWebWidget.C:789

Actions #4

Updated by Stefan Bn 1 day ago

Update: These errors exist also in current version Wt 4.11.3.

I did a little more research: Besides Wt, the base implementation of std::regex seems to have a poor performance in gernal when it comes to processing UTF-8 characters.

Actions

Also available in: Atom PDF