#UCS2

2026-01-05

Ну всё, пора закапывать UTF-8

Здравствуйте, меня зовут Дмитрий Карловский и я... серийный убийца устоявшихся стандартов. Сегодня я выследил и нанёс критический урон UTF-8. И сейчас я расскажу, как я его переиграл и уничтожил новым стандартом кодирования текста — Unicode Compact Format . No, God! Please, No, NO!

habr.com/ru/articles/983042/

#utf8 #utf16 #utf32 #ucs2 #ucs4 #scsu #bocu1 #utfc #ucf #$mol

Felix Palmen 📯zirias@techhub.social
2024-01-24

@wader I had a quick look at your code now and see you're already "handling" this using wmain() and doing the conversion. So, this seems to be a bit mysterious.

Is all your output #utf8? Then adding SetConsoleOutputCP(CP_UTF8); should fix output for anyone without requiring them to use chcp first, but I don't see why specifically argv values continue to have a problem. It probably needs some experimentation 😞

Text encoding on #Windows is so borked because they jumped on #Unicode early using #UCS2 and now they need to handle everything in #UTF16 stored in 16bit wide wchar_t ... and also want to remain compatible with any older crap ... 🙈

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst