steve3d Posted October 10, 2011 Share Posted October 10, 2011 I have a scene which use a lot of chinese characters in gui, that causes engine very unstable, pop up a dialog will cause engine crash. When I debugging my SQLPlugin, I found there is D3D10Info in output window of Visual Studio D3D10: INFO: ID3D10Device::CreateTexture2D: Note that the resource allocation (268435456 bytes plus overhead) would use more than 128 MB of application usable memory. This is fine; D3D10 allows attempts to make allocations above 128 MB in the event that they may happen to succeed, however this usage is subject to hardware specific failure. D3D10 only guarantees that allocations within 128 MB are supported by all D3D10 hardware. Here, failure may only happen if the system runs out of resources. Allocations above 128 MB may fail for a couple of reasons, not only because the system is overextended, but also if the particular hardware being used does not support it. There is intentionally no supported way to report individual hardware limits on allocation sizes above 128 MB. this info shows at different time when engine running that scene, after this shows up, engine becomes very unstable. and I also traced this happens from int D3D10FontTTFGlyphs::create_texture() in D3D10FontTTF.cpp. This is my first scene which uses so many chinese characters in ui, so I tried with lastest 3 version of sdk, all have this problem. Link to comment
steve3d Posted October 10, 2011 Author Share Posted October 10, 2011 seems I've found the problem, the real problem is that int FontTTFGlyphs::utf8_to_unicode(const char *str,wchar_t &code) in FontTTF.cpp sometimes calculated to wrong ucs-2 characters when using chinese characters, I tried to modify the source to: int FontTTFGlyphs::utf8_to_unicode(const char *str,wchar_t &code) const { MultiByteToWideChar(CP_UTF8, 0, str, strlen(str), &code, 1); return WideCharToMultiByte(CP_UTF8, 0, &code, 1, NULL, 0, NULL, NULL); } then there is no missing characters, untill now the engine runs more stable than ever, but I don't sure the crash was caused by this, but one thing that I can sure is there is definitly no missing characters in ui(in Chinese of course). Under windows and linux there is much more native functions to do the code page convert, why unigine using it's own method to convert utf8 to unicode? If this is real problem, then this bug might also happened to those using Japenses and Korean characters. Link to comment
frustum Posted October 10, 2011 Share Posted October 10, 2011 Could you send utf8 text samples which causes wrong conversion. Link to comment
steve3d Posted October 10, 2011 Author Share Posted October 10, 2011 I don't know which string was converted wrong, the character's missing in ui is randomly, at first, all characters rendered fine, then sunddenly, some characters are missing, then when ui need to create new dialog or window, then engine crashed. well, I've tested the utf8functions with these codes File f = new File("utf8.txt", "r"); string utf8_content = f.gets(); f.close(); f.open("unicode.txt", "w"); int uni[0]; utf8ToUnicode(utf8_content, uni); for(int i=0; i<utf8strlen(utf8_content); i++) f.writeShort(uni[i]); f.close(); then result are same as I expected, result ucs-2 text are wrong. some characters are right but others are wrong. you can test with attachment text file is utf8 without bom, chinese characters and english characters mixed. utf8.txt Link to comment
steve3d Posted October 10, 2011 Author Share Posted October 10, 2011 strange, I wrote a little Win32 program to convert these text, still has wrong characters, but different from unigine. but if I use iconv -f utf8 -t ucs2 utf8.txt > unicode.txt under linux, all characters are converted right. Link to comment
steve3d Posted October 11, 2011 Author Share Posted October 11, 2011 well I've studied the UTF8 and Unicode for chinese, It contains many situation, in Win32 api, MultiByteToWideChar can not handle Chinese punctuation well, but in current implementation method of converting utf8 to unicode in ungine, there are still some bugs, some characters can not be correctly converted, if the string contains no punctuation, MultiByteToWideChar can convert it perfectly. all those unicode library like icu, iconv uses a mapping table to map those special punctuation in Chinese, quote from MSDN: Windows applications normally use UTF-16 to represent Unicode character data. The use of 16 bits allows direct representation of 65,536 unique characters, but this Basic Multilingual Plane (BMP) is not nearly enough to cover all the symbols used in human languages. Unicode version 4.1 includes over 97,000 characters, with over 70,000 characters for Chinese alone. So I think the only way to perfectly convert all utf8 to unicode for cjk is to use icu/iconv, or use utf8 with freetype(freetype already support utf8 characters rendering in version 2.4). BTW, Unity3D uses iconv for utf8 and unicode convert. Link to comment
steve3d Posted October 11, 2011 Author Share Posted October 11, 2011 I think I've found the real problem, in unigine's TTF rendering code, unigine use wchar_t, under windows and linux( don't know about mac) wchar_t is 16 bit long, this means unigine actually use ucs-2 for glyph represent a character in unicode, but, in fact, ucs-2 doesn't include all unicode characters, wchar_t can only represent 65536 characters, but only Chinese characters are over 70000, so using ucs-2 to represent all characters in unicode is not possible. In FreeType, it us unsigned long, which means utf32 to represent a character in ttf files. check this http://www.freetype....ml#FT_Load_Char , this function is used by unigine to get glyph, it use unsigned long to represent a character code, so it will absolutly include all characters in unicode. now things are become more easy. Unigine need to change wchar_t to uint32 or, use FreeType's FT_ULong to represent characters and load it from ttf, that should solve all the cjk characters problem. From utf8 to utf32 is a easy convert, no need to use iconv/icu. I've also checked CEGui, it also uses uint32 to represent a character. and here is the code sample of converting utf8/utf16/utf32: http://www.koders.com/c/fid26678D7D6076ED6880A07A83497F6BD8D33FDA7A.aspx Link to comment
frustum Posted October 11, 2011 Share Posted October 11, 2011 Thanks, I will remove all wchar_t types. Link to comment
steve3d Posted October 12, 2011 Author Share Posted October 12, 2011 well, that's quick, seems it might be related to many files, I will wait for next update, and tell you the result. :D Link to comment
Recommended Posts