Jump to content

using a lot of chinese characters in gui will cause engine very unstable


photo

Recommended Posts

I have a scene which use a lot of chinese characters in gui, that causes engine very unstable, pop up a dialog will cause engine crash.

 

When I debugging my SQLPlugin, I found there is D3D10Info in output window of Visual Studio

 

D3D10: INFO: ID3D10Device::CreateTexture2D: Note that the resource allocation (268435456 bytes plus overhead) would use more than 128 MB of application usable memory. This is fine; D3D10 allows attempts to make allocations above 128 MB in the event that they may happen to succeed, however this usage is subject to hardware specific failure. D3D10 only guarantees that allocations within 128 MB are supported by all D3D10 hardware. Here, failure may only happen if the system runs out of resources. Allocations above 128 MB may fail for a couple of reasons, not only because the system is overextended, but also if the particular hardware being used does not support it. There is intentionally no supported way to report individual hardware limits on allocation sizes above 128 MB.

 

 

this info shows at different time when engine running that scene, after this shows up, engine becomes very unstable. and I also traced this happens from int D3D10FontTTFGlyphs::create_texture()

in D3D10FontTTF.cpp.

 

This is my first scene which uses so many chinese characters in ui, so I tried with lastest 3 version of sdk, all have this problem.

Link to comment

seems I've found the problem, the real problem is that int FontTTFGlyphs::utf8_to_unicode(const char *str,wchar_t &code) in FontTTF.cpp sometimes calculated to wrong ucs-2 characters when using chinese characters, I tried to modify the source to:

int FontTTFGlyphs::utf8_to_unicode(const char *str,wchar_t &code) const {

   MultiByteToWideChar(CP_UTF8, 0, str, strlen(str), &code, 1);
   return WideCharToMultiByte(CP_UTF8, 0, &code, 1, NULL, 0, NULL, NULL);
}

 

then there is no missing characters, untill now the engine runs more stable than ever, but I don't sure the crash was caused by this, but one thing that I can sure is there is definitly no missing characters in ui(in Chinese of course).

 

Under windows and linux there is much more native functions to do the code page convert, why unigine using it's own method to convert utf8 to unicode?

 

If this is real problem, then this bug might also happened to those using Japenses and Korean characters.

Link to comment

I don't know which string was converted wrong, the character's missing in ui is randomly, at first, all characters rendered fine, then sunddenly, some characters are missing, then when ui need to create new dialog or window, then engine crashed.

 

well, I've tested the utf8functions with these codes

File f = new File("utf8.txt", "r");
   string utf8_content = f.gets();
   f.close();

   f.open("unicode.txt", "w");
   int uni[0];
   utf8ToUnicode(utf8_content, uni);
   for(int i=0; i<utf8strlen(utf8_content); i++)
       f.writeShort(uni[i]);

   f.close();

 

then result are same as I expected, result ucs-2 text are wrong. some characters are right but others are wrong. you can test with attachment text file is utf8 without bom, chinese characters and english characters mixed.

utf8.txt

Link to comment

strange, I wrote a little Win32 program to convert these text, still has wrong characters, but different from unigine.

 

but if I use iconv -f utf8 -t ucs2 utf8.txt > unicode.txt under linux, all characters are converted right.

Link to comment

well I've studied the UTF8 and Unicode for chinese, It contains many situation, in Win32 api, MultiByteToWideChar can not handle Chinese punctuation well, but in current implementation method of converting utf8 to unicode in ungine, there are still some bugs, some characters can not be correctly converted, if the string contains no punctuation, MultiByteToWideChar can convert it perfectly.

 

all those unicode library like icu, iconv uses a mapping table to map those special punctuation in Chinese, quote from MSDN:

Windows applications normally use UTF-16 to represent Unicode character data. The use of 16 bits allows direct representation of 65,536 unique characters, but this Basic Multilingual Plane (BMP) is not nearly enough to cover all the symbols used in human languages. Unicode version 4.1 includes over 97,000 characters, with over 70,000 characters for Chinese alone.

 

So I think the only way to perfectly convert all utf8 to unicode for cjk is to use icu/iconv, or use utf8 with freetype(freetype already support utf8 characters rendering in version 2.4).

 

BTW, Unity3D uses iconv for utf8 and unicode convert.

Link to comment

I think I've found the real problem, in unigine's TTF rendering code, unigine use wchar_t, under windows and linux( don't know about mac) wchar_t is 16 bit long, this means unigine actually use ucs-2 for glyph represent a character in unicode, but, in fact, ucs-2 doesn't include all unicode characters, wchar_t can only represent 65536 characters, but only Chinese characters are over 70000, so using ucs-2 to represent all characters in unicode is not possible.

 

In FreeType, it us unsigned long, which means utf32 to represent a character in ttf files. check this http://www.freetype....ml#FT_Load_Char , this function is used by unigine to get glyph, it use unsigned long to represent a character code, so it will absolutly include all characters in unicode. now things are become more easy. Unigine need to change wchar_t to uint32 or, use FreeType's FT_ULong to represent characters and load it from ttf, that should solve all the cjk characters problem.

 

From utf8 to utf32 is a easy convert, no need to use iconv/icu.

 

I've also checked CEGui, it also uses uint32 to represent a character.

 

and here is the code sample of converting utf8/utf16/utf32: http://www.koders.com/c/fid26678D7D6076ED6880A07A83497F6BD8D33FDA7A.aspx

Link to comment
×
×
  • Create New...