首页 | 本学科首页   官方微博 | 高级检索  
     

关于软件对字符编码方式误判的研究
引用本文:王玉,;张永胜. 关于软件对字符编码方式误判的研究[J]. 软件工程师, 2014, 0(9): 22-24
作者姓名:王玉,  张永胜
作者单位:[1]山东师范大学信息科学与工程学院,山东济南250358; [2]山东省分布式计算机软件新技术重点实验室,山东济南250014
摘    要:针对目前字符编码方式众多的现状,应用软件如何更好的判断文件编码属于何种字符集,并将其正确的解码成为不容忽视的问题。针对Windows记事本不能正常显示"联通"二字的Bug进行分析,利用Winhex软件解析文件获得16进制编码,根据得到的编码分析误判原因,通过注释记事本IsTextUTF8函数对分析得到的误判原因进行证实,进一步找到了更多Windows记事本无法正常显示的汉字。

关 键 词:编码方式  字符集  UTF-8  记事本  误判

Research on Character Encoding Software Misjudgment
Affiliation:WANG Yu,ZHANG Yongsheng ( 1.School of Information Science and Engineering,Shandong Normal UniversiCy, Jinan 250358,China 2.Shandong Provincial Key Laboratory for Distlibuted Computer Software Novel Teehnology,Jinan 250014,China)
Abstract:According to the present situation of various character encoded modes,it has become a problem,which can't be ignored, that how softwares judge which character set file the target file belongs to.In this paper, aiming at the fact that Notepad can't display the "Unicom" correctly, the Windows Bug,using software Winhex analysis the file to obtain 16 binary codes,according to the codes to guess the cause of misjudgment,exegesis function IsTextUTF8 of Notepad to prove that,finally it properly showed the root cause of that "Unicom" can't display correctly, and found more Chinese characters which Windows Notepad can't display correctly.
Keywords:encoded mode  encoded character set  UTF-8  notepad  miscalculation
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号