首页 | 本学科首页   官方微博 | 高级检索  
     


Chicon—a Chinese text manipulation language
Authors:Kam-Fai Wong  Vincent Y Lum  Wai-Ip Lam
Abstract:Text processing is an important computer application. Due to its importance, a number of text manipulation programming languages have been devised (e.g. Icon). These programming languages are very useful for applications such as natural language processing, text analysis, text editing, document formatting, text generation, etc. However, they were mainly designed to handle English texts, and are ineffective for Chinese. This is because English and Chinese texts are represented very differently in a computer. An English character is mainly represented in 7-bit ASCII, and its Chinese counterpart commonly in 16-bit GB or BIG-5. This difference makes direct application of English-based text manipulation programming languages to Chinese erroneous, e.g. application of Icon to reverse a string of Chinese characters. In this paper, a new dialect of Icon, referred to as Chicon (i.e. Chinese Icon), is proposed. In the design of Chicon, new data types were introduced to differentiate pure English and English/Chinese mixed texts. In addition, existing Icon text manipulation functions were modified to account for Chinese texts. Experiments have shown that Chicon not only could overcome the problems of Chinese processing in Icon, but its execution speed was actually superior to Icon in handling Chinese. Furthermore, application of Chicon to a real sized problem, namely word segmentation, has proved that the language is practical. © 1998 John Wiley & Sons, Ltd.
Keywords:icon computer language  compiler  Chinese information processing
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号