首页 | 本学科首页   官方微博 | 高级检索  
     


Compression techniques for Chinese text
Authors:Phil Vines  Justin Zobel
Abstract:With the growth of digital libraries and the internet, large volumes of text are available in electronic form. The majority of this text is English but other languages are increasingly well represented, including large-alphabet languages such as Chinese. It is thus attractive to compress text written in the large alphabet languages, but the general-purpose compression utilities are not particularly effective for this application. In this paper we survey proposals for compressing Chinese text, then examine in detail the application to Chinese text of the partial predictive matching compression technique (PPM). We propose several refinements to PPM to make it more effective for Chinese text, and, on our publicly-available test corpus of around 50 Mb of Chinese text documents, show that these refinements can significantly improve compression performance while using only a limited volume of memory. © 1998 John Wiley & Sons, Ltd.
Keywords:compression  Chinese text  PPM (partial-predictive matching)  text compression
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号