首页 | 官方网站   微博 | 高级检索  
     

基于PDC编码的中文文本压缩算法
引用本文:曾党泉.基于PDC编码的中文文本压缩算法[J].计算机工程与应用,2015,51(17):205-209.
作者姓名:曾党泉
作者单位:厦门大学嘉庚学院 信息科学与技术学院,福建 漳州 363105
摘    要:针对中文文本结构的特点以及传统压缩算法对中文文本压缩的不足,提出并实现了一个基于PDC编码的中文文本压缩算法。该算法采用的是字典压缩方式。根据单个汉字在中文文本出现的概率,采用Huffman编码方式进行前缀变长编码;定义由某个汉字为前缀的词组和短语的深度;对具有相同前缀和相同深度的词组和短语进行局部的定长编码,构成一部压缩编码字典。通过对相同文本分别使用该算法和传统的LZW和LZSS编码算法压缩后得到的数据结果对比,压缩率有2.53%~40.48%的提高,表明该压缩算法有较好的压缩效果。

关 键 词:中文文本  压缩算法  前缀  深度  编码  压缩率  

Chinese text compression algorithm based on PDC coding
ZENG Dangquan.Chinese text compression algorithm based on PDC coding[J].Computer Engineering and Applications,2015,51(17):205-209.
Authors:ZENG Dangquan
Affiliation:School of Information Science & Technology, Xiamen University Tan Kah Kee College, Zhangzhou, Fujian 363105, China
Abstract:According to the characteristics of Chinese text structures and the disadvantages of traditional compression algorithm for Chinese text compression, it proposes and implements a Chinese text compression algorithm based on PDC coding. The algorithm uses dictionary compression. According to the words’ probability that appears in the Chinese text, the prefix encoded variable-length coding uses Huffman coding, it defines the depth of the phrases and short sentences that prefixed by the word, the algorithm encodes partial fixed-length coding for the phrases and short sentences which have the same prefix and depth, it constructs a compression dictionary. By comparing with the tradition compression algorithm LZW and LZSS that in the same texts, the compression algorithm’s compression ratio increases 2.53%~40.48%, which means the compression algorithm has a better compression effect than the traditional compression algorithm.
Keywords:Chinese text  compression algorithm  prefix  depth  coding  compression ratio  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号