共查询到17条相似文献,搜索用时 46 毫秒
1.
本文探讨汉语文本的0阶统计模型的构造方法,提出了一个卓有成效的汉语文本压缩算法.仅仅凭借这一最初级的模型,汉语文本的编码效率已经超过LZ与Huffman编码的混合算法.由于0阶统计模型是各种高阶统计模型的基础,所以,本文对汉语以及其他大字符集文种(如日文、朝鲜文)的文本压缩研究具有重要的参考意义. 相似文献
2.
本文探讨汉语文本的0阶统计模型的构造方法,提出了一个卓有成效的汉语文本压缩算法。仅仅凭借这一最初级的模型,汉语文本的编码效率已经超过LZ与Huffman编码的混合算法。由于0阶统计模型是各种高阶统计模型的基础,所以本文对汉语以及其他大字符集文种(如日文、朝鲜文)的文本压缩研究具有重要的参考意义。 相似文献
3.
小字母表的高性能算术编码 总被引:1,自引:0,他引:1
本文基于改进的算术编码,提出了适用于小字母表的高性能算术编码算法。编码部分和模型部分都在小字母表场合作了特别设计。在编码部分,我们将改进的算术编码进一步改造成无乘法算术编码器,分析表明,冗余码长不于最新的Printz等的结果,编码效率接近百分之百。在模型部分,我们提出了自适应高阶统计模型的快速算法。实验结果表明,算法实现了对小字母表的高效率快速压缩。 相似文献
4.
5.
6.
7.
通过分析证券交易数据的传输问题,指出对其压缩的重要性。根据国家金融行业标准“证券交易业务数据交换协议(STEP)”的消息数据,改进自适应二进制算术编码,提出混合算术编码。混合算术编码利用STEP消息特征进行基于模板/域转换的二进制化,并使用独立信源模型进行算术编码时的概率估计。实验结果表明,混合算术编码与自适应二进制算术编码相比,压缩比提高34%,编解码速度加 快10%。 相似文献
8.
目前分布式算术编码研究都是基于先验概率已知的有损压缩,为了实现概率自适应的无损压缩,研究了采用结束字符和概率自适应的编码方式来实现编码,提出了无损自适应分布式算术编码。实验结果表明,该算法拥有更好的压缩效果和更低的解码复杂度,并且在实际应用中,编解码可以同时进行。由于无损自适应分布式算术编码具有编码简单、压缩效果好的优点,故将它和比特面编码结合实现超光谱图像压缩,并将仿真结果与3D-SPECK算法比较,结果表明了该方法可以使信噪比提高0.13-0.37dB。 相似文献
9.
《计算机应用与软件》2016,(12)
随着下一代生物序列测序技术的发展,大文件生物序列数据越来越常见。虽然压缩序列数据能减少数据存储空间,但是传统的数据压缩的方法很难快速完成大规模的序列压缩,因此如何缩短数据压缩时间是当前压缩技术研究的一个重要方向。采用CUDA技术实现算术编码,分析核苷酸生物序列数据特性,给出不同物种及数据库生物序列数据集中核苷酸的分布概率,提出并比较三种并行压缩方法,指出先验概率的并行压缩方法具有更好的压缩性能。实验结果表明,先验概率的并行压缩方法不仅具有较高的时间效率,而且也能保持较高的数据压缩率,能较好地解决大规模生物序列文件的高效快速压缩问题。 相似文献
10.
11.
Alistair Moffat 《Software》1999,29(7):647-659
In 1994 Peter Fenwick at the University of Auckland devised an elegant mechanism for tracking the cumulative symbol frequency counts that are required for adaptive arithmetic coding. His structure spends O(log n) time per update when processing the sth symbol in an alphabet of n symbols. In this note we propose a small but significant alteration to this mechanism, and reduce the running time to O(log (1+s)) time per update. If a probability‐sorted alphabet is maintained, so that symbol s in the alphabet is the sth most frequent, the cost of processing each symbol is then linear in the number of bits produced by the arithmetic coder. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献
12.
Hung‐Yan Gu 《Software》2005,35(11):1027-1039
In this paper, a large‐alphabet‐oriented scheme is proposed for both Chinese and English text compression. Our scheme parses Chinese text with the alphabet defined by Big‐5 code, and parses English text with some rules designed here. Thus, the alphabet used for English is not a word alphabet. After a token is parsed out from the input text, zero‐, first‐, and second‐order Markov models are used to estimate the occurrence probabilities of this token. Then, the probabilities estimated are blended and accumulated in order to perform arithmetic coding. To implement arithmetic coding under a large alphabet and probability‐blending condition, a way to partition count‐value range is studied. Our scheme has been programmed and can be executed as a software package. Then, typical Chinese and English text files are compressed to study the influences of alphabet size and prediction order. On average, our compression scheme can reduce a text file's size to 33.9% for Chinese and to 23.3% for English text. These rates are comparable with or better than those obtained by popular data compression packages. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献
13.
We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text retrieval system. Using a semi-static word-based compression model, the space needed to store the text is under 30 per cent of the original requirement. The model is used in conjunction with canonical Huffman coding and together these two paradigms provide fast decompression. Experiments with 500 Mb of newspaper articles show that in full-text retrieval environments compression not only saves space, it can also yield faster query processing - a win-win situation. 相似文献
14.
15.
Pérez-Ortiz Juan Antonio Calera-Rubio Jorge Forcada Mikel L. 《Neural Processing Letters》2001,14(2):127-140
Arithmetic coding is one of the most outstanding techniques for lossless data compression. It attains its good performance with the help of a probability model which indicates at each step the probability of occurrence of each possible input symbol given the current context. The better this model, the greater the compression ratio achieved. This work analyses the use of discrete-time recurrent neural networks and their capability for predicting the next symbol in a sequence in order to implement that model. The focus of this study is on online prediction, a task much harder than the classical offline grammatical inference with neural networks. The results obtained show that recurrent neural networks have no problem when the sequences come from the output of a finite-state machine, easily giving high compression ratios. When compressing real texts, however, the dynamics of the sequences seem to be too complex to be learned online correctly by the net. 相似文献
16.
为提高无损压缩的效果,分析了BWT的基本原理,回顾并比较了霍夫曼编码、算术编码、LZ77和LZW算法的性能,然后把BWT与多阶算术编码、LZW编码结合起来研究,结果表明,对大于BWT分组数据块的文件先经过BWT预处理,再进行压缩,压缩效果明显提高。 相似文献
17.
汉语文本压缩至今很少受到重视, 然而, 作为许多计算机应用系统的支撑技术, 其重要性毋庸置疑。本文结合汉语文本的特征对现行文本压缩技术进行评述, 指出汉语文本理论上可能获得的平均压缩比率(〉3.9) 及现行压缩算法所能达到的水平(1.6左右)。此外, 讨论了汉语文本压缩的研究方向以及几种典型的应用。 相似文献