期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王忠效范植华《中文信息学报》2000,14(1):39-47

本文探讨汉语文本的0阶统计模型的构造方法,提出了一个卓有成效的汉语文本压缩算法.仅仅凭借这一最初级的模型,汉语文本的编码效率已经超过LZ与Huffman编码的混合算法.由于0阶统计模型是各种高阶统计模型的基础,所以,本文对汉语以及其他大字符集文种(如日文、朝鲜文)的文本压缩研究具有重要的参考意义. 相似文献

2.

汉语文本动态字母表0阶模型算术编码

王忠效范植华《中文信息学报》2000,14(1):39-47

本文探讨汉语文本的０阶统计模型的构造方法,提出了一个卓有成效的汉语文本压缩算法。仅仅凭借这一最初级的模型,汉语文本的编码效率已经超过ＬＺ与Ｈｕｆｆｍａｎ编码的混合算法。由于０阶统计模型是各种高阶统计模型的基础,所以本文对汉语以及其他大字符集文种（如日文、朝鲜文）的文本压缩研究具有重要的参考意义。相似文献

3.

小字母表的高性能算术编码 总被引：1，自引：0，他引：1

薛晓辉高文《计算机学报》1997,20(11):974-981

本文基于改进的算术编码，提出了适用于小字母表的高性能算术编码算法。编码部分和模型部分都在小字母表场合作了特别设计。在编码部分，我们将改进的算术编码进一步改造成无乘法算术编码器，分析表明，冗余码长不于最新的Ｐｒｉｎｔｚ等的结果，编码效率接近百分之百。在模型部分，我们提出了自适应高阶统计模型的快速算法。实验结果表明，算法实现了对小字母表的高效率快速压缩。相似文献

4.

改进的算术编码 总被引：2，自引：1，他引：2

薛晓辉高文《计算机学报》1997,20(11):966-973

算术编码是基于统计的、无损数据压缩效率最高的方法。对于算术编码的进位问题，目前广泛使用的是Ｒｉｓｓａｎｅｎ和Ｌａｎｇｄｏｎ提出的比特填充技术。本文提出进位陷阱技术，不必人为插入填充比特就可以解决进位问题，因而能够得到一个确切的数，并使解码端得到很好的简化。以进位陷阱的思想为基础，本文提出算术编码和一种简捷的终止技术，称为中值终止技术，并重新构造了算术编码和解码算法。本文讨论了算术编码和分析性质，得相似文献

5.

算术编码在图像信号压缩中的应用

邓关宝杨士元汪锐《计算机工程》2006,32(6):234-236

阐述了算术编码的有限精度描述及其推导过程。并对几种不同的算术编码模型进行了测试分析，提出了一种采用算术编码对特定格式的图形文件进行压缩的算法，并论证了该算法实现的优势。相似文献

6.

多阶自适应算术编码研究

李彬倪桂强罗健欣《微型机与应用》2010,29(12)

符号在某符号序列后出现的概率往往高于其在整个信息中的概率,由此可以降低冗余度,获得高效的压缩率.对不同类型文件的试验结果表明,多阶自适应算术编码能显著提高压缩效果,特别是对于上下文相关性强的文件,其压缩效果要好于LZW和WinRAR. 相似文献

7.

证券交易数据压缩的混合算术编码

蔡祺杨小虎《计算机工程》2010,36(18):267-269

通过分析证券交易数据的传输问题,指出对其压缩的重要性。根据国家金融行业标准“证券交易业务数据交换协议(STEP)”的消息数据,改进自适应二进制算术编码,提出混合算术编码。混合算术编码利用STEP消息特征进行基于模板/域转换的二进制化,并使用独立信源模型进行算术编码时的概率估计。实验结果表明,混合算术编码与自适应二进制算术编码相比,压缩比提高34%,编解码速度加快10%。相似文献

8.

无损自适应分布式算术编码的研究及应用

王敏超王敏莉李秋生张诚鎏《计算机工程与设计》2011,32(10):3470-3476

目前分布式算术编码研究都是基于先验概率已知的有损压缩,为了实现概率自适应的无损压缩,研究了采用结束字符和概率自适应的编码方式来实现编码,提出了无损自适应分布式算术编码。实验结果表明,该算法拥有更好的压缩效果和更低的解码复杂度,并且在实际应用中,编解码可以同时进行。由于无损自适应分布式算术编码具有编码简单、压缩效果好的优点,故将它和比特面编码结合实现超光谱图像压缩,并将仿真结果与3D-SPECK算法比较,结果表明了该方法可以使信噪比提高0.13-0.37dB。相似文献

9.

基于CUDA的生物序列数据算术编码并行压缩

《计算机应用与软件》2016,(12)

随着下一代生物序列测序技术的发展,大文件生物序列数据越来越常见。虽然压缩序列数据能减少数据存储空间,但是传统的数据压缩的方法很难快速完成大规模的序列压缩,因此如何缩短数据压缩时间是当前压缩技术研究的一个重要方向。采用CUDA技术实现算术编码,分析核苷酸生物序列数据特性,给出不同物种及数据库生物序列数据集中核苷酸的分布概率,提出并比较三种并行压缩方法,指出先验概率的并行压缩方法具有更好的压缩性能。实验结果表明,先验概率的并行压缩方法不仅具有较高的时间效率,而且也能保持较高的数据压缩率,能较好地解决大规模生物序列文件的高效快速压缩问题。相似文献

10.

自适应算术编码的FPGA实现 总被引：2，自引：0，他引：2

姜勤忠李建华马幼军林建英付尧《电子技术应用》2004,30(7):67-69,73

在简单介绍算术编码和自适应算术编码的基础上,介绍了利用FPGA器件并通过VHDL语言描述实现自适应算术编码的过程。整个编码系统在ALTERA公司的MAX+plusⅡ软件上进行了编译仿真,测试结果表明:编码器各个模块的设计在速度和资源利用两方面均达到了较优的状态,可以满足实时编码的要求。相似文献

11.

Alistair Moffat 《Software》1999,29(7):647-659

In 1994 Peter Fenwick at the University of Auckland devised an elegant mechanism for tracking the cumulative symbol frequency counts that are required for adaptive arithmetic coding. His structure spends O(log n) time per update when processing the sth symbol in an alphabet of n symbols. In this note we propose a small but significant alteration to this mechanism, and reduce the running time to O(log (1+s)) time per update. If a probability‐sorted alphabet is maintained, so that symbol s in the alphabet is the sth most frequent, the cost of processing each symbol is then linear in the number of bits produced by the arithmetic coder. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

12.

Hung‐Yan Gu 《Software》2005,35(11):1027-1039

In this paper, a large‐alphabet‐oriented scheme is proposed for both Chinese and English text compression. Our scheme parses Chinese text with the alphabet defined by Big‐5 code, and parses English text with some rules designed here. Thus, the alphabet used for English is not a word alphabet. After a token is parsed out from the input text, zero‐, first‐, and second‐order Markov models are used to estimate the occurrence probabilities of this token. Then, the probabilities estimated are blended and accumulated in order to perform arithmetic coding. To implement arithmetic coding under a large alphabet and probability‐blending condition, a way to partition count‐value range is studied. Our scheme has been programmed and can be executed as a software package. Then, typical Chinese and English text files are compressed to study the influences of alphabet size and prediction order. On average, our compression scheme can reduce a text file's size to 33.9% for Chinese and to 23.3% for English text. These rates are comparable with or better than those obtained by popular data compression packages. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

13.

Justin Zobel Alistair Moffat 《Software》1995,25(8):891-903

We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text retrieval system. Using a semi-static word-based compression model, the space needed to store the text is under 30 per cent of the original requirement. The model is used in conjunction with canonical Huffman coding and together these two paradigms provide fast decompression. Experiments with 500 Mb of newspaper articles show that in full-text retrieval environments compression not only saves space, it can also yield faster query processing - a win-win situation. 相似文献

14.

基于编辑距离的Web数据挖掘

黄亮赵泽茂梁兴开《计算机应用》2012,32(6):1662-1665

Div+CSS流行于Web页面的布局,在这种布局下,网页中很多数据记录以重复结构的形式聚集在一个层级。为了更好地从网页中挖掘数据,提出了一种新的Web数据挖掘算法,把树编辑距离转化为字符串编辑距离的计算,改进字符串编辑距离算法,利用字符串编辑距离评价树的相似度,进而找到网页中的重复模式,提取数据。通过针对不同重复模式特征的网页的实验说明,基于编辑距离的Web数据挖掘算法不仅能提取具有根节点及上面几层相同的网页的数据,对具有底层节点相同的网页也是有效的。相似文献

15.

Online Text Prediction with Recurrent Neural Networks

Pérez-Ortiz Juan Antonio Calera-Rubio Jorge Forcada Mikel L. 《Neural Processing Letters》2001,14(2):127-140

Arithmetic coding is one of the most outstanding techniques for lossless data compression. It attains its good performance with the help of a probability model which indicates at each step the probability of occurrence of each possible input symbol given the current context. The better this model, the greater the compression ratio achieved. This work analyses the use of discrete-time recurrent neural networks and their capability for predicting the next symbol in a sequence in order to implement that model. The focus of this study is on online prediction, a task much harder than the classical offline grammatical inference with neural networks. The results obtained show that recurrent neural networks have no problem when the sequences come from the output of a finite-state machine, easily giving high compression ratios. When compressing real texts, however, the dynamics of the sequences seem to be too complex to be learned online correctly by the net. 相似文献

16.

BWT与经典压缩算法研究

倪桂强李彬罗健欣张雪《计算机与数字工程》2010,38(11):26-29,41

为提高无损压缩的效果,分析了BWT的基本原理,回顾并比较了霍夫曼编码、算术编码、LZ77和LZW算法的性能,然后把BWT与多阶算术编码、LZW编码结合起来研究,结果表明,对大于BWT分组数据块的文件先经过BWT预处理,再进行压缩,压缩效果明显提高。相似文献

17.

汉语文本压缩研究及其应用 总被引：6，自引：3，他引：3

下载免费PDF全文

王忠效《中文信息学报》1997,11(3):58-65

汉语文本压缩至今很少受到重视, 然而, 作为许多计算机应用系统的支撑技术, 其重要性毋庸置疑。本文结合汉语文本的特征对现行文本压缩技术进行评述, 指出汉语文本理论上可能获得的平均压缩比率(〉3.9) 及现行压缩算法所能达到的水平(1.6左右)。此外, 讨论了汉语文本压缩的研究方向以及几种典型的应用。相似文献