首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Consider the case where consecutive blocks of letters of a semi-infinite individual sequence over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a priori information about is available at the encoder, which must therefore adopt a universal data-compression algorithm. It is known that if the universal Lempel-Ziv (LZ) data compression algorithm is successively applied to -blocks then the best error-free compression, for the particular individual sequence is achieved as tends to infinity. The best possible compression that may be achieved by any universal data compression algorithm for finite -blocks is discussed. It is demonstrated that context tree coding essentially achieves it. Next, consider a device called classifier (or discriminator) that observes an individual training sequence . The classifier's task is to examine individual test sequences of length and decide whether the test -sequence has the same features as those that are captured by the training sequence , or is sufficiently different, according to some appropriate criterion. Here again, it is demonstrated that a particular universal context classifier with a storage-space complexity that is linear in , is essentially optimal. This may contribute a theoretical ldquoindividual sequencerdquo justification for the Probabilistic Suffix Tree (PST) approach in learning theory and in computational biology.  相似文献   

2.
Coding theorems for individual sequences   总被引:2,自引:0,他引:2  
A quantity called the {em finite-state} complexity is assigned to every infinite sequence of elements drawn from a finite sot. This quantity characterizes the largest compression ratio that can be achieved in accurate transmission of the sequence by any finite-state encoder (and decoder). Coding theorems and converses are derived for an individual sequence without any probabilistic characterization, and universal data compression algorithms are introduced that are asymptotically optimal for all sequences over a given alphabet. The finite-state complexity of a sequence plays a role similar to that of entropy in classical information theory (which deals with probabilistic ensembles of sequences rather than an individual sequence). For a probabilistic source, the expectation of the finite state complexity of its sequences is equal to the source's entropy. The finite state complexity is of particular interest when the source statistics are unspecified.  相似文献   

3.
A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be fully reconstructed. In a grammar-based code, a data sequence is first converted into a grammar by a grammar transform and then losslessly encoded. In this paper, a greedy grammar transform is first presented; this grammar transform constructs sequentially a sequence of irreducible grammars from which the original data sequence can be recovered incrementally. Based on this grammar transform, three universal lossless data compression algorithms, a sequential algorithm, an improved sequential algorithm, and a hierarchical algorithm, are then developed. These algorithms combine the power of arithmetic coding with that of string matching. It is shown that these algorithms are all universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source. Moreover, it is proved that their worst case redundancies among all individual sequences of length n are upper-bounded by c log log n/log n, where c is a constant. Simulation results show that the proposed algorithms outperform the Unix Compress and Gzip algorithms, which are based on LZ78 and LZ77, respectively  相似文献   

4.
A new notion of empirical informational divergence (relative entropy) between two individual sequences is introduced. If the two sequences are independent realizations of two finite-order, finite alphabet, stationary Markov processes, the empirical relative entropy converges to the relative entropy almost surely. This empirical divergence is based on a version of the Lempel-Ziv data compression algorithm. A simple universal algorithm for classifying individual sequences into a finite number of classes, which is based on the empirical divergence, is introduced. The algorithm discriminates between the classes whenever they are distinguishable by some finite-memory classifier for almost every given training set and almost any test sequence from these classes. It is universal in the sense that it is independent of the unknown sources  相似文献   

5.
In this correspondence, we present a new universal entropy estimator for stationary ergodic sources, prove almost sure convergence, and establish an upper bound on the convergence rate for finite-alphabet finite memory sources. The algorithm is motivated by data compression using the Burrows-Wheeler block sorting transform (BWT). By exploiting the property that the BWT output sequence is close to a piecewise stationary memoryless source, we can segment the output sequence and estimate probabilities in each segment. Experimental results show that our algorithm outperforms Lempel-Ziv (LZ) string-matching-based algorithms.  相似文献   

6.
The context-tree weighting method: basic properties   总被引:4,自引:0,他引:4  
Describes a sequential universal data compression procedure for binary tree sources that performs the “double mixture.” Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. The authors derive a natural upper bound on the cumulative redundancy of the method for individual sequences. The three terms in this bound can be identified as coding, parameter, and model redundancy, The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. The upper bound on the redundancy shows that the proposed context-tree weighting procedure is optimal in the sense that it achieves the Rissanen (1984) lower bound  相似文献   

7.
MPEG-2 based lossless video compression   总被引:1,自引:0,他引:1  
The authors describe an efficient algorithm design for lossless video compression by using MPEG-2 as a basic research platform. Starting from MPEG motion estimation and compensation, the proposed algorithm focuses on a context tree design to fine tune the statistics and thus optimise the estimation of conditional probabilities to drive an arithmetic coder. In comparison with the existing work on context tree design, the proposed algorithm features: (i) prefix sequence matching to locate the statistics model at the internal node nearest to the stopping point, where successful match of context sequence is broken; (ii) traversing the context tree along a fixed order of context structure with a maximum number of four motion compensated errors; and (iii) context thresholding to quantise the higher end of error values into a single statistics cluster. As a result, the proposed algorithm is able to achieve competitive processing speed, low computational complexity and high compression performances, which bridges the gap between universal statistics modelling and practical compression techniques. When JPEG-LS and CALIC, the existing state-of-the-art in lossless compression of still images, are applied to those motion compensated error-frames as well as individual non-predicted frames to formulate benchmarks, the authors' experiments illustrate that the proposed algorithm outperforms JPEG-LS by up to 24% and CALIC by up to 22%, yet the processing time ranges from less than 2 s per frame to 6 s per frame on a typical PC computing platform.  相似文献   

8.
Two universal lossy data compression schemes, one with fixed rate and the other with fixed distortion, are presented, based on the well-known Lempel-Ziv algorithm. In the case of fixed rate R, the universal lossy data compression scheme works as follows: first pick a codebook Bn consisting of all reproduction sequences of length n whose Lempel-Ziv codeword length is ⩽nR, and then use Bn to encode the entire source sequence n-block by n-block. This fixed-rate data compression scheme is universal in the sense that for any stationary, ergodic source or for any individual sequence, the sample distortion performance as n→∞ is given almost surely by the distortion rate function. A similar result is shown in the context of fixed distortion lossy source coding  相似文献   

9.
Motivated by the evident success of context-tree based methods in lossless data compression, we explore, in this correspondence, methods of the same spirit in universal prediction of individual sequences. By context-tree prediction, we refer to a family of prediction schemes, where at each time instant t, after having observed all outcomes of the data sequence x1,...,xt-1, but not yet xt , the prediction is based on a "context" (or a state) that consists of the k most recent past outcomes xt-k,...,xt-1, where the choice of k may depend on the contents of a possibly longer, though limited, portion of the observed past, xt-kmax,...,xt-1. This is different from the study reported in the paper by Feder, Merhav, and Gutman (1992), where general finite-state predictors as well as "Markov" (finite-memory) predictors of fixed order, were studied in the regime of individual sequences. Another important difference between this study and the work of Feder is the asymptotic regime. While in their work, the resources of the predictor (i.e., the number of states or the memory size) were kept fixed regardless of the length N of the data sequence, here we investigate situations where the number of contexts, or states, is allowed to grow concurrently with N. We are primarily interested in the following fundamental question: What is the critical growth rate of the number of contexts, below which the performance of the best context-tree predictor is still universally achievable, but above which it is not? We show that this critical growth rate is linear in N. In particular, we propose a universal context-tree algorithm that essentially achieves optimum performance as long as the growth rate is sublinear, and show that, on the other hand, this is impossible in the linear case  相似文献   

10.
The sliding-window version of the Lempel-Ziv data-compression algorithm (LZ1) has found many applications recently (e.g., the Stacker program for personal computers and the new Microsoft MS-DOS.6.2). Other versions of the Lempel-Ziv data-compression algorithm (LZ2) became an integral part of international standards for data transmission modems and proved themselves to be highly successful. The purpose of this paper is to give an intuitive overview of universal, noiseless data compression of sequences as well as 2-D images, by following the lines of approach which characterizes the family of LZ universal codes and by further extending this approach so as to yield some new results.  相似文献   

11.
A universal finite memory source   总被引:4,自引:0,他引:4  
An irreducible parameterization for a finite memory source is constructed in the form of a tree machine. A universal information source for the set of finite memory sources is constructed by a predictive modification of an earlier studied algorithm-Context. It is shown that this universal source incorporates any minimal data-generating tree machine in an asymptotically optimal manner in the following sense: the negative logarithm of the probability it assigns to any long typical sequence, generated by any tree machine, approaches that assigned by the tree machine at the best possible rate  相似文献   

12.
We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction strategies, called experts. By using a general loss function, we generalize previous work on universal prediction, forecasting, and data compression. However, here we restrict ourselves to the case when the comparison class is finite. For a given sequence, we define the regret as the total loss on the entire sequence suffered by the adaptive sequential predictor, minus the total loss suffered by the predictor in the comparison class that performs best on that particular sequence. We show that for a large class of loss functions, the minimax regret is either &thetas;(log N) or Ω(√ℒlog N), depending on the loss function, where N is the number of predictors in the comparison class andℒ is the length of the sequence to be predicted. The former case was shown previously by Vovk (1990); we give a simplified analysis with an explicit closed form for the constant in the minimax regret formula, and give a probabilistic argument that shows this constant is the best possible. Some weak regularity conditions are imposed on the loss function in obtaining these results. We also extend our analysis to the case of predicting arbitrary sequences that take real values in the interval [0,1]  相似文献   

13.
14.
Better OPM/L Text Compression   总被引:1,自引:0,他引:1  
An OPM/L data compression scheme suggested by Ziv and Lempel, LZ77, is applied to text compression. A slightly modified version suggested by Storer and Szymanski, LZSS, is found to achieve compression ratios as good as most existing schemes for a wide range of texts. LZSS decoding is very fast, and comparatively little memory is required for encoding and decoding. Although the time complexity of LZ77 and LZSS encoding isO(M)for a text ofMcharacters, straightforward implementations are very slow. The time consuming step of these algorithms is a search for the longest string match. Here a binary search tree is used to find the longest string match, and experiments show that this results in a dramatic increase in encoding speed. The binary tree algorithm can be used to speed up other OPM/L schemes, and other applications where a longest string match is required. Although the LZSS scheme imposes a limit on the length of a match, the binary tree algorithm will work without any limit.  相似文献   

15.
Universal prediction of individual sequences   总被引:8,自引:0,他引:8  
The problem of predicting the next outcome of an individual binary sequence using finite memory is considered. The finite-state predictability of an infinite sequence is defined as the minimum fraction of prediction errors that can be made by any finite-state (FS) predictor. It is proven that this FS predictability can be achieved by universal sequential prediction schemes. An efficient prediction procedure based on the incremental parsing procedure of the Lempel-Ziv data compression algorithm is shown to achieve asymptotically the FS predictability. Some relations between compressibility and predictability are discussed, and the predictability is proposed as an additional measure of the complexity of a sequence  相似文献   

16.
We consider zero-delay joint source-channel coding of individual source sequences for a general known channel. Given an arbitrary finite set of schemes with finite-memory (not necessarily time-invariant) decoders, a scheme is devised that does essentially as well as the best in the set on all individual source sequences. Using this scheme, we construct a universal zero-delay joint source-channel coding scheme that is guaranteed to achieve, asymptotically, the performance of the best zero-delay encoding-decoding scheme with a finite-state encoder and a Markov decoder, on all individual sequences. For the case where the channel is a discrete memoryless channel (DMC), we construct an implementable zero-delay joint source-channel coding scheme that is based on the "follow the perturbed leader" scheme of Gyoumlrgy for lossy source coding of individual sequences. Our scheme is guaranteed to attain asymptotically the performance of the best in the set of all encoding-decoding schemes with a "symbol-by-symbol" decoder (and arbitrary encoder), on all individual sequences  相似文献   

17.
A universal variable-to-fixed length algorithm for binary memoryless sources which converges to the entropy of the source at the optimal rate is known. We study the problem of universal variable-to-fixed length coding for the class of Markov sources with finite alphabets. We give an upper bound on the performance of the code for large dictionary sizes and show that the code is optimal in the sense that no codes exist that have better asymptotic performance. The optimal redundancy is shown to be H log log M/log M where H is the entropy rate of the source and M is the code size. This result is analogous to Rissanen's (1984) result for fixed-to-variable length codes. We investigate the performance of a variable-to-fixed coding method which does not need to store the dictionaries, either at the coder or the decoder. We also consider the performance of both these source codes on individual sequences. For individual sequences we bound the performance in terms of the best code length achievable by a class of coders. All the codes that we consider are prefix-free and complete  相似文献   

18.
The compression performance of grammar-based codes is revisited from a new perspective. Previously, the compression performance of grammar-based codes was evaluated against that of the best arithmetic coding algorithm with finite contexts. In this correspondence, we first define semifinite-state sources and finite-order semi-Markov sources. Based on the definitions of semifinite-state sources and finite-order semi-Markov sources, and the idea of run-length encoding (RLE), we then extend traditional RLE algorithms to context-based RLE algorithms: RLE algorithms with k contexts and RLE algorithms of order k, where k is a nonnegative integer. For each individual sequence x, let r/sup *//sub sr,k/(x) and r/sup *//sub sr|k/(x) be the best compression rate given by RLE algorithms with k contexts and by RLE algorithms of order k, respectively. It is proved that for any x, r/sup *//sub sr,k/ is no greater than the best compression rate among all arithmetic coding algorithms with k contexts. Furthermore, it is shown that there exist stationary, ergodic semi-Markov sources for which the best RLE algorithms without any context outperform the best arithmetic coding algorithms with any finite number of contexts. Finally, we show that the worst case redundancies of grammar-based codes against r/sup *//sub sr,k/(x) and r/sup *//sub sr|k/(x) among all length- n individual sequences x from a finite alphabet are upper-bounded by d/sub 1/loglogn/logn and d/sub 2/loglogn/logn, respectively, where d/sub 1/ and d/sub 2/ are constants. This redundancy result is stronger than all previous corresponding results.  相似文献   

19.
传统无损压缩算法对屏幕图像的压缩效果不佳。该文根据典型屏幕图像的特性,以LZ4HC(LZ4 High Compression)算法为具体实现基础,提出一种基于串匹配的高性能低复杂度(String Matching with High Performance and Low Complexity, SMHPLC) 的屏幕图像无损压缩算法。相对于传统字典编码无损压缩算法,新算法提出了以像素为搜索和匹配单位,对未匹配串长度、匹配串长度以及匹配偏移量这3个编码参数进行联合优化编码,并对参数进行映射编码。实验结果表明,SMHPLC具有高性能和低复杂度的综合优势,大幅降低编码复杂度,提高了编码效率。使用移动的文字和图形类的AVS2通用测试序列作为测试对象,对于YUV和RGB两种格式,SMHPLC算法比LZ4HC总体节省码率分别为22.4%,21.2%,同时编码复杂度降低分别为34.6%,46.8%。  相似文献   

20.
Inspired by theoretical results on universal modeling, a general framework for sequential modeling of gray-scale images is proposed and applied to lossless compression. The model is based on stochastic complexity considerations and is implemented with a tree structure. It is efficiently estimated by a modification of the universal algorithm context. Several variants of the algorithm are described. The sequential, lossless compression schemes obtained when the context modeler is used with an arithmetic coder are tested with a representative set of gray-scale images. The compression ratios are compared with those obtained with state-of-the-art algorithms available in the literature, with the results of the comparison consistently favoring the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号