首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A new notion of empirical informational divergence (relative entropy) between two individual sequences is introduced. If the two sequences are independent realizations of two finite-order, finite alphabet, stationary Markov processes, the empirical relative entropy converges to the relative entropy almost surely. This empirical divergence is based on a version of the Lempel-Ziv data compression algorithm. A simple universal algorithm for classifying individual sequences into a finite number of classes, which is based on the empirical divergence, is introduced. The algorithm discriminates between the classes whenever they are distinguishable by some finite-memory classifier for almost every given training set and almost any test sequence from these classes. It is universal in the sense that it is independent of the unknown sources  相似文献   

2.
In a causal source coding system, the reconstruction of the present source sample is restricted to be a function of the present and past source samples, while the code stream itself may be noncausal and have variable rate. Neuhoff and Gilbert showed that for memoryless sources, optimum performance among all causal source codes is achieved by time-sharing at most two memoryless codes (quantizers) followed by entropy coding. In this work, we extend Neuhoff and Gilbert's result in the limit of small distortion (high resolution) to two new settings. First, we show that at high resolution, an optimal causal code for a stationary source with finite differential entropy rate consists of a uniform quantizer followed by a (sequence) entropy coder. This implies that the price of causality at high resolution is approximately 0.254 bit, i.e., the space-filling loss of the uniform quantizer. Then, we consider individual sequences and introduce a deterministic analogue of differential entropy, which we call "Lempel-Ziv differential entropy." We show that for any bounded individual sequence with finite Lempel-Ziv differential entropy, optimum high-resolution performance among all finite-memory variable-rate causal codes is achieved by dithered scalar uniform quantization followed by Lempel-Ziv coding. As a by-product, we also prove an individual-sequence version of the Shannon lower bound.  相似文献   

3.
The Lempel-Ziv data compression algorithm has the property that for finite-memory sources the redundancy ρn (defined as the difference between the average code rate and the entropy when the memory size is n) is O (log log n/log n). We suggest a new version of the algorithm with redundancy ρn=O (1/log n)  相似文献   

4.
A random number generator generates fair coin flips by processing deterministically an arbitrary source of nonideal randomness. An optimal random number generator generates asymptotically fair coin flips from a stationary ergodic source at a rate of bits per source symbol equal to the entropy rate of the source. Since optimal noiseless data compression codes produce incompressible outputs, it is natural to investigate their capabilities as optimal random number generators. We show under general conditions that optimal variable-length source codes asymptotically achieve optimal variable-length random bit generation in a rather strong sense. In particular, we show in what sense the Lempel-Ziv (1978) algorithm can be considered an optimal universal random bit generator from arbitrary stationary ergodic random sources with unknown distributions  相似文献   

5.
The redundancy problem of the fixed-database Lempel-Ziv (1977) algorithm is considered. It is demonstrated that for a class of φ-mixing sources which includes Markov sources, unifilar sources, and finite-state sources as special cases, the redundancy of the fixed-database Lempel-Ziv algorithm with database size n is lower-bounded by H(loglogn)/logn+O((logloglogn)/logn) and upper-bounded by 2H(loglogn)/logn+O((logloglogn)/logn) where H is the entropy rate of the source. The method of proof is new and uses the concept of variable-length to variable-length codes  相似文献   

6.
An upper bound on the probability of a sequence drawn from a finite-state source is derived. The bound is given in terms of the number of phrases obtained by parsing the sequence according to the Lempel-Ziv (L-Z) incremental parsing rule, and is universal in the sense that it does not depend on the statistical parameters that characterize the source. This bound is used to derive an upper bound on the redundance of the L-Z universal data compression algorithm applied to finite-state sources, that depends on the length N of the sequence, on the number K of states of the source, and, eventually, on the source entropy. A variation of the L-Z algorithm is presented, and an upper bound on its redundancy is derived for finite-state sources. A method to derive tighter implicit upper bounds on the redundancy of both algorithms is also given, and it is shown that for the proposed variation this bound is smaller than for the original L-Z algorithm, or every value of N and K  相似文献   

7.
We derive several almost-sure results related to the sliding-window Lempel-Ziv (SWLZ) algorithm. A principal result is a path-wise lower bound to the redundancy equal to 1/2hlog2log 2nw/log2nw in the main term, where nw is the sliding window size. This bound is off by a factor of two from the main term in the lower bound of A. J. Wyner and the work of Yang and Kieffer, which hold in the expected sense for the fixed-database Lempel-Ziv algorithm (FDLZ). Another aspect of the present work studies the asymptotic behavior of the ratio of the number of phrases to the length of the parsed string for any finite sliding window size; in here we exploit the theory of asymptotic mean stationary processes of Gray and Kieffer and some results of Kieffer and Rahe. In all cases it is assumed that the source is stationary and that in the most restrictive case it is an irreducible and aperiodic Markov chain; some of the results hold for sources that have exponential rates for entropy and more generally for the ergodic setting  相似文献   

8.
Entropy and data compression schemes   总被引:7,自引:0,他引:7  
Some new ways of defining the entropy of a process by observing a single typical output sequence as well as a new kind of Shannon-McMillan-Breiman theorem are presented. This provides a new and conceptually very simple ways of estimating the entropy of an ergodic stationary source as well as new insight into the workings of such well-known data compression schemes as the Lempel-Ziv algorithm  相似文献   

9.
Two universal lossy data compression schemes, one with fixed rate and the other with fixed distortion, are presented, based on the well-known Lempel-Ziv algorithm. In the case of fixed rate R, the universal lossy data compression scheme works as follows: first pick a codebook Bn consisting of all reproduction sequences of length n whose Lempel-Ziv codeword length is ⩽nR, and then use Bn to encode the entire source sequence n-block by n-block. This fixed-rate data compression scheme is universal in the sense that for any stationary, ergodic source or for any individual sequence, the sample distortion performance as n→∞ is given almost surely by the distortion rate function. A similar result is shown in the context of fixed distortion lossy source coding  相似文献   

10.
The fixed-database version of the Lempel-Ziv algorithm closely resembles many versions that appear in practice. We ascertain several key asymptotic properties of the algorithm as applied to sources with finite memory. First, we determine that for a dictionary of size n, the algorithm achieves a redundancy ρn=Hlog log n/log n+0(log log n/log n) where H is the entropy of the process. This is the first, nontrivial, lower bound on any Lempel-Ziv-type compression scheme. We then find the limiting distribution and all moments of the lengths of the phrases by comparing them to a random-walk-like variable with well-known behavior  相似文献   

11.
In this correspondence, we present a new universal entropy estimator for stationary ergodic sources, prove almost sure convergence, and establish an upper bound on the convergence rate for finite-alphabet finite memory sources. The algorithm is motivated by data compression using the Burrows-Wheeler block sorting transform (BWT). By exploiting the property that the BWT output sequence is close to a piecewise stationary memoryless source, we can segment the output sequence and estimate probabilities in each segment. Experimental results show that our algorithm outperforms Lempel-Ziv (LZ) string-matching-based algorithms.  相似文献   

12.
For pt. I see ibid., vol.46, p.755-88 (2000). The concept of context-free grammar (CFG)-based coding is extended to the case of countable-context models, yielding context-dependent grammar (CDG)-based coding. Given a countable-context model, a greedy CDG transform is proposed. Based on this greedy CDG transform, two universal lossless data compression algorithms, an improved sequential context-dependent algorithm and a hierarchical context-dependent algorithm, are then developed. It is shown that these algorithms are all universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source with a finite alphabet. Moreover, it is proved that these algorithms' worst case redundancies among all individual sequences of length n from a finite alphabet are upper-bounded by d log log n/log n, as long as the number of distinct contexts grows with the sequence length n in the order of O(n/sup a/), where 0 < /spl alpha/ < 1 and d are positive constants. It is further shown that for some nonstationary sources, the proposed context-dependent algorithms can achieve better expected redundancies than any existing CFG-based codes, including the Lempel-Ziv (1978) algorithm, the multilevel pattern matching algorithm, and the context-free algorithms in Part I of this series of papers.  相似文献   

13.
The continuous wavelet transform is obtained as a maximum entropy solution of the corresponding inverse problem. It is well known that although a signal can be reconstructed from its wavelet transform, the expansion is not unique due to the redundancy of continuous wavelets. Hence, the inverse problem has no unique solution. If we want to recognize one solution as “optimal”, then an appropriate decision criterion has to be adopted. We show here that the continuous wavelet transform is an “optimal” solution in a maximum entropy sense  相似文献   

14.
The Lempel-Ziv codes are universal variable-to-fixed length codes that have become virtually standard in practical lossless data compression. For any given source output string from a Markov or unifilar source, we upper-bound the difference between the number of binary digits needed to encode the string and the self-information of the string. We use this result to demonstrate that for unifilar or Markov sources, the redundancy of encoding the first n letters of the source output with the Lempel-Ziv incremental parsing rule (LZ'78), the Welch modification (LZW), or a new variant is O((ln n)-1), and we upper-bound the exact form of convergence. We conclude by considering the relationship between the code length and the empirical entropy associated with a string  相似文献   

15.
The sliding-window version of the Lempel-Ziv data-compression algorithm (LZ1) has found many applications recently (e.g., the Stacker program for personal computers and the new Microsoft MS-DOS.6.2). Other versions of the Lempel-Ziv data-compression algorithm (LZ2) became an integral part of international standards for data transmission modems and proved themselves to be highly successful. The purpose of this paper is to give an intuitive overview of universal, noiseless data compression of sequences as well as 2-D images, by following the lines of approach which characterizes the family of LZ universal codes and by further extending this approach so as to yield some new results.  相似文献   

16.
A multidimensional incremental parsing algorithm (MDIP) for multidimensional discrete sources, as a generalization of the Lempel-Ziv coding algorithm, is investigated. It consists of three essential component schemes, maximum decimation matching, hierarchical structure of multidimensional source coding, and dictionary augmentation. As a counterpart of the longest match search in the Lempel-Ziv algorithm, two classes of maximum decimation matching are studied. Also, an underlying behavior of the dictionary augmentation scheme for estimating the source statistics is examined. For an m-dimensional source, m augmentative patches are appended into the dictionary at each coding epoch, thus requiring the transmission of a substantial amount of information to the decoder. The property of the hierarchical structure of the source coding algorithm resolves this issue by successively incorporating lower dimensional coding procedures in the scheme. In regard to universal lossy source coders, we propose two distortion functions, the local average distortion and the local minimax distortion with a set of threshold levels for each source symbol. For performance evaluation, we implemented three image compression algorithms based upon the MDIP; one is lossless and the others are lossy. The lossless image compression algorithm does not perform better than the Lempel-Ziv-Welch coding, but experimentally shows efficiency in capturing the source structure. The two lossy image compression algorithms are implemented using the two distortion functions, respectively. The algorithm based on the local average distortion is efficient at minimizing the signal distortion, but the images by the one with the local minimax distortion have a good perceptual fidelity among other compression algorithms. Our insights inspire future research on feature extraction of multidimensional discrete sources.  相似文献   

17.
The authors estimate the order of a finite Markov source based on empirically observed statistics. The performance criterion adopted is to minimize the probability of underestimating the model order while keeping the overestimation probability exponent at a prescribed level. A universal asymptotically optimal test, in the sense just defined, is proposed for the case where a given integer is known to be the upper bound of the true order. For the case where such a bound is unavailable, an alternative rule based on the Lempel-Ziv data compression algorithm is shown to be asymptotically optimal also and computationally more efficient  相似文献   

18.
Nonasymptotic coding and converse theorems are derived for universal data compression algorithms in cases where the training sequence (“history”) that is available to the encoder consists of the most recent segment of the input data string that has been processed, but is not large enough so as to yield the ultimate compression, namely, the entropy of the source  相似文献   

19.
The horizontal inhomogeneity of the atmosphere within a satellite microwave radiometer's field of view (FOV) has always been considered as a source of rainfall retrieval errors. The hydrometeor profile retrieval algorithm presented exploits it to obtain an approximation of a radiative transfer model, which allows relatively simple inversion. The atmosphere within the FOV is treated as a combination of horizontally homogeneous domains. Assuming that one of known “basic” hydrometeor profiles occurs in each domain, the inverse problem is reduced to a determination of “beamfilling coefficients.” The online procedure includes determination of beamfilling coefficients and a footprint-averaged hydrometeor profile as a linear combination of “basic” ones. Off-line procedures involve the selection of a minimum number of necessary “basic” brightness temperature vectors and correction of “basic” hydrometeor profiles to provide the best retrieval accuracy for a given cloud/radiative simulation. The performance of the algorithm is tested for both numerical simulations and TRMM/TMI data. Numerical simulation has allowed a comparison of the information content of radiometer measurements from SSM/I, TMI, and the future AMSR. The effectiveness of the algorithm is being tested for rain water integral and rain rate retrievals from TRMM TMI measurements  相似文献   

20.
该文提出将图像编码后残留冗余的马尔可夫场模型分解为4个方向的马尔可夫链,并结合简化的模型及低密度奇偶校验码(LDPC)译码的软输出进行信源-信道联合译码。将分解后信源中多个方向上同时存在的相关性看作一种特殊的天然信道编码方式,利用前向-后向算法、和积算法以及信道译码软输出分别对信源符号进行串行和并行的译码。仿真实验表明,与传统利用马尔可夫场模型的联合译码算法相比,该联合译码算法降低了复杂度,同时提高了重建图像的峰值信噪比。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号