首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Asymptotic properties of data compression and suffix trees   总被引:1,自引:0,他引:1  
Recently, Wyner and Ziv (see ibid., vol.35, p.1250-8, 1989) have proved that the typical length of a repeated subword found within the first n positions of a stationary ergodic sequence is (1/h) log n in probability where h is the entropy of the alphabet. This finding was used to obtain several insights into certain universal data compression schemes, most notably the Lempel-Ziv data compression algorithm. Wyner and Ziv have also conjectured that their result can be extended to a stronger almost sure convergence. In this paper, we settle this conjecture in the negative in the so called right domain asymptotic, that is, during a dynamic phase of expanding the data base. We prove-under an additional assumption involving mixing conditions-that the length of a typical repeated subword oscillates almost surely (a.s.) between (1/h1)log n and (1/h2)log n where D21<∞. We also show that the length of the nth block in the Lempel-Ziv parsing algorithm reveals a similar behavior. We relate our findings to some problems on digital trees, namely the asymptotic behavior of a (noncompact) suffix tree built from suffixes of a random sequence. We prove that the height and the shortest feasible path in a suffix tree are typically (1/h2 )log n (a.s.) and (1/h1)log n (a.s.) respectively  相似文献   

2.
The fixed-database version of the Lempel-Ziv algorithm closely resembles many versions that appear in practice. We ascertain several key asymptotic properties of the algorithm as applied to sources with finite memory. First, we determine that for a dictionary of size n, the algorithm achieves a redundancy ρn=Hlog log n/log n+0(log log n/log n) where H is the entropy of the process. This is the first, nontrivial, lower bound on any Lempel-Ziv-type compression scheme. We then find the limiting distribution and all moments of the lengths of the phrases by comparing them to a random-walk-like variable with well-known behavior  相似文献   

3.
A practical suboptimal (variable source coding) algorithm for lossy data compression is presented. This scheme is based on approximate string matching, and it naturally extends the lossless Lempel-Ziv (1977) data compression scheme. Among others we consider the typical length of an approximately repeated pattern within the first n positions of a stationary mixing sequence where D percent of mismatches is allowed. We prove that there exists a constant r0(D) such that the length of such an approximately repeated pattern converges in probability to 1/r0(D) log n (pr.) but it almost surely oscillates between 1/r-∞(D) log n and 2/r1(D) log n, where r -∞(D)>r0(D)>r1(D)/2 are some constants. These constants are natural generalizations of Renyi entropies to the lossy environment. More importantly, we show that the compression ratio of a lossy data compression scheme based on such an approximate pattern matching is asymptotically equal to r0(D). We also establish the asymptotic behavior of the so-called approximate waiting time Nl which is defined as the time until a pattern of length C repeats approximately for the first time. We prove that log Nl/l→r0(D) (pr.) as l→∞. In general, r0(D)>R(D) where R(D) is the rate distortion function. Thus for stationary mixing sequences we settle in the negative the problem investigated by Steinberg and Gutman by showing that a lossy extension of the Wyner-Ziv (1989) scheme cannot be optimal  相似文献   

4.
Consider approximate (lossy) matching of a source string ~P, with a random codebook generated from reproduction distribution Q, at a specified distortion d. Previous work determined the minimum coding rate R1=R(P, Q, d) for this setting. We observe that for a large word length and with high probability, the matching codeword is typical with a distribution Q1 which is different from Q. If a new random codebook is generated ~Q1, then the source string will favor codewords which are typical with a new distribution Q2, resulting in a minimum coding rate R2=R(P, Q1, d), and so on. We show that the sequences of distributions Q1, Q 2,... and rates R1, R2,..., generated by this procedure, converge to an optimum reproduction distribution Q*, and the rate-distortion function R(P, d), respectively. We also derive a fixed rate-distortion slope version of this natural type selection process. In the latter case, an iteration of the process stochastically simulates an iteration of the Blahut-Arimoto (1972) algorithm for rate-distortion function computation (without recourse to prior knowledge of the underlying source distribution). To strengthen these limit statements, we also characterize the steady-state error of these procedures when iterating at a finite string length. Implications of the main results provide fresh insights into the workings of lossy variants of the Lempel-Ziv algorithm for adaptive compression  相似文献   

5.
This paper establishes the following results concerning the estimation of the power spectrum of a single, deterministic, infinitely long signal. a) If S x is the signal's power spectral density, correlogram spectral estimates obtained from increasingly longer signal segments tend to S x * ? x/2p in the L 1-norm, where ? is the Fourier transform of the window used to generate the estimates. b) The L 1-norm of S x - S x * ? x/2p can be made arbitrarily small by an appropriate choice of window. It is further shown that the accuracy of the spectral estimates generated by a given window is related to a newly introduced function, termed the windowing error kernel and that this function yields bounds on the asymptotic error of the estimates. As an example, correlogram spectral estimates are used to analyze spectral regrowth in an amplifier.  相似文献   

6.
7.
The phase transformation and stability of TiSi2 on n + diffusions are investigated. Narrower n+ diffusions require higher anneal temperatures, or longer anneal times, than wider diffusions for complete transitions from the high-resistivity C49 phase to the low-resistivity C54 phase. A model is presented which explains this in terms of the probability of forming C54 nuclei on narrow diffusions and the influence of diffusion width on C54 grain size. The results are that more C49 and C54 nucleation events are required to completely transform narrow lines. For thin TiSi2 (40 nm), there is a narrow process window for achieving complete transformation without causing agglomeration of the TiSi2. The process window decreases with decreasing silicide thickness. A significantly larger process window is achieved with short-time rapid annealing. Similar studies are performed for CoSi2 on n+ and p+ diffusions. No linewidth dependence is observed for the transformation from CoSix to CoSi2. There is a broad process window from 575°C to 850°C using furnace annealing, for which the low-resistivity phase is obtained without causing agglomeration  相似文献   

8.
Bounds on the minimum support weights   总被引:6,自引:0,他引:6  
The minimum support weight, dr(C), of a linear code C over GF(q) is the minimal size of the support of an r-dimensional subcode of C. A number of bounds on dr(C) are derived, generalizing the Plotkin bound and the Griesmer bound, as well as giving two new existential bounds. As the main result, it is shown that there exist codes of any given rate R whose ratio dr/d1 is lower bounded by a number ranging from (qr-1)/(qr -qr-1) to r, depending on R  相似文献   

9.
Iterative turbo processing between detection and decoding shows near-capacity performance on a multiple-antenna system. Combining iterative processing with optimum front-end detection is particularly challenging because the front-end maximum a posteriori (MAP) algorithm has a computational complexity that is exponential. Sub-optimum detector such as the soft interference cancellation linear minimum mean square error (SIC-LMMSE) detector with near front-end MAP performance has been proposed in the literature. The asymptotic computational complexity of SIC-LMMSE is O(nt 2nr + ntnr 3 + ntMc2Mc) per detection-decoding cycle where nt is number of transmit antenna, nr is number of receive antenna, and Mc is modulation size. A lower complexity detector is the hard interference cancellation LMMSE (HIC-LMMSE) detector. HIC-LMMSE has asymptotic complexity of O(nt 2nr + ntMc2Mc) but suffers extra performance degradation. In this paper, two front-end detection algorithms are introduced that not only achieve asymptotic computational complexity of O(nt 2nr + ntnr 2 [Gamma (beta) + 1] + ntMc2Mc) where Gamma(beta) is a function with discrete output {-1, 2, 3, ...,nt} and O(ntMc2Mc) respectively. Simulation results demonstrate that the proposed low complexity detection algorithms offer exactly same performance as their full complexity counterpart in an iterative receiver while being computational more efficient.  相似文献   

10.
Source coding, large deviations, and approximate pattern matching   总被引:1,自引:0,他引:1  
We present a development of parts of rate-distortion theory and pattern-matching algorithms for lossy data compression, centered around a lossy version of the asymptotic equipartition property (AEP). This treatment closely parallels the corresponding development in lossless compression, a point of view that was advanced in an important paper of Wyner and Ziv in 1989. In the lossless case, we review how the AEP underlies the analysis of the Lempel-Ziv algorithm by viewing it as a random code and reducing it to the idealized Shannon code. This also provides information about the redundancy of the Lempel-Ziv algorithm and about the asymptotic behavior of several relevant quantities. In the lossy case, we give various versions of the statement of the generalized AEP and we outline the general methodology of its proof via large deviations. Its relationship with Barron (1985) and Orey's (1985, 1986) generalized AEP is also discussed. The lossy AEP is applied to (i) prove strengthened versions, of Shannon's(1948, 1974) direct source-coding theorem and universal coding theorems; (ii) characterize the performance of "mismatched" codebooks in lossy data compression; ( iii) analyze the performance of pattern-matching algorithms for lossy compression (including Lempel-Ziv schemes); and (iv) determine the first-order asymptotic of waiting times between stationary processes. A refinement to the lossy AEP is then presented, and it is used to (i) prove second-order (direct and converse) lossy source-coding theorems, including universal coding theorems; (ii) characterize which sources are quantitatively easier to compress; (iii) determine the second-order asymptotic of waiting times between stationary processes; and (iv) determine the precise asymptotic behavior of longest match-lengths between stationary processes. Finally, we discuss extensions of the above framework and results to random fields  相似文献   

11.
In this paper, we settle a long-standing open problem concerning the average redundancy rn of the Lempel-Ziv'78 (LZ78) code. We prove that for a memoryless source the average redundancy rate attains asymptotically Ern=(A+δ(n))/log n+ O(log log n/log2 n), where A is an explicitly given constant that depends on source characteristics, and δ(x) is a fluctuating function with a small amplitude. We also derive the leading term for the kth moment of the number of phrases. We conclude by conjecturing a precise formula on the expected redundancy for a Markovian source. The main result of this paper is a consequence of the second-order properties of the Lempel-Ziv algorithm obtained by Jacquet and Szpankowski (1995). These findings have been established by analytical techniques of the precise analysis of algorithms. We give a brief survey of these results since they are interesting in their own right, and shed some light on the probabilistic behavior of pattern matching based data compression  相似文献   

12.
Upper and lower bounds on the capacity of a continuous-time additive white Gaussian noise (AWGN) channel with bilevel (±√P) input signals subjected to a minimum inter-transition time (Tmin) constraint are derived. The channel model and input constraints reflect basic features of certain magnetic recording systems. The upper bounds are based on Duncan's relation between the average mutual information in an AWGN regime and the mean-square error (MSE) of an optimal causal estimator. Evaluation or upper-bounding the MSE of suboptimal causal estimators yields the desired upper bounds. The lower bound is found by invoking the extended “Mrs. Gerber's” lemma and asymptotic properties of the entropy of max-entropic bipolar (d, k) codes. Asymptotic results indicate that at low SNR=PTmin/N0, with N0 designating the noise one-sided power spectral density, the capacity tends to P/N 0 nats per second (nats/s), i.e., it equals the capacity in the simplest average power limited case. At high SNR, the capacity in the simplest average power limited case. At high SNR, the capacity behaves asymptotically as Tmin-1ln(SNR/ln(SNR)) (nats/s), demonstrating the degradation relatively to Tavg -1 lnSNR, which is the asymptotic known behavior of the capacity with a bilevel average intertransition-time (Tavg) restricted channel input. Additional lower bounds are obtained by considering specific signaling formats such as pulsewidth modulation. The effect of mild channel filtering on the lower bounds on capacity is also addressed, and novel techniques to lower-bound the capacity in this case are introduced  相似文献   

13.
New converses in the theory of identification via channels   总被引:2,自引:0,他引:2  
New converses for identification via arbitrary single-user and multiple-access channels, with finite first- and second-type probabilities of error, are developed. For the arbitrary single-user channel, it is shown that (λ1, λ2)-identification capacity is upper-bounded by λ-capacity, and optimistic (λ12 )-identification capacity is upper-bounded by optimistic λ-capacity, for any λ>λ12. The bounds become tight at the limit of the vanishing probabilities of error, thus generalizing previous results by Han and Verdu (1992), who showed that the identification capacity is equal to transmission capacity for channels satisfying the strong converse of the channel coding theorem. A by-product of the new identification converses is a general formula for optimistic λ-capacity. An outer bound on the (λ1, λ2)-identification capacity region of an arbitrary multiple-access channel is developed. A consequence of this bound is that the identification capacity region is equal to the transmission capacity region for any stationary, finite-memory multiple-access channel. The key tool in proving these bounds is the partial resolvability of a channel, a new notion in resolvability theory, which deals with approximation of the output statistics on a suitably chosen part of the output alphabet. This notion of approximation enables us to get sharp bounds on identification for arbitrary channels, and to extend these bounds to the multiple-access channel  相似文献   

14.
Two universal lossy data compression schemes, one with fixed rate and the other with fixed distortion, are presented, based on the well-known Lempel-Ziv algorithm. In the case of fixed rate R, the universal lossy data compression scheme works as follows: first pick a codebook Bn consisting of all reproduction sequences of length n whose Lempel-Ziv codeword length is ⩽nR, and then use Bn to encode the entire source sequence n-block by n-block. This fixed-rate data compression scheme is universal in the sense that for any stationary, ergodic source or for any individual sequence, the sample distortion performance as n→∞ is given almost surely by the distortion rate function. A similar result is shown in the context of fixed distortion lossy source coding  相似文献   

15.
We analyze the cycle-slipping behavior of the Mth-power nondecision-aided (NDA) feedforward carrier synchronizer for M-ary phase-shift keying (MPSK). The averaging filter is a block window accumulator, and postprocessing is used to eliminate equivocation. An asymptotic (high Es/No) expression for the cycle-slip probability is derived and verified by computer simulation. A “hybrid” approach, combining analytical results with importance sampling simulation techniques, allows for the fast and accurate determination of the cycle-slip probability for all Es /No values of practical interest  相似文献   

16.
The sliding-window version of the Lempel-Ziv data-compression algorithm (sometimes called LZ '77) has been thrust into prominence recently. A version of this algorithm is used in the highly successful “Stacker” program for personal computers. If is also incorporated into Microsoft's new MS-DOS-6. Although other versions of the Lempel-Ziv algorithm are known to he optimal in the sense that they compress a data source to its entropy, optimality in this sense has never been demonstrated for this version. In this self-contained paper, we will describe the algorithm, and show that as the “window size,“ a quantity which is related to the memory and complexity of the procedure, goes to infinity, the compression rate approaches the source entropy. The proof is surprisingly general, applying to all finite-alphabet stationary ergodic sources  相似文献   

17.
This work describes two algorithms designed for remote calibration of an Nc-element active phased-array antenna. These algorithms involve transmission of N⩾Nc time multiplexed orthogonal encoded signals. The received signals are coherently detected, accumulated in vector forms, and decoded with the inverse of the orthogonal encoding matrix. The unitary transform encoding (UTE) algorithm is most suited for digital beamforming as it requires additional encoding hardware for an analog implementation. The control circuit encoding (CCE) algorithm is ideally suited for analog beamformers as it requires no additional encoding hardware. The CCE method encodes phased-array elemental signals using a Hadamard matrix to control the switching of intrinsic phase shifter delay circuits. The UTE and CCE algorithms can reduce the average measurement integration times for the complete set of calibration parameters by ~Nc relative to the corresponding values for single-element calibration procedures. This is significant for satellite systems as calibration must be performed in a short enough time window that the process can be treated as being stationary. Proofs are given that the orthogonal codes satisfy the mathematical lower bounds for the asymptotic forms of calibration parameter estimation variances  相似文献   

18.
In a causal source coding system, the reconstruction of the present source sample is restricted to be a function of the present and past source samples, while the code stream itself may be noncausal and have variable rate. Neuhoff and Gilbert showed that for memoryless sources, optimum performance among all causal source codes is achieved by time-sharing at most two memoryless codes (quantizers) followed by entropy coding. In this work, we extend Neuhoff and Gilbert's result in the limit of small distortion (high resolution) to two new settings. First, we show that at high resolution, an optimal causal code for a stationary source with finite differential entropy rate consists of a uniform quantizer followed by a (sequence) entropy coder. This implies that the price of causality at high resolution is approximately 0.254 bit, i.e., the space-filling loss of the uniform quantizer. Then, we consider individual sequences and introduce a deterministic analogue of differential entropy, which we call "Lempel-Ziv differential entropy." We show that for any bounded individual sequence with finite Lempel-Ziv differential entropy, optimum high-resolution performance among all finite-memory variable-rate causal codes is achieved by dithered scalar uniform quantization followed by Lempel-Ziv coding. As a by-product, we also prove an individual-sequence version of the Shannon lower bound.  相似文献   

19.
Irregular sampling theorems for wavelet subspaces   总被引:2,自引:0,他引:2  
From the Paley-Wiener 1/4-theorem, the finite energy signal f(t) can be reconstructed from its irregularly sampled values f(k+δκ) if f(t) is band-limited and supκκ|<1/4. We consider the signals in wavelet subspaces and wish to recover the signals from its irregular samples by using scaling functions. Then the method of estimating the upper bound of supκκ | such that the irregularly sampled signals can be recovered is very important. Following the work done by Liu and Walter (see J. Fourier Anal. Appl., vol.2, no.2, p.181-9, 1995), we present an algorithm which can estimate a proper upper bound of supκ κ|. Compared to Paley-Wiener 1/4-theorem, this theorem can relax the upper bound for sampling in some wavelet subspaces  相似文献   

20.
We present capacity achieving multilevel run-length-limited (ML-RLL) codes that can be decoded by a sliding window of size $2$.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号