首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
A string dictionary is a basic tool for storing a set of strings in many kinds of applications. Recently, many applications need space-efficient dictionaries to handle very large datasets. In this paper, we propose new compressed string dictionaries using improved double-array tries. The double-array trie is a data structure that can implement a string dictionary supporting extremely fast lookup of strings, but its space efficiency is low. We introduce approaches for improving the disadvantage. From experimental evaluations, our dictionaries can provide the fastest lookup compared to state-of-the-art compressed string dictionaries. Moreover, the space efficiency is competitive in many cases.  相似文献   

2.
Suffix trees are among the most important data structures in stringology, with a number of applications in flourishing areas like bioinformatics. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. A smaller suffix tree representation could fit in a faster memory, outweighing by far the theoretical slowdown brought by the space reduction. We present a novel compressed suffix tree, which is the first achieving at the same time sublogarithmic complexity for the operations, and space usage that asymptotically goes to zero as the entropy of the text does. The main ideas in our development are compressing the longest common prefix information, totally getting rid of the suffix tree topology, and expressing all the suffix tree operations using range minimum queries and a novel primitive called next/previous smaller value in a sequence. Our solutions to those operations are of independent interest.  相似文献   

3.
Sparse representation and Dictionary learning have attracted a lot of research attention in the last couple of decades and have provided state of the art results in many fields such as denoising, classification, inpainting and compression. However, applying general dictionary learning such as Method of Optimal Directions and Recursive Least Squares Dictionary Learning Algorithm can be computationally expensive, due to the large amount of free variables to be learned. Also sometimes the signal class has obvious repetitive structure which could benefit from a structured dictionary. One way to deal with these shortcomings is to impose a structure on the dictionary itself, for example the dictionary can be sparse or the atoms can be shift-invariant. In practice, imposing a structure means limiting the number of free variables. There are many examples of structured dictionaries such as double sparsity model or shift-invariant dictionaries. We have recently proposed a closed form solution to impose arbitrary structures onto a dictionary, called Flexible Structure Dictionary Learning. In this paper, we use this method to impose shift-invariant structure when training a dictionary. This structure allows us to not only simplify the original solution and make it computationally feasible to be used for large signals but also extend the concept of shift-invariance to include variable sized shifts in different atoms. The proposed dictionary update step finds all the free variables in all the atoms jointly, whereas some shift-invariant structured dictionaries in the recent literature, update one atom at a time. We have compared our proposed method with a general dictionary learning method and another shift-invariant method. Results show that signal approximation can be a promising application.  相似文献   

4.
Computing sparse representation (SR) over an exemplar dictionary is time consuming and computationally expensive for large dictionary size. This also requires huge memory requirement for saving the dictionary. In order to reduce the latency and to achieve some diversity, ensemble of exemplar dictionary based language identification (LID) system is explored. The full diversity can be obtained if each of the exemplar dictionary contains only one feature vector from each of the language class. To achieve full diversity, a large number of multiple dictionaries are required; thus needs to compute SR for a particular test utterance as many times. The other solution to reduce the latency is to use a learned dictionary. The dictionary may contain unequal number of dictionary atoms and it is not guaranteed that each language class information is present. It totally depends upon the number of data and its variations. Motivated by this, language specific dictionary is learned, and then concatenated to form a single learned dictionary. Furthermore, to overcome the problem of ensemble exemplar dictionary based LID system, we investigated the ensemble of learned-exemplar dictionary based LID system. The proposed approach achieves the same diversity and latency as that of ensemble exemplar dictionary with reduced number of learned dictionaries. The proposed techniques are applied on two spoken utterance representations: the i-vector and the JFA latent vector. The experiments are performed on 2007 NIST LRE, 2009 NIST LRE and AP17-OLR datasets in closed set condition.  相似文献   

5.
Sparse coding has recently been a hot topic in visual tasks in image processing and computer vision. It has applications and brings benefits in reconstruction-like tasks and in classification-like tasks as well. However, regarding binary classification problems, there are several choices to learn and use dictionaries that have not been studied. In particular, how single-dictionary and dual-dictionary approaches compare in terms of classification performance is largely unexplored. We compare three single-dictionary strategies and two dual-dictionary strategies for the problem of pedestrian classification (“pedestrian” vs “background” images). In each of these five cases, images are represented as the sparse coefficients induced from the respective dictionaries, and these coefficients are the input to a regular classifier both for training and subsequent classification of novel unseen instances. Experimental results with the INRIA pedestrian dataset suggest, on the one hand, that dictionaries learned from only one of the classes, even from the background class, are enough for obtaining competitive good classification performance. On the other hand, while better performance is generally obtained when instances of both classes are used for dictionary learning, the representation induced by a single dictionary learned from a set of instances from both classes provides comparable or even superior performance over the representations induced by two dictionaries learned separately from the pedestrian and background classes.  相似文献   

6.
7.
Face recognition has attracted extensive interests due to its wide applications. However, there are many challenges in the real world scenario. For example, relatively few samples are available for training. Face images collected from surveillance cameras may consist of complex variations (e.g. illumination, expression, occlusion and pose). To address these challenges, in this paper we propose learning class-specific and intra-class variation dictionaries separately. Specifically, we first develop a discriminative class-specific dictionary amplifying the differences between training classes. We impose a constraint on sparse coefficients, which guarantees the sparse representation coefficients having small within-class scatter and large between-class scatter. Moreover, we introduce a new intra-class variation dictionary based on the assumption that similar variations from different classes may share some common features. The intra-class variation dictionary not only captures the inner-relationship of variations, but also addresses the limitation of the manually designed dictionaries that are person-specific. Finally, we apply the combined dictionary to adaptively represent face images. Experiments conducted on the AR, CMU-PIE, FERET and Extended Yale B databases show the effectiveness of the proposed method.  相似文献   

8.
The representation of large subsets of the World Wide Web in the form of a directed graph has been extensively used to analyze structure, behavior, and evolution of those so-called Web graphs. However, interesting Web graphs are very large and their classical representations do not fit into the main memory of typical computers, whereas the required graph algorithms perform inefficiently on secondary memory. Compressed graph representations drastically reduce their space requirements while allowing their efficient navigation in compressed form. While the most basic navigation operation is to retrieve the successors of a node, several important Web graph algorithms require support for extended queries, such as finding the predecessors of a node, checking the presence of a link, or retrieving links between ranges of nodes. Those are seldom supported by compressed graph representations.This paper presents the k2-tree, a novel Web graph representation based on a compact tree structure that takes advantage of large empty areas of the adjacency matrix of the graph. The representation not only retrieves successors and predecessors in symmetric fashion, but also it is particularly efficient to check for specific links between nodes, or between ranges of nodes, or to list the links between ranges. Compared to the best representations in the literature supporting successor and predecessor queries, our technique offers the least space usage (1–3 bits per link) while supporting fast navigation to predecessors and successors (28μs per neighbor retrieved) and sharply outperforming the others on the extended queries. The representation is also of general interest and can be used to compress other kinds of graphs and data structures.  相似文献   

9.
10.
Dictionary learning algorithms for sparse representation   总被引:11,自引:0,他引:11  
  相似文献   

11.
The dictionary matching problem seeks all locations in a given text that match any of the patterns in a given dictionary. Efficient algorithms for dictionary matching scan the text once, searching for all patterns simultaneously. Existing algorithms that solve the 2-dimensional dictionary matching problem all require working space proportional to the size of the dictionary. This paper presents the first efficient 2-dimensional dictionary matching algorithm that operates in small space. Given d patterns, D={P 1,…,P d }, each of size m×m, and a text T of size n×n, our algorithm finds all occurrences of P i , 1≤id, in T. The preprocessing of the dictionary forms a compressed self-index of the patterns, after which the original dictionary may be discarded. Our algorithm uses O(dmlogdm) extra bits of space. The time complexity of our algorithm is close to linear, O(dm 2+n 2 τlogσ), where τ is the time it takes to access a character in the compressed self-index and σ is the size of the alphabet. Using recent results τ is at most sub-logarithmic.  相似文献   

12.
为了提高Symbian S60数据库中文本数据存储的效率,同时使数据库应用具有良好的响应速度,在研究该类数据库的特点和"字典码"压缩算法的基础上,提出通过提取隐含在"字典码"压缩算法压缩的文件中的字典并独立存储和维护,实现对数据库记录级的文本压缩。该方法只有在用户用到数据库记录数据时,相应记录中被压缩的数据才被解压缩,因此数据库的响应速度快,内存占用也更少,开始运行软件时数据库加载也更快。该方法在数据记录短,文本数据量大的数据库应用中更具有优势。  相似文献   

13.
Positional ranking functions, widely used in web search engines and related search systems, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time–space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether positional data should be indexed, and how.We show that there is a wide range of practical time–space trade-offs. Moreover, we show that using about 1.30 times the space of positional data, we can store everything needed for efficient query processing, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.  相似文献   

14.
王成语  李伟红 《计算机应用》2011,31(8):2115-2118
基于超完备字典的人脸稀疏表示方法的难点是其字典构成。针对此问题,首先采用双密度双树复小波变换(DD-DT CWT)提取人脸图像不同尺度的高频子带,然后根据能量平均分布最大原则选择能量较大的部分子带构成对应尺度的超完备字典。同时,将测试样本相应的人脸DD-DT CWT子带特征看成超完备字典中原子的线性组合,并组合多字典上的稀疏表示进行识别。在AR人脸图像库上进行了实验,结果表明该方法是一种有效的人脸特征表示及分类方法。  相似文献   

15.
从压缩传感到低秩矩阵恢复: 理论与应用   总被引:11,自引:0,他引:11  
综述了压缩传感、矩阵秩最小化和低秩矩阵恢复等方面的基础理论及典型应用. 基于凸优化的压缩传感及由其衍生的矩阵秩最小化和低秩矩阵恢复是近年来的研究热点,在信号处理、 推荐系统、高维数据分析、图像处理、计算机视觉等很多研究领域具有重要和成功的应用. 在这些实际的应用中,往往涉及到对高维数据的分析与处理,需要充分和合理利用数据中的如稀疏性或其所构成矩阵的低秩性等性质. 尽管在最坏情况下,最小化诸如稀疏性或矩阵秩这样的目标函数是 NP 难的,但是在某些合理的假设条件下,通过优化目标函数的凸松弛替代函数, 采用凸优化的方法,能够精确地给出原问题的最优解. 有很多高效的凸优化算法对之进行求解且适用于大规模问题.本文首先分别综述了压缩传感、 矩阵秩最小化和低秩矩阵恢复的相关基础理论,然后对其在图像处理、计算机视觉和计算摄像学等领域的典型应用予以举例介绍,并展望了相关领域未来的研究工作.  相似文献   

16.
Super resolution (SR) of remote sensing images is significant for improving accuracy of target identification and for image fusing.Conventional fusion-based methods inevitably result in distortion of spectral information,a feasible solution to the problem is the single-image based super resolution.In this work,we proposed a single-image based approach to super resolution of multiband remote sensing images.The method combines the EMD (Empirical Mode Decomposition),compressed sensing and PCA to dictionary learning and super resolution reconstruction of remote sensing color image.First,the original image is decomposed into a series of IMFs(Intrinsic Mode Function) according to their frequency component by using EMD,and the super resolution is implemented only on IMF1,which includes high-frequency component;then the K-SVD algorithm is used to learn and obtain overcomplete dictionaries,and the MOP (Orthogonal Matching Pursuit) algorithm is used to reconstruct the IMF1;Finally,the up-scaled IMF1 is combined with other IMFs to acquire the super resolution of original image.For a multiband image reconstruction,a PCA transform is first implemented on multiband image,and the PC1 is adopted for learning to get overcomplete dictionaries,the obtained dictionaries is then used to super-resolution reconstruction of each multi-spectral band.The Geoeye-1 panchromatic and multi-spectral images are used as experimental data to demonstrate the effectiveness of the proposed algorithm.The results show that the proposed method is workable to exhibit the detail within the images.  相似文献   

17.
High resolution (HR) infrared (IR) images play an important role in many areas. However, it is difficult to obtain images at a desired resolution level because of the limitation of hardware and image environment. Therefore, improving the spatial resolution of infrared images has become more and more urgent. Methods based on sparse coding have been successfully used in single-image super-resolution (SR) reconstruction. However, the existing sparse representation-based SR method for infrared (IR) images usually encounter three problems. First, IR images always lack detailed information, which leads to unsatisfying IR image reconstruction results with conventional method. Second, the existing dictionary learning methods in SR aim at learning a universal and over-complete dictionary to represent various image structures. A large number of different structural patterns exist in an image, whereas one dictionary is not capable of capturing all of the different structures. Finally, the optimization for dictionary learning and image reconstruction requires a highly intensive computation, which restricts the practical application in real-time systems. To overcome these problems, we propose a fast IR image SR scheme. Firstly, we integrate the information from visible (VI) images and IR images to improve the resolution of IR images because images acquired by different sensors provide complementary information for the same scene. Second, we divide the training patches into several clusters, then the multiple dictionaries are learned for each cluster in order to provide each patch with a more accurate dictionary. Finally, we propose an method of Soft-assignment based Multiple Regression (SMR). SMR reconstructs the high resolution (HR) patch by the dictionaries corresponding to its K nearest training patch clusters. The method has a low level of computational complexity and may be readily suitable for real-time processing applications. Numerous experiments validate that this scheme brings better results in terms of quantization and visual perception than many state-of-the-art methods, while at the same time maintains a relatively low level of time complexity. Since the main computation of this scheme is matrix multiplication, it will be easily implemented in FPGA system.  相似文献   

18.
A class dictionary defines all data structures that appear in a program as well as a language for describing data specified by the data structures. We demonstrate that class dictionaries are ideal for simplifying object-oriented programming. Our class dictionary-based approach to object-oriented programming is independent of any particular programming language, so it is applicable to a large variety of object-oriented systems. The experience in designing and using over one hundred class dictionaries has resulted in a set of useful design techniques. This novel approach to object-oriented programming makes interesting links between language design, data structure design, and data-base design.  相似文献   

19.
Double‐array structures have been widely used to implement dictionaries with string keys. Although the space efficiency of dynamic double‐array dictionaries tends to decrease with key updates, we can still maintain high efficiency using existing methods. However, these methods have practical problems of time and functionality. This paper presents several efficient rearrangement methods to solve these problems. Through experiments using real‐world datasets, we demonstrate that the proposed rearrangement methods are much more practical than existing methods.  相似文献   

20.
In machine translation, collocation dictionaries are important for selecting accurate target words. However, if the dictionary size is too large it can decrease the efficiency of translation. This paper presents a method to develop a compact collocation dictionary for transitive verb–object pairs in English–Korean machine translation without losing translation accuracy. We use WordNet to calculate the semantic distance between words, and k-nearestneighbor learning to select the translations. The entries in the dictionary are minimized to balance the trade-off between translation accuracy and time. We have performed several experiments on a selected set of verbs extracted from a raw corpus of over 3 million words. The results show that in real-time translation environments the size of a collocation dictionary can be reduced up to 40% of its original size without significant decrease in its accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号