首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This work concisely reviews and unifies the analysis of different variants of neural associative networks consisting of binary neurons and synapses (Willshaw model). We compute storage capacity, fault tolerance, and retrieval efficiency and point out problems of the classical Willshaw model such as limited fault tolerance and restriction to logarithmically sparse random patterns. Then we suggest possible solutions employing spiking neurons, compression of the memory structures, and additional cell layers. Finally, we discuss from a technical perspective whether distributed neural associative memories have any practical advantage over localized storage, e.g., in compressed look-up tables.  相似文献   

2.
Research in robust data structures can be done both by theoretical analysis of properties of abstract implementations and by empirical study of real implementations. Empirical study requires a support environment for the actual implementation. In particular, if the response of the implementation to errors is being studied, a mechanism must exist for artificially injecting appropriate kinds of errors. This paper discusses techniques used in empirical investigations of data structure robustness, with particular reference to tools developed for this purpose at the University of Waterloo.  相似文献   

3.
Text retrieval systems require an index to allow efficient retrieval of documents at the cost of some storage overhead. This paper proposes a novel full-text indexing model for Chinese text retrieval based on the concept of adjacency matrix of directed graph. Using this indexing model, on one hand, retrieval systems need to keep only the indexing data, instead of the indexing data and the original text data as the traditional retrieval systems always do. On the other hand, occurrences of index term are identified by labels of the so-called s-strings where the index term appears, rather than by its positions as in traditional indexing models. Consequently, system space cost as a whole can be reduced drastically while retrieval efficiency is maintained satisfactory. Experiments over several real-world Chinese text collections are carried out to demonstrate the effectiveness and efficiency of this model. In addition to Chinese, The proposed indexing model is also effective and efficient for text retrieval of other Oriental languages, such as Japanese and Korean. It is especially useful for digital library application areas where storage resource is very limited (e.g., e-books and CD-based text retrieval systems).  相似文献   

4.
S.  K.  K.  C.  Y.   《Data & Knowledge Engineering》2008,67(3):362-380
We present a set of time-efficient approaches to index objects moving on the plane to efficiently answer range queries about their future positions. Our algorithms are based on previously described solutions as well as on the employment of efficient access methods. Finally, an experimental evaluation is included that shows the performance, scalability and efficiency of our methods.  相似文献   

5.
Query expansion methods have been extensively studied in information retrieval. This paper proposes a query expansion method. The HQE method employs a combination of ontology-based collaborative filtering and neural networks to improve query expansion. In the HQE method, ontology-based collaborative filtering is used to analyze semantic relationships in order to find the similar users, and the radial basis function (RBF) networks are used to acquire the most relevant web documents and their corresponding terms from these similar users’ queries. The method can improve the precision and only requires users to provide less query information at the beginning than traditional collaborative filtering methods.  相似文献   

6.
This paper propsed a novel text representation and matching scheme for Chinese text retrieval.At present,the indexing methods of chinese retrieval systems are either character-based or word-based.The character-based indexing methods,such as bi-gram or tri-gram indexing,have high false drops due to the mismatches between queries and documents.On the other hand,it‘s difficult to efficiently identify all the proper nouns,terminology of different domains,and phrases in the word-based indexing systems.The new indexing method uses both proximity and mutual information of the word paris to represent the text content so as to overcome the high false drop,new word and phrase problems that exist in the character-based and word-based systems.The evaluation results indicate that the average query precision of proximity-based indexing is 5.2% higher than the best results of TREC-5.  相似文献   

7.
8.
9.
In this paper we present a concise O(n) implementation of Cleary's algorithm for generating a sequence of restricted rotations between any two binary trees. The algorithm is described directly in terms of the binary trees, without using any intermediate representation.  相似文献   

10.
11.
The diffusion of the World Wide Web (WWW) and the consequent increase in the production and exchange of textual information demand the development of effective information retrieval systems. The HyperText Markup Language (HTML) constitues a common basis for generating documents over the internet and the intranets. By means of the HTML the author is allowed to organize the text into subparts delimited by special tags; these subparts are then visualized by the HTML browser in distinct ways, i.e. with distinct typographical formats. In this paper a model for indexing HTML documents is proposed which exploits the role of tags in encoding the importance of their delimited text. Central to our model is a method to compute the significance degree of a term in a document by weighting the term instances according to the tags in which they occur. The indexing model proposed is based on a contextual weighted representation of the document under consideration, by means of which a set of (normalized) numerical weights is assigned to the various tags containing the text. The weighted representation is contextual in the sense that the set of numerical weights assigned to the various tags and the respective text depend (other than on the tags themselves) on the particular document considered. By means of the contextual weighted representation our indexing model reflects not only the general syntactic structure of the HTML language but also the information conveyed by the particular way in which the author instantiates that general structure in the document under consideration. We discuss two different forms of contextual weighting: the first is based on a linear weighted representation and is closer to the standard model of universal (i.e. non contextual) weighting; the second is based on a more complex non linear weighted representation and has a number of novel and interesting features.  相似文献   

12.
In this paper, a new algorithm for content-based image indexing and retrieval is presented. The proposed method is based on a combination of multiresolution image decomposition and color correlation histogram. According to the new algorithm, wavelet coefficients of the image are computed first using a directional wavelet transform such as Gabor wavelets. A quantization step is then applied before computing one-directional autocorrelograms of the wavelet coefficients. Finally, index vectors are constructed using these one-directional wavelet correlograms. The retrieval results obtained by application of our new method on a 1000 image database demonstrated a significant improvement in effectiveness and efficiency compared to the indexing and retrieval methods based on image color correlogram or wavelet transform.  相似文献   

13.
Speed and storage capacity are very important issues in information retrieval system. In natural language analysis, double array is a well-known data structure to implement the trie, which is a widely used approach to retrieve strings in a dictionary. Moreover, double array helps for fast access in a matrix table with compactness of a list form. In order to realize quite compact structure for information retrieval, this paper presents a compression method by dividing the trie constructed into several pieces (pages). This compression method enables us to reduce the number of bits representing entries of the double array. The obtained trie must trace to the pages that cause slow retrieval time, because of a state connection. To solve this problem, this paper proposes a new trie construction method to compress and minimize the number of state connections. Experimental results applying for a large set of keys show that the storage capacity has been reduced to 50%. Moreover, our new approach has the same retrieval speed as the old one.  相似文献   

14.
Reliability is an important research topic in distributed computing systems consisting of a large number of processors. To achieve reliability, the fault-tolerance scheme of the distributed computing system must be revised. This kind of problem is known as a Byzantine agreement (BA) problem. It requires all fault-free processors to agree on a common value, even if some components are corrupt. Consequently, there have been significant studies of this agreement problem in distributed systems. However, the traditional BA protocols focus on running ⌊(n−1)/3⌋+1 rounds of message exchange continuously to make each fault-free processor reach an agreement. In other words, since having a large number of messages results in a large protocol overhead, those protocols are inefficient and unreasonable, especially for some network environments which have large number of processors. In this study, we propose a novel and efficient protocol to reduce the number of messages. Our protocol can collect, compare and replace the received values to find the reliable processors and replace the values sent by the unreliable processors. Subsequently, each processor can agree on a common value through three rounds of message exchange. Furthermore, the proposed protocol can use the minimum number of messages to tolerate the maximum number of faulty components in a distributed system.  相似文献   

15.
Data fusion in information retrieval has been investigated by many researchers and a number of data fusion methods have been proposed. However, problems such as why data fusion can increase effectiveness and favorable conditions for the use of data fusion methods are poorly resolved at best. In this paper, we formally describe data fusion under a geometric framework, in which each component result returned from an information retrieval system for a given query is represented as a point in a multi-dimensional space. The Euclidean distance is the measure by which the effectiveness and similarity of search results are judged. This allows us to explain all component results and fused results using geometrical principles. In such a framework, score-based data fusion becomes a deterministic problem. Several interesting features of the centroid-based data fusion method and the linear combination method are discussed. Nevertheless, in retrieval evaluation, ranking-based measures are the most popular. Therefore, this paper investigates the relation and correlation between the Euclidean distance and several typical ranking-based measures. We indeed find that a very strong correlation exists between these. It means that the theorems and observations obtained using the Euclidean distance remain valid when ranking-based measures are used. The proposed framework enables us to have a better understanding of score-based data fusion and use score-based data fusion methods more precisely and effectively in various ways.  相似文献   

16.
17.
Storing and querying high-dimensional data are important problems in designing an information retrieval system. Two crucial issues, time and space efficiencies, must be considered when evaluating the performance of such a system. The KDB-tree and its variants have been reported to have good performance by using them as the index structure for retrieving multidimensional data. However, they all suffer from low storage utilization problem caused by imperfect “splitting policies.” Unnecessary splits increase the size of the index structure and deteriorate the performance of the system. In this paper, a new data insertion algorithm with a better splitting policy was proposed, which arranges data entries in the leaf nodes as many as possible. Our new index scheme can increase the storage utilization up to nearly 100% and reduce the index size to a smaller scale. As a result, both time and space efficiencies are significantly improved. Analytical and experimental results show that our indexing method outperforms the traditional KDB-tree and its variants.  相似文献   

18.
Doing exhaustive relevance judgments is one of the most challenging tasks in the construction process of an IR test collection, especially when the collection is composed of millions of documents. Pooling (or system pooling), which is basically a method for selecting documents to assess, is a solution to overcome this challenge. In this paper, to form such an assessment pool, a new, ranked-based document selection criterion, called the expected level of importance (ELI), is introduced. The results of the experiments performed, using TREC 5, 6, 7, and 8 data, showed that by using a pool in which the documents are sorted in the decreasing order of their calculated ELI scores, relevance judgments can efficiently be made by minimal human effort, while maintaining the size and the effectiveness of the resulting test collection. The criterion we propose can directly be adapted to the traditional TREC pooling practice in favor of efficiency, with no additional cost.  相似文献   

19.
20.
We propose a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. In contrast to piecewise constant approximation (PCA) techniques that approximate each time series with constant value segments, the proposed method--Piecewise Vector Quantized Approximation--uses the closest (based on a distance measure) codeword from a codebook of key-sequences to represent each segment. The new representation is symbolic and it allows for the application of text-based retrieval techniques into time series similarity analysis. Experiments on real and simulated datasets show that the proposed technique generally outperforms PCA techniques in clustering and similarity searches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号