首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new dual wing harmonium model that integrates term frequency features and term connection features into a low dimensional semantic space without increase of computation load is proposed for the application of document retrieval. Terms and vectorized graph connectionists are extracted from the graph representation of document by employing weighted feature extraction method. We then develop a new dual wing harmonium model projecting these multiple features into low dimensional latent topics with different probability distributions assumption. Contrastive divergence algorithm is used for efficient learning and inference. We perform extensive experimental verification, and the comparative results suggest that the proposed method is accurate and computationally efficient for document retrieval.  相似文献   

2.
A new approach for content-based image retrieval (CBIR) is described. In this study, a tree-structured image representation together with a multi-layer self-organizing map (MLSOM) is proposed for efficient image retrieval. In the proposed tree-structured image representation, a root node contains the global features, while child nodes contain the local region-based features. This approach hierarchically integrates more information of image contents to achieve better retrieval accuracy compared with global and region features individually. MLSOM in the proposed method provides effective compression and organization of tree-structured image data. This enables the retrieval system to operate at a much faster rate than that of directly comparing query images with all images in databases. The proposed method also adopts a relevance feedback scheme to improve the retrieval accuracy by a respectable level. Our obtained results indicate that the proposed image retrieval system is robust against different types of image alterations. Comparative results corroborate that the proposed CBIR system is promising in terms of accuracy, speed and robustness.  相似文献   

3.
This paper presents a multi-level matching method for document retrieval (DR) using a hybrid document similarity. Documents are represented by multi-level structure including document level and paragraph level. This multi-level-structured representation is designed to model underlying semantics in a more flexible and accurate way that the conventional flat term histograms find it hard to cope with. The matching between documents is then transformed into an optimization problem with Earth Mover’s Distance (EMD). A hybrid similarity is used to synthesize the global and local semantics in documents to improve the retrieval accuracy. In this paper, we have performed extensive experimental study and verification. The results suggest that the proposed method works well for lengthy documents with evident spatial distributions of terms.  相似文献   

4.
5.
For querying structured and semistructured data, data retrieval and document retrieval are two valuable and complementary techniques that have not yet been fully integrated. In this paper, we introduce integrated information retrieval (IIR), an XML-based retrieval approach that closes this gap. We introduce the syntax and semantics of an extension of the XQuery language called XQuery/IR. The extended language realizes IIR and thereby allows users to formulate new kinds of queries by nesting ranked document retrieval and precise data retrieval queries. Furthermore, we detail index structures and efficient query processing approaches for implementing XQuery/IR. Based on a new identification scheme for nodes in node-labeled tree structures, the extended index structures require only a fraction of the space of comparable index structures that only support data retrieval.  相似文献   

6.
提出了基于人工智能框架知识表示的构件描述方法,以解决构件描述、分类、检索等构件复庸丶晕侍?并利用框架表示的推理特性,建立了基于规则推理和功能粒度的构件搜索匹配算法,提高构件搜索效率和准确性.  相似文献   

7.
D.  Y.  B.  J. -M. 《Data & Knowledge Engineering》2003,46(3):345-375
The main contribution of this paper is to lay down a conceptual framework for document semantics modeling. This framework provides a generic graphical knowledge representation model based on Sowa’s conceptual structures. Modeling primitives are introduced to represent factual and ontological knowledge that can be expressed in electronic documents. Binding features are proposed so as to keep knowledge representation and knowledge formulation linked together.

This framework may be applied to various domains and may accept, for this purpose, many different ontological extensions. Thus an extension is provided so as to properly handle the particular kind of knowledge encountered in the legal domain.  相似文献   


8.
We have developed a novel system for content-based image retrieval in large, unannotated databases. The system is called PicSOM, and it is based on tree structured self-organizing maps (TS-SOMs). Given a set of reference images, PicSOM is able to retrieve another set of images which are similar to the given ones. Each TS-SOM is formed with a different image feature representation like color, texture, or shape. A new technique introduced in PicSOM facilitates automatic combination of responses from multiple TS-SOMs and their hierarchical levels. This mechanism adapts to the user's preferences in selecting which images resemble each other. Thus, the mechanism implements a relevance feedback technique on content-based image retrieval. The image queries are performed through the World Wide Web and the queries are iteratively refined as the system exposes more images to the user.  相似文献   

9.
Consumers’ purchasing behavior has obviously changed in recent years with developments in social economics. This change has been evident in the decreased ratio of planned purchases but not in the increase of planned (or spontaneous) purchases. This act of spontaneous or otherwise unplanned purchasing is called “impulse buying”. However, buying under these conditions costs more money always comes with negative responses, such as complaints and regret. Therefore, we propose and have designed a new merchandise recommendation system, the Mobile Merchandise Evaluation Service Platform (MMESP). This is a three-tier system composed of Real-time Merchandise Identifying System (RMIS), Real-time Merchandise Evaluation System (RMES), and Real-time Merchandise Recommendation System (RMRS). With this system, Mobile Users (MUs) take pictures of merchandise and send them to MMESP, RMIS integrates Region Adjacency Graph (RAG) and Self-Organizing Maps (SOM) to gather information on the merchandise through those photographs, and. RMES and RMRS provide Intelligence Agents (IAs) and Multiple Document Summarization (MDS) to summarize recommendations on merchandise for MUs, all in real time.  相似文献   

10.
11.
A new differential LSI space-based probabilistic document classifier   总被引:1,自引:0,他引:1  
We have developed a new effective probabilistic classifier for document classification by introducing the concept of differential document vectors and DLSI (differential latent semantic indexing) spaces. A combined use of the projections on and the distances to the DLSI spaces introduced from the differential document vectors improves the adaptability of the LSI (latent semantic indexing) method by capturing unique characteristics of documents. Using the intra- and extra-document statistics, both a simple posteriori calculation on a small example and an experiment on a large Reuters-21578 database demonstrate the advantage of the DLSI space-based probabilistic classifier over the LSI space-based classifier in classification performance.  相似文献   

12.
The list of documents returned by Internet search engines in response to a query these days can be quite overwhelming. There is an increasing need for organising this information and presenting it in a more compact and efficient manner. This paper describes a method developed for the automatic clustering of World Wide Web documents, according to their relevance to the user’s information needs, by using a hybrid neural network. The objective is to reduce the time and effort the user has to spend to find the information sought after. Clustering documents by features representative of their contents—in this case, key words and phrases—increases the effectiveness and efficiency of the search process. It is shown that a two-dimensional visual presentation of information on retrieved documents, instead of the traditional linear listing, can create a more user-friendly interface between a search engine and the user.  相似文献   

13.
14.
哼唱的随意性和音乐特征提取算法误差都会影响基于哼唱的音乐检索系统的性能。针对上述问题,利用元音帧检测获得较为精确的音符边界,实现音符分割;对分割后的音符提取相对音高和音长,实现符号描述;最后将哼唱片段中音高和音长最值点周围的符号描述作为特征与数据库中的数据进行匹配,得到最相似的候选音乐。实验表明该方法对未经训练的哼唱者的首位匹配正确率达到70%以上,匹配速度也大大优于传统方法,检索性能基本达到了实际应用的需求。  相似文献   

15.
Text categorization plays an important role in applications where information is filtered, monitored, personalized, categorized, organized or searched. Feature selection remains as an effective and efficient technique in text categorization. Feature selection metrics are commonly based on term frequency or document frequency of a word. We focus on relative importance of these frequencies for feature selection metrics. The document frequency based metrics of discriminative power measure and GINI index were examined with term frequency for this purpose. The metrics were compared and analyzed on Reuters 21,578 dataset. Experimental results revealed that the term frequency based metrics may be useful especially for smaller feature sets. Two characteristics of term frequency based metrics were observed by analyzing the scatter of features among classes and the rate at which information in data was covered. These characteristics may contribute toward their superior performance for smaller feature sets.  相似文献   

16.
In certain bilingual and multi‐lingual societies, translated legal documents are as important as the original legal documents because they have the same legal status as the originals. However, there is little reported work on the retrieval and management of bilingual legal documents. We describe the design and development of a bilingual document retrieval and management prototype, called ELDoS, which is used by court interpreters and judges from the Hong Kong Judiciary. Since the speed of retrieval is a major concern for user acceptance, and therefore for widespread deployment of the system, the architecture of the prototype is designed to balance the workload of the client and server. Extensible Markup Language (XML) is used to mark up the bilingual legal documents for a variety of document retrieval and management tasks. XML enables the use of XML Stylesheet Language Transformation (XSLT) to align bilingual data in the client, instead of the server, and improve alignment speed linearly with respect to the size of the document, using a high‐end PC, when the server has no concurrent access. The design of the interface was continually improved after extensive consultation with court interpreters and after the user acceptance tests. In our evaluation, the facilities for highlighting translated terms have a macro‐averaged precision of 90+% and a macro‐average recall of 80+%, which were considered acceptable by our users. We believe that the experience in the design and development of this prototype is applicable to other language pairs as well as to other domains. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

17.
针对医学图像检索中相似性表达的自身困难,以及噪声影响的问题,提出一种通过张量积图进行扩散,利用其他数据点的上下信息改进基于纹理元的成对相似性度量的方法。首先,采用纹理元的统计方法进行医学图像特征描述和提取,并通过对纹理元相似性加权,得到图像的成对相似性;然后,利用张量积图沿着数据点的内在流形进行相似性的传播,实现全局的相似性度量。在ImageCLEFmed 2009上的实验结果表明,该算法与基于Gabor的检索算法相比,其类平均精度提高了32%,与基于尺度不变特征转换(SIFT)的检索算法相比,其类平均精度提高了19%,能良好地应用于医学图像检索。  相似文献   

18.
This research extends text mining and information retrieval research to the digital forensic text string search process. Specifically, we used a self-organizing neural network (a Kohonen Self-Organizing Map) to conceptually cluster search hits retrieved during a real-world digital forensic investigation. We measured information retrieval effectiveness (e.g., precision, recall, and overhead) of the new approach and compared them against the current approach. The empirical results indicate that the clustering process significantly reduces information retrieval overhead of the digital forensic text string search process, which is currently a very burdensome endeavor.  相似文献   

19.
Text in images and video contains important information for visual content understanding, indexing, and recognizing. Extraction of this information involves preprocessing, localization and extraction of the text from a given image. In this paper, we propose a novel expiration code detection and recognition algorithm by using Gabor features and collaborative representation based classification. The proposed system consists of four steps: expiration code location, character isolation, Gabor features extraction and characters recognition. For expiration code detection, the Gabor energy (GE) and the maximum energy difference (MED) are extracted. The performance of the recognition algorithm is tested over three Gabor features: GE, magnitude response (MR) and imaginary response (IR). The Gabor features are classified based on collaborative representation based classifier (GCRC). To encompass all frequencies and orientations, downsampling and principal component analysis (PCA) are applied in order to reduce the features space dimensionality. The effectiveness of the proposed localization algorithm is highlighted and compared with other existing methods. Extensive testing shows that the suggested detection scheme outperforms existing methods in terms of detection rate for large image database. Also, GCRC show very competitive results compared with Gabor feature sparse representation based classification (GSRC). Also, the proposed system outperforms the nearest neighbor (NN) classifier and the collaborative representation based classification (CRC).  相似文献   

20.
Finding efficient, effective ways to compare graphs arising from recognition processes with their corresponding ground-truth graphs is an important step toward more rigorous performance evaluation.In this paper, we examine in detail the graph probing paradigm we first put forth in the context of our work on table understanding and later extended to HTML-coded Web pages. We present a formalism showing that graph probing provides a lower bound on the true edit distance between two graphs. From an empirical standpoint, the results of two simulation studies and an experiment using scanned pages show that graph probing correlates well with the latter measure. Moreover, our technique is very fast; graphs with tens or hundreds of thousands of vertices can be compared in mere seconds. Ease of implementation, scalability, and speed of execution make graph probing an attractive alternative for graph comparison.Received: 1 October 2002, Accepted: 15 January 2003, Published online: 6 February 2004Correspondence to: D. Lopresti  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号