共查询到20条相似文献,搜索用时 62 毫秒
1.
To retrieve Web documents of interest, most of the Web users rely on Web search engines. All existing search engines provide query facility for users to search for the desired documents using search-engine keywords. However, when a search engine retrieves a long list of Web documents, the user might need to browse through each retrieved document in order to determine which document is of interest. We observe that there are two kinds of problems involved in the retrieval of Web documents: (1) an inappropriate selection of keywords specified by the user; and (2) poor precision in the retrieved Web documents. In solving these problems, we propose an automatic binary-categorization method that is applicable for recognizing multiple-record Web documents of interest, which appear often in advertisement Web pages. Our categorization method uses application ontologies and is based on two information retrieval models, the Vector Space Model (VSM) and the Clustering Model (CM). We analyze and cull Web documents to just those applicable to a particular application ontology. The culling analysis (i) uses CM to find a virtual centroid for the records in a Web document, (ii) computes a vector in a multi-dimensional space for this centroid, and (iii) compares the vector with the predefined ontology vector of the same multi-dimensional space using VSM, which we consider the magnitudes of the vectors, as well as the angle between them. Our experimental results show that we have achieved an average of 90% recall and 97% precision in recognizing Web documents belonged to the same category (i.e., domain of interest). Thus our categorization discards very few documents it should have kept and keeps very few it should have discarded. 相似文献
2.
《Knowledge》2005,18(2-3):117-124
In this paper we propose an approach for refining a document ranking by learning filtering rulesets through relevance feedback. This approach includes two important procedures. One is a filtering method, which can be incorporated into any kinds of information retrieval systems. The other is a learning algorithm to make a set of filtering rules, each of which specifies a condition to identify relevant documents using combinations of characteristic words. Our approach is useful not only to overcome the limitation of the vector space model, but also to utilize tags of semi-structured documents like Web pages. Through experiments we show our approach improves the performance of relevance feedback in two types of IR systems adopting the vector space model and a Web search engine, respectively. 相似文献
3.
Document similarity search is to find documents similar to a given query document and return a ranked list of similar documents to users, which is widely used in many text and web systems, such as digital library, search engine, etc. Traditional retrieval models, including the Okapi's BM25 model and the Smart's vector space model with length normalization, could handle this problem to some extent by taking the query document as a long query. In practice, the Cosine measure is considered as the best model for document similarity search because of its good ability to measure similarity between two documents. In this paper, the quantitative performances of the above models are compared using experiments. Because the Cosine measure is not able to reflect the structural similarity between documents, a new retrieval model based on TextTiling is proposed in the paper. The proposed model takes into account the subtopic structures of documents. It first splits the documents into text segments with TextTiling and calculates the similarities for different pairs of text segments in the documents. Lastly the overall similarity between the documents is returned by combining the similarities of different pairs of text segments with optimal matching method. Experiments are performed and results show: 1) the popular retrieval models (the Okapi's BM25 model and the Smart's vector space model with length normalization) do not perform well for document similarity search; 2) the proposed model based on TextTiling is effective and outperforms other models, including the Cosine measure; 3) the methods for the three components in the proposed model are validated to be appropriately employed. 相似文献
4.
In this paper, we present a new method for query reweighting to deal with document retrieval. The proposed method uses genetic algorithms to reweight a user's query vector, based on the user's relevance feedback, to improve the performance of document retrieval systems. It encodes a user's query vector into chromosomes and searches for the optimal weights of query terms for retrieving documents by genetic algorithms. After the best chromosome is found, the proposed method decodes the chromosome into the user's query vector for dealing with document retrieval. The proposed query reweighting method can find the best weights of query terms in the user's query vector, based on the user's relevance feedback. It can increase the precision rate and the recall rate of the document retrieval system for dealing with document retrieval. 相似文献
5.
6.
搜索引擎返回的信息太多且不能根据用户的兴趣提供检索结果,使得用户使用搜索引擎难以用简便的方式找到感兴趣的文档。个性化推荐是一种旨在减轻用户在信息检索方面负担的有效方法。文中把内容过滤技术和文档聚类技术相结合,实现了一个基于搜索结果的个性化推荐系统,以聚类的方法自动组织搜索结果,主动推荐用户感兴趣的文档。通过建立用户概率兴趣模型,对搜索结果STC聚类的基础上进行内容过滤。实验表明,概率模型比矢量空间模型更好地表达了用户的兴趣和变化。 相似文献
7.
《Computer Communications》2013,36(1):90-104
The success and intensive use of social networks makes strategies for efficient document location a hot topic of research. In this paper, we propose a common vector space to describe documents and users to create a social network based on affinities, and explore epidemic routing to recommend documents according to the user’s interests. Furthermore, we propose the creation of a SoftDHT structure to improve the recommendation results. Using these mechanisms, an efficient document recommender system with a fast organization of clusters of users based on their affinity can be provided, preventing the creation of unlinked communities. We show through simulations that the proposed system has a short convergence time and presents a high recall ratio. 相似文献
8.
A Knowledge-Based Approach to Effective Document Retrieval 总被引:3,自引:0,他引:3
This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query. 相似文献
9.
Michael Johnson Farshad Fotouhi Sorin DrĂghici Ming Dong Duo Xu 《Multimedia Tools and Applications》2004,24(2):155-188
This paper describes our research into a query-by-semantics approach to searching the World Wide Web. This research extends existing work, which had focused on a query-by-structure approach for the Web. We present a system that allows users to request documents containing not only specific content information, but also to specify that documents be of a certain type. The system captures and utilizes structure information as well as content during a distributed query of the Web. The system also allows the user the option of creating their own document types by providing the system with example documents. In addition, although the system still gives users the option of dynamically querying the web, the incorporation of a document database has improved the response time involved in the search process. Based on extensive testing and validation presented herein, it is clear that a system that incorporates structure and document semantic information into the query process can significantly improve search results over the standard keyword search. 相似文献
10.
11.
针对云计算环境下已有的密文检索方案不支持检索关键词语义扩展、精确度不够、检索结果不支持排序的问题,提出一种支持检索关键词语义扩展的可排序密文检索方案。首先,使用词频逆文档频率(TF-IDF)方法计算文档中关键词与文档之间的相关度评分,并对文档不同域中的关键词设置不同的位置权重,使用域加权评分方法计算位置权重评分,将相关度评分与位置权重评分的乘积设置为关键词在文档索引向量上相应位置的取值;其次,根据WordNet语义网对授权用户输入的检索关键词进行语义扩展,得到语义扩展检索关键词集合,使用编辑距离公式计算语义扩展检索关键词集合中关键词之间的相似度,并将相似度值设置为检索关键词在文档检索向量上相应位置的取值;最后,加密产生安全索引和文档检索陷门,在向量空间模型(VSM)下进行内积运算,以内积运算的结果为密文检索文档的排序依据。理论分析和实验仿真表明,所提方案在已知密文模型和已知背景知识模型下是安全的,且具备对检索结果的排序能力;与多关键字密文检索结果排序(MRSE)方案相比,所提方案支持关键词语义扩展,查询准确率比MRSE方案更加准确可靠,而检索时间则与MRSE方案相差不大。 相似文献
12.
O. Cordón F. Moya C. Zarco 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(5):308-319
Relevance feedback techniques have demonstrated to be a powerful means to improve the results obtained when a user submits
a query to an information retrieval system as the world wide web search engines. These kinds of techniques modify the user
original query taking into account the relevance judgements provided by him on the retrieved documents, making it more similar
to those he judged as relevant. This way, the new generated query permits to get new relevant documents thus improving the
retrieval process by increasing recall. However, although powerful relevance feedback techniques have been developed for the
vector space information retrieval model and some of them have been translated to the classical Boolean model, there is a
lack of these tools in more advanced and powerful information retrieval models such as the fuzzy one. In this contribution
we introduce a relevance feedback process for extended Boolean (fuzzy) information retrieval systems based on a hybrid evolutionary
algorithm combining simulated annealing and genetic programming components. The performance of the proposed technique will
be compared with the only previous existing approach to perform this task, Kraft et al.'s method, showing how our proposal
outperforms the latter in terms of accuracy and sometimes also in time consumption. Moreover, it will be showed how the adaptation
of the retrieval threshold by the relevance feedback mechanism allows the system effectiveness to be increased. 相似文献
13.
Park L.A.F. Ramamohanarao K. Palaniswami M. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(5):529-539
Current document retrieval methods use a vector space similarity measure to give scores of relevance to documents when related to a specific query. The central problem with these methods is that they neglect any spatial information within the documents in question. We present a new method, called Fourier Domain Scoring (FDS), which takes advantage of this spatial information, via the Fourier transform, to give a more accurate ordering of relevance to a document set. We show that FDS gives an improvement in precision over the vector space similarity measures for the common case of Web like queries, and it gives similar results to the vector space measures for longer queries. 相似文献
14.
传统的云计算下的可搜索加密算法没有对查询关键词进行语义扩展,导致了用户查询意图与返回结果存在语义偏差,并且对检索结果的相关度排序不够合理,无法满足用户对智能搜索的需求。对此,提出了一种支持语义的可搜索加密方法。该方法利用本体知识库实现了用户查询的语义拓展,并通过语义相似度来控制扩展词的个数,防止因拓展词过多影响检索的精确度。同时,该方法利用文档向量、查询向量分块技术构造出对应的标记向量,以过滤无关文档,并在查询-文档的相似度得分中引入了语义相似度、关键词位置加权评分及关键词-文档相关度等影响因子,实现了检索结果的有效排序。实验结果表明,该方法在提高检索效率的基础上显著改善了检索结果的排序效果,提高了用户满意度。 相似文献
15.
16.
17.
Although a technique of relevance feedback is common in the field of information retrieval (IR), the feedback is usually done
by means of query refinement; restructuring of the information space has not been attempted yet. The restructuring not only
allows useful applications such as clustering but also is indispensable for IR if a modeling function employs correlation
of terms. In this paper we present a new method of relevance feedback through the restructuring of the information space.
Our method adapts document space to the user’s mental model by manipulating a dictionary vector. Therefore, user’s viewpoint
is preserved after a series of retrieval processes and reused for retrieval performed later. We show its effectiveness through
the retrieval experiments on FAQ (Frequntly Asked Questions) documents.
Tomoko Murakami: She obtained her bachelor’s degree in Engineering from Aoyama Gakuin University in 1996, and her master’s degree in Media
and Governance from Keio University in 1998. In 1998 she joined Human Interface Labolatory, Corporate Research & Development
Center, Toshiba Corporation, Kawasaki, Japan. Her research interests are in Machine Learning, especially Inductive Logic Programming.
She is a member of JSAI.
Ryohei Orihara, Ph.D.: He is a research scientist at Human Interface Laboratory, Corporate Research & Development Center, Toshiba Corporation.
He obtained his bachelor’s degree and master’s degree in Engineering and Ph.D. from University of Tsukuba in 1986, 1988 and
1999 respectively. His current research interests include machine learning, creativity support system, analogical reasoning
and metaphor understanding. He was a visiting researcher at University of Toronto from 1993 to 1995. He is a member of IPSJ,
JSAI and JSSST. He is presently on the editorial committee of the Journal of JSAI. 相似文献
18.
Frederick Knabe Daniel Tunkelang 《IT Professional》2007,9(1):21-28
Today, digitally stored information isn't only ubiquitous, it's also increasing in volume at an exponential rate. And not only is the volume increasing, but so is the variety, as well as the ways of combining information from different sources to derive insights. Not surprisingly, our most pressing technological and business problem is finding what we need in this sea of information. The dominant paradigm for addressing this problem is information retrieval (Modem Information Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto, ACM Press, 1999). In this paradigm, the user enters a query (typically a few words typed into a search box), and the system retrieves documents matching the query, ranking the matches based on an estimate of their relevancy to the query. If the system finds many matches, the user sees only the highest-ranked matches. The popularity of Web search systems such as Google shows that the information retrieval paradigm can be effective. An information access framework empowers users by explicitly focusing on the interaction between users and the system. The key problem for information access systems isn't guessing which matching document is most relevant, but establishing a dialogue in which users progressively communicate their information goals while the system provides immediate, incremental feedback that guides users in the pursuit of those goals 相似文献
19.
In this paper, we propose CYBER, a CommunitY
Based sEaRch engine, for information retrieval utilizing community feedback information in a DHT network. In CYBER, each user is associated
with a set of user profiles that capture his/her interests. Likewise, a document is associated with a set of profiles—one
for each indexed term. A document profile is updated by users who query on the term and consider the document as a relevant
answer. Thus, the profile acts as a consolidation of users feedback from the same community, and reflects their interests.
In this way, as one user finds a document to be relevant, another user in the same community issuing a similar query will
benefit from the feedback provided by the earlier user. Hence, the search quality in terms of both precision and recall is
improved. Moreover, we further improve the effectiveness of CYBER by introducing an index tuning technique. By choosing the
indexing terms more carefully, community-based relevance feedback is utilized in both building/refining indices and re-evaluating
queries. We first propose a naive scheme, CYBER+, which involves an index tuning technique based on past queries only, and
then re-evaluates queries in a separate step. We then propose a more complex scheme, CYBER+ +, which refines its index based
on both past queries and relevance feedback. As the index is built with more selective and accurate terms, the search performance
is further improved. We conduct a comprehensive experimental study and the results show the effectiveness of our schemes. 相似文献
20.
Steve Benford Dave Snowdon Chris Greenhalgh Rob Ingram Ian Knox Chris Brown 《Computer Graphics Forum》1995,14(3):349-360
We present a virtual reality application called VR-VIBE which is intended to support the co-operative browsing and filtering of large document stores. VR-VIBE extends a visualisation approach proposed in a previous two dimensional system called VIBE into three dimensions, allowing more information to be visualised at one time and supporting more powerful styles of interaction, The essence of VR-VIBE is that multiple users can explore the results of applying several simultaneous queries to a corpus of documents. By arranging the queries into a spatial framework, the system shows the relative attraction of each document to each query by its spatial position and also shows the absolute relevance of each document to all of the queries. Users may then navigate the space, select individual documents, control the display according to a dynamic relevance threshold and dynamically drag the queries to new positions to see the effect on the document space. Co-operative browsing is supported by directly embodying users and providing them with the ability to interact over live audio connections and to attach brief textual annotations to individual documents. Finally, we conclude with some initial observations gleaned from our experience of constructing VR-VIBE and using it in the laboratory setting. 相似文献