首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We seek to leverage an expert user's knowledge about how information is organized in a domain and how information is presented in typical documents within a particular domain-specific collection, to effectively and efficiently meet the expert's targeted information needs. We have developed the semantic components model to describe important semantic content within documents. The semantic components model for a given collection (based on a general understanding of the type of information needs expected) consists of a set of document classes, where each class has an associated set of semantic components. Each semantic component instance consists of segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. This paper describes how the semantic components model can be used to improve an information retrieval system. We present experimental evidence from a large interactive searching study that compared the use of semantic components in a system with full text and keyword indexing, where we extended the query language to allow users to search using semantic components, to a base system that did not have semantic components. We evaluate the systems from a system perspective, where semantic components were shown to improve document ranking for precision-oriented searches, and from a user perspective. We also evaluate the systems from a session-based perspective, evaluating not only the results of individual queries but also the results of multiple queries during a single interactive query session.  相似文献   

2.
现有的空间关键字查询处理模式大都仅支持位置相近和文本相似匹配,但不能将语义相近但形式上不匹配的对象提供给用户;并且,当前的空间-文本索引结构也不能对空间对象中的数值属性进行处理。针对上述问题,本文提出了一种支持语义近似查询的空间关键字查询方法。首先,利用词嵌入技术对用户原始查询进行扩展,生成一系列与原始查询关键字语义相关的查询关键字;然后,提出了一种能够同时支持文本和语义匹配,并利用Skyline方法对数值属性进行处理的混合索引结构AIR-Tree;最后,利用AIR-Tree进行查询匹配,返回top-k个与查询条件最为相关的有序空间对象。实验分析和结果表明,与现有同类方法相比,本文方法具有较高的执行效率和较好的用户满意度;基于AIR-Tree索引的查询效率较IRS-Tree索引提高了3.6%,在查询结果准确率上较IR-Tree和IRS-Tree索引分别提高了10.14%和16.15%。  相似文献   

3.
Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.  相似文献   

4.
Web数据库用户通常使用他们熟知的关键字表达查询意图,这可能导致获取的结果不能很好满足其查询需求,因此为他们提供top-k个与初始查询语义相关且多样化的候选查询有助于用户扩展知识范围,从而更准确完善地表达其查询意图.提出一种top-k多样性关键字查询推荐方法.1)利用不同关键字在查询历史中的同现频率和关联关系评估关键字之间的内耦合和间耦合关系;2)根据关键字之间的耦合关系构建语义矩阵,进而利用语义矩阵和核函数方法评估不同关键字查询之间的语义相关度.为了快速返回top-k个与初始查询相关且多样性的候选查询,根据查询之间的语义相关度,利用概率密度函数分析查询的典型程度,并利用近似算法从查询历史中找出典型查询.对于所有的典型查询,从中选出少数代表性查询,根据其他典型查询与代表性查询之间的语义相关度,为每个代表性查询构建相应的查询序列;当一个新的查询到来时,评估其与代表性查询之间的语义相关度,然后利用阈值算法(threshold algorithm, TA)在预先创建的查询序列上快速选出top-k个与给定查询语义相关的多样性候选查询.实验结果和分析表明:提出的关键字之间耦合关系计算和查询之间的语义相关度评估方法具有较高准确性,top-k多样性选取方法具有较好效果和较高执行效率.  相似文献   

5.
Keyword search is an effective paradigm for information discovery and has been introduced recently to query XML documents. Scoring of XML search results is an important issue in XML keyword search. Traditional “bag-of-words” model cannot differentiate the roles of keywords as well as the relationship between keywords, thus is not proper for XML keyword queries. In this paper, we present a new scoring method based on a novel query model, called keyword query with structure (QWS), which is specially designed for XML keyword query. The method is based on a totally new view taken by the QWS model on a keyword query that, a keyword query is a composition of several query units, each representing a query condition. We believe that this method captures the semantic relevance of the search results. The paper first introduces an algorithm reformulating a keyword query to a QWS. Then, a scoring method is presented which measures the relevance of search results according to how many and how well the query conditions are matched. The scoring method is also extended to clusters of search results. Experimental results verify the effectiveness of our methods.  相似文献   

6.
We present a new text-to-image re-ranking approach for improving the relevancy rate in searches. In particular, we focus on the fundamental semantic gap that exists between the low-level visual features of the image and high-level textual queries by dynamically maintaining a connected hierarchy in the form of a concept database. For each textual query, we take the results from popular search engines as an initial retrieval, followed by a semantic analysis to map the textual query to higher level concepts. In order to do this, we design a two-layer scoring system which can identify the relationship between the query and the concepts automatically. We then calculate the image feature vectors and compare them with the classifier for each related concept. An image is relevant only when it is related to the query both semantically and content-wise. The second feature of this work is that we loosen the requirement for query accuracy from the user, which makes it possible to perform well on users’ queries containing less relevant information. Thirdly, the concept database can be dynamically maintained to satisfy the variations in user queries, which eliminates the need for human labor in building a sophisticated initial concept database. We designed our experiment using complex queries (based on five scenarios) to demonstrate how our retrieval results are a significant improvement over those obtained from current state-of-the-art image search engines.  相似文献   

7.
The problem of word mismatch in information retrieval (IR) occurs because users often use different words to describe concepts in their queries than authors use to describe the same concepts in their documents. Query expansion is used to deal with the mismatch between author and user vocabularies. To support query expansion, indices on words related by lexical semantics and syntactical co-occurrence need to be maintained. Two issues become paramount in supporting query expansion: the size of index tables and the query processing overhead. In this paper, we propose to use the notion of multi-granularity for more efficient indexing and query processing while the same degrees of precision and recall are maintained. We also describes extensions of this technique to handle: (1) query relaxation to handle words with multiple senses and with other semantic relationships; (2) progressive processing of queries with top N results and (3) progressive processing of queries with specification of the importance of each keyword.  相似文献   

8.
李岩  张博文  郝红卫 《计算机应用》2016,36(9):2526-2530
针对传统查询扩展方法在专业领域中扩展词与原始查询之间缺乏语义关联的问题,提出一种基于语义向量表示的查询扩展方法。首先,构建了一个语义向量表示模型,通过对语料库中词的上下文语义进行学习,得到词的语义向量表示;其次,根据词语义向量表示,计算词之间的语义相似度;然后,选取与查询中词汇的语义最相似的词作为查询的扩展词,扩展原始查询语句;最后,基于提出的查询扩展方法构建了生物医学文档检索系统,针对基于维基百科或WordNet的传统查询扩展方法和BioASQ 2014—2015参加竞赛的系统进行对比实验和显著性差异指标分析。实验结果表明,基于语义向量表示查询扩展的检索方法所得到结果优于传统查询扩展方法的结果,平均准确率至少提高了1个百分点,在与竞赛系统的对比中,系统的效果均有显著性提高。  相似文献   

9.
语义检索是解决信息检索中准确度、人性化要求的一个非常有潜力的方法。通过对知识文档进行主题词标注,然后建立从词元→主题词→知识文档的二级索引结构;对用户的检索,进行查询词到主题词的转化,计算语义相似度,按照语义相似度算法进行排序文档。目前基于知识文档的语义检索系统已经在某集团公司进行部署和应用,取得了前5项结果命中用户总查询90%的效果,说明这种方法是语义检索的一种有效途径。  相似文献   

10.
基于语义网的电子政务文档智能检索   总被引:7,自引:0,他引:7  
杨芳  杨振山 《计算机应用》2005,25(10):2434-2435
根据电子政务文档的特点,通过电子政务主题词表计算检索文档集和检索请求的特征值。讨论了检索文档集和检索请求的相似性计算,从而找到与检索请求匹配的文档。根据电子政务文档元数据的语义组织形式,研究电子政务文档元数据的检索问题。对所检索到的文档进行元数据语义组织,从而在语义推理的基础上实现智能检索。  相似文献   

11.
空间关键词搜索立足于查找满足用户查询意图且空间距离相近的兴趣点(point of interest, POI),在地图搜索等领域有着广泛的应用.传统的空间关键词搜索方法仅考虑关键词与POI点在文本上的匹配程度,忽略了查询的语义信息,因而会导致相关结果丢失以及无关结果引入等问题.针对传统方法的局限,提出了语义增强的空间关键词搜索方法S3(semantic-enhanced spatial keyword search).该方法对查询关键词中包含的语义信息进行分析,并结合语义相关性和空间距离对POI点进行有效的排序.S3方法主要有以下2个技术挑战:1)如何对语义信息进行分析.为此,S3引入了知识库对POI数据进行语义扩充,并提出了一种基于图的语义距离度量方式.结合语义距离和空间距离,S3给出POI点的综合排序方案.2)如何在大规模数据上即时地返回top-k搜索结果.针对这一挑战,提出了一种新型的语义-空间混合索引结构GRTree(graph rectangle tree),并研究了有效的剪枝策略.在大规模真实数据集上的实验表明,S3不仅能够返回更为相关的结果,而且有着很好的效率和可扩展性.  相似文献   

12.
Users who are familiar with the existing keyword-based search have problems of not being able to configure the formal query because they don’t have generic knowledge on knowledge base when using the semantic-based retrieval system. User wants the search results which are more accurate and match the user’s search intents with the existing keyword-based search and the same search keyword without the need to recognize what technology the currently used retrieval system is based on to provide the search results. In order to do the semantic analysis of the ambiguous search keyword entered by users who are familiar with the existing keyword-based search, ontological knowledge base constructed based on refined meta-data is necessary, and the keyword semantic analysis technique which reflects user’s search intents from the well-established knowledge base and can generate accurate search results is necessary. In this paper, therefore, by limiting the knowledge base construction to multimedia contents meta-data, the applicable prototype has been implemented and its performance in the same environment as Smart TV has been evaluated. Semantic analysis of user’s search keyword is done, evaluated and recommended through the proposed ontological knowledge base framework so that accurate search results that match user’s search intents can be provided.  相似文献   

13.
Online information repositories commonly provide keyword search facilities through textual query languages based on Boolean logic. However, there is evidence to suggest that the syntactic demands of such languages can lead to user errors and adversely affect the time that it takes users to form queries. Users also face difficulties because of the conflict in semantics between AND and OR when used in Boolean logic and English language. Analysis of usage logs for the New Zealand Digital Library (NZDL) show that few Boolean queries contain more than three terms, use of the intersection operator dominates and that query refinement is common. We suggest that graphical query languages, in particular Venn-like diagrams, can alleviate the problems that users experience when forming Boolean expressions with textual languages. A study of the utility of Venn diagrams for query specification indicates that with little or no training users can interpret and form Venn-like diagrams in a consistent manner which accurately correspond to Boolean expressions. We describe VQuery, a Venn-diagram based user interface to the New Zealand Digital Library (NZDL). In a study which compared VQuery with a standard textual Boolean interface, users took significantly longer to form queries and produced more erroneous queries when using VQuery. We discuss the implications of these results and suggest directions for future work. Received: 15 December 1997 / Revised: June 1999  相似文献   

14.
Linked Data brings inherent challenges in the way users and applications consume the available data. Users consuming Linked Data on the Web, should be able to search and query data spread over potentially large numbers of heterogeneous, complex and distributed datasets. Ideally, a query mechanism for Linked Data should abstract users from the representation of data. This work focuses on the investigation of a vocabulary independent natural language query mechanism for Linked Data, using an approach based on the combination of entity search, a Wikipedia-based semantic relatedness measure and spreading activation. Wikipedia-based semantic relatedness measures address existing limitations of existing works which are based on similarity measures/term expansion based on WordNet. Experimental results using the query mechanism to answer 50 natural language queries over DBpedia achieved a mean reciprocal rank of 61.4%, an average precision of 48.7% and average recall of 57.2%.  相似文献   

15.
李求实  王秋月  王珊 《软件学报》2012,23(8):2002-2017
与纯文本文档集相比,使用语义标签标注的半结构化的XML文档集,有助于信息检索系统更好地理解待检索文档.同样,结构化查询,比如SQL,XQuery和Xpath,相对于纯关键词查询更加清晰地表达了用户的查询意图.这二者都能够帮助信息检索系统获得更好的检索精度.但关键词查询因其简单和易用性,仍被广泛使用.提出了XNodeRelation算法,以自动推断关键词查询的结构化信息(条件/目标节点类型).与已有的推断算法相比,综合了XML文档集的模式和统计信息以及查询关键词出现的上下文及其关联关系等推断用户的查询意图.大量的实验验证了该算法的有效性.  相似文献   

16.
Traditional search engines have become the most useful tools to search the World Wide Web. Even though they are good for certain search tasks, they may be less effective for others, such as satisfying ambiguous or synonym queries. In this paper, we propose an algorithm that, with the help of Wikipedia and collaborative semantic annotations, improves the quality of web search engines in the ranking of returned results. Our work is supported by (1) the logs generated after query searching, (2) semantic annotations of queries and (3) semantic annotations of web pages. The algorithm makes use of this information to elaborate an appropriate ranking. To validate our approach we have implemented a system that can apply the algorithm to a particular search engine. Evaluation results show that the number of relevant web resources obtained after executing a query with the algorithm is higher than the one obtained without it.  相似文献   

17.
A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining these relation instances into complex result tuples with conjunctive queries. Our query processor transforms a structured user query into keyword queries that are submitted to a search engine, forwards search results to a relation extractor, and then combines relations into complex result tuples. The processor automatically learns discriminative and effective keywords for different types of semantic relations. Thereby, our query processor leverages the index of a search engine to query potentially billions of pages. Unfortunately, relation extractors may fail to return a relation for a result tuple. Moreover, user defined data sources may not return at least k complete result tuples. Therefore we propose an adaptive routing model based on information theory for retrieving missing attributes of incomplete result tuples. The model determines the most promising next incomplete tuple and attribute type for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our query processor returns complete result tuples while processing only very few Web pages.  相似文献   

18.
In this paper, we propose a multimodal query suggestion method for video search which can leverage multimodal processing to improve the quality of search results. When users type general or ambiguous textual queries, our system MQSS provides keyword suggestions and representative image examples in an easy-to-use dropdown manner which can help users specify their search intent more precisely and effortlessly. It is a powerful complement to initial queries. After the queries are formulated as multimodal query (i.e., text, image), the new queries are input to individual search models, such as text-based, concept-based and visual example-based search model. Then we apply multimodal fusion method to aggregate the above-mentioned several search results. The effectiveness of MQSS is demonstrated by evaluations over a web video data set.  相似文献   

19.
夏美翠  时鸿涛 《计算机应用》2015,35(10):2915-2919
为了提高Web信息检索的准确率,提出一种基于语义网的高效信息查询方法。首先从本体库中提取目标资源与查询关键字之间的语义路径,通过分析语义路径所包含的属性的权重和识别能力,分别计算每个语义路径的权重;然后,根据资源与查询关键字之间的语义路径的权重、数量和特异性,分别计算每个资源与各关键字之间的语义相关性,并结合关键字的涵盖范围和识别能力综合计算每个资源与关键字集之间的语义相关性;最后,以该相关性为依据对所有资源进行排序和输出。实验结果表明,与OntoLook、tf*idf和TMSubtree三种语义网查询算法相比,基于语义网的高效信息查询方法的平均正确率分别提高了69.0、25.0和21.0个百分点;平均召回率分别提高了77.1、28.3和24.3个百分点;平均F测度值分别提高了72.4、26.4和22.4个百分点。实验结果表明:该方法不仅能够有效提升语义查询的准确率,而且对隐性信息也有很好的查询效果。  相似文献   

20.
Distributed hash tables (DHTs) excel at exact-match lookups, but they do not directly support complex queries such as semantic search that is based on content. In this paper, we propose a novel approach to efficient semantic search on DHT overlays. The basic idea is to place indexes of semantically close files into same peer nodes with high probability by exploiting information retrieval algorithms and locality sensitive hashing. A query for retrieving semantically close files is answered with high recall by consulting only a small number (e.g., 10–20) of nodes that stores the indexes of the files semantically close to the query. Our approach adds only index information to peer nodes, imposing only a small storage overhead. Via detailed simulations, we show that our approach achieves high recall for queries at very low cost, i.e., the number of nodes visited for a query is about 10–20, independent of the overlay size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号