首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
基于语义的概念查询扩展   总被引:1,自引:1,他引:1  
针对当前信息检索系统中所存在查准率低和查全率低的情况,分析了当前检索系统中常用的方法后,提出了一种基于语义的概念查询扩展方法.该方法结合概念语义空间来实现用户检索的概念查询扩展,以达到提高查准率和查全率的目的.实验结果表明,该方法相对于传统方法可以大幅提高用户检索的查准率和查全率.  相似文献   

A search query, being a very concise grounding of user intent, could potentially have many possible interpretations. Search engines hedge their bets by diversifying top results to cover multiple such possibilities so that the user is likely to be satisfied, whatever be her intended interpretation. Diversified Query Expansion is the problem of diversifying query expansion suggestions, so that the user can specialize the query to better suit her intent, even before perusing search results. In this paper, we consider the usage of semantic resources and tools to arrive at improved methods for diversified query expansion. In particular, we develop two methods, those that leverage Wikipedia and pre-learnt distributional word embeddings respectively. Both the approaches operate on a common three-phase framework; that of first taking a set of informative terms from the search results of the initial query, then building a graph, following by using a diversity-conscious node ranking to prioritize candidate terms for diversified query expansion. Our methods differ in the second phase, with the first method Select-Link-Rank (SLR) linking terms with Wikipedia entities to accomplish graph construction; on the other hand, our second method, Select-Embed-Rank (SER), constructs the graph using similarities between distributional word embeddings. Through an empirical analysis and user study, we show that SLR ourperforms state-of-the-art diversified query expansion methods, thus establishing that Wikipedia is an effective resource to aid diversified query expansion. Our empirical analysis also illustrates that SER outperforms the baselines convincingly, asserting that it is the best available method for those cases where SLR is not applicable; these include narrow-focus search systems where a relevant knowledge base is unavailable. Our SLR method is also seen to outperform a state-of-the-art method in the task of diversified entity ranking.  相似文献   

结合概念语义空间的语义扩展技术研究   总被引:2,自引:0,他引:2  
王磊  黄广君 《计算机工程与应用》2012,48(35):106-109,193
查询扩展是在原查询词的基础上加入相关的词或者词组,以克服自然语言的"二义性"问题,改进查询意愿的描述。在概念语义空间中进行查询词扩展,可以充分挖掘出查询词之间的关联程度,在整体上把握查询意愿。利用WordNet语义词典中的上下文关系和相似度关系为各个原始查询词构建语义树,并将这些语义树向上溯源建立完整的概念语义空间,以共现信息为特征参数对扩展源中的词进行筛选,以避免过度扩展引起查询语义漂移。还引入动态观察窗口加权模型,以强化共现信息对单词之间关联度的表示。实验结果表明,该扩展算法比传统伪相关反馈算法的扩展质量有明显提高。  相似文献   

基于语义查询本体的语义网文档检索   总被引:1,自引:0,他引:1  
语义网的发展使人们需要对语义网文档进行检索.为了在不需要专业知识和技巧的情况下让用户能形成语义的查询,提出了一种基于本体可以在结构化的知识库里检索语义网文档的算法.通过将自然语言查询术语映射到词汇意义来构造查询本体,以及检索跟查询本体最相似的语义网文档,提高了对语义网文档检索的查准率,使用户能更好地利用语义检索服务.  相似文献   

针对现有的稠密文本检索模型(dense passage retrieval,DPR)存在的负采样效率低、易产生过拟合等问题,提出了一种基于查询语义特性的稠密文本检索模型(Q-DPR)。首先,针对模型的负采样过程,提出了一种基于近邻查询的负采样方法。该方法通过检索近邻查询,快速地构建高质量的负相关样本,以降低模型的训练成本。其次,针对模型易产生过拟合的问题,提出了一种基于对比学习的查询自监督方法。该方法通过建立查询间的自监督对比损失,缓解模型对训练标签的过拟合,从而提升模型的检索准确性。Q-DPR在面向开放领域问答的大型数据集MSMARCO上表现优异,取得了0.348的平均倒数排名以及0.975的召回率。实验结果证明,该模型成功地降低了训练的开销,同时也提升了检索的性能。  相似文献   

《Computers in Industry》2014,65(6):937-951
Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very efficient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results could be to consider query expansions that aim at adding automatically or semi-automatically additional information in the query to improve the reliability and accuracy of the returned results. In this paper, we present a new approach for enhancing the relevancy of queries during a passage retrieval in log files. It is based on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of the requested information within a learning process. The second step is a new kind of pseudo relevance feedback. Based on a novel term weighting measure it aims at assigning a weight to terms according to their relatedness to queries. This measure, called TRQ (Term Relatedness to Query), is used to identify the most relevant expansion terms.The main advantage of our approach is that is can be applied both on log files and documents from general domains. Experiments conducted on real data from logs and documents show that our query expansion protocol enables retrieval of relevant passages.  相似文献   

基于互信息的问句语义扩展研究   总被引:1,自引:0,他引:1  
用户习惯用很少的关键字来检索所需的信息,这必然会导致出现用户所检索的信息与得到的信息有所偏差.针对这一现象,提出了基于互信息的问句语义扩展模型(QSE_BMI).它的好处在于可以根据用户自己制定的兴趣模型和输入的查询问句,检索出与用户兴趣相匹配的并且符合用户需要的相关信息.  相似文献   

Users who are familiar with the existing keyword-based search have problems of not being able to configure the formal query because they don’t have generic knowledge on knowledge base when using the semantic-based retrieval system. User wants the search results which are more accurate and match the user’s search intents with the existing keyword-based search and the same search keyword without the need to recognize what technology the currently used retrieval system is based on to provide the search results. In order to do the semantic analysis of the ambiguous search keyword entered by users who are familiar with the existing keyword-based search, ontological knowledge base constructed based on refined meta-data is necessary, and the keyword semantic analysis technique which reflects user’s search intents from the well-established knowledge base and can generate accurate search results is necessary. In this paper, therefore, by limiting the knowledge base construction to multimedia contents meta-data, the applicable prototype has been implemented and its performance in the same environment as Smart TV has been evaluated. Semantic analysis of user’s search keyword is done, evaluated and recommended through the proposed ontological knowledge base framework so that accurate search results that match user’s search intents can be provided.  相似文献   

Engineers create engineering documents with their own terminologies, and want to search existing engineering documents quickly and accurately during a product development process. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy has been often problematic because of the semantic ambiguity of terminologies in engineering documents and queries. The semantic ambiguity can be alleviated by using a domain ontology. Also, if queries are expanded to incorporate the engineer’s personalized information needs, the accuracy of the search result would be improved. Therefore, we propose a framework to search engineering documents with less semantic ambiguity and more focus on each engineer’s personalized information needs. The framework includes four processes: (1) developing a domain ontology, (2) indexing engineering documents, (3) learning user profiles, and (4) performing personalized query expansion and retrieval. A domain ontology is developed based on product structure information and engineering documents. Using the domain ontology, terminologies in documents are disambiguated and indexed. Also, a user profile is generated from the domain ontology. By user profile learning, user’s interests are captured from the relevant documents. During a personalized query expansion process, the learned user profile is used to reflect user’s interests. Simultaneously, user’s searching intent, which is implicitly inferred from the user’s task context, is also considered. To retrieve relevant documents, an expanded query in which both user’s interests and intents are reflected is then matched against the document collection. The experimental results show that the proposed approach can substantially outperform both the keyword-based approach and the existing query expansion method in retrieving engineering documents. Reflecting a user’s information needs precisely has been identified to be the most important factor underlying this notable improvement.  相似文献   

The experience of a user of major search engines or other web information retrieval services looking for information in the Basque language is far from satisfactory: they only return pages with exact matches but no inflections (necessary for an agglutinative language like Basque), many results in other languages (no search engine gives the option to restrict its results to Basque), etc. This paper proposes using morphological query expansion and language-filtering words in combination with the APIs of search engines as a very cost-effective solution to build appropriate web search services for Basque. The implementation details of the methodology (choosing the most appropriate language-filtering words, the number of them, the most frequent inflections for the morphological query expansion, etc.) have been specified by corpora-based studies. The improvements produced have been measured in terms of precision and recall both over corpora and real web searches. Morphological query expansion can improve recall up to 47 % and language-filtering words can raise precision from 15 % to around 90 %, although with a loss in recall of about 30–35 %. The proposed methodology has already been successfully used in the Basque search service Elebila (http://www.elebila.eu) and the web-as-corpus tool CorpEus (http://www.corpeus.org), and the approach could be applied to other morphologically rich or under-resourced languages as well.  相似文献   

Following the rapid development of Internet, particularly web page interaction technology, distant e-learning has become increasingly realistic and popular. To solve the problems associated with sharing and reusing teaching materials in different e-learning systems, several standard formats, including SCORM, IMS, LOM, and AICC, etc., recently have been proposed by several different international organizations. SCORM LOM, namely learning object metadata, facilitates the indexing and searching of learning objects in a learning object repository through extended sharing and searching features. However, LOM suffers a weakness in terms of semantic-awareness capability. Most information retrieval systems assume that users have cognitive ability regarding their needs. However, in e-learning systems, users may have no idea of what they are looking for and the learning object metadata. This study presents an ontological approach for semantic-aware learning object retrieval. This approach has two significant novel features: a fully automatic ontology-based query expansion algorithm for inferring and aggregating user intention based on their original short query, and another “ambiguity removal” procedure for correcting inappropriate user query terms. This approach is sufficiently generic to be embedded to other LOM-based search mechanisms for semantic-aware learning object retrieval.Focused on digital learning material and contrasted to other traditional keyword-based search technologies, the proposed approach has experimentally demonstrated significantly improved retrieval precision and recall rate.  相似文献   

The expansion of the Internet has made the task of searching a crucial one. Internet users, however, have to make a great effort in order to formulate a search query that returns the required results. Many methods have been devised to assist in this task by helping the users modify their query to give better results. In this paper we propose an interactive method for query expansion. It is based on the observation that documents are often found to contain terms with high information content, which can summarise their subject matter. We present experimental results, which demonstrate that our approach significantly shortens the time required in order to accomplish a certain task by performing web searches.  相似文献   

Multimedia Tools and Applications - This paper proposes a qualitative knowledge-driven semantic modelling approach for image understanding and retrieval. The similarity measure is calculated for...  相似文献   

高效企业信息检索已成为信息检索领域的重点和难点,讨论了企业信息检索相关技术的发展,设计并实现了一个基于概念的企业信息检索系统,利用查询扩展算法对用户输入的关键词进行语义扩展:利用专业词典查找同义词,通过学习指定文档集合找出关联词,并允许用户自定义关联词进行扩展,从而实现真正意义上的概念搜索。系统设计充分考虑可适应性及平台无关性问题,其层次间独立的结构设计使得系统字典可替换,可用于不同行业不同平台的企业信息查询,特别适合中小型企业的轻型简便应用。  相似文献   

详细分析传统的XML电子病历(Electronic Medical Record,EMR)存储方式.针对传统RMR存储方式修改表结构代价大,系统维护困难、负担加重,不能为XML文档建立索引并加快查询速度,不能充分利用XML数据资源、分解文档通常还会造成细节损失等缺点,提出一种新的电子病历原生XML存储方式。该方式不仅可以降低系统复杂度,而且结合行业新兴的XML数据操作语言可以方便地将稿历中的信息应用于医学信息统计、临床辅助诊断等其他领域,从而拓展电子病历的应用空间.  相似文献   


The World Wide Web(WWW) comprises a wide range of information, and it is mainly operated on the principles of keyword matching which often reduces accurate information retrieval. Automatic query expansion is one of the primary methods for information retrieval, and it handles the vocabulary mismatch problem often faced by the information retrieval systems to retrieve an appropriate document using the keywords. This paper proposed a novel approach of hybrid COOT-based Cat and Mouse Optimization (CMO) algorithm named as hybrid COOT-CMO for the appropriate selection of optimal candidate terms in the automatic query expansion process. To improve the accuracy of the Cat and Mouse Optimization (CMO) algorithm, the parameters are tuned with the help of the Coot algorithm. The best suitable expanded query is identified from the available expanded query sets also known as candidate query pools. All feasible combinations in this candidate query pool should be obtained from the top retrieved documents. Benchmark datasets such as the GOV2 Test Collection, the Cranfield Collections, and the NTCIR Test Collection are utilized to assess the performance of the proposed hybrid COOT-CMO method for automatic query expansion. This proposed method surpasses the existing state-of-the-art techniques using many performance measures such as F-score, precision, and mean average precision (MAP).


We propose the application of a novel sub-ontology extraction methodology for achieving interoperability and improving the semantic validity of information retrieval in the medical information systems (MIS) domain. The system offers advanced profiling of a user’s field of specialization by exploiting the concept of sub-ontology extraction, i.e., each sub-ontology may subsequently represent a particular user profile. Semantic profiling of a user’s field of specialization or interest is necessary functionality in any medical domain information retrieval system; this is because the (structural and semantic) extent of information sources is massive and individual users are only likely to be interested in specific parts of the overall knowledge documents on the basis of their area of specialization. The prototypical system, OntoMOVE, has been specifically designed for application in the medical information systems domain. OntoMOVE utilizes semantic web standards like RDF(S) and OWL in addition to medical domain standards and vocabularies encompassed by the UMLS knowledge sources.  相似文献   


跨语言词向量表示旨在利用语言资源丰富的词向量提高语言资源缺乏的词向量表示. 已有方法学习2个词向量空间的映射关系进行单词对齐,其中生成对抗网络方法能在不使用对齐字典的条件下获得良好性能. 然而,在远语言对上,由于缺乏种子字典的引导,映射关系的学习仅依赖向量空间的全局距离,导致求解的词对存在多种可能,难以准确对齐. 为此,提出了基于双判别器对抗的半监督跨语言词向量表示方法. 在已有对抗模型基础上,增加一个双向映射共享的、细粒度判别器,形成具有双判别器的对抗模型. 此外,引入负样本字典补充预对齐字典,利用细粒度判别器进行半监督对抗学习,消减生成多种词对的可能,提高对齐精度. 在2个跨语言数据集上的实验效果表明,提出的方法能有效提升跨语言词向量表示性能.


针对当前主流web搜索引擎存在信息检索个性化效果差和信息检索的精确率低等缺点, 通过对已有方法的技术改进, 介绍了一种基于用户历史兴趣网页和历史查询词相结合的个性化查询扩展方法。当用户在搜索引擎上输入查询词时,能根据学习到的当前用户兴趣模型动态判定用户潜在兴趣和计算词间相关度,并将恰当的扩展查询词组提交给搜索引擎,从而实现不同用户输入同一查询词能返回不同检索结果的目的。实验验证了算法的有效性,检索精确率也比原方法有明显提高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号