首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Given a question and its answer candidates (named QA corpus), answer selection is the task of identifying the most relevant answers to the question. Answer selection is widely used in question answering, web search, and so on. Current deep neural network models primarily utilize local features extracted from input question‐answer pairs (QA pairs). However, the global features contained in QA corpora are under‐utilized, and we argue that these global features substantially contribute to the answer selection task. To verify this point of view, we propose a novel model that combines local and global features for answer selection. In our model, two different global feature extractors are employed to extract statistical global features and deep global features from a QA corpus, respectively. Furthermore, we investigate the integration of these global features with local features in various experimental settings: statistical global features, deep global features, and a combination of statistical and deep global features. Our experimental results show that the global features are effective for answer selection. Our model obtains new state‐of‐the‐art results on two public answer selection datasets and performs especially well on YahooCQA, where it achieves 9.2 and 6% higher precision@1 (P@1) and mean reciprocal rank (MRR) scores than previously published models.  相似文献   

2.
With the expanding growth of Arabic electronic data on the web, extracting information, which is actually one of the major challenges of the question-answering, is essentially used for building corpus of documents. In fact, building a corpus is a research topic that is currently referred to among some other major themes of conferences, in natural language processing (NLP), such as, information retrieval (IR), question-answering (QA), automatic summary (AS), etc. Generally, a question-answering system provides various passages to answer the user questions. To make these passages truly informative, this system needs access to an underlying knowledge base; this requires the construction of a corpus. The aim of our research is to build an Arabic question-answering system. In addition, analyzing the question must be the first step. Next, it is essential to retrieve a passage from the web that can serve as an appropriate answer. In this paper, we propose a method to analysis the question and retrieve the passage answer in the Arabic language. For the question analysis, five factual question types are processed. Additionally, our purpose is to experiment with the generation of a logic representation from the declarative form of each question. Several studies, deal with the logic approaches in question-answering, are discussed in other languages than the Arabic language. This representation is very promising because it helps us later in the selection of a justifiable answer. The accuracy of questions that are correctly analyzed and translated into the logic form achieved 64%. And then, the results of passages of texts that are automatically generated achieved an 87% score for accuracy and a 98% score for c@1.  相似文献   

3.
基于网络的中文问答系统及信息抽取算法研究   总被引:24,自引:3,他引:21  
问答系统(Question Answering System)能用准确、简洁的答案回答用户用自然语言提出的问题。目前多数问答系统利用大规模文本作为抽取答案的知识库,而网络上丰富的资源为问答系统提供了另外一种良好的知识来源,对于回答简短、基于事实的问题非常有效。本文对基于网络的问答系统研究现状作了简要的介绍,分析了网络信息的特点。我们提出了一种基于语句相似度计算的答案抽取方法,在此基础上实现了一个基于网络的中文问答系统。该系统只利用网络搜索引擎返回结果中的摘要部分作为答案抽取的资源,从而节省了下载、分析网络源文本的时间。实验结果表明该系统对人名、数量及时间类型的问题效果显著,对测试问题集的MRR值达到0.51。  相似文献   

4.
目前,关于问答的大部分研究都是面向正式文本的问答对。然而,与以往研究不同的是,该文关注于社会媒体上存在的非正式文本问答对。非正式文本会存在问题文本里包含多个问题以及回答文本里包含多个回答的情况。针对该情况,我们提出了一个新的任务: 问答配对,即对问题文本的每个问题,从答案文本中找到和该问题相关的句子。首先,我们从产品问答网站上收集了大规模非正式文本问答对,并在此基础上创建了一个产品问答配对语料库。其次,为了解决非正式文本中存在的噪声问题,提出了一种基于注意力机制的上下文相关的问答配对方法。实验结果表明,该文提出的方法能有效地提升非正式文本的问答配对的性能。  相似文献   

5.
Community Question-Answering platforms are massive knowledge bases of questions and answers pairs produced by their members. In other to provide a vibrant service, they are compelled to provide answers to new posted questions as soon as possible. However, since their dynamic requires their own users to answer questions, there is an inherent delay between posting time and the arrival of good answers. In fact, many of these new questions might be already asked and satisfactorily answered in the past. Ergo, one of the pressing needs of these services is capitalizing on good answers given to related resolved questions across their large-scale knowledge base. To that end, current approaches have studied the effectiveness of human-generated web queries across search logs in fetching related questions and potential good answers from these community archives. However, this kind of strategy is not suitable for questions without click-through data, in particular those recently posted, limiting their capability of providing them with real-time answers.In this paper, we propose an approach to find related questions across the cQA knowledge base, which automatically generate effective search strings directly from question titles and bodies. In so doing, we automatically construct a massive corpus of related questions on top of the relationships yielded by their click-through graph, and generated candidate queries by inspecting dependency paths across the title and body of each question afterwards. Then, we utilize this corpus for automatically annotating the retrieval power of each of these candidates. With this labelled corpus, we study the effectiveness of several learning to rank models enriched with assorted linguistically-motivated properties. Thus deducing the linguistic structure of automatically generated search strings that are effective in finding related questions. Since these models are inferred solely from each question itself, they can be used when search log data (i.e., web queries) is unavailable.Overall, our experiments underline the effectiveness of our approach, in particular our outcomes indicate that named entity recognition is instrumental in structuring and recognizing 2–5 terms effective queries. Furthermore, we carry out experiments considering and ignoring question bodies, and we show that profiting only from question titles is more promising, but most effective queries are harder to detect. Conversely, adding question bodies makes the retrieval of past related questions noisier, but their content helps to generalize models capable of identifying more effective candidates.  相似文献   

6.
问答系统能够理解用户问题,并直接返回答案。现有问答系统大多是面向领域的,仅能回答特定领域的问题。文中提出了基于大规模知识库的开放领域问答系统实现方法。该系统首先采用自定义词典分词和CRF模型相结合的方法识别问句中的主体;其次,采用模糊匹配方法将问句中的主体与知识库中实体建立链接;然后,通过相似度计算以及规则匹配等多种方法识别问句中的谓词并与知识库实体的属性建立关联;最后,进行实体消歧和答案获取。该系统平均F-Measure值为0.695 6,表明所提方法在基于知识库的开放领域问答上具有可行性。  相似文献   

7.
基于自动问答系统的信息检索技术研究进展   总被引:2,自引:0,他引:2  
汤庸  林鹭贤  罗烨敏  潘炎 《计算机应用》2008,28(11):2745-2748
自动问答是根据用户以自然语言提出的问题给出一个明确的答案。近年来,自动问答越来越受到信息检索和自然语言处理的研究者的关注。典型的自动问答系统通常包含问题分析、文段检索和答案选择等部件。介绍了自动问答的最新研究进展和相关国际会议情况,着重阐述问题分类、查询扩展、文段检索和答案选择这四个热点技术的主要功能和常用方法,最后提出存在的一些问题和展望。  相似文献   

8.
吴勇 《计算机时代》2011,(2):11-12,16
利用论坛的问题解答资源作为数据源建立了网络论坛问答检索系统,系统涉及到数据采集、数据处理、答案抽取、索引排序、问题映射等问题.重点研究了决定系统性能的答案抽取技术.在进行答案抽取时,使用Ranking SVM对回帖文档进行排序,得到问题对应的所有回帖的一个全序排列,进而抽取序列的前几项得到最佳答案.  相似文献   

9.
关注社交群中的问答资源,提出面向社交群的问答对获取方法,主要包括问句识别和答案获取.分析了基于规则和深度学习及结合方法三种问句识别方法的特性;答案获取以深度学习模型为基础,将区分正反例回答同问题的相关度作为学习目标,对各个候选答案与问题的相关度打分排序.引入回答顺序和共现词特征对基础打分作调整进行二次打分排序.实验结果...  相似文献   

10.
一种基于结构化语料库的概念语义网络自动生成算法   总被引:4,自引:0,他引:4  
概念语义网络是为了解决信息检索中的词汇不匹配的问题而提出的,是提高检索效果的基本途径之一.以面向自然语言的网络答疑为应用背景,提出了一种基于半结构化语料库的概念语义网络自动生成算法.通过分析语料的组成特点,对不同的概念关系类型,采取不同的模板进行文档抽取,并设定不同的窗口单元计算概念间的相关度;然后经过阈值筛选和角色转换,获得各种类型的概念关系,在此基础上进行语义网络的优化调整.实验结果表明,本算法获得的概念语义网络可以有效地提高问题检索的效果.  相似文献   

11.
基于无监督学习的问答模式抽取技术   总被引:4,自引:0,他引:4  
本文提出了一种基于无监督学习算法的问答模式抽取技术从互联网上抽取应用于汉语问答系统的答案模式。该算法可以避免有监督学习算法的不足,它无需用户提供<提问,答案>对作为训练集,只需用户提供每种提问类型两个或以上的提问实例,算法即可通过Web检索、主题划分、模式提取、垂直聚类和水平聚类等步骤完成该类型提问的答案模式的学习。实验结果表明,论文提出的无监督问答模式学习方法是有效的,基于模式匹配的答案抽取技术能够较大幅度地提高汉语问答系统的性能。  相似文献   

12.
一种基于LDA的社区问答问句相似度计算方法   总被引:2,自引:0,他引:2  
传统的问答系统(QA)只是直接返回问题的答案,而且没有用户交互特性,而基于社区的问答系统(CQA),含有大量的“问答对”可以利用。该文提出了一种基于LDA的匹配框架来解决相似问句的匹配问题,分别从问句的统计信息、语义信息和主题信息三个方面来计算问句相似度,综合得到整体相似度。实验是在Yahoo! Answers上抽取的真实标注数据集上进行,最终的实验结果表明,该文的方法达到了很好的性能。  相似文献   

13.
We propose a semantic passage segmentation method for a Question Answering (QA) system. We define a semantic passage as sentences grouped by semantic coherence, determined by the topic assigned to individual sentences. Topic assignments are done by a sentence classifier based on a statistical classification technique, Maximum Entropy (ME), combined with multiple linguistic features. We ran experiments to evaluate the proposed method and its impact on application tasks, passage retrieval and template-filling for question answering. The experimental result shows that our semantic passage retrieval method using topic matching is more useful than fixed length passage retrieval. With the template-filling task used for information extraction in the QA system, the value of the sentence topic assignment method was reinforced.  相似文献   

14.
The traditional search engines return a large number of relative web pages rather than accurate answers. However, in a question answering system, users could use sentences in daily life to raise questions. The question answering system will analyze and comprehend these questions and return answers to users directly. Aiming at the problems in current network environment, such as low precision of question answering, imperfect expression of domain knowledge, low reuse rate and lack of reasonable theory reference models, we put forward the information integration method of semantic web based on pervasive agent ontology (SWPAO) method, which will integrate, analyze and process enormous web information and extract answers on the basis of semantics. With SWPAO method as the clue, we mainly study the method of concept extraction based on uniform semantic term mining, pervasive agent ontology construction method on account of multi-points and the answer extraction in view of semantic inference. Meanwhile, we present the structural model of the question answering system applying ontology, which adopts OWL language to describe domain knowledge base from where it infers and extracts answers by Jena inference engine, thus the precision of question answering in QA system could be improved. In the system testing, the precision has reached 86%, and recalling rate is 93%. The experiment indicates that this method is feasible and it has the significance of reference and value of further study for the question answering systems.  相似文献   

15.
Question-answering (QA) models find answers to a given question. The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets. In this paper, we deal with the QA pair matching approach in QA models, which finds the most relevant question and its recommended answer for a given question. Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies. In contrast, we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category. Due to the text classification model, we can effectively reduce the search space for finding the answers to a given question. Therefore, the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time. Furthermore, to improve the performance of finding similar sentences in each category, we present an ensemble embedding model for sentences, improving the performance compared to the individual embedding models. Using real-world QA data sets, we evaluate the performance of the proposed QA matching model. As a result, the accuracy of our final ensemble embedding model based on the text classification model is 81.18%, which outperforms the existing models by 9.81%∼14.16% point. Moreover, in terms of the model inference speed, our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model.  相似文献   

16.
Mining linguistic browsing patterns in the world wide web   总被引:2,自引:0,他引:2  
 World-wide-web applications have grown very rapidly and have made a significant impact on computer systems. Among them, web browsing for useful information may be most commonly seen. Due to its tremendous amounts of use, efficient and effective web retrieval has thus become a very important research topic in this field. Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for a certain purpose. In this paper, we use the data mining techniques to discover relevant browsing behavior from log data in web servers, thus being able to help make rules for retrieval of web pages. The browsing time of a customer on each web page is used to analyze the retrieval behavior. Since the data collected are numeric, fuzzy concepts are used to process them and to form linguistic terms. A sophisticated web-mining algorithm is thus proposed to find relevant browsing behavior from the linguistic data. Each page uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the pages. Computational time can thus be greatly reduced. The patterns mined out thus exhibit the browsing behavior and can be used to provide some appropriate suggestions to web-server managers.  相似文献   

17.
阅读理解(reading comprehension,RC)任务的目的在于理解一篇文档并对提出的问题返回答案句.提出了一种充分利用外部资源来提高RC系统性能的方法,使得RC系统性能在Remedia和ChungHwa两种语料上均得到提高.特别地,在对基于Remedia语料RC系统的性能分析表明,24.1%的性能提高归因于基于Web的答案模式匹配的运用,11.1%的性能提高归因于语言学特征匹配策略运用.同时也进行了t-test,结果表明答案模式匹配、语言学特征匹配和词汇语义关联推理的运用所得到的性能提高是显著的.  相似文献   

18.
The World Wide Web (WWW) today is so vast that it has become more and more difficult to find answers to questions using standard search engines. Current search engines can return ranked lists of documents, but they do not deliver direct answers to the user. The goal of Open Domain Question Answering (QA) systems is to take a natural language question, understand the meaning of the question, and present a short answer as a response based on a repository of information. In this paper we present QARAB, a QA system that combines techniques from Information Retrieval and Natural Language Processing. This combination enables domain independence. The system takes natural language questions expressed in the Arabic language and attempts to provide short answers in Arabic. To do so, it attempts to discover what the user wants by analyzing the question and a variety of candidate answers from a linguistic point of view.  相似文献   

19.
Traditional Chinese text retrieval systems return a ranked list of documents in response to a user‘s request. While a ranked list of documents may be an appropriate response for the user, frequently it is not.Usually it would be better for the system to provide the answer itself instead of requiring the user to search for the answer in a set of documents. Since Chinese text retrieval has just been developed lately, and due to various specific characteristics of Chinese language, the approaches to its retrieval are quite different from those studies and researches proposed to deal with Western language. Thus, an architecture that augments existing search engines is developed to support Chinese natural language question answering. In this paper a new approach to building Chinese question-answering system is described, which is the general-purpose, fully-automated Chinese question-answering system available on the web. In the approach, we attempt to represent Chinese text by its characteristics, and try to convert the Chinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the question through ERE relation model. The system performs quite well giving the simplicity of the techniques being utilized. Experimental results show that question-answering accuracy can be greatly improved by analyzing more and more matching ERE relation data lists. Simple ERE relation data extraction techniques work well in our system making it efficient to use with many backend retrieval engines.  相似文献   

20.
高考语文阅读理解问答相对普通阅读理解问答难度更大,问句抽象表述的理解需要更深层的语言分析技术,答案候选句抽取更注重与问句的关联分析,答案候选句排序更注重答案句之间的语义相关性。为此,该文提出借助框架语义匹配和框架语义关系抽取答案候选句,在排序时引入流形排序模型,通过答案句之间的框架语义相关度将排序分数进行传播,最终选取分数较高的Top-4作为答案句。在北京近12年高考语文阅读理解问答题上的准确率与召回率分别达到了53.65%与79.06%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号