共查询到20条相似文献,搜索用时 15 毫秒
1.
Given a question and its answer candidates (named QA corpus), answer selection is the task of identifying the most relevant answers to the question. Answer selection is widely used in question answering, web search, and so on. Current deep neural network models primarily utilize local features extracted from input question‐answer pairs (QA pairs). However, the global features contained in QA corpora are under‐utilized, and we argue that these global features substantially contribute to the answer selection task. To verify this point of view, we propose a novel model that combines local and global features for answer selection. In our model, two different global feature extractors are employed to extract statistical global features and deep global features from a QA corpus, respectively. Furthermore, we investigate the integration of these global features with local features in various experimental settings: statistical global features, deep global features, and a combination of statistical and deep global features. Our experimental results show that the global features are effective for answer selection. Our model obtains new state‐of‐the‐art results on two public answer selection datasets and performs especially well on YahooCQA, where it achieves 9.2 and 6% higher precision@1 (P@1) and mean reciprocal rank (MRR) scores than previously published models. 相似文献
2.
Wided Bakari Patrice Bellot Mahmoud Neji 《International Journal of Speech Technology》2017,20(2):339-353
With the expanding growth of Arabic electronic data on the web, extracting information, which is actually one of the major challenges of the question-answering, is essentially used for building corpus of documents. In fact, building a corpus is a research topic that is currently referred to among some other major themes of conferences, in natural language processing (NLP), such as, information retrieval (IR), question-answering (QA), automatic summary (AS), etc. Generally, a question-answering system provides various passages to answer the user questions. To make these passages truly informative, this system needs access to an underlying knowledge base; this requires the construction of a corpus. The aim of our research is to build an Arabic question-answering system. In addition, analyzing the question must be the first step. Next, it is essential to retrieve a passage from the web that can serve as an appropriate answer. In this paper, we propose a method to analysis the question and retrieve the passage answer in the Arabic language. For the question analysis, five factual question types are processed. Additionally, our purpose is to experiment with the generation of a logic representation from the declarative form of each question. Several studies, deal with the logic approaches in question-answering, are discussed in other languages than the Arabic language. This representation is very promising because it helps us later in the selection of a justifiable answer. The accuracy of questions that are correctly analyzed and translated into the logic form achieved 64%. And then, the results of passages of texts that are automatically generated achieved an 87% score for accuracy and a 98% score for c@1. 相似文献
3.
基于网络的中文问答系统及信息抽取算法研究 总被引:24,自引:3,他引:21
问答系统(Question Answering System)能用准确、简洁的答案回答用户用自然语言提出的问题。目前多数问答系统利用大规模文本作为抽取答案的知识库,而网络上丰富的资源为问答系统提供了另外一种良好的知识来源,对于回答简短、基于事实的问题非常有效。本文对基于网络的问答系统研究现状作了简要的介绍,分析了网络信息的特点。我们提出了一种基于语句相似度计算的答案抽取方法,在此基础上实现了一个基于网络的中文问答系统。该系统只利用网络搜索引擎返回结果中的摘要部分作为答案抽取的资源,从而节省了下载、分析网络源文本的时间。实验结果表明该系统对人名、数量及时间类型的问题效果显著,对测试问题集的MRR值达到0.51。 相似文献
4.
目前,关于问答的大部分研究都是面向正式文本的问答对。然而,与以往研究不同的是,该文关注于社会媒体上存在的非正式文本问答对。非正式文本会存在问题文本里包含多个问题以及回答文本里包含多个回答的情况。针对该情况,我们提出了一个新的任务: 问答配对,即对问题文本的每个问题,从答案文本中找到和该问题相关的句子。首先,我们从产品问答网站上收集了大规模非正式文本问答对,并在此基础上创建了一个产品问答配对语料库。其次,为了解决非正式文本中存在的噪声问题,提出了一种基于注意力机制的上下文相关的问答配对方法。实验结果表明,该文提出的方法能有效地提升非正式文本的问答配对的性能。 相似文献
5.
Community Question-Answering platforms are massive knowledge bases of questions and answers pairs produced by their members. In other to provide a vibrant service, they are compelled to provide answers to new posted questions as soon as possible. However, since their dynamic requires their own users to answer questions, there is an inherent delay between posting time and the arrival of good answers. In fact, many of these new questions might be already asked and satisfactorily answered in the past. Ergo, one of the pressing needs of these services is capitalizing on good answers given to related resolved questions across their large-scale knowledge base. To that end, current approaches have studied the effectiveness of human-generated web queries across search logs in fetching related questions and potential good answers from these community archives. However, this kind of strategy is not suitable for questions without click-through data, in particular those recently posted, limiting their capability of providing them with real-time answers.In this paper, we propose an approach to find related questions across the cQA knowledge base, which automatically generate effective search strings directly from question titles and bodies. In so doing, we automatically construct a massive corpus of related questions on top of the relationships yielded by their click-through graph, and generated candidate queries by inspecting dependency paths across the title and body of each question afterwards. Then, we utilize this corpus for automatically annotating the retrieval power of each of these candidates. With this labelled corpus, we study the effectiveness of several learning to rank models enriched with assorted linguistically-motivated properties. Thus deducing the linguistic structure of automatically generated search strings that are effective in finding related questions. Since these models are inferred solely from each question itself, they can be used when search log data (i.e., web queries) is unavailable.Overall, our experiments underline the effectiveness of our approach, in particular our outcomes indicate that named entity recognition is instrumental in structuring and recognizing 2–5 terms effective queries. Furthermore, we carry out experiments considering and ignoring question bodies, and we show that profiting only from question titles is more promising, but most effective queries are harder to detect. Conversely, adding question bodies makes the retrieval of past related questions noisier, but their content helps to generalize models capable of identifying more effective candidates. 相似文献
6.
问答系统能够理解用户问题,并直接返回答案。现有问答系统大多是面向领域的,仅能回答特定领域的问题。文中提出了基于大规模知识库的开放领域问答系统实现方法。该系统首先采用自定义词典分词和CRF模型相结合的方法识别问句中的主体;其次,采用模糊匹配方法将问句中的主体与知识库中实体建立链接;然后,通过相似度计算以及规则匹配等多种方法识别问句中的谓词并与知识库实体的属性建立关联;最后,进行实体消歧和答案获取。该系统平均F-Measure值为0.695 6,表明所提方法在基于知识库的开放领域问答上具有可行性。 相似文献
7.
8.
利用论坛的问题解答资源作为数据源建立了网络论坛问答检索系统,系统涉及到数据采集、数据处理、答案抽取、索引排序、问题映射等问题.重点研究了决定系统性能的答案抽取技术.在进行答案抽取时,使用Ranking SVM对回帖文档进行排序,得到问题对应的所有回帖的一个全序排列,进而抽取序列的前几项得到最佳答案. 相似文献
9.
关注社交群中的问答资源,提出面向社交群的问答对获取方法,主要包括问句识别和答案获取.分析了基于规则和深度学习及结合方法三种问句识别方法的特性;答案获取以深度学习模型为基础,将区分正反例回答同问题的相关度作为学习目标,对各个候选答案与问题的相关度打分排序.引入回答顺序和共现词特征对基础打分作调整进行二次打分排序.实验结果... 相似文献
10.
一种基于结构化语料库的概念语义网络自动生成算法 总被引:4,自引:0,他引:4
概念语义网络是为了解决信息检索中的词汇不匹配的问题而提出的,是提高检索效果的基本途径之一.以面向自然语言的网络答疑为应用背景,提出了一种基于半结构化语料库的概念语义网络自动生成算法.通过分析语料的组成特点,对不同的概念关系类型,采取不同的模板进行文档抽取,并设定不同的窗口单元计算概念间的相关度;然后经过阈值筛选和角色转换,获得各种类型的概念关系,在此基础上进行语义网络的优化调整.实验结果表明,本算法获得的概念语义网络可以有效地提高问题检索的效果. 相似文献
11.
12.
13.
We propose a semantic passage segmentation method for a Question Answering (QA) system. We define a semantic passage as sentences grouped by semantic coherence, determined by the topic assigned to individual sentences. Topic assignments are done by a sentence classifier based on a statistical classification technique, Maximum Entropy (ME), combined with multiple linguistic features. We ran experiments to evaluate the proposed method and its impact on application tasks, passage retrieval and template-filling for question answering. The experimental result shows that our semantic passage retrieval method using topic matching is more useful than fixed length passage retrieval. With the template-filling task used for information extraction in the QA system, the value of the sentence topic assignment method was reinforced. 相似文献
14.
The traditional search engines return a large number of relative web pages rather than accurate answers. However, in a question answering system, users could use sentences in daily life to raise questions. The question answering system will analyze and comprehend these questions and return answers to users directly. Aiming at the problems in current network environment, such as low precision of question answering, imperfect expression of domain knowledge, low reuse rate and lack of reasonable theory reference models, we put forward the information integration method of semantic web based on pervasive agent ontology (SWPAO) method, which will integrate, analyze and process enormous web information and extract answers on the basis of semantics. With SWPAO method as the clue, we mainly study the method of concept extraction based on uniform semantic term mining, pervasive agent ontology construction method on account of multi-points and the answer extraction in view of semantic inference. Meanwhile, we present the structural model of the question answering system applying ontology, which adopts OWL language to describe domain knowledge base from where it infers and extracts answers by Jena inference engine, thus the precision of question answering in QA system could be improved. In the system testing, the precision has reached 86%, and recalling rate is 93%. The experiment indicates that this method is feasible and it has the significance of reference and value of further study for the question answering systems. 相似文献
15.
Question-answering (QA) models find answers to a given question. The necessity of automatically finding answers is increasing because it is very important and challenging from the large-scale QA data sets. In this paper, we deal with the QA pair matching approach in QA models, which finds the most relevant question and its recommended answer for a given question. Existing studies for the approach performed on the entire dataset or datasets within a category that the question writer manually specifies. In contrast, we aim to automatically find the category to which the question belongs by employing the text classification model and to find the answer corresponding to the question within the category. Due to the text classification model, we can effectively reduce the search space for finding the answers to a given question. Therefore, the proposed model improves the accuracy of the QA matching model and significantly reduces the model inference time. Furthermore, to improve the performance of finding similar sentences in each category, we present an ensemble embedding model for sentences, improving the performance compared to the individual embedding models. Using real-world QA data sets, we evaluate the performance of the proposed QA matching model. As a result, the accuracy of our final ensemble embedding model based on the text classification model is 81.18%, which outperforms the existing models by 9.81%∼14.16% point. Moreover, in terms of the model inference speed, our model is faster than the existing models by 2.61∼5.07 times due to the effective reduction of search spaces by the text classification model. 相似文献
16.
Mining linguistic browsing patterns in the world wide web 总被引:2,自引:0,他引:2
Hong T.-P. Lin K.-Y. Wang S.-L. 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(5):329-336
World-wide-web applications have grown very rapidly and have made a significant impact on computer systems. Among them, web
browsing for useful information may be most commonly seen. Due to its tremendous amounts of use, efficient and effective web
retrieval has thus become a very important research topic in this field. Data mining is the process of extracting desirable
knowledge or interesting patterns from existing databases for a certain purpose. In this paper, we use the data mining techniques
to discover relevant browsing behavior from log data in web servers, thus being able to help make rules for retrieval of web
pages. The browsing time of a customer on each web page is used to analyze the retrieval behavior. Since the data collected
are numeric, fuzzy concepts are used to process them and to form linguistic terms. A sophisticated web-mining algorithm is
thus proposed to find relevant browsing behavior from the linguistic data. Each page uses only the linguistic term with the
maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number
of the pages. Computational time can thus be greatly reduced. The patterns mined out thus exhibit the browsing behavior and
can be used to provide some appropriate suggestions to web-server managers. 相似文献
17.
阅读理解(reading comprehension,RC)任务的目的在于理解一篇文档并对提出的问题返回答案句.提出了一种充分利用外部资源来提高RC系统性能的方法,使得RC系统性能在Remedia和ChungHwa两种语料上均得到提高.特别地,在对基于Remedia语料RC系统的性能分析表明,24.1%的性能提高归因于基于Web的答案模式匹配的运用,11.1%的性能提高归因于语言学特征匹配策略运用.同时也进行了t-test,结果表明答案模式匹配、语言学特征匹配和词汇语义关联推理的运用所得到的性能提高是显著的. 相似文献
18.
Bassam Hammo Saleem Abuleil Steven Lytinen Martha Evens 《Computers and the Humanities》2004,38(4):397-415
The World Wide Web (WWW) today is so vast that it has become more and more difficult to find answers to questions using standard search engines. Current search engines can return ranked lists of documents, but they do not deliver direct answers to the user. The goal of Open Domain Question Answering (QA) systems is to take a natural language question, understand the meaning of the question, and present a short answer as a response based on a repository of information. In this paper we present QARAB, a QA system that combines techniques from Information Retrieval and Natural Language Processing. This combination enables domain independence. The system takes natural language questions expressed in the Arabic language and attempts to provide short answers in Arabic. To do so, it attempts to discover what the user wants by analyzing the question and a variety of candidate answers from a linguistic point of view. 相似文献
19.
Gai-TaiHuang Hsiu-HsenYao 《计算机科学技术学报》2004,19(4):0-0
Traditional Chinese text retrieval systems return a ranked list of documents in response to a user‘s request. While a ranked list of documents may be an appropriate response for the user, frequently it is not.Usually it would be better for the system to provide the answer itself instead of requiring the user to search for the answer in a set of documents. Since Chinese text retrieval has just been developed lately, and due to various specific characteristics of Chinese language, the approaches to its retrieval are quite different from those studies and researches proposed to deal with Western language. Thus, an architecture that augments existing search engines is developed to support Chinese natural language question answering. In this paper a new approach to building Chinese question-answering system is described, which is the general-purpose, fully-automated Chinese question-answering system available on the web. In the approach, we attempt to represent Chinese text by its characteristics, and try to convert the Chinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the question through ERE relation model. The system performs quite well giving the simplicity of the techniques being utilized. Experimental results show that question-answering accuracy can be greatly improved by analyzing more and more matching ERE relation data lists. Simple ERE relation data extraction techniques work well in our system making it efficient to use with many backend retrieval engines. 相似文献
20.