首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Video question answering aims to pinpoint answers in response to user’s specified questions. However, most question answering technologies involve in integrating rich specific external knowledge such as syntactic parsers, which are often unavailable for many languages. In this paper, we present a new string pattern matching-based passage ranking algorithm for extending traditional text Q/A toward videoQ/A. Users interact with our videoQ/A system through natural language questions whereas our system returns three sentence-length passages with corresponding video clips as answers. We collect 45 GB Discovery videos and 253 Chinese questions for evaluation. The experimental results showed that our method outperformed six top-performed ranking models. It is 7.39% better than the second best method (language model-based) in relatively MRR score and 6.12% in precision rate. Besides, we also show that the use of a trained Chinese word segmentation tool did decrease the overall videoQ/A performance where most ranking algorithms dropped at least 10% in relatively MRR, precision, and answer pattern recall rates.  相似文献   

We have empirically compared two classes of technologies capable of locating potentially malevolent online content: 1) popular keyword searching, currently widely used by law enforcement and general public, and 2) emerging question answering (QA). The Google search engine exemplified the first approach. To exemplify the second, we further advanced the pattern based probabilistic QA approach and implemented a proof-of-concept prototype that was capable of finding web pages that provide the answers to the given questions, including non-factual ones (e.g. “How to build a pipe bomb?”). The answers to those question typically indicate the presence of malevolent content. Our findings suggest that QA technology can be a good addition to the traditional keyword searching for the task of locating malevolent online content and, possibly, for a more general task of interactive online information exploration.  相似文献   

基于自动问答系统的信息检索技术研究进展   总被引:2,自引:0,他引:2  
汤庸  林鹭贤  罗烨敏  潘炎 《计算机应用》2008,28(11):2745-2748
自动问答是根据用户以自然语言提出的问题给出一个明确的答案。近年来,自动问答越来越受到信息检索和自然语言处理的研究者的关注。典型的自动问答系统通常包含问题分析、文段检索和答案选择等部件。介绍了自动问答的最新研究进展和相关国际会议情况,着重阐述问题分类、查询扩展、文段检索和答案选择这四个热点技术的主要功能和常用方法,最后提出存在的一些问题和展望。  相似文献   

Regulations play an important role in assuring the quality of a building’s construction and minimizing its adverse environmental impacts. Engineers and the like need to retrieve regulatory information to ensure a building conforms to specified standards. Despite the availability of search engines and digital databases that can be used to store regulations, engineers, for example, are unable to retrieve information for domain-specific needs in a timely manner. As a consequence, users often have to deal with the burden of browsing and filtering information, which can be a time-consuming process. This research develops a robust end-to-end methodology to improve the efficiency and effectiveness of retrieving queries pertaining to building regulations. The developed methodology integrates information retrieval with a deep learning model of Natural Language Processing (NLP) to provide precise and rapid answers to user’s questions from a collection of building regulations. The methodology is evaluated and a prototype system to retrieve queries is developed. The paper’s contribution is therefore twofold as it develops a: (1) methodology that combines NLP and deep learning to be able to address queries raised about the building regulations; and (2) chatbot of question answering system, which we refer to as QAS4CQAR. Our proposed methodology has powerful feature representation and learning capability and therefore can potentially be adopted to building regulations in other jurisdictions.  相似文献   

The availability of large amounts of open, distributed, and structured semantic data on the web has no precedent in the history of computer science. In recent years, there have been important advances in semantic search and question answering over RDF data. In particular, natural language interfaces to online semantic data have the advantage that they can exploit the expressive power of Semantic Web data models and query languages, while at the same time hiding their complexity from the user. However, despite the increasing interest in this area, there are no evaluations so far that systematically evaluate this kind of systems, in contrast to traditional question answering and search interfaces to document spaces. To address this gap, we have set up a series of evaluation challenges for question answering over linked data. The main goal of the challenge was to get insight into the strengths, capabilities, and current shortcomings of question answering systems as interfaces to query linked data sources, as well as benchmarking how these interaction paradigms can deal with the fact that the amount of RDF data available on the web is very large and heterogeneous with respect to the vocabularies and schemas used. Here, we report on the results from the first and second of such evaluation campaigns. We also discuss how the second evaluation addressed some of the issues and limitations which arose from the first one, as well as the open issues to be addressed in future competitions.  相似文献   

The semantic web vision is one in which rich, ontology-based semantic markup will become widely available. The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input, and returns answers drawn from one or more knowledge bases (KBs). We say that AquaLog is portable because the configuration time required to customize the system for a particular ontology is negligible. AquaLog presents an elegant solution in which different strategies are combined together in a novel way. It makes use of the GATE NLP platform, string metric algorithms, WordNet and a novel ontology-based relation similarity service to make sense of user queries with respect to the target KB. Moreover it also includes a learning component, which ensures that the performance of the system improves over the time, in response to the particular community jargon used by end users.  相似文献   

The usage of computer applications in the construction industry is increasing, as is the complexity of software applications and this makes it difficult for project personnel to maintain familiarity. Furthermore, the causes of practical problems, such as project delays and cost over-runs, are often not derivable from the output of most software. A question answering system provides a means for directly extracting knowledge from this output. This paper begins with an examination of issues involved in building such a system. An emerging industry standard, ifcXML, is adopted as the knowledge representation format, thereby reducing the effort that is necessary to build a knowledge base. We then explore the mechanisms that use information in the knowledge base for question understanding. A prototype system has been built and tested to illustrate usefulness for project management applications.  相似文献   

Question Answering Systems (QAS) are receiving increasing attention from IS researchers, particularly those in the information retrieval and natural language processing communities. Evaluation of an IS's success and user satisfaction are important issues, especially for emerging online service systems using the Internet. Although many QAS have been implemented, little work has been done on the development of an evaluation model for them. Our purpose was to develop a validated instrument to measure user satisfaction with QAS (USQAS). The proposed validated instrument was intended as a reference for the design of QAS from a user's perspective.  相似文献   

We propose a semantic passage segmentation method for a Question Answering (QA) system. We define a semantic passage as sentences grouped by semantic coherence, determined by the topic assigned to individual sentences. Topic assignments are done by a sentence classifier based on a statistical classification technique, Maximum Entropy (ME), combined with multiple linguistic features. We ran experiments to evaluate the proposed method and its impact on application tasks, passage retrieval and template-filling for question answering. The experimental result shows that our semantic passage retrieval method using topic matching is more useful than fixed length passage retrieval. With the template-filling task used for information extraction in the QA system, the value of the sentence topic assignment method was reinforced.  相似文献   

王宇  王芳 《计算机应用研究》2020,37(6):1769-1773
社区问答系统中充斥着大量的噪声,给用户检索信息造成麻烦,以往的问句检索模型大多集中在词语层面。针对以上问题构建句子层面的问句检索模型。新模型基于概念层次网络(hierarchincal network of concept,HNC)理论当中的句类知识,从句子的语用、语法和语义三个层面计算问句间相似度。通过问句分类算法确定查询问句和候选问句的问句类别,得到问句间的语用相似度,利用句类表达式的结构和语义块组成分别计算问句间的语法及语义相似度。在真实数据集上的实验表明,基于HNC句类的新模型提高了问句检索结果的准确性。  相似文献   

张继燕  欧莹元 《软件》2013,34(5):155-156
本文从信息管理与信息系统的专业目标开始分析,确立《信息存储与检索》课程在该专业中的地位,然后阐述《信息存储与检索》课程的跨多学科的特点,分析当前大学的主要教材,选择最适合信息管理与信息系统专业的教材,针对所选教材阐述了该课程的教学内容及教学方式、方法。  相似文献   

The rise of the Social Web and advances in the Semantic Web provides unprecedented possibilities for the development of novel methods to enhance the information retrieval (IR) process by including varying degrees of semantics. We shed light on the corresponding notion of semantically-enhanced information retrieval by presenting state-of-the art techniques in related research areas. We describe techniques based on the main processes of a typical IR workflow and map them onto three main types of semantics, which vary from formal semantic knowledge representations and content-based semantics to social semantics emerging through usage and user interactions.  相似文献   

With the number of documents describing real-world events and event-oriented information needs rapidly growing on a daily basis, the need for efficient retrieval and concise presentation of event-related information is becoming apparent. Nonetheless, the majority of information retrieval and text summarization methods rely on shallow document representations that do not account for the semantics of events. In this article, we present event graphs, a novel event-based document representation model that filters and structures the information about events described in text. To construct the event graphs, we combine machine learning and rule-based models to extract sentence-level event mentions and determine the temporal relations between them. Building on event graphs, we present novel models for information retrieval and multi-document summarization. The information retrieval model measures the similarity between queries and documents by computing graph kernels over event graphs. The extractive multi-document summarization model selects sentences based on the relevance of the individual event mentions and the temporal structure of events. Experimental evaluation shows that our retrieval model significantly outperforms well-established retrieval models on event-oriented test collections, while the summarization model outperforms competitive models from shared multi-document summarization tasks.  相似文献   

We use information from the Web for performing our daily tasks more and more often. Locating the right resources that help us in doing so is a daunting task, especially with the present rate of growth of the Web as well as the many different kinds of resources available. The tasks of search engines is to assist us in finding those resources that are apt for our given tasks. In this paper we propose to use the notion of quality as a metric for estimating the aptness of online resources for individual searchers.The formal model for quality as presented in this paper is firmly grounded in literature. It is based on the observations that objects (dubbed artefacts in our work) can play different roles (i.e., perform different functions). An artefact can be of high quality in one role but of poor quality in another. Even more, the notion of quality is highly personal.Our quality-computations for estimating the aptness of resources for searches uses the notion of linguistic variables from the field of fuzzy logic. After presenting our model for quality we also show how manipulation of online resources by means of transformations can influence the quality of these resources.  相似文献   

《Computers in Industry》2014,65(6):937-951
Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very efficient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results could be to consider query expansions that aim at adding automatically or semi-automatically additional information in the query to improve the reliability and accuracy of the returned results. In this paper, we present a new approach for enhancing the relevancy of queries during a passage retrieval in log files. It is based on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of the requested information within a learning process. The second step is a new kind of pseudo relevance feedback. Based on a novel term weighting measure it aims at assigning a weight to terms according to their relatedness to queries. This measure, called TRQ (Term Relatedness to Query), is used to identify the most relevant expansion terms.The main advantage of our approach is that is can be applied both on log files and documents from general domains. Experiments conducted on real data from logs and documents show that our query expansion protocol enables retrieval of relevant passages.  相似文献   

微软小冰引发了问答系统的新一轮研究热潮。作为一种新型的信息检索方式,问答系统能直接以自然语言与用户进行人性化的交互。而基于Web的问答系统能通过搜索引擎获取开放的互联网上的各种相关信息,并将以自然语言形式表述的准确答案返回给用户,因此此类系统同时具有搜索引擎和问答系统的优点。首先,对基于Web的问答系统的研究背景与发展历史进行了概述;然后,详细介绍了基于Web的问答系统的架构及其问题分析、信息检索、答案抽取这三大关键技术的研究进展;在此基础上,分析了基于Web的问答系统所面临的问题;最后,对基于Web的问答系统的未来发展趋势进行了展望。  相似文献   

The architectural choices underlying Linked Data have led to a compendium of data sources which contain both duplicated and fragmented information on a large number of domains. One way to enable non-experts users to access this data compendium is to provide keyword search frameworks that can capitalize on the inherent characteristics of Linked Data. Developing such systems is challenging for three main reasons. First, resources across different datasets or even within the same dataset can be homonyms. Second, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain user query. Finally, constructing a federated formal query from keywords across different datasets requires exploiting links between the different datasets on both the schema and instance levels. We present Sina, a scalable keyword search system that can answer user queries by transforming user-supplied keywords or natural-languages queries into conjunctive SPARQL queries over a set of interlinked data sources. Sina uses a hidden Markov model to determine the most suitable resources for a user-supplied query from different datasets. Moreover, our framework is able to construct federated queries by using the disambiguated resources and leveraging the link structure underlying the datasets to query. We evaluate Sina over three different datasets. We can answer 25 queries from the QALD-1 correctly. Moreover, we perform as well as the best question answering system from the QALD-3 competition by answering 32 questions correctly while also being able to answer queries on distributed sources. We study the runtime of SINA in its mono-core and parallel implementations and draw preliminary conclusions on the scalability of keyword search on Linked Data.  相似文献   

预训练语言模型的发展极大地推动了机器阅读理解任务的进步.为了充分利用预训练语言模型中的浅层特征,并进一步提升问答模型预测答案的准确性,提出了一种基于BERT的三阶段式问答模型.首先,基于BERT设计了预回答、再回答及答案调整三个阶段;然后,在预回答阶段将BERT嵌入层的输入视作浅层特征来进行答案预生成;接着,在再回答阶...  相似文献   

A short text modeling method combining semantic and statistical information   总被引:2,自引:0,他引:2  
A novel modeling method for a collection of short text snippets is presented in this paper to measure the similarity between pairs of snippets. The method takes account of both the semantic and statistical information within the short text snippets, and consists of three steps. Given a set of raw short text snippets, it first establishes the initial similarity between words by using a lexical database. The method then iteratively calculates both word similarity and short text similarity. Finally, a proximity matrix is constructed based on word similarity and used to convert the raw text snippets into vectors. Word similarity and text clustering experiments show that the proposed short text modeling method improves the performance of existing text-related information retrieval (IR) techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号