首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
自然语言生成(NLG)技术利用人工智能和语言学的方法来自动地生成可理解的自然语言文本。NLG降低了人类和计算机之间沟通的难度,被广泛应用于机器新闻写作、聊天机器人等领域,已经成为人工智能的研究热点之一。首先,列举了当前主流的NLG的方法和模型,并详细对比了这些方法和模型的优缺点;然后,分别针对文本到文本、数据到文本和图像到文本等三种NLG技术,总结并分析了应用领域、存在的问题和当前的研究进展;进而,阐述了上述生成技术的常用评价方法及其适用范围;最后,给出了当前NLG技术的发展趋势和研究难点。  相似文献   

2.
图像标题生成与描述的任务是通过计算机将图像自动翻译成自然语言的形式重新表达出来,该研究在人类视觉辅助、智能人机环境开发等领域具有广阔的应用前景,同时也为图像检索、高层视觉语义推理和个性化描述等任务的研究提供支撑。图像数据具有高度非线性和繁杂性,而人类自然语言较为抽象且逻辑严谨,因此让计算机自动地对图像内容进行抽象和总结,具有很大的挑战性。本文对图像简单标题生成与描述任务进行了阐述,分析了基于手工特征的图像简单描述生成方法,并对包括基于全局视觉特征、视觉特征选择与优化以及面向优化策略等基于深度特征的图像简单描述生成方法进行了梳理与总结。针对图像的精细化描述任务,分析了当前主要的图像“密集描述”与结构化描述模型与方法。此外,本文还分析了融合情感信息与个性化表达的图像描述方法。在分析与总结的过程中,指出了当前各类图像标题生成与描述方法存在的不足,提出了下一步可能的研究趋势与解决思路。对该领域常用的MS COCO2014(Microsoft common objects in context)、Flickr30K等数据集进行了详细介绍,对图像简单描述、图像密集描述与段落描述和图像情感描述等代表性模型在数据集上的性能进行了对比分析。由于视觉数据的复杂性与自然语言的抽象性,尤其是融合情感与个性化表达的图像描述任务,在相关特征提取与表征、语义词汇的选择与嵌入、数据集构建及描述评价等方面尚存在大量问题亟待解决。  相似文献   

3.
《现代自然语言生成》系统地总结了以神经网络为代表的现代自然语言生成技术,并由浅入深地介绍了自然语言生成的基本思想、模型、算法和框架.为了让读者更全面的理解自然语言生成技术,本书从基础模型、优化方法、生成方式、生成机制等方向对已有技术进行了归纳,同时也辅助讲解了常见的生成任务和评价方法.本书既包括现代自然语言生成的基本知...  相似文献   

4.
自然语言生成(Natural Language Generation, NLG)任务是自然语言处理(Natural Languge Processing, NLP)任务中的一个子类,并且是一项具有挑战性的任务。随着深度学习在自然语言处理中的大量应用,其已经变成自然语言生成中处理各种任务的主要方法。自然语言生成任务中主要有问答任务、生成摘要任务、生成评论任务、机器翻译任务、生成式对话任务等。传统的生成模型依赖输入文本,基于有限的知识生成文本。为解决这个问题,引入了知识增强的方法。首先介绍了自然语言生成的研究背景和重要模型,然后针对自然语言处理归纳介绍了提高模型性能的方法,以及基于内部知识(如提取关键词增强生成、围绕主题词等)和外部知识(如借助外部知识图谱增强生成)集成到文本生成过程中的方法和架构。最后,通过分析生成任务面临的一些问题,讨论了未来的挑战和研究方向。  相似文献   

5.
平面几何领域规则生成主要是对领域内的内在联系进行提取或进行问题求解.关键规则目前主要依赖领域专家的编写,不具有扩展性和可持续性.通过对自然语言描述的平面几何定理的分析,构建其对应的对象和关系模型,提出了一种自动提取和生成几何关系模型对应规则的方法,以平面几何定理的机器证明为例,验证了此方法的可行性.改进方法还可进一步扩展至其它领域规则的自动生成.  相似文献   

6.
自然语言生成技术及其应用实例   总被引:4,自引:0,他引:4  
自然语言生成是自然语言自理的两大领域之一。国外许多学者都在致力于NLG技术的研究。本文主要介绍有关自动生成器的实现方法。本文主要介绍有关文本自动生成器的实现方法。首先简单地阐述文本自动生成的三大主要任务,其次,具体描述四种常用的生成器实现技术及其优缺点。最后,文章谈到了一个具体实例--天气预报自动生成系统的实现模型。  相似文献   

7.
问题生成是指机器主动对一段文本进行提问,生成一个自然语言的问题.神经问题生成则是完全采用端到端的训练方式,使用神经网络完成文档和答案到问题的转换,是自然语言处理中一个新兴而又重要的研究方向.文中首先对神经问题生成进行了简单介绍,包括基本概念、主流框架和评价方法.接着介绍了该研究方向的关键问题,包括输入建模、长文本处理、...  相似文献   

8.
自然语言是人类智慧和文明的结晶,它是人与人自然交流的一种重要载体,让机器理解人类的语言被认为是人工智能领域皇冠上的一颗明珠。利用先进的深度学习、自然语言理解、自然语言生成等技术,让机器为人类提供文本自动审核、内容纠错、实体搜索、智能推荐、文章编写等有价值的服务,让机器代替人工完成一些重复性的工作。搭建人类和机器之间沟通的桥梁,同时大幅提高企业的管理效率是自然语言处理的目标之一。  相似文献   

9.
《计算机工程》2018,(4):231-235
机器理解藏文语句存在灵活性差和复杂性高的问题。为此,针对藏文相同语义句子的不同表达方式,设计复述句自动生成方法。通过对藏文句型结构、句子内部组块进行分析,利用全排列递归算法生成复述句。实验结果显示,与其他语言复述生成方法不同,该方法根据藏文句子中组块数量的不同,通过一个句子可以生成一个或多个,甚至上千个句义相同的复述句并且准确率达到93.4%,可应用于藏汉机器翻译、机器翻译评测和藏文问答系统等领域。  相似文献   

10.
议论文自动生成是自然语言生成中一项极具挑战性的任务,与诗歌、故事等生成任务不同,所生成的文章需要句子语义明确、论证结构清晰并合理地表达出核心论点。上述特点使得现有的预训练模型难以准确地建模并自动生成,因此传统的检索式方法成为解决该问题的主要方式。但前人方法在句子检索和排序过程中只考虑了语义相关度,忽视了对逻辑论证关系的判别,导致语义不连贯、论证逻辑倒置等问题。针对上述问题,该文将自然语言推理应用于论证关系逻辑判别任务,提出了基于显式语义结构的论证关系逻辑判别方法,新模型在论证判别数据集上取得优于以往自然语言推理模型的效果。同时将论文判别结果作为显式特征应用于议论文句子排序模型,在议论文生成数据集中有效改善了排序模型的逻辑不一致问题并进一步提升了议论文生成系统的总体性能。  相似文献   

11.
This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous summarization techniques, approaches, and tools have been proposed to satisfy the ongoing demand of improving software performance and quality and facilitating developers in understanding the problems at hand. Since aforementioned artifacts contain both structured and unstructured data at the same time, researchers have applied different machine learning and data mining techniques to generate summaries. Therefore, this paper first intends to provide a general perspective on the state of the art, describing the type of artifacts, approaches for summarization, as well as the common portions of experimental procedures shared among these artifacts. Moreover, we discuss the applications of summarization, i.e., what tasks at hand have been achieved through summarization. Next, this paper presents tools that are generated for summarization tasks or employed during summarization tasks. In addition, we present different summarization evaluation methods employed in selected studies as well as other important factors that are used for the evaluation of generated summaries such as adequacy and quality. Moreover, we briefly present modern communication channels and complementarities with commonalities among different software artifacts. Finally, some thoughts about the challenges applicable to the existing studies in general as well as future research directions are also discussed. The survey of existing studies will allow future researchers to have a wide and useful background knowledge on the main and important aspects of this research field.  相似文献   

12.
随着互联网产生的文本数据越来越多,文本信息过载问题日益严重,对各类文本进行一个“降维”处理显得非常必要,文本摘要便是其中一个重要的手段,也是人工智能领域研究的热点和难点之一。文本摘要旨在将文本或文本集合转换为包含关键信息的简短摘要。近年来语言模型的预处理提高了许多自然语言处理任务的技术水平,包括情感分析、问答、自然语言推理、命名实体识别和文本相似性、文本摘要。本文梳理文本摘要以往的经典方法和近几年的基于预训练的文本摘要方法,并对文本摘要的数据集以及评价方法进行整理,最后总结文本摘要目前面临的挑战与发展趋势。  相似文献   

13.
As information is available in abundance for every topic on internet, condensing the important information in the form of summary would benefit a number of users. Hence, there is growing interest among the research community for developing new approaches to automatically summarize the text. Automatic text summarization system generates a summary, i.e. short length text that includes all the important information of the document. Since the advent of text summarization in 1950s, researchers have been trying to improve techniques for generating summaries so that machine generated summary matches with the human made summary. Summary can be generated through extractive as well as abstractive methods. Abstractive methods are highly complex as they need extensive natural language processing. Therefore, research community is focusing more on extractive summaries, trying to achieve more coherent and meaningful summaries. During a decade, several extractive approaches have been developed for automatic summary generation that implements a number of machine learning and optimization techniques. This paper presents a comprehensive survey of recent text summarization extractive approaches developed in the last decade. Their needs are identified and their advantages and disadvantages are listed in a comparative manner. A few abstractive and multilingual text summarization approaches are also covered. Summary evaluation is another challenging issue in this research field. Therefore, intrinsic as well as extrinsic both the methods of summary evaluation are described in detail along with text summarization evaluation conferences and workshops. Furthermore, evaluation results of extractive summarization approaches are presented on some shared DUC datasets. Finally this paper concludes with the discussion of useful future directions that can help researchers to identify areas where further research is needed.  相似文献   

14.
源代码的摘要可以帮助软件开发人员快速地理解代码,帮助维护人员更快地完成维护任务.但是,手工编写摘要代价高、效率低,因此人们试图利用计算机自动地为源代码生成摘要.近年来,基于神经网络的代码摘要技术成为自动源代码摘要研究的主流技术和软件工程领域的研究热点.首先阐述了代码摘要的概念和自动代码摘要的定义,回顾了自动代码摘要技术...  相似文献   

15.
Automatic text summarization is an essential tool in this era of information overloading. In this paper we present an automatic extractive Arabic text summarization system where the user can cap the size of the final summary. It is a direct system where no machine learning is involved. We use a two pass algorithm where in pass one, we produce a primary summary using Rhetorical Structure Theory (RST); this is followed by the second pass where we assign a score to each of the sentences in the primary summary. These scores will help us in generating the final summary. For the final output, sentences are selected with an objective of maximizing the overall score of the summary whose size should not exceed the user selected limit. We used Rouge to evaluate our system generated summaries of various lengths against those done by a (human) news editorial professional. Experiments on sample texts show our system to outperform some of the existing Arabic summarization systems including those that require machine learning.  相似文献   

16.
机器学习在汉语关联词语识别中的应用   总被引:2,自引:0,他引:2  
关联词语在一些汉语议论文章中占很大的比重,因而,对于此类汉语文章的分析,关联词可以起到非常重要的作用。本文主要讨论如何将机器学习应用于汉语关联词的歧义辨别——原因,方法和效果。我们在已经加工完毕的80篇汉语语料的基础上,抽取了用于机器学习的训练集和测试集,并使用C4.5进行了测试,识别正确率在80%以上。在文章的后面,我们还从语言学的角度对机器学习的结果进行了解释和分析。  相似文献   

17.
Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality.  相似文献   

18.
针对传统词向量在自动文本摘要过程中因无法对多义词进行有效表征而降低文本摘要准确度和可读性的问题,提出一种基于BERT(Bidirectional Encoder Representations from Transformers)的自动文本摘要模型构建方法。该方法引入BERT预训练语言模型用于增强词向量的语义表示,将生成的词向量输入Seq2Seq模型中进行训练并形成自动文本摘要模型,实现对文本摘要的快速生成。实验结果表明,该模型在Gigaword数据集上能有效地提高生成摘要的准确率和可读性,可用于文本摘要自动生成任务。  相似文献   

19.
Automatic summarization of texts is now crucial for several information retrieval tasks owing to the huge amount of information available in digital media, which has increased the demand for simple, language-independent extractive summarization strategies. In this paper, we employ concepts and metrics of complex networks to select sentences for an extractive summary. The graph or network representing one piece of text consists of nodes corresponding to sentences, while edges connect sentences that share common meaningful nouns. Because various metrics could be used, we developed a set of 14 summarizers, generically referred to as CN-Summ, employing network concepts such as node degree, length of shortest paths, d-rings and k-cores. An additional summarizer was created which selects the highest ranked sentences in the 14 systems, as in a voting system. When applied to a corpus of Brazilian Portuguese texts, some CN-Summ versions performed better than summarizers that do not employ deep linguistic knowledge, with results comparable to state-of-the-art summarizers based on expensive linguistic resources. The use of complex networks to represent texts appears therefore as suitable for automatic summarization, consistent with the belief that the metrics of such networks may capture important text features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号