首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
《现代自然语言生成》系统地总结了以神经网络为代表的现代自然语言生成技术,并由浅入深地介绍了自然语言生成的基本思想、模型、算法和框架。为了让读者更全面的理解自然语言生成技术,本书从基础模型、优化方法、生成方式、生成机制等方向对已有技术进行了归纳,同时也辅助讲解了常见的生成任务和评价方法。  相似文献   

2.
介绍了一种自动代码生成的方法.提出了以动词为中心,基于属性的语义处理方法理论.在此思想理论的指导下,建立了相应的知识库和语义处理规则库,并详细研究了受限自然语言语句中词语的语义处理过程.最后将受限自然语言理解应用自动代码生成中去,通过对已经规范化的受限汉语语句中的各个动词进行分类并赋予其属性概念,依据知识库和规则库,对受限语句进行语义分析,将之转换为中间语言,并结合可定制的模板方法,在程序生成引擎中自动生成代码.  相似文献   

3.
已有的基于文本生成的无约束型自然语言信息隐藏方法主要利用不同的文本生成模型在秘密信息的控制下实现隐写文本的生成,它们生成的隐写文本质量较好且嵌入容量高.但这些方法大都局限于生成短隐写文本,整体的文本质量和句间语义相关性会随着句子长度增加而急剧下降.与无约束型方法不同,已有的约束型自然语言信息隐藏方法能针对特定场景实现长文本生成任务下的信息隐藏,具有更高的语言隐蔽性和安全性.为提高约束型方法面对各类应用场景的普适性,本文提出了一种通用的序列到隐写序列模型框架,该框架包含语言编码器和隐写器两部分,能实现从一种约束信息序列到另一种隐写文本序列的变换.以摘要生成为例,本文以序列到隐写序列模型为基本框架,提出了一种新颖的约束型自然语言信息隐藏方法.该方法在语言编码器中引入注意力优化单元以提升特征学习性能,在隐写器中融合复制机制和新设计的基于多候选优化的自适应隐写编码方法,使得隐写器可以根据候选单词序列的概率分布情况和待嵌入的秘密信息自适应地选择不同的输出优化策略,通过输出多个候选序列以及仅在嵌入时刻选择合适位置嵌入信息的方式来提高隐写文本质量.实验结果表明,本文提出的方法能够通过优化语言编码器和...  相似文献   

4.
刘奇  马娆  俞凯 《计算机学报》2022,45(2):289-301
自然语言生成是目前非常重要且具有挑战性的一类人工智能任务.长短时记忆(Long Short-Term Memory,LSTM)语言模型是目前最为主流的自然语言生成模型.但是,LSTM语言模型的训练准则是词语级别的交叉熵,这会导致暴露偏差问题.此外,一般自然语言生成任务的评测指标是序列级别的BLEU分数或者词错误率,这与...  相似文献   

5.
图像标题生成与描述的任务是通过计算机将图像自动翻译成自然语言的形式重新表达出来,该研究在人类视觉辅助、智能人机环境开发等领域具有广阔的应用前景,同时也为图像检索、高层视觉语义推理和个性化描述等任务的研究提供支撑。图像数据具有高度非线性和繁杂性,而人类自然语言较为抽象且逻辑严谨,因此让计算机自动地对图像内容进行抽象和总结,具有很大的挑战性。本文对图像简单标题生成与描述任务进行了阐述,分析了基于手工特征的图像简单描述生成方法,并对包括基于全局视觉特征、视觉特征选择与优化以及面向优化策略等基于深度特征的图像简单描述生成方法进行了梳理与总结。针对图像的精细化描述任务,分析了当前主要的图像“密集描述”与结构化描述模型与方法。此外,本文还分析了融合情感信息与个性化表达的图像描述方法。在分析与总结的过程中,指出了当前各类图像标题生成与描述方法存在的不足,提出了下一步可能的研究趋势与解决思路。对该领域常用的MS COCO2014(Microsoft common objects in context)、Flickr30K等数据集进行了详细介绍,对图像简单描述、图像密集描述与段落描述和图像情感描述等代表性模型在数据集上的性能进行了对比分析。由于视觉数据的复杂性与自然语言的抽象性,尤其是融合情感与个性化表达的图像描述任务,在相关特征提取与表征、语义词汇的选择与嵌入、数据集构建及描述评价等方面尚存在大量问题亟待解决。  相似文献   

6.
自然语言生成技术及其应用实例   总被引:4,自引:0,他引:4  
自然语言生成是自然语言自理的两大领域之一。国外许多学者都在致力于NLG技术的研究。本文主要介绍有关自动生成器的实现方法。本文主要介绍有关文本自动生成器的实现方法。首先简单地阐述文本自动生成的三大主要任务,其次,具体描述四种常用的生成器实现技术及其优缺点。最后,文章谈到了一个具体实例--天气预报自动生成系统的实现模型。  相似文献   

7.
航天测控系统的可靠性分析关系到航天飞行任务能否顺利完成,并通过对不同测控方案的分析,发现薄弱环节,优化测控系统的项层设计.由于测控系统组成及其任务需求的复杂性,依靠人工完成对系统的描述和模型生成不仅效率低下,而且很难保证模型的准确性.基于可扩展标记语言(extensible markup language,XML),提出了测控资源元数据和测控任务的规范化描述方法,并由计算机辅助生成基于Markov的系统可靠性模型.提出了Markov模型的自动生成算法,通过算例证明了规范化描述方法和模型自动生成算法的有效性.  相似文献   

8.
自然语言生成(Natural Language Generation, NLG)任务是自然语言处理(Natural Languge Processing, NLP)任务中的一个子类,并且是一项具有挑战性的任务。随着深度学习在自然语言处理中的大量应用,其已经变成自然语言生成中处理各种任务的主要方法。自然语言生成任务中主要有问答任务、生成摘要任务、生成评论任务、机器翻译任务、生成式对话任务等。传统的生成模型依赖输入文本,基于有限的知识生成文本。为解决这个问题,引入了知识增强的方法。首先介绍了自然语言生成的研究背景和重要模型,然后针对自然语言处理归纳介绍了提高模型性能的方法,以及基于内部知识(如提取关键词增强生成、围绕主题词等)和外部知识(如借助外部知识图谱增强生成)集成到文本生成过程中的方法和架构。最后,通过分析生成任务面临的一些问题,讨论了未来的挑战和研究方向。  相似文献   

9.
自然语言生成(NLG)技术利用人工智能和语言学的方法来自动地生成可理解的自然语言文本。NLG降低了人类和计算机之间沟通的难度,被广泛应用于机器新闻写作、聊天机器人等领域,已经成为人工智能的研究热点之一。首先,列举了当前主流的NLG的方法和模型,并详细对比了这些方法和模型的优缺点;然后,分别针对文本到文本、数据到文本和图像到文本等三种NLG技术,总结并分析了应用领域、存在的问题和当前的研究进展;进而,阐述了上述生成技术的常用评价方法及其适用范围;最后,给出了当前NLG技术的发展趋势和研究难点。  相似文献   

10.
陈粤  孟晓风  边泽强 《计算机工程与设计》2007,28(20):4833-4835,4870
讨论了基于Frame结构的测试程序开发环境,该开发环境以自然语言描述测试需求,利用人工智能技术根据测试需求自动生成计算机语言代码.在实现途径上,给出了利用框架式结构描述测试需求的方法和利用语法自动分析原理分析测试流程、由推理机实现代码自动生成的方法,从而有效地提高了测试程序集的开发效率与开发质量.  相似文献   

11.
This paper presents a literature review in the field of summarizing software artifacts, focusing on bug reports, source code, mailing lists and developer discussions artifacts. From Jan. 2010 to Apr. 2016, numerous summarization techniques, approaches, and tools have been proposed to satisfy the ongoing demand of improving software performance and quality and facilitating developers in understanding the problems at hand. Since aforementioned artifacts contain both structured and unstructured data at the same time, researchers have applied different machine learning and data mining techniques to generate summaries. Therefore, this paper first intends to provide a general perspective on the state of the art, describing the type of artifacts, approaches for summarization, as well as the common portions of experimental procedures shared among these artifacts. Moreover, we discuss the applications of summarization, i.e., what tasks at hand have been achieved through summarization. Next, this paper presents tools that are generated for summarization tasks or employed during summarization tasks. In addition, we present different summarization evaluation methods employed in selected studies as well as other important factors that are used for the evaluation of generated summaries such as adequacy and quality. Moreover, we briefly present modern communication channels and complementarities with commonalities among different software artifacts. Finally, some thoughts about the challenges applicable to the existing studies in general as well as future research directions are also discussed. The survey of existing studies will allow future researchers to have a wide and useful background knowledge on the main and important aspects of this research field.  相似文献   

12.
随着互联网产生的文本数据越来越多,文本信息过载问题日益严重,对各类文本进行一个“降维”处理显得非常必要,文本摘要便是其中一个重要的手段,也是人工智能领域研究的热点和难点之一。文本摘要旨在将文本或文本集合转换为包含关键信息的简短摘要。近年来语言模型的预处理提高了许多自然语言处理任务的技术水平,包括情感分析、问答、自然语言推理、命名实体识别和文本相似性、文本摘要。本文梳理文本摘要以往的经典方法和近几年的基于预训练的文本摘要方法,并对文本摘要的数据集以及评价方法进行整理,最后总结文本摘要目前面临的挑战与发展趋势。  相似文献   

13.
As information is available in abundance for every topic on internet, condensing the important information in the form of summary would benefit a number of users. Hence, there is growing interest among the research community for developing new approaches to automatically summarize the text. Automatic text summarization system generates a summary, i.e. short length text that includes all the important information of the document. Since the advent of text summarization in 1950s, researchers have been trying to improve techniques for generating summaries so that machine generated summary matches with the human made summary. Summary can be generated through extractive as well as abstractive methods. Abstractive methods are highly complex as they need extensive natural language processing. Therefore, research community is focusing more on extractive summaries, trying to achieve more coherent and meaningful summaries. During a decade, several extractive approaches have been developed for automatic summary generation that implements a number of machine learning and optimization techniques. This paper presents a comprehensive survey of recent text summarization extractive approaches developed in the last decade. Their needs are identified and their advantages and disadvantages are listed in a comparative manner. A few abstractive and multilingual text summarization approaches are also covered. Summary evaluation is another challenging issue in this research field. Therefore, intrinsic as well as extrinsic both the methods of summary evaluation are described in detail along with text summarization evaluation conferences and workshops. Furthermore, evaluation results of extractive summarization approaches are presented on some shared DUC datasets. Finally this paper concludes with the discussion of useful future directions that can help researchers to identify areas where further research is needed.  相似文献   

14.
源代码的摘要可以帮助软件开发人员快速地理解代码,帮助维护人员更快地完成维护任务.但是,手工编写摘要代价高、效率低,因此人们试图利用计算机自动地为源代码生成摘要.近年来,基于神经网络的代码摘要技术成为自动源代码摘要研究的主流技术和软件工程领域的研究热点.首先阐述了代码摘要的概念和自动代码摘要的定义,回顾了自动代码摘要技术...  相似文献   

15.
Automatic text summarization is an essential tool in this era of information overloading. In this paper we present an automatic extractive Arabic text summarization system where the user can cap the size of the final summary. It is a direct system where no machine learning is involved. We use a two pass algorithm where in pass one, we produce a primary summary using Rhetorical Structure Theory (RST); this is followed by the second pass where we assign a score to each of the sentences in the primary summary. These scores will help us in generating the final summary. For the final output, sentences are selected with an objective of maximizing the overall score of the summary whose size should not exceed the user selected limit. We used Rouge to evaluate our system generated summaries of various lengths against those done by a (human) news editorial professional. Experiments on sample texts show our system to outperform some of the existing Arabic summarization systems including those that require machine learning.  相似文献   

16.
机器学习在汉语关联词语识别中的应用   总被引:2,自引:0,他引:2  
关联词语在一些汉语议论文章中占很大的比重,因而,对于此类汉语文章的分析,关联词可以起到非常重要的作用。本文主要讨论如何将机器学习应用于汉语关联词的歧义辨别——原因,方法和效果。我们在已经加工完毕的80篇汉语语料的基础上,抽取了用于机器学习的训练集和测试集,并使用C4.5进行了测试,识别正确率在80%以上。在文章的后面,我们还从语言学的角度对机器学习的结果进行了解释和分析。  相似文献   

17.
Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality.  相似文献   

18.
针对传统词向量在自动文本摘要过程中因无法对多义词进行有效表征而降低文本摘要准确度和可读性的问题,提出一种基于BERT(Bidirectional Encoder Representations from Transformers)的自动文本摘要模型构建方法。该方法引入BERT预训练语言模型用于增强词向量的语义表示,将生成的词向量输入Seq2Seq模型中进行训练并形成自动文本摘要模型,实现对文本摘要的快速生成。实验结果表明,该模型在Gigaword数据集上能有效地提高生成摘要的准确率和可读性,可用于文本摘要自动生成任务。  相似文献   

19.
Automatic summarization of texts is now crucial for several information retrieval tasks owing to the huge amount of information available in digital media, which has increased the demand for simple, language-independent extractive summarization strategies. In this paper, we employ concepts and metrics of complex networks to select sentences for an extractive summary. The graph or network representing one piece of text consists of nodes corresponding to sentences, while edges connect sentences that share common meaningful nouns. Because various metrics could be used, we developed a set of 14 summarizers, generically referred to as CN-Summ, employing network concepts such as node degree, length of shortest paths, d-rings and k-cores. An additional summarizer was created which selects the highest ranked sentences in the 14 systems, as in a voting system. When applied to a corpus of Brazilian Portuguese texts, some CN-Summ versions performed better than summarizers that do not employ deep linguistic knowledge, with results comparable to state-of-the-art summarizers based on expensive linguistic resources. The use of complex networks to represent texts appears therefore as suitable for automatic summarization, consistent with the belief that the metrics of such networks may capture important text features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号