首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The goal of abstractive summarization of multi-documents is to automatically produce a condensed version of the document text and maintain the significant information. Most of the graph-based extractive methods represent sentence as bag of words and utilize content similarity measure, which might fail to detect semantically equivalent redundant sentences. On other hand, graph based abstractive method depends on domain expert to build a semantic graph from manually created ontology, which requires time and effort. This work presents a semantic graph approach with improved ranking algorithm for abstractive summarization of multi-documents. The semantic graph is built from the source documents in a manner that the graph nodes denote the predicate argument structures (PASs)—the semantic structure of sentence, which is automatically identified by using semantic role labeling; while graph edges represent similarity weight, which is computed from PASs semantic similarity. In order to reflect the impact of both document and document set on PASs, the edge of semantic graph is further augmented with PAS-to-document and PAS-to-document set relationships. The important graph nodes (PASs) are ranked using the improved graph ranking algorithm. The redundant PASs are reduced by using maximal marginal relevance for re-ranking the PASs and finally summary sentences are generated from the top ranked PASs using language generation. Experiment of this research is accomplished using DUC-2002, a standard dataset for document summarization. Experimental findings signify that the proposed approach shows superior performance than other summarization approaches.  相似文献   

2.
We propose a framework for abstractive summarization of multi-documents, which aims to select contents of summary not from the source document sentences but from the semantic representation of the source documents. In this framework, contents of the source documents are represented by predicate argument structures by employing semantic role labeling. Content selection for summary is made by ranking the predicate argument structures based on optimized features, and using language generation for generating sentences from predicate argument structures. Our proposed framework differs from other abstractive summarization approaches in a few aspects. First, it employs semantic role labeling for semantic representation of text. Secondly, it analyzes the source text semantically by utilizing semantic similarity measure in order to cluster semantically similar predicate argument structures across the text; and finally it ranks the predicate argument structures based on features weighted by genetic algorithm (GA). Experiment of this study is carried out using DUC-2002, a standard corpus for text summarization. Results indicate that the proposed approach performs better than other summarization systems.  相似文献   

3.
主题关键词信息融合的中文生成式自动摘要研究   总被引:2,自引:0,他引:2  
随着大数据和人工智能技术的迅猛发展,传统自动文摘研究正朝着从抽取式摘要到生成式摘要的方向演化,从中达到生成更高质量的自然流畅的文摘的目的.近年来,深度学习技术逐渐被应用于生成式摘要研究中,其中基于注意力机制的序列到序列模型已成为应用最广泛的模型之一,尤其在句子级摘要生成任务(如新闻标题生成、句子压缩等)中取得了显著的效果.然而,现有基于神经网络的生成式摘要模型绝大多数将注意力均匀分配到文本的所有内容中,而对其中蕴含的重要主题信息并没有细致区分.鉴于此,本文提出了一种新的融入主题关键词信息的多注意力序列到序列模型,通过联合注意力机制将文本中主题下重要的一些关键词语的信息与文本语义信息综合起来实现对摘要的引导生成.在NLPCC 2017的中文单文档摘要评测数据集上的实验结果验证了所提方法的有效性和先进性.  相似文献   

4.
自动文本摘要技术旨在凝练给定文本,以篇幅较短的摘要有效反映出原文核心内容.现阶段,生成型文本摘要技术因能够以更加灵活丰富的词汇对原文进行转述,已成为文本摘要领域的研究热点.然而,现有生成型文本摘要模型在产生摘要语句时涉及对原有词汇的重组与新词的添加,易造成摘要语句不连贯、可读性低.此外,通过传统基于已标注数据的有监督训...  相似文献   

5.
Text summarization is the process of automatically creating a shorter version of one or more text documents. It is an important way of finding relevant information in large text libraries or in the Internet. Essentially, text summarization techniques are classified as Extractive and Abstractive. Extractive techniques perform text summarization by selecting sentences of documents according to some criteria. Abstractive summaries attempt to improve the coherence among sentences by eliminating redundancies and clarifying the contest of sentences. In terms of extractive summarization, sentence scoring is the technique most used for extractive text summarization. This paper describes and performs a quantitative and qualitative assessment of 15 algorithms for sentence scoring available in the literature. Three different datasets (News, Blogs and Article contexts) were evaluated. In addition, directions to improve the sentence extraction results obtained are suggested.  相似文献   

6.
庞超  尹传环 《计算机科学》2018,45(1):144-147, 178
自动文本摘要是自然语言处理领域中一项重要的研究内容,根据实现方式的不同其分为摘录式和理解式,其中理解式文摘是基于不同的形式对原始文档的中心内容和概念的重新表示,生成的文摘中的词语无需与原始文档相同。提出了一种基于分类的理解式文摘模型。该模型将基于递归神经网络的编码-解码结构与分类结构相结合,并充分利用监督信息,从而获得更多的摘要特性;通过在编码-解码结构中使用注意力机制,模型能更精确地获取原文的中心内容。模型的两部分可以同时在大数据集下进行训练优化,训练过程简单且有效。所提模型表现出了优异的自动摘要性能。  相似文献   

7.
陈伟  杨燕 《计算机应用》2021,41(12):3527-3533
作为自然语言处理中的热点问题,摘要生成具有重要的研究意义。基于Seq2Seq模型的生成式摘要模型取得了良好的效果,然而抽取式的方法具有挖掘有效特征并抽取文章重要句子的潜力,因此如何利用抽取式方法来改进生成式方法是一个较好的研究方向。鉴于此,提出了融合生成式和抽取式方法的模型。首先,使用TextRank算法并融合主题相似度来抽取文章中有重要意义的句子。然后,设计了融合抽取信息语义的基于Seq2Seq模型的生成式框架来实现摘要生成任务;同时,引入指针网络解决模型训练中的未登录词(OOV)问题。综合以上步骤得到最终摘要,并在CNN/Daily Mail数据集上进行验证。结果表明在ROUGE-1、ROUGE-2和ROUGE-L三个指标上所提模型比传统TextRank算法均有所提升,同时也验证了融合抽取式和生成式方法在摘要生成领域中的有效性。  相似文献   

8.
Automatic summarization of texts is now crucial for several information retrieval tasks owing to the huge amount of information available in digital media, which has increased the demand for simple, language-independent extractive summarization strategies. In this paper, we employ concepts and metrics of complex networks to select sentences for an extractive summary. The graph or network representing one piece of text consists of nodes corresponding to sentences, while edges connect sentences that share common meaningful nouns. Because various metrics could be used, we developed a set of 14 summarizers, generically referred to as CN-Summ, employing network concepts such as node degree, length of shortest paths, d-rings and k-cores. An additional summarizer was created which selects the highest ranked sentences in the 14 systems, as in a voting system. When applied to a corpus of Brazilian Portuguese texts, some CN-Summ versions performed better than summarizers that do not employ deep linguistic knowledge, with results comparable to state-of-the-art summarizers based on expensive linguistic resources. The use of complex networks to represent texts appears therefore as suitable for automatic summarization, consistent with the belief that the metrics of such networks may capture important text features.  相似文献   

9.
With the continuous growth of online news articles, there arises the necessity for an efficient abstractive summarization technique for the problem of information overloading. Abstractive summarization is highly complex and requires a deeper understanding and proper reasoning to come up with its own summary outline. Abstractive summarization task is framed as seq2seq modeling. Existing seq2seq methods perform better on short sequences; however, for long sequences, the performance degrades due to high computation and hence a two-phase self-normalized deep neural document summarization model consisting of improvised extractive cosine normalization and seq2seq abstractive phases has been proposed in this paper. The novelty is to parallelize the sequence computation training by incorporating feed-forward, the self-normalized neural network in the Extractive phase using Intra Cosine Attention Similarity (Ext-ICAS) with sentence dependency position. Also, it does not require any normalization technique explicitly. Our proposed abstractive Bidirectional Long Short Term Memory (Bi-LSTM) encoder sequence model performs better than the Bidirectional Gated Recurrent Unit (Bi-GRU) encoder with minimum training loss and with fast convergence. The proposed model was evaluated on the Cable News Network (CNN)/Daily Mail dataset and an average rouge score of 0.435 was achieved also computational training in the extractive phase was reduced by 59% with an average number of similarity computations.  相似文献   

10.
提出的摘要方法,以句子为基本抽取单位,以兴趣主题词为句子的加权特征。对句子基于潜语义聚类,提出语义结构,这种结构对摘要质量的提高有重要作用,并且提出了较为客观和有效的摘要评价方法。实验表明,本文方法是行之有效的。  相似文献   

11.
抽取式方法从源文本中抽取句子,会造成信息冗余;生成式方法可以生成非源文词,会产生语法问题,自然性差。BERT作为一种双向Transformer模型,在自然语言理解任务上展现了优异的性能,但在文本生成任务的应用有待探索。针对以上问题,提出一种基于预训练的三阶段复合式文本摘要模型(TSPT),结合抽取式方法和生成式方法,将源本文经过预训练产生的双向上下文信息词向量由sigmoid函数获取句子得分抽取关键句,在摘要生成阶段将关键句作为完形填空任务重写,生成最终摘要。实验结果表明,该模型在CNN/Daily Mail数据集中取得了良好效果。  相似文献   

12.
针对自然语言处理(NLP)生成式自动摘要领域的语义理解不充分、摘要语句不通顺和摘要准确度不够高的问题,提出了一种新的生成式自动摘要解决方案,包括一种改进的词向量生成技术和一个生成式自动摘要模型。改进的词向量生成技术以Skip-Gram方法生成的词向量为基础,结合摘要的特点,引入词性、词频和逆文本频率三个词特征,有效地提高了词语的理解;而提出的Bi-MulRnn+生成式自动摘要模型以序列映射(seq2seq)与自编码器结构为基础,引入注意力机制、门控循环单元(GRU)结构、双向循环神经网络(BiRnn)、多层循环神经网络(MultiRnn)和集束搜索,提高了生成式摘要准确性与语句流畅度。基于大规模中文短文本摘要(LCSTS)数据集的实验结果表明,该方案能够有效地解决短文本生成式摘要问题,并在Rouge标准评价体系中表现良好,提高了摘要准确性与语句流畅度。  相似文献   

13.
As information is available in abundance for every topic on internet, condensing the important information in the form of summary would benefit a number of users. Hence, there is growing interest among the research community for developing new approaches to automatically summarize the text. Automatic text summarization system generates a summary, i.e. short length text that includes all the important information of the document. Since the advent of text summarization in 1950s, researchers have been trying to improve techniques for generating summaries so that machine generated summary matches with the human made summary. Summary can be generated through extractive as well as abstractive methods. Abstractive methods are highly complex as they need extensive natural language processing. Therefore, research community is focusing more on extractive summaries, trying to achieve more coherent and meaningful summaries. During a decade, several extractive approaches have been developed for automatic summary generation that implements a number of machine learning and optimization techniques. This paper presents a comprehensive survey of recent text summarization extractive approaches developed in the last decade. Their needs are identified and their advantages and disadvantages are listed in a comparative manner. A few abstractive and multilingual text summarization approaches are also covered. Summary evaluation is another challenging issue in this research field. Therefore, intrinsic as well as extrinsic both the methods of summary evaluation are described in detail along with text summarization evaluation conferences and workshops. Furthermore, evaluation results of extractive summarization approaches are presented on some shared DUC datasets. Finally this paper concludes with the discussion of useful future directions that can help researchers to identify areas where further research is needed.  相似文献   

14.
Extractive summarization aims to automatically produce a short summary of a document by concatenating several sentences taken exactly from the original material. Due to its simplicity and easy-to-use, the extractive summarization methods have become the dominant paradigm in the realm of text summarization. In this paper, we address the sentence scoring technique, a key step of the extractive summarization. Specifically, we propose a novel word-sentence co-ranking model named CoRank, which combines the word-sentence relationship with the graph-based unsupervised ranking model. CoRank is quite concise in the view of matrix operations, and its convergence can be theoretically guaranteed. Moreover, a redundancy elimination technique is presented as a supplement to CoRank, so that the quality of automatic summarization can be further enhanced. As a result, CoRank can serve as an important building-block of the intelligent summarization systems. Experimental results on two real-life datasets including nearly 600 documents demonstrate the effectiveness of the proposed methods.  相似文献   

15.
The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of automatically creating a compressed version of a given document that provides useful information to users, and multi-document summarization is to produce a summary delivering the majority of information content from a set of documents about an explicit or implicit main topic. In our study we focus on sentence based extractive document summarization. We propose the generic document summarization method which is based on sentence clustering. The proposed approach is a continue sentence-clustering based extractive summarization methods, proposed in Alguliev [Alguliev, R. M., Aliguliyev, R. M., Bagirov, A. M. (2005). Global optimization in the summarization of text documents. Automatic Control and Computer Sciences 39, 42–47], Aliguliyev [Aliguliyev, R. M. (2006). A novel partitioning-based clustering method and generic document summarization. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI–IAT 2006 Workshops) (WI–IATW’06), 18–22 December (pp. 626–629) Hong Kong, China], Alguliev and Alyguliev [Alguliev, R. M., Alyguliev, R. M. (2007). Summarization of text-based documents with a determination of latent topical sections and information-rich sentences. Automatic Control and Computer Sciences 41, 132–140] Aliguliyev, [Aliguliyev, R. M. (2007). Automatic document summarization by sentence extraction. Journal of Computational Technologies 12, 5–15.]. The purpose of present paper to show, that summarization result not only depends on optimized function, and also depends on a similarity measure. The experimental results on an open benchmark datasets from DUC01 and DUC02 show that our proposed approach can improve the performance compared to sate-of-the-art summarization approaches.  相似文献   

16.
肖升  何炎祥 《计算机应用研究》2012,29(12):4507-4511
中文摘录是一种实现中文自动文摘的便捷方法,它根据摘录规则选取若干个原文句子直接组成摘要。通过优化输入矩阵和关键句子选取算法,提出了一种改进的潜在语义分析中文摘录方法。该方法首先基于向量空间模型构建多值输入矩阵;然后对输入矩阵进行潜在语义分析,并由此得出句子与潜在概念(主题信息的抽象表达)的语义相关度;最后借助改进的优选算法完成关键句子选取。实验结果显示,该方法准确率、召回率和F度量值的平均值分别为75.9%、71.8%和73.8%,与已有同类方法相比,改进后的方法实现了全程无监督且在整体效率上有较大提升,更具应用潜质。  相似文献   

17.
Sentence extraction is a widely adopted text summarization technique where the most important sentences are extracted from document(s) and presented as a summary. The first step towards sentence extraction is to rank sentences in order of importance as in the summary. This paper proposes a novel graph-based ranking method, iSpreadRank, to perform this task. iSpreadRank models a set of topic-related documents into a sentence similarity network. Based on such a network model, iSpreadRank exploits the spreading activation theory to formulate a general concept from social network analysis: the importance of a node in a network (i.e., a sentence in this paper) is determined not only by the number of nodes to which it connects, but also by the importance of its connected nodes. The algorithm recursively re-weights the importance of sentences by spreading their sentence-specific feature scores throughout the network to adjust the importance of other sentences. Consequently, a ranking of sentences indicating the relative importance of sentences is reasoned. This paper also develops an approach to produce a generic extractive summary according to the inferred sentence ranking. The proposed summarization method is evaluated using the DUC 2004 data set, and found to perform well. Experimental results show that the proposed method obtains a ROUGE-1 score of 0.38068, which represents a slight difference of 0.00156, when compared with the best participant in the DUC 2004 evaluation.  相似文献   

18.
长文本摘要生成一直是自动摘要领域的难题。现有方法在处理长文本的过程中,存在准确率低、冗余等问题。鉴于主题模型在多文档摘要中的突出表现,将其引入到长文本摘要任务中。另外,目前单一的抽取式或生成式方法都无法应对长文本的复杂情况。结合两种摘要方法,提出了一种针对长文本的基于主题感知的抽取式与生成式结合的混合摘要模型。并在TTNews和CNN/Daily Mail数据集上验证了模型的有效性,该模型生成摘要ROUGE分数与同类型模型相比提升了1~2个百分点,生成了可读性更高的摘要。  相似文献   

19.
文本摘要是指对文本信息内容进行概括、提取主要内容进而形成摘要的过程。现有的文本摘要模型通常将内容选择和摘要生成独立分析,虽然能够有效提高句子压缩和融合的性能,但是在抽取过程中会丢失部分文本信息,导致准确率降低。基于预训练模型和Transformer结构的文档级句子编码器,提出一种结合内容抽取与摘要生成的分段式摘要模型。采用BERT模型对大量语料进行自监督学习,获得包含丰富语义信息的词表示。基于Transformer结构,通过全连接网络分类器将每个句子分成3类标签,抽取每句摘要对应的原文句子集合。利用指针生成器网络对原文句子集合进行压缩,将多个句子集合生成单句摘要,缩短输出序列和输入序列的长度。实验结果表明,相比直接生成摘要全文,该模型在生成句子上ROUGE-1、ROUGE-2和ROUGE-L的F1平均值提高了1.69个百分点,能够有效提高生成句子的准确率。  相似文献   

20.
案件舆情摘要是从涉及特定案件的新闻文本簇中,抽取能够概括其主题信息的几个句子作为摘要.案件舆情摘要可以看作特定领域的多文档摘要,与一般的摘要任务相比,可以通过一些贯穿于整个文本簇的案件要素来表征其主题信息.在文本簇中,由于句子与句子之间存在关联关系,案件要素与句子亦存在着不同程度的关联关系,这些关联关系对摘要句的抽取有着重要的作用.提出了基于案件要素句子关联图卷积的案件文本摘要方法,采用图的结构来对多文本簇进行建模,句子作为主节点,词和案件要素作为辅助节点来增强句子之间的关联关系,利用多种特征计算不同节点间的关联关系.然后,使用图卷积神经网络学习句子关联图,并对句子进行分类得到候选摘要句.最后,通过去重和排序得到案件舆情摘要.在收集到的案件舆情摘要数据集上进行实验,结果表明:提出的方法相比基准模型取得了更好的效果,引入要素及句子关联图对案件多文档摘要有很好的效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号