首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Event detection is a fundamental information extraction task, which has been explored largely in the context of question answering, topic detection and tracking, knowledge base population, news recommendation, and automatic summarization. In this article, we explore an event detection framework to improve a key phrase-guided centrality-based summarization model. Event detection is based on the fuzzy fingerprint method, which is able to detect all types of events in the ACE 2005 Multilingual Corpus. Our base summarization approach is a two-stage method that starts by extracting a collection of key phrases that will be used to help the centrality-as-relevance retrieval model. We explored three different ways to integrate event information, achieving state-of-the-art results in text and speech corpora: (1) filtering of nonevents, (2) event fingerprints as features, and (3) combination of filtering of nonevents and event fingerprints as features.  相似文献   

2.
带有时间标志的演化式摘要是近年来提出的自然语言处理任务,其本质是多文档自动文摘,它的研究对象是互联网上连续报道的热点新闻文档。针对互联网新闻事件报道的动态演化、动态关联和信息重复等特点,该文提出了一种基于局部—全局主题关系的演化式摘要方法,该方法将新闻事件划分为多个不同的子主题,在考虑时间演化的基础上同时考虑子主题之间的主题演化,最后将新闻标题作为摘要输出。实验结果表明,该方法是有效的,并且在以新闻标题作为输入输出时,和当前主流的多文档摘要和演化摘要方法相比,在Rouge评价指标上有显著提高。  相似文献   

3.
Information ordering is a nontrivial task in multi‐document summarization (MDS), which typically relies on the traditional vector space model (VSM) notorious for semantic deficiency. In this article, we propose a novel event‐enriched VSM to alleviate the problem by building event semantics into sentence representations. The mediation of event information between sentence and term, especially in the news domain, has an intuitive appeal as well as technical advantage in common sentence‐level operations such as sentence similarity computation. Inspired by the block‐style writing by humans, we base the sentence ordering algorithm on sentence clustering. To accommodate the complexity introduced by event information, we adopt a soft‐to‐hard clustering strategy on the event and sentence levels, using expectation–maximization clustering and K‐means, respectively. For the purpose of cluster‐based sentence ordering, the event‐enriched VSM enables us to design an ordering algorithm to enhance event coherence computed between sentence and sentence–context pairs. Drawing on the findings of earlier research, we also incorporate topic continuity measures and time information into the scheme. We evaluate the performance of the model and its variants automatically and manually, with experimental results showing clear advantage of the event‐based model over baseline and non‐event‐based models in information ordering for multi‐document news summarization. We are confident that the event‐enriched VSM has even greater potential in summarization and beyond, which awaits further research. © 2014 Wiley Periodicals, Inc.  相似文献   

4.
New event detection (NED), which is crucial to firms’ environmental surveillance, requires timely access to and effective analysis of live streams of news articles from various online sources. These news articles, available in unprecedent frequency and quantity, are difficult to sift through manually. Most of existing techniques for NED are full-text-based; typically, they perform full-text analysis to measure the similarity between a new article and previous articles. This full-text-based approach is potentially ineffective, because a news article often contains sentences that are less relevant to define the focal event being reported and the inclusion of these less relevant sentences into the similarity estimation can impair the effectiveness of NED. To address the limitation of the full-text-based approach and support NED more effectively and efficiently, this study proposes and develops a summary-based event detection method that first selects relevant sentences of each article as a summary, then uses the resulting summaries to detect new events. We empirically evaluate our proposed method in comparison with some prevalent full-text-based techniques, including a vector space model and two deep-learning-based models. Our evaluation results confirm that the proposed method provides greater utilities for detecting new events from online news articles. This study demonstrates the value and feasibility of the text summarization approach for generating news article summaries for detecting new events from live streams of online news articles, proposes a new method more effective and efficient than the benchmark techniques, and contributes to NED research in several important ways.  相似文献   

5.
Comparative news summarization aims to highlight the commonalities and differences between two comparable news topics by using human-readable sentences. The summary ought to focus on the salient comparative aspects of both topics, and at the same time, it should describe the representative properties of each topic appropriately. In this study, we propose a novel approach for generating comparative news summaries. We consider cross-topic pairs of semantic-related concepts as evidences of comparativeness and consider topic-related concepts as evidences of representativeness. The score of a summary is estimated by summing up the weights of evidences in the summary. We formalize the summarization task as an optimization problem of selecting proper sentences to maximize this score and address the problem by using a mixed integer programming model. The experimental results demonstrate the effectiveness of our proposed model.  相似文献   

6.
事件相似度计算   总被引:3,自引:0,他引:3  
事件间的相似度计算对事件本体的构建起着重要的作用,是基于事件的信息查询、自动问答、自动文摘生成等事件本体应用的前提.本文解决了以下问题,基于给定的事件模型,定义事件间的相似度;根据事件各要素的特征,提出一种结合语法、语义、词语序列、时间关系的事件相似度计算方法.实验数据和模糊分析表明本方法合理、准确.  相似文献   

7.
Online news has become one of the major channels for Internet users to get news. News websites are daily overwhelmed with plenty of news articles. Huge amounts of online news articles are generated and updated everyday, and the processing and analysis of this large corpus of data is an important challenge. This challenge needs to be tackled by using big data techniques which process large volume of data within limited run times. Also, since we are heading into a social-media data explosion, techniques such as text mining or social network analysis need to be seriously taken into consideration.In this work we focus on one of the most common daily activities: web news reading. News websites produce thousands of articles covering a wide spectrum of topics or categories which can be considered as a big data problem. In order to extract useful information, these news articles need to be processed by using big data techniques. In this context, we present an approach for classifying huge amounts of different news articles into various categories (topic areas) based on the text content of the articles. Since these categories are constantly updated with new articles, our approach is based on Evolving Fuzzy Systems (EFS). The EFS can update in real time the model that describes a category according to the changes in the content of the corresponding articles. The novelty of the proposed system relies in the treatment of the web news articles to be used by these systems and the implementation and adjustment of them for this task. Our proposal not only classifies news articles, but it also creates human interpretable models of the different categories. This approach has been successfully tested using real on-line news.  相似文献   

8.
In this paper, we present a novel business network construction approach, where the nodes of the network correspond to the names of the companies in a particular stock market index, and its links show the co-occurrence of two company names in daily news. Our approach consists of two phases, in which search for the company names in the news articles and network construction operations are performed, respectively. To increase the quality of results, each article is classified as business news or not business news before these operations, and only the articles that are classified as business news are considered for network construction. The resulting network presents a visualization of the business events and company relationships during the corresponding time period. We study both co-occurrences as well as single occurrences of company names in the articles scanned in our analysis.  相似文献   

9.
There are many real applications existing where the decision making process depends on a model that is built by collecting information from different data sources. Let us take the stock market as an example. The decision making process depends on a model which that is influenced by factors such as stock prices, exchange volumes, market indices (e.g. Dow Jones Index), news articles, and government announcements (e.g., the increase of stamp duty). Yet Nevertheless, modeling the stock market is a challenging task because (1) the process related to market states (rise state/drop state) is a stochastic process, which is hard to capture using the deterministic approach, and (2) the market state is invisible but will be influenced by the visible market information, like stock prices and news articles. In this paper, we propose an approach to model the stock market process by using a Non-homogeneous Hidden Markov Model (NHMM). It takes both stock prices and news articles into consideration when it is being computed. A unique feature of our approach is event driven. We identify associated events for a specific stock using a set of bursty features (keywords), which has a significant impact on the stock price changes when building the NHMM. We apply the model to predict the trend of future stock prices and the encouraging results indicate our proposed approach is practically sound and highly effective.  相似文献   

10.
Online information is growing enormously day by day with the blessing of World Wide Web. Search engines often provide users with abundant collection of articles; in particular, news articles which are retrieved from different news sources reporting on the same event. In this work, we aim to produce high quality multi document news summaries by taking into account the generic components of a news story within a specific domain. We also present an effective method, named Genetic-Case Base Reasoning, to identify cross-document relations from un-annotated texts. Following that, we propose a new sentence scoring model based on fuzzy reasoning over the identified cross-document relations. The experimental findings show that the proposed approach performed better that the conventional graph based and cluster based approach.  相似文献   

11.
12.
13.
In paper, we propose an unsupervised text summarization model which generates a summary by extracting salient sentences in given document(s). In particular, we model text summarization as an integer linear programming problem. One of the advantages of this model is that it can directly discover key sentences in the given document(s) and cover the main content of the original document(s). This model also guarantees that in the summary can not be multiple sentences that convey the same information. The proposed model is quite general and can also be used for single- and multi-document summarization. We implemented our model on multi-document summarization task. Experimental results on DUC2005 and DUC2007 datasets showed that our proposed approach outperforms the baseline systems.  相似文献   

14.
With the continuous growth of online news articles, there arises the necessity for an efficient abstractive summarization technique for the problem of information overloading. Abstractive summarization is highly complex and requires a deeper understanding and proper reasoning to come up with its own summary outline. Abstractive summarization task is framed as seq2seq modeling. Existing seq2seq methods perform better on short sequences; however, for long sequences, the performance degrades due to high computation and hence a two-phase self-normalized deep neural document summarization model consisting of improvised extractive cosine normalization and seq2seq abstractive phases has been proposed in this paper. The novelty is to parallelize the sequence computation training by incorporating feed-forward, the self-normalized neural network in the Extractive phase using Intra Cosine Attention Similarity (Ext-ICAS) with sentence dependency position. Also, it does not require any normalization technique explicitly. Our proposed abstractive Bidirectional Long Short Term Memory (Bi-LSTM) encoder sequence model performs better than the Bidirectional Gated Recurrent Unit (Bi-GRU) encoder with minimum training loss and with fast convergence. The proposed model was evaluated on the Cable News Network (CNN)/Daily Mail dataset and an average rouge score of 0.435 was achieved also computational training in the extractive phase was reduced by 59% with an average number of similarity computations.  相似文献   

15.
主题关键词信息融合的中文生成式自动摘要研究   总被引:2,自引:0,他引:2  
随着大数据和人工智能技术的迅猛发展,传统自动文摘研究正朝着从抽取式摘要到生成式摘要的方向演化,从中达到生成更高质量的自然流畅的文摘的目的.近年来,深度学习技术逐渐被应用于生成式摘要研究中,其中基于注意力机制的序列到序列模型已成为应用最广泛的模型之一,尤其在句子级摘要生成任务(如新闻标题生成、句子压缩等)中取得了显著的效果.然而,现有基于神经网络的生成式摘要模型绝大多数将注意力均匀分配到文本的所有内容中,而对其中蕴含的重要主题信息并没有细致区分.鉴于此,本文提出了一种新的融入主题关键词信息的多注意力序列到序列模型,通过联合注意力机制将文本中主题下重要的一些关键词语的信息与文本语义信息综合起来实现对摘要的引导生成.在NLPCC 2017的中文单文档摘要评测数据集上的实验结果验证了所提方法的有效性和先进性.  相似文献   

16.
With the high availability of digital video contents on the internet, users need more assistance to access digital videos. Various researches have been done about video summarization and semantic video analysis to help to satisfy these needs. These works are developing condensed versions of a full length video stream through the identification of the most important and pertinent content within the stream. Most of the existing works in these areas are mainly focused on event mining. Event mining from video streams improves the accessibility and reusability of large media collections, and it has been an active area of research with notable recent progress. Event mining includes a wide range of multimedia domains such as surveillance, meetings, broadcast, news, sports, documentary, and films, as well as personal and online media collections. Due to the variety and plenty of Event mining techniques, in this paper we suggest an analytical framework to classify event mining techniques and to evaluate them based on important functional measures. This framework could lead to empirical and technical comparison of event mining methods and development of more efficient structures at future.  相似文献   

17.
We present an optimization-based unsupervised approach to automatic document summarization. In the proposed approach, text summarization is modeled as a Boolean programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units that convey the same information; and (3) length: summary is bounded in length. The approach proposed in this paper is applicable to both tasks: single- and multi-document summarization. In both tasks, documents are split into sentences in preprocessing. We select some salient sentences from document(s) to generate a summary. Finally, the summary is generated by threading all the selected sentences in the order that they appear in the original document(s). We implemented our model on multi-document summarization task. When comparing our methods to several existing summarization methods on an open DUC2005 and DUC2007 data sets, we found that our method improves the summarization results significantly. This is because, first, when extracting summary sentences, this method not only focuses on the relevance scores of sentences to the whole sentence collection, but also the topic representative of sentences. Second, when generating a summary, this method also deals with the problem of repetition of information. The methods were evaluated using ROUGE-1, ROUGE-2 and ROUGE-SU4 metrics. In this paper, we also demonstrate that the summarization result depends on the similarity measure. Results of the experiment showed that combination of symmetric and asymmetric similarity measures yields better result than their use separately.  相似文献   

18.
本文针对实际党建领域中的新闻标题进行自动生成,提出了一种融合指针网络的自动文本摘要模型-Tri-PCN.相比于传统基于编码器-解码器框架的自动文本摘要模型,党建新闻标题生成模型还需要满足1)从更长的文本序列提取特征; 2)保留关键的党建信息.针对党建新闻比普通文本摘要任务面临更长文本序列问题,论文使用Transformer模型在解码阶段提取多层次全局文本特征.针对党建新闻标题生成过程中需要保留关键的党建信息,论文引入指针生成网络模型的复制机制在新闻标题生成时可以直接从新闻文本中复制关键词信息.实验采用ROUGE值作为评测指标,结果表明本文提出的Tri-PCN模型在党建新闻领域自动文本摘要任务上效果明显优于基准模型,比其他模型具有更好的效果.  相似文献   

19.
贺瑞芳  段绍杨 《软件学报》2019,30(4):1015-1030
事件抽取旨在从非结构化的文本中提取人们感兴趣的信息,并以结构化的形式呈现给用户.当前,大多数中文事件抽取系统采用连续的管道模型,即:先识别事件触发词,后识别事件元素.其容易产生级联错误,且处于下游的任务无法将信息反馈至上游任务,辅助上游任务的识别.将事件抽取看作序列标注任务,构建了基于CRF多任务学习的中文事件抽取联合模型.针对仅基于CRF的事件抽取联合模型的缺陷进行了两个扩展:首先,采用分类训练策略解决联合模型中事件元素的多标签问题(即:当一个事件提及中包含多个事件时,同一个实体往往会在不同的事件中扮演不同的角色).其次,由于处于同一事件大类下的事件子类,其事件元素存在高度的相互关联性.为此,提出采用多任务学习方法对各事件子类进行互增强的联合学习,进而有效缓解分类训练后的语料稀疏问题.在ACE 2005中文语料上的实验证明了该方法的有效性.  相似文献   

20.
In this paper we address extractive summarization of long threads in online discussion fora. We present an elaborate user evaluation study to determine human preferences in forum summarization and to create a reference data set. We showed long threads to ten different raters and asked them to create a summary by selecting the posts that they considered to be the most important for the thread. We study the agreement between human raters on the summarization task, and we show how multiple reference summaries can be combined to develop a successful model for automatic summarization. We found that although the inter-rater agreement for the summarization task was slight to fair, the automatic summarizer obtained reasonable results in terms of precision, recall, and ROUGE. Moreover, when human raters were asked to choose between the summary created by another human and the summary created by our model in a blind side-by-side comparison, they judged the model’s summary equal to or better than the human summary in over half of the cases. This shows that even for a summarization task with low inter-rater agreement, a model can be trained that generates sensible summaries. In addition, we investigated the potential for personalized summarization. However, the results for the three raters involved in this experiment were inconclusive. We release the reference summaries as a publicly available dataset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号