首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
通过对自动文摘技术的研究,针对叙事类文本,以事件作为基本语义单元,提出一种基于事件的多主题文本自动文摘方法。利用事件和事件间的关系构建事件网络文本表示模型,使用社区划分算法解决子事件主题划分问题。实验结果表明,该方法提取出的准确率、召回率及F值较高,能更好地概括文本的内容。  相似文献   

2.
基于事件的文本表示方法研究   总被引:1,自引:1,他引:0  
在传统文本表示模型的研究基础上,针对叙事类文本,考虑以事件作为基本语义单元,并结合图结构表示的 特点,提出了一种基于事件的文本表示方法—事件网络。该方法利用事件和事件间的关系来表示文本,能够较大程 度地保留文本的结构信息及语义信息。实验结果表明,基于该方法的自动摘要取得了较好的效果。  相似文献   

3.
Knowledge is information that has been contextualised in a certain domain, to be used or applied. It represents the basic core of our Cultural Heritage and Natural Language provides us with prime versatile means of construing experience at multiple levels of organization. The natural language generation field consists in the creation of texts providing information contained in other kind of sources (numerical data, graphics, taxonomies and ontologies or even other texts), with the aim of making such texts indistinguishable, as far as possible, from those created by humans. On the other hand, the knowledge extraction, basing on text mining and text analysis tasks, as examples of the many applications born from computational linguistic, provides summarization, categorization, topics extractions from textual resources using linguistic concepts, which deal with the imprecision and ambiguity of human language. This paper presents a research activity focused on exploring and scientifically describing knowledge structure and organization involved in textual resources’ generation. Thus, a novel multidimensional model for the representation of conceptual knowledge, is proposed. Furthermore, a real case study in the Cultural Heritage domain is described to demonstrate the effectiveness and the feasibility of the proposed model and approach.  相似文献   

4.
This paper presents a state of the art review of features extraction for soccer video summarization research. The all existing approaches with regard to event detection, video summarization based on video stream and application of text sources in event detection have been surveyed. As regard the current challenges for automatic and real time provision of summary videos, different computer vision approaches are discussed and compared. Audio, video feature extraction methods and their combination with textual methods have been investigated. Available commercial products are presented to better clarify the boundaries in this domain and future directions for improvement of existing systems have been suggested.  相似文献   

5.
Event summarization is a task to generate a single, concise textual representation of an event. This task does not consider multiple development phases in an event. However, news articles related to long and complicated events often involve multiple phases. Thus, traditional approaches for event summarization generally have difficulty in capturing event phases in summarization effectively. In this paper, we define the task of Event Phase Oriented News Summarization (EPONS). In this approach, we assume that a summary contains multiple timelines, each corresponding to an event phase. We model the semantic relations of news articles via a graph model called Temporal Content Coherence Graph. A structural clustering algorithm EPCluster is designed to separate news articles into several groups corresponding to event phases. We apply a vertex-reinforced random walk to rank news articles. The ranking results are further used to create timelines. Extensive experiments conducted on multiple datasets show the effectiveness of our approach.  相似文献   

6.
Environmental scanning, the acquisition and use of the information about events, trends, and relationships in an organization's external environment, permits an organization to adapt to its environment and to develop effective responses to secure or improve the organization's position in the future. Event detection technique that identifies the onset of new events from streams of news stories would facilitate the process of organization's environmental scanning. However, traditional event detection techniques generally adopted the feature co-occurrence approach that identifies whether a news story contains an unseen event by comparing the similarity of features between the new story and past news stories. Such feature-based event detection techniques greatly suffer from the word mismatch and inconsistent orientation problems and do not directly support event categorization and news stories filtering. In this study, we developed an information extraction-based event detection (NEED) technique that combines information extraction and text categorization techniques to address the problems inherent to traditional feature-based event detection techniques. Using a traditional feature-based event detection technique (i.e., INCR) as benchmarks, the empirical evaluation results showed that the proposed NEED technique improved the effectiveness of event detection measured by the tradeoff between miss and false alarm rates.  相似文献   

7.
We present an optimization-based unsupervised approach to automatic document summarization. In the proposed approach, text summarization is modeled as a Boolean programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units that convey the same information; and (3) length: summary is bounded in length. The approach proposed in this paper is applicable to both tasks: single- and multi-document summarization. In both tasks, documents are split into sentences in preprocessing. We select some salient sentences from document(s) to generate a summary. Finally, the summary is generated by threading all the selected sentences in the order that they appear in the original document(s). We implemented our model on multi-document summarization task. When comparing our methods to several existing summarization methods on an open DUC2005 and DUC2007 data sets, we found that our method improves the summarization results significantly. This is because, first, when extracting summary sentences, this method not only focuses on the relevance scores of sentences to the whole sentence collection, but also the topic representative of sentences. Second, when generating a summary, this method also deals with the problem of repetition of information. The methods were evaluated using ROUGE-1, ROUGE-2 and ROUGE-SU4 metrics. In this paper, we also demonstrate that the summarization result depends on the similarity measure. Results of the experiment showed that combination of symmetric and asymmetric similarity measures yields better result than their use separately.  相似文献   

8.
Text summarization is either extractive or abstractive. Extractive summarization is to select the most salient pieces of information (words, phrases, and/or sentences) from a source document without adding any external information. Abstractive summarization allows an internal representation of the source document so as to produce a faithful summary of the source. In this case, external text can be inserted into the generated summary. Because of the complexity of the abstractive approach, the vast majority of work in text summarization has adopted an extractive approach.In this work, we focus on concepts fusion and generalization, i.e. where different concepts appearing in a sentence can be replaced by one concept which covers the meanings of all of them. This is one operation that can be used as part of an abstractive text summarization system. The main goal of this contribution is to enrich the research efforts on abstractive text summarization with a novel approach that allows the generalization of sentences using semantic resources. This work should be useful in intelligent systems more generally since it introduces a means to shorten sentences by producing more general (hence abstractions of the) sentences. It could be used, for instance, to display shorter texts in applications for mobile devices. It should also improve the quality of the generated text summaries by mentioning key (general) concepts. One can think of using the approach in reasoning systems where different concepts appearing in the same context are related to one another with the aim of finding a more general representation of the concepts. This could be in the context of Goal Formulation, expert systems, scenario recognition, and cognitive reasoning more generally.We present our methodology for the generalization and fusion of concepts that appear in sentences. This is achieved through (1) the detection and extraction of what we define as generalizable sentences and (2) the generation and reduction of the space of generalization versions. We introduce two approaches we have designed to select the best sentences from the space of generalization versions. Using four NLTK1 corpora, the first approach estimates the “acceptability” of a given generalization version. The second approach is Machine Learning-based and uses contextual and specific features. The recall, precision and F1-score measures resulting from the evaluation of the concept generalization and fusion approach are presented.  相似文献   

9.
Text summarization and classification are core techniques to analyze a huge amount of text data in the big data environment. Moreover, as the need to read texts on smart phones, tablets and television as well as personal computers continues to grow, text summarization and classification techniques become more important and both of them do essential processes for text analysis in many applications.Traditional text summarization and classification techniques have individually been considered as different research fields in this literature. However, we find out that they can help each other as text summarization makes use of category information from text classification and text classification does summary information from text summarization. Therefore, we propose an effective integrated learning framework using both of summary and category information in this paper. In this framework, the feature-weighting method for text summarization utilizes a language model to combine feature distributions in each category and text, and one for text classification does the sentence importance scores estimated from the text summarization.In the experiments, the performances of the integrated framework are better than ones of individual text summarization and classification. In addition, the framework has some advantages of easy implementation and language independence because it is based on only simple statistical approaches and POS tagger.  相似文献   

10.
The rapidly growing amount of newswire stories stored in electronic devices raises new challenges for information retrieval technology. Traditional query-driven retrieval is not suitable for generic queries. It is desirable to have an intelligent system to automatically locate topically related events or topics in a continuous stream of newswire stories. This is the goal of automatic event detection. We propose a new approach to performing event detection from multilingual newswire stories. Unlike traditional methods which employ simple keyword matching, our method makes use of concept terms and named entities such as person, location, and organization names. Concept terms of a story are derived from statistical context analysis between sentences in the news story and stories in the concept database. We have conducted a set of experiments to study the effectiveness of our approach. The results show that the performance of detection using concept terms together with story keywords is better than traditional methods which only use keyword representation. © 2001 John Wiley & Sons, Inc.  相似文献   

11.
12.
提取文本的特征并用这些特征表示文本是文本信息挖掘的基础工作.揭示了叙事文本是由事件组成的语义特征,给出了基于概念格的叙事文本表示模型.针对事件的某些要素随着时间的更新或者动作的发生而不断改变的客观事实,定义了要素变迁系统来描述这种动态变化过程.  相似文献   

13.
Text mining techniques have been recently employed to classify and summarize user reviews on mobile application stores. However, due to the inherently diverse and unstructured nature of user-generated online textual data, text-based review mining techniques often produce excessively complicated models that are prone to overfitting. In this paper, we propose a novel approach, based on frame semantics, for app review mining. Semantic frames help to generalize from raw text (individual words) to more abstract scenarios (contexts). This lower-dimensional representation of text is expected to enhance the predictive capabilities of review mining techniques and reduce the chances of overfitting. Specifically, our analysis in this paper is two-fold. First, we investigate the performance of semantic frames in classifying informative user reviews into various categories of actionable software maintenance requests. Second, we propose and evaluate the performance of multiple summarization algorithms in generating concise and representative summaries of informative reviews. Three different datasets of app store reviews, sampled from a broad range of application domains, are used to conduct our experimental analysis. The results show that semantic frames can enable an efficient and accurate review classification process. However, in review summarization tasks, our results show that text-based summarization generates more comprehensive summaries than frame-based summarization. Finally, we introduces MARC 2.0, a review classification and summarization suite that implements the algorithms investigated in our analysis.  相似文献   

14.
自动文摘技术应尽可能获取准确的相似度以确定句子或段落的权重,但目前常用的基于向量空间模型的计算方法却忽视句子、段落、文本中词的顺序.提出了一种新的基于相邻词序组的相似度度量方法并应用于文本的自动摘要,采用基于聚类的方法实现了词序组的向量表示并以此刻画句子、段落、文本,通过线性插值将基于不同长度词序组的相似度结果予以综合.同时,提出了新的基于含词序组重要性累计度的句子或段落的权重指标.实验证明利用词序信息可有效提高自动文摘质量.  相似文献   

15.
We propose a framework for abstractive summarization of multi-documents, which aims to select contents of summary not from the source document sentences but from the semantic representation of the source documents. In this framework, contents of the source documents are represented by predicate argument structures by employing semantic role labeling. Content selection for summary is made by ranking the predicate argument structures based on optimized features, and using language generation for generating sentences from predicate argument structures. Our proposed framework differs from other abstractive summarization approaches in a few aspects. First, it employs semantic role labeling for semantic representation of text. Secondly, it analyzes the source text semantically by utilizing semantic similarity measure in order to cluster semantically similar predicate argument structures across the text; and finally it ranks the predicate argument structures based on features weighted by genetic algorithm (GA). Experiment of this study is carried out using DUC-2002, a standard corpus for text summarization. Results indicate that the proposed approach performs better than other summarization systems.  相似文献   

16.
基于概念的中文文本可视化表示机制   总被引:1,自引:0,他引:1  
为了浏览因特网上日益增多的在线中文文本,本文给出了基于概念的中文文本可视化表示机制,以直观的方式组织和表示文本及文本集,其基本思想是:首先在概念扩充的基础上,进行文本分类,然后,利用本文提出的提出的文本特征抽取方法和摘要方法,获取广西类别、广西、广西正文的标记的信息,通过类别,文本、有选择地浏览文本。  相似文献   

17.
微博数据具有实时动态特性,人们通过分析微博数据可以检测现实生活中的事件。同时,微博数据的海量、短文本和丰富的社交关系等特性也为事件检测带来了新的挑战。综合考虑了微博数据的文本特征(转帖、评论、内嵌链接、用户标签hashtag、命名实体等)、语义特征、时序特性和社交关系特性,提出了一种有效的基于微博数据的事件检测算法(event detection in microblogs,EDM)。还提出了一种通过提取事件关键要素,即关键词、命名实体、发帖时间和用户情感倾向性,构成事件摘要的方法。与基于LDA(latent Dirichlet allocation)模型的事件检测算法进行实验对比,结果表明,EDM算法能够取得更好的事件检测效果,并且能够提供更直观可读的事件摘要。  相似文献   

18.
随着互联网产生的文本数据越来越多,文本信息过载问题日益严重,对各类文本进行一个“降维”处理显得非常必要,文本摘要便是其中一个重要的手段,也是人工智能领域研究的热点和难点之一。文本摘要旨在将文本或文本集合转换为包含关键信息的简短摘要。近年来语言模型的预处理提高了许多自然语言处理任务的技术水平,包括情感分析、问答、自然语言推理、命名实体识别和文本相似性、文本摘要。本文梳理文本摘要以往的经典方法和近几年的基于预训练的文本摘要方法,并对文本摘要的数据集以及评价方法进行整理,最后总结文本摘要目前面临的挑战与发展趋势。  相似文献   

19.
针对传统Seq2Seq序列模型在文本摘要任务中无法准确地提取到文本中的关键信息、无法处理单词表之外的单词等问题,本文提出一种基于Fastformer的指针生成网络(pointer generator network, PGN)模型,且该模型结合了抽取式和生成式两种文本摘要方法.模型首先利用Fastformer模型高效的获取具有上下文信息的单词嵌入向量,然后利用指针生成网络模型选择从源文本中复制单词或利用词汇表来生成新的摘要信息,以解决文本摘要任务中常出现的OOV(out of vocabulary)问题,同时模型使用覆盖机制来追踪过去时间步的注意力分布,动态的调整单词的重要性,解决了重复词问题,最后,在解码阶段引入了Beam Search优化算法,使得解码器能够获得更加准确的摘要结果.实验在百度AI Studio中汽车大师所提供的汽车诊断对话数据集中进行,结果表明本文提出的FastformerPGN模型在中文文本摘要任务中达到的效果要优于基准模型,具有更好的效果.  相似文献   

20.
Existing software for handling textual variants suffers froma number of faults, and is generally designed for a narrow rangeof text types. This paper develops a new data structure forvariants, suitable for a wider range of texts, which also solvesmost of the problems associated with the representation of variantdata. A prototype applet, which can graphically display thenew data structure is described, as also the current state ofthe editor being developed from it.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号