首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
基于关键词元的话题内事件检测   总被引:1,自引:0,他引:1  
各种媒体每天有大量的新闻报道产生,需要一种自动化的分析方法将新闻以一种更加清晰的组织形式展示给用户.大多已有工作将新闻划分成平面的话题,然而一个话题并非仅仅是简单的新闻集合,而是由一系列相互关联的事件所组成的.由于话题内的事件之间往往非常相似,导致话题内的事件检测精确度较差.为了克服以上问题,提出了基于事件词元委员会的事件检测与关系发现方法.即首先挖掘每个事件的核心词元,随后利用事件的核心词元进行事件检测与关系发现.在Linguistic Data Consortium(LDC)的两个数据集上的实验结果显示,提出的事件检测与关系发现方法可以显著地改善已有方法的效果.  相似文献   

2.
新事件检测(NED)的目标是从一个或多个新闻源中检测出报道一个新闻话题的第一个新闻。传统向量空间模型采用单个词来表示文本特征,考虑到词的位置信息以及其他的表示内容的信息,提出了词对表示文本的方法,并结合HowNet资源对所抽取的词对进行归一化处理,最后对不同类别新闻中不同词性对的权重参数进行优化。通过在已有的突发性新闻语料上进行实验,表明这种改进方法的效果比较明显,性能也有一定的提高。  相似文献   

3.
新事件检测(New Event Detection,简称NED)的目标是从一个或多个新闻源中检测出报道一个新闻话题的第一个新闻.初步实验发现,构成事件的一项重要属性是事件发生的特定时间,因此时间是区分不同事件的重要标志.为此,提出时序话题模型(Temporal Topic Model,简称TTM).TTM将话题和报道切分为若干对应不同时间表达式的事件.其中,时间表达式描述事件发生的特定时间.TTM基于时间表达式在话题中出现的频率和增长率,估量对应时间表达式的事件作为种子事件或相关新颖事件的概率.此外,时间表达式的频率与增长率也用于衡量事件在相关性匹配中权重.在此基础上,NED依赖时间特性快速遏制不同时间发生的事件匹配,并借助种子事件和新颖事件调整相关性判定的权重分配.  相似文献   

4.
张秀华  云红艳  贺英  胡欢 《计算机与数字工程》2021,49(6):1143-1147,1280
新闻事件检测是自然语言处理任务中的一项任务.新闻事件检测旨在从新闻文本数据流中检测出新闻事件并给出事件主题.人工构建新闻事件的特征费时费力.传统的新闻事件检测方法是根据新闻事件之间的空间距离检测新闻事件,对于不同的新闻事件相似度较高时,容易误判为同一事件.针对上述问题,论文提出基于注意力机制的双向长短记忆网络构建新闻事件检测模型,通过深度学习学习新闻文本深层次的特征并且基于新闻事件检测模型构建新闻事件建模应用系统.实验表明论文方法在准确率、召回率优于传统方法,可对新闻事件准确识别.  相似文献   

5.
随着新闻网站的快速发展,网络新闻和评论数据激增,给人们带来了大量有价值的信息。新闻让人们了解发生在国内外的时事,而评论则体现了人们对事件的观点和看法,这对舆情分析和新闻评论推荐等应用很重要。然而,新闻评论数据又多又杂,而且通常比较简短,因此难以快速直观地从中发现评论者的关注点所在。为此,该文提出一种面向新闻评论的聚类方法EWMD-AP,用以自动挖掘社会大众对事件的关注点。该方法利用强化了权重向量的Word Movers Distance(WMD)计算评论之间的距离,进而用Affinity Propagation(AP)对评论进行聚类,从杂乱的新闻评论中得到关注点簇及其代表性评论。特别地,该文提出利用强化权重向量替代传统WMD中的词频权重向量。而强化权重由三部分组成,包括结合词性特征与文本表达特征的词重要度系数、新闻正文作为评论背景的去背景化系数和TFIDF系数。在24个新闻评论数据集上的对比实验表明,EWMD-AP相比Kmeans和Mean Shift等传统聚类算法以及Density Peaks等当前最新算法都具有更好的新闻评论聚类效果。  相似文献   

6.
针对传统特征加权方法未充分考虑词语之间的语义信息和类别分布信息的不足,提出了一种融合词语共现距离和类别信息的短文本特征提取方法。一方面,将同一短文本中两个词语之间的间隔词数作为共现距离,计算它们之间的相关度。通过计算这两个词语共同出现的频率,得到每个词的关联权重;另一方面,利用改进的期望交叉熵计算某个词在某个类别中的权重值,将两者整合,得到某个类别中所有词的权重值。对所有类别中的词按权重值的大小进行降序排序,选取前K个词作为新的特征词项集合。实验表明,该方法能够有效提高短文本特征提取的效果。  相似文献   

7.
基于增量型聚类的自动话题检测研究   总被引:1,自引:0,他引:1  
张小明  李舟军  巢文涵 《软件学报》2012,23(6):1578-1587
随着网络信息飞速的发展,收集并组织相关信息变得越来越困难.话题检测与跟踪(topic detection and tracking,简称TDT)就是为解决该问题而提出来的研究方向.话题检测是TDT中重要的研究任务之一,其主要研究内容是把讨论相同话题的故事聚类到一起.虽然话题检测已经有了多年的研究,但面对日益变化的网络信息,它具有了更大的挑战性.提出了一种基于增量型聚类的和自动话题检测方法,该方法旨在提高话题检测的效率,并且能够自动检测出文本库中话题的数量.采用改进的权重算法计算特征的权重,通过自适应地提炼具有较强的主题辨别能力的文本特征来提高文档聚类的准确率,并且在聚类过程中利用BIC来判断话题类别的数目,同时利用话题的延续性特征来预聚类文档,并以此提高话题检测的速度.基于TDT-4语料库的实验结果表明,该方法能够大幅度提高话题检测的效率和准确率.  相似文献   

8.
基于混合模型的生物事件触发词检测   总被引:1,自引:0,他引:1  
语义歧义增加了生物事件触发词检测的难度,为了解决语义歧义带来的困难,提高生物事件触发词检测的性能,该文提出了一种基于丰富特征和组合不同类型学习器的混合模型。该方法通过组合支持向量机(SVM)分类器和随机森林(Random Forest)分类器,利用丰富的特征进行触发词检测,从而为每一个待检测词分配一个事件类型,达到检测触发词的目的。实验是在BioNLP2009共享任务提供的数据集上进行的,实验结果表明该方法有效可行。  相似文献   

9.
基于子话题分治匹配的新事件检测   总被引:4,自引:0,他引:4  
洪宇  张宇  范基礼  刘挺  李生 《计算机学报》2008,31(4):687-695
新事件检测是话题检测与跟踪领域的一项重要研究,其任务是实时监控新闻报道流并从中识别新话题.现有方法将话题和报道描述为单一结构的特征向量进行匹配,造成子话题间互为噪声并形成错误语义,从而误导新话题的识别.针对这一缺陷,文中提出基于子话题分治匹配的新事件检测方法,将话题和报道划分为不同子话题,根据相关子话题的比例关系和分布关系建立新话题识别模型.实验在TDT4和TDT5中获得显著改进,最小检测错误代价为0.4061,相应漏检率为0.1859.  相似文献   

10.
当前,突发热点事件的传播日益迅猛与广泛.如何通过事件抽取准确快速地抽取出事件触发词及其事件元素,有助于决策者分析舆情态势、引导社会舆论.针对现有事件抽取方法多是从单个句子中抽取事件元素,而突发热点事件的事件元素往往分布在多个句子当中的问题,提出了一种基于图注意力网络的突发热点事件联合抽取方法,该方法分为三个阶段:基于TextRank的事件句抽取、基于图注意力网络的篇章级事件联合抽取、突发热点事件补全.在抽取出新闻主旨事件以后对整篇新闻做事件抽取,利用候选事件与新闻主旨事件的事件向量相似度以及事件论元相似度对该新闻主旨事件进行补全.实验结果表明,该方法在DUEE1.0数据集上进行触发词抽取和论元角色抽取任务时的F1指标分别达到83.2%、59.1%;在中文突发事件语料库上进行触发词抽取和论元角色抽取任务时的F1指标分别达到82.7%、58.7%,验证了模型的合理性和有效性.  相似文献   

11.
基于新闻要素的新事件检测方法研究   总被引:1,自引:0,他引:1  
薛晓飞  张永奎  任晓东 《计算机应用》2008,28(11):2975-2977
新事件检测(NED)的目标是检测出报道一个新闻话题种子事件的第一个新闻。考虑到新闻的基本要素在新闻中的作用,采用特征加权的方式对传统的词频和倒排文档频率(TF-IDF)模型进行改进,并在新闻报道中提取出时间信息和地点信息,分别对内容相似度、时间相似度和地点相似度进行计算,并将三者结合来检测新事件。实验证明这种方法有效。  相似文献   

12.
New event detection (NED), which is crucial to firms’ environmental surveillance, requires timely access to and effective analysis of live streams of news articles from various online sources. These news articles, available in unprecedent frequency and quantity, are difficult to sift through manually. Most of existing techniques for NED are full-text-based; typically, they perform full-text analysis to measure the similarity between a new article and previous articles. This full-text-based approach is potentially ineffective, because a news article often contains sentences that are less relevant to define the focal event being reported and the inclusion of these less relevant sentences into the similarity estimation can impair the effectiveness of NED. To address the limitation of the full-text-based approach and support NED more effectively and efficiently, this study proposes and develops a summary-based event detection method that first selects relevant sentences of each article as a summary, then uses the resulting summaries to detect new events. We empirically evaluate our proposed method in comparison with some prevalent full-text-based techniques, including a vector space model and two deep-learning-based models. Our evaluation results confirm that the proposed method provides greater utilities for detecting new events from online news articles. This study demonstrates the value and feasibility of the text summarization approach for generating news article summaries for detecting new events from live streams of online news articles, proposes a new method more effective and efficient than the benchmark techniques, and contributes to NED research in several important ways.  相似文献   

13.
Environmental scanning, the acquisition and use of the information about events, trends, and relationships in an organization's external environment, permits an organization to adapt to its environment and to develop effective responses to secure or improve the organization's position in the future. Event detection technique that identifies the onset of new events from streams of news stories would facilitate the process of organization's environmental scanning. However, traditional event detection techniques generally adopted the feature co-occurrence approach that identifies whether a news story contains an unseen event by comparing the similarity of features between the new story and past news stories. Such feature-based event detection techniques greatly suffer from the word mismatch and inconsistent orientation problems and do not directly support event categorization and news stories filtering. In this study, we developed an information extraction-based event detection (NEED) technique that combines information extraction and text categorization techniques to address the problems inherent to traditional feature-based event detection techniques. Using a traditional feature-based event detection technique (i.e., INCR) as benchmarks, the empirical evaluation results showed that the proposed NEED technique improved the effectiveness of event detection measured by the tradeoff between miss and false alarm rates.  相似文献   

14.
Abstract— Near‐to‐eye displays (NEDs) have unique optical properties requiring different characterization techniques compared to direct‐view display measurements. Here, a new version of a NED measurement system is introduced, and optical measurements of five commercially available consumer NED products are discussed. Luminance, focal distance, qualified viewing space, angular properties, and interocular differences are among the values. In addition, these results are compared to extensive subjective studies. The main intention is not to benchmark between the different products, but to show that display measurements are important for NEDs. According to the results, the determination of NED's characteristics helps to predict the subjective experiences, but the nature of the relation between subjective and objective findings is rather complex and depends on several NED‐, user‐, and task‐related features. Measured characteristics indicate that with a conventional biocular NED system approach of using two microdisplays and their enlarging optics, it is a design and a manufacturing challenge to build an ergonomically satisfactory NED device that fits everyone.  相似文献   

15.
The primary purpose of the current study is to explore whether emotional-display behavior varies on different forms of CMC in a context of one-to-one online chat. Eighty college students (40 males and 40 females) participated in this experiment, and participants were randomly and equally assigned to one of the four different chat conditions (i.e., joint-view, no-view, view-in, and view-out), manipulating visibility (whether or not participants could see their chat partner) and monitorability (whether or not participants could be monitored by their chat partner). In an assigned chat condition, participants were asked to read, consecutively, two different emotional (happy and disgusting) stories typed by their chat partner. The emotional behavior participants displayed while reading the emotional stories was measured by self-reports and a facial-action coding system. Results reveal (1) no main effects for visibility and monitorability on the degree of social presence; (2) significant differences in the use of emotion-management techniques in response to happy and disgust emotions, respectively; and (3) less likelihood of a facial expression of disgust in the monitored conditions than in the unmonitored conditions. The results indicate that there are some differences between text-based chat and video-based chat in terms of emotional-display behavior. These findings make meaningful contributions to the ongoing debate regarding communication behavior in CMC.  相似文献   

16.
The automatic generation of summaries using cases (GARUCAS) environment was designed as an intelligent system to help one learn to summarize narrative texts by means of examples within a case‐based reasoning (CBR) approach. Each example, modeled as a case, contains a conceptual representation of the initial textual state, the different steps of the summarization method, and the representation of the final textual state obtained. The CBR approach allows the environment to summarize new texts in order to produce new text summarization examples with respect to some predefined educational objectives. Within GARUCAS, this approach is used at two levels: an event level (EL) in order to identify essential elements of a story, and the clause level (CL) to make the summary more readable. The purpose of this article is to describe the GARUCAS environment and the model used to build story summarization examples and summarize new texts. This model is based on important psycholinguistic work concerning event and narrative structures and text revision rules. An experiment was conducted with 12 short stories. The GARUCAS environment can classify the stories according to their structure analogy and reuse the summarization method of the most similar text. Such an approach can be reused for any kind of texts or summary types. © 2003 Wiley Periodicals, Inc.  相似文献   

17.
In this paper, we propose a novel non-expected route travel time (NERTT) model, which belong to the rank-dependent expected utility model. The NERTT consists of two parts, which are the route travel time distribution and the distortion function. With the strictly increasing and strictly concave distortion function, we can prove that the route travel time in the proposed model is risk-averse, which is the main focus of this paper. We show two different reduction methods from the NERTT model to the travel time budget model and mean-excess travel time model. One method is based on the properly selected distortion functions and the other one is based on a general distortion function. Besides, the behavioral inconsistency of the expected utility model in the route choice can be overcome with the proposed model. The NERTT model can also be generalized to the non-expected disutility (NED) model, and some relationship between the NED model and the route choice model based on the cumulative prospect theory can be shown. This indicates that the proposed model has some generality. Finally, we develop a non-expected risk-averse user equilibrium model and formulate it as a variational inequality (VI) problem. A heuristic gradient projection algorithm with column generation is used to solve the VI. The proposed model and algorithm are tested on some hypothetical traffic networks and on some large-scale traffic networks.  相似文献   

18.
A hybrid formal theory of arguments,stories and criminal evidence   总被引:1,自引:1,他引:0  
This paper presents a theory of reasoning with evidence in order to determine the facts in a criminal case. The focus is on the process of proof, in which the facts of the case are determined, rather than on related legal issues, such as the admissibility of evidence. In the literature, two approaches to reasoning with evidence can be distinguished, one argument-based and one story-based. In an argument-based approach to reasoning with evidence, the reasons for and against the occurrence of an event, e.g., based on witness testimony, are central. In a story-based approach, evidence is evaluated and interpreted from the perspective of the factual stories as they may have occurred in a case, e.g., as they are defended by the prosecution. In this paper, we argue that both arguments and narratives are relevant and useful in the reasoning with and interpretation of evidence. Therefore, a hybrid approach is proposed and formally developed, doing justice to both the argument-based and the narrative-based perspective. By the formalization of the theory and the associated graphical representations, our proposal is the basis for the design of software developed as a tool to make sense of the evidence in complex cases.  相似文献   

19.

The problem of automatically discovering business process models from event logs has been intensely investigated in the past two decades, leading to a wide range of approaches that strike various trade-offs between accuracy, model complexity, and execution time. A few studies have suggested that the accuracy of automated process discovery approaches can be enhanced by means of metaheuristic optimization techniques. However, these studies have remained at the level of proposals without validation on real-life datasets or they have only considered one metaheuristic in isolation. This article presents a metaheuristic optimization framework for automated process discovery. The key idea of the framework is to construct a directly-follows graph (DFG) from the event log, to perturb this DFG so as to generate new candidate solutions, and to apply a DFG-based automated process discovery approach in order to derive a process model from each DFG. The framework can be instantiated by linking it to an automated process discovery approach, an optimization metaheuristic, and the quality measure to be optimized (e.g., fitness, precision, F-score). The article considers several instantiations of the framework corresponding to four optimization metaheuristics, three automated process discovery approaches (Inductive Miner—directly-follows, Fodina, and Split Miner), and one accuracy measure (Markovian F-score). These framework instances are compared using a set of 20 real-life event logs. The evaluation shows that metaheuristic optimization consistently yields visible improvements in F-score for all the three automated process discovery approaches, at the cost of execution times in the order of minutes, versus seconds for the baseline approaches.

  相似文献   

20.

Context

Variability management (VM) is one of the most important activities of software product-line engineering (SPLE), which intends to develop software-intensive systems using platforms and mass customization. VM encompasses the activities of eliciting and representing variability in software artefacts, establishing and managing dependencies among different variabilities, and supporting the exploitation of the variabilities for building and evolving a family of software systems. Software product line (SPL) community has allocated huge amount of effort to develop various approaches to dealing with variability related challenges during the last two decade. Several dozens of VM approaches have been reported. However, there has been no systematic effort to study how the reported VM approaches have been evaluated.

Objective

The objectives of this research are to review the status of evaluation of reported VM approaches and to synthesize the available evidence about the effects of the reported approaches.

Method

We carried out a systematic literature review of the VM approaches in SPLE reported from 1990s until December 2007.

Results

We selected 97 papers according to our inclusion and exclusion criteria. The selected papers appeared in 56 publication venues. We found that only a small number of the reviewed approaches had been evaluated using rigorous scientific methods. A detailed investigation of the reviewed studies employing empirical research methods revealed significant quality deficiencies in various aspects of the used quality assessment criteria. The synthesis of the available evidence showed that all studies, except one, reported only positive effects.

Conclusion

The findings from this systematic review show that a large majority of the reported VM approaches have not been sufficiently evaluated using scientifically rigorous methods. The available evidence is sparse and the quality of the presented evidence is quite low. The findings highlight the areas in need of improvement, i.e., rigorous evaluation of VM approaches. However, the reported evidence is quite consistent across different studies. That means the proposed approaches may be very beneficial when they are applied properly in appropriate situations. Hence, it can be concluded that further investigations need to pay more attention to the contexts under which different approaches can be more beneficial.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号