首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document–document and document–sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.  相似文献   

2.
In this paper, a new multi-document summarization framework which combines rhetorical roles and corpus-based semantic analysis is proposed. The approach is able to capture the semantic and rhetorical relationships between sentences so as to combine them to produce coherent summaries. Experiments were conducted on datasets extracted from web-based news using standard evaluation methods. Results show the promise of our proposed model as compared to state-of-the-art approaches.  相似文献   

3.
黄丽雯  钱微 《计算机应用》2006,26(11):2626-2627,2630
提出了一种对HITS算法进行改进的新方法,本方法将文档内容与一些启发信息如“短语”,“句子长度”和“首句优先”等结合,用于发现多文档子主题,并且将文档子主题特征转换成图节点进行排序。通过对DUC2004数据的实验,结果显示本方法是一种有效的多文本摘要方法。  相似文献   

4.
多文档自动摘要技术可以向用户提供一个简洁、全面的摘要信息,因此研究多文档自动摘要技术具有很重要的意义.本文提出了一种上下文敏感的基于词频统计的多文档自动摘要生成方案.该方案利用高频词的重要作用统计高频词信息,同时具备上下文敏感的特性.它具有简单易行,运行速度快,效果好等特点.实验结果证明,取得了很好的ROUGE成绩.  相似文献   

5.
设计并实现了一个基于维基百科的抽取式多文档自动摘要系统。使用ROUGE评测工具对使用维基百科前后的摘要进行对比实验。实验结果表明,维基百科能较大程度地提高多文档摘要的质量。  相似文献   

6.
主题模型LDA的多文档自动文摘   总被引:3,自引:0,他引:3  
近年来使用概率主题模型表示多文档文摘问题受到研究者的关注.LDA (latent dirichlet allocation)是主题模型中具有代表性的概率生成性模型之一.提出了一种基于LDA的文摘方法,该方法以混乱度确定LDA模型的主题数目,以Gibbs抽样获得模型中句子的主题概率分布和主题的词汇概率分布,以句子中主题权重的加和确定各个主题的重要程度,并根据LDA模型中主题的概率分布和句子的概率分布提出了2种不同的句子权重计算模型.实验中使用ROUGE评测标准,与代表最新水平的SumBasic方法和其他2种基于LDA的多文档自动文摘方法在通用型多文档摘要测试集DUC2002上的评测数据进行比较,结果表明提出的基于LDA的多文档自动文摘方法在ROUGE的各个评测标准上均优于SumBasic方法,与其他基于LDA模型的文摘相比也具有优势.  相似文献   

7.
With the number of documents describing real-world events and event-oriented information needs rapidly growing on a daily basis, the need for efficient retrieval and concise presentation of event-related information is becoming apparent. Nonetheless, the majority of information retrieval and text summarization methods rely on shallow document representations that do not account for the semantics of events. In this article, we present event graphs, a novel event-based document representation model that filters and structures the information about events described in text. To construct the event graphs, we combine machine learning and rule-based models to extract sentence-level event mentions and determine the temporal relations between them. Building on event graphs, we present novel models for information retrieval and multi-document summarization. The information retrieval model measures the similarity between queries and documents by computing graph kernels over event graphs. The extractive multi-document summarization model selects sentences based on the relevance of the individual event mentions and the temporal structure of events. Experimental evaluation shows that our retrieval model significantly outperforms well-established retrieval models on event-oriented test collections, while the summarization model outperforms competitive models from shared multi-document summarization tasks.  相似文献   

8.
The massive quantity of data available today in the Internet has reached such a huge volume that it has become humanly unfeasible to efficiently sieve useful information from it. One solution to this problem is offered by using text summarization techniques. Text summarization, the process of automatically creating a shorter version of one or more text documents, is an important way of finding relevant information in large text libraries or in the Internet. This paper presents a multi-document summarization system that concisely extracts the main aspects of a set of documents, trying to avoid the typical problems of this type of summarization: information redundancy and diversity. Such a purpose is achieved through a new sentence clustering algorithm based on a graph model that makes use of statistic similarities and linguistic treatment. The DUC 2002 dataset was used to assess the performance of the proposed system, surpassing DUC competitors by a 50% margin of f-measure, in the best case.  相似文献   

9.
Zhu  Xing  Xu  Qiang  Tang  Minggao  Li  Huajin  Liu  Fangzhou 《Neural computing & applications》2018,30(12):3825-3835

A novel hybrid model composed of least squares support vector machines (LSSVM) and double exponential smoothing (DES) was proposed and applied to calculate one-step ahead displacement of multifactor-induced landslides. The wavelet de-noising and Hodrick-Prescott filter methods were used to decompose the original displacement time series into three components: periodic term, trend term and random noise, which respectively represent periodic dynamic behaviour of landslides controlled by the seasonal triggers, the geological conditions and the random measuring noise. LSSVM and DES models were constructed and trained to forecast the periodic component and the trend component, respectively. Models’ inputs include the seasonal triggers (e.g. reservoir level and rainfall data) and displacement values which are measurable variables in a specific prior time. The performance of the hybrid model was evaluated quantitatively. Calculated displacement from the hybrid model is excellently consistent with actual monitored value. Results of this work indicate that the hybrid model is a powerful tool for predicting one-step ahead displacement of landslide triggered by multiple factors.

  相似文献   

10.
Multi-document summarization is the process of extracting salient information from a set of source texts and present that information to the user in a condensed form. In this paper, we propose a multi-document summarization system which generates an extractive generic summary with maximum relevance and minimum redundancy by representing each sentence of the input document as a vector of words in Proper Noun, Noun, Verb and Adjective set. Five features, such as TF_ISF, Aggregate Cross Sentence Similarity, Title Similarity, Proper Noun and Sentence Length associated with the sentences, are extracted, and scores are assigned to sentences based on these features. Weights that can be assigned to different features may vary depending upon the nature of the document, and it is hard to discover the most appropriate weight for each feature, and this makes generation of a good summary a very tough task without human intelligence. Multi-document summarization problem is having large number of decision parameters and number of possible solutions from which most optimal summary is to be generated. Summary generated may not guarantee the essential quality and may be far from the ideal human generated summary. To address this issue, we propose a population-based multicriteria optimization method with multiple objective functions. Three objective functions are selected to determine an optimal summary, with maximum relevance, diversity, and novelty, from a global population of summaries by considering both the statistical and semantic aspects of the documents. Semantic aspects are considered by Latent Semantic Analysis (LSA) and Non Negative Matrix Factorization (NMF) techniques. Experiments have been performed on DUC 2002, DUC 2004 and DUC 2006 datasets using ROUGE tool kit. Experimental results show that our system outperforms the state of the art works in terms of Recall and Precision.  相似文献   

11.
Most existing research on applying the matrix factorization approaches to query-focused multi-document summarization (Q-MDS) explores either soft/hard clustering or low rank approximation methods. We employ a different kind of matrix factorization method, namely weighted archetypal analysis (wAA) to Q-MDS. In query-focused summarization, given a graph representation of a set of sentences weighted by similarity to the given query, positively and/or negatively salient sentences are values on the weighted data set boundary. We choose to use wAA to compute these extreme values, archetypes, and hence to estimate the importance of sentences in target documents set. We investigate the impact of using the multi-element graph model for query focused summarization via wAA. We conducted experiments on the data of document understanding conference (DUC) 2005 and 2006. Experimental results evidence the improvement of the proposed approach over other closely related methods and many of state-of-the-art systems.  相似文献   

12.
邓箴  包宏 《计算机与应用化学》2012,29(11):1384-1386
提出了一种基于词汇链抽取,文法分析的抽取文本代表词条的多文档摘要生成的方法。通过计算词义相似度构建词汇链,结合词频与位置特征进行文本代表词条成员的选择,将含有词条权值高的句子经过聚类形成多文档文摘句集合,然后进行质心句的抽取和排序,生成多文档文摘。该方法不仅考虑了词汇之间的语义信息,还考虑了词条对文本的代表成度,能够改善文摘句抽取的性能。实验结果表明,与单纯的由关键词确定文摘的方法相比,召回率和准确率都有不少的提高。  相似文献   

13.
多文档文摘评价标准的研究   总被引:1,自引:0,他引:1       下载免费PDF全文
多文档自动文摘是自然语言处理领域的一个重要研究方向。但对于多文档文摘的评价方法仍然存在方法单一,缺乏统一标准的问题。针对这些问题,就多文档文摘信息覆盖度尝试性地提出一套标准。该标准将涉及以下几个重要参数:改进BLEU参数(改进召回率),与原文档有效词覆盖度,高频词覆盖度。实验证明利用该标准能准确反映出文摘系统在信息覆盖度方面的优劣,并且接近人工评价结果。  相似文献   

14.
The volume of text data has been growing exponentially in the last years, mainly due to the Internet. Automatic Text Summarization has emerged as an alternative to help users find relevant information in the content of one or more documents. This paper presents a comparative analysis of eighteen shallow sentence scoring techniques to compute the importance of a sentence in the context of extractive single- and multi-document summarization. Several experiments were made to assess the performance of such techniques individually and applying different combination strategies. The most traditional benchmark on the news domain demonstrates the feasibility of combining such techniques, in most cases outperforming the results obtained by isolated techniques. Combinations that perform competitively with the state-of-the-art systems were found.  相似文献   

15.

Service availability plays a vital role on computer networks, against which Distributed Denial of Service (DDoS) attacks are an increasingly growing threat each year. Machine learning (ML) is a promising approach widely used for DDoS detection, which obtains satisfactory results for pre-known attacks. However, they are almost incapable of detecting unknown malicious traffic. This paper proposes a novel method combining both supervised and unsupervised algorithms. First, a clustering algorithm separates the anomalous traffic from the normal data using several flow-based features. Then, using certain statistical measures, a classification algorithm is used to label the clusters. Employing a big data processing framework, we evaluate the proposed method by training on the CICIDS2017 dataset and testing on a different set of attacks provided in the more up-to-date CICDDoS2019. The results demonstrate that the Positive Likelihood Ratio (LR+) of our method is approximately 198% higher than the ML classification algorithms.

  相似文献   

16.
A hybrid machine learning approach to network anomaly detection   总被引:3,自引:0,他引:3  
Zero-day cyber attacks such as worms and spy-ware are becoming increasingly widespread and dangerous. The existing signature-based intrusion detection mechanisms are often not sufficient in detecting these types of attacks. As a result, anomaly intrusion detection methods have been developed to cope with such attacks. Among the variety of anomaly detection approaches, the Support Vector Machine (SVM) is known to be one of the best machine learning algorithms to classify abnormal behaviors. The soft-margin SVM is one of the well-known basic SVM methods using supervised learning. However, it is not appropriate to use the soft-margin SVM method for detecting novel attacks in Internet traffic since it requires pre-acquired learning information for supervised learning procedure. Such pre-acquired learning information is divided into normal and attack traffic with labels separately. Furthermore, we apply the one-class SVM approach using unsupervised learning for detecting anomalies. This means one-class SVM does not require the labeled information. However, there is downside to using one-class SVM: it is difficult to use the one-class SVM in the real world, due to its high false positive rate. In this paper, we propose a new SVM approach, named Enhanced SVM, which combines these two methods in order to provide unsupervised learning and low false alarm capability, similar to that of a supervised SVM approach.We use the following additional techniques to improve the performance of the proposed approach (referred to as Anomaly Detector using Enhanced SVM): First, we create a profile of normal packets using Self-Organized Feature Map (SOFM), for SVM learning without pre-existing knowledge. Second, we use a packet filtering scheme based on Passive TCP/IP Fingerprinting (PTF), in order to reject incomplete network traffic that either violates the TCP/IP standard or generation policy inside of well-known platforms. Third, a feature selection technique using a Genetic Algorithm (GA) is used for extracting optimized information from raw internet packets. Fourth, we use the flow of packets based on temporal relationships during data preprocessing, for considering the temporal relationships among the inputs used in SVM learning. Lastly, we demonstrate the effectiveness of the Enhanced SVM approach using the above-mentioned techniques, such as SOFM, PTF, and GA on MIT Lincoln Lab datasets, and a live dataset captured from a real network. The experimental results are verified by m-fold cross validation, and the proposed approach is compared with real world Network Intrusion Detection Systems (NIDS).  相似文献   

17.

Accurate and real-time product demand forecasting is the need of the hour in the world of supply chain management. Predicting future product demand from historical sales data is a highly non-linear problem, subject to various external and environmental factors. In this work, we propose an optimised forecasting model - an extreme learning machine (ELM) model coupled with the Harris Hawks optimisation (HHO) algorithm to forecast product demand in an e-commerce company. ELM is preferred over traditional neural networks mainly due to its fast computational speed, which allows efficient demand forecasting in real-time. Our ELM-HHO model performed significantly better than ARIMA models that are commonly used in industries to forecast product demand. The performance of the proposed ELM-HHO model was also compared with traditional ELM, ELM auto-tuned using Bayesian Optimisation (ELM-BO), Gated Recurrent Unit (GRU) based recurrent neural network and Long Short Term Memory (LSTM) recurrent neural network models. Different performance metrics, i.e., Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE) and Mean Percentage Error (MPE) were used for the comparison of the selected models. Horizon forecasting at 3 days and 7 days ahead was also performed using the proposed approach. The results revealed that the proposed approach is superior to traditional product demand forecasting models in terms of prediction accuracy and it can be applied in real-time to predict future product demand based on the previous week’s sales data. In particular, considering RMSE of forecasting, the proposed ELM-HHO model performed 62.73% better than the statistical ARIMA(7,1,0) model, 40.73% better than the neural network based GRU model, 34.05% better than the neural network based LSTM model, 27.16% better than the traditional non-optimised ELM model with 100 hidden nodes and 11.63% better than the ELM-BO model in forecasting product demand for future 3 months. The novelty of the proposed approach lies in the way the fast computational speed of ELMs has been combined with the accuracy gained by tuning hyperparameters using HHO. An increased number of hyperparameters has been optimised in our methodology compared to available models. The majority of approaches to improve the accuracy of ELM so far have only focused on tuning the weights and the biases of the hidden layer. In our hybrid model, we tune the number of hidden nodes, the number of input time lags and even the type of activation function used in the hidden layer in addition to tuning the weights and the biases. This has resulted in a significant increase in accuracy over previous methods. Our work presents an original way of performing product demand forecasting in real-time in industry with highly accurate results which are much better than pre-existing demand forecasting models.

  相似文献   

18.
Multimedia Tools and Applications - In recent years, one of the forgery detection methods is the Copy-Move Forgery Detection (CMFD), which is mostly used approach amongst all the forgery...  相似文献   

19.
It is very important for financial institutions to develop credit rating systems to help them to decide whether to grant credit to consumers before issuing loans. In literature, statistical and machine learning techniques for credit rating have been extensively studied. Recent studies focusing on hybrid models by combining different machine learning techniques have shown promising results. However, there are various types of combination methods to develop hybrid models. It is unknown that which hybrid machine learning model can perform the best in credit rating. In this paper, four different types of hybrid models are compared by ‘Classification + Classification’, ‘Classification + Clustering’, ‘Clustering + Classification’, and ‘Clustering + Clustering’ techniques, respectively. A real world dataset from a bank in Taiwan is considered for the experiment. The experimental results show that the ‘Classification + Classification’ hybrid model based on the combination of logistic regression and neural networks can provide the highest prediction accuracy and maximize the profit.  相似文献   

20.
Journal of Intelligent Information Systems - Nowadays, data scientists prefer “easy” high-level languages like R and Python, which accomplish complex mathematical tasks with a few lines...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号