首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
组块分析的主要任务是语块的识别和划分,它使句法分析的任务在某种程度上得到简化。针对长句子组块分析所遇到的困难,该文提出了一种基于分治策略的组块分析方法。该方法的基本思想是首先对句子进行最长名词短语识别,根据识别的结果,将句子分解为最长名词短语部分和句子框架部分;然后,针对不同的分析单元选用不同的模型加以分析,再将分析结果进行组合,完成整个组块分析过程。该方法将整句分解为更小的组块分析单元,降低了句子的复杂度。通过在宾州中文树库CTB4数据集上的实验结果显示,各种组块识别结果平均F1值结果为91.79%,优于目前其他的组块分析方法。  相似文献   

2.
Mining With Noise Knowledge: Error-Aware Data Mining   总被引:1,自引:0,他引:1  
Real-world data mining deals with noisy information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing approaches to enhance the data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performance. In this paper, we consider an error-aware (EA) data mining design, which takes advantage of statistical error information (such as noise level and noise distribution) to improve data mining results. We assume that such noise knowledge is available in advance, and we propose a solution to incorporate it into the mining process. More specifically, we use noise knowledge to restore original data distributions, which are further used to rectify the model built from noise- corrupted data. We materialize this concept by the proposed EA naive Bayes classification algorithm. Experimental comparisons on real-world datasets will demonstrate the effectiveness of this design.  相似文献   

3.
Attack mitigation schemes actively throttle attack traffic generated in distributed denial-of-service (DDoS) attacks. This paper presents attack diagnosis (AD), a novel attack mitigation scheme that adopts a divide-and-conquer strategy. AD combines the concepts of pushback and packet marking, and its architecture is in line with the ideal DDoS attack countermeasure paradigm - attack detection is performed near the victim host and packet filtering is executed close to the attack sources. AD is a reactive defense mechanism that is activated by a victim host after an attack is detected. By instructing its upstream routers to mark packets deterministically, the victim can trace back one attack source and command an AD-enabled router close to the source to filter the attack packets. This process isolates one attacker and throttles it, which is repeated until the attack is mitigated. We also propose an extension to AD called parallel attack diagnosis (PAD) that is capable of throttling traffic coming from a large number of attackers simultaneously. AD and PAD are analyzed and evaluated using the Skitter Internet map, Lumeta's Internet map, and the 6-degree complete tree topology model. Both schemes are shown to be robust against IP spoofing and to incur low false positive ratios  相似文献   

4.
周庆华  齐治昌 《计算机科学》2002,29(Z1):244-245
一、引言 电子商务的发展使得企业的经营模式产生了巨大的变化,这对企业的竞争力的提高产生了很大的推动作用.由于Internet的特性,电子商务让企业的客户从一个局部的范围轻易地拓展到了全球范围,而且成本极小.但同时,由于企业与客户是通过Internet而不是面对面地进行交流,电子商务的一些弱点也是显而易见的.  相似文献   

5.
用SQL Server 2000构建数据挖掘解决方案   总被引:3,自引:0,他引:3  
Microsoft的SQL Server2000第一次包含了数据挖掘特性。Microsoft的数据挖掘解决方案是基于针对指定的数据挖掘OLE DE上的。OLE DB是Microsoft制定的工业标准并被一系列数据挖掘ISV所支持。这种指定为数据挖掘提出了一种新的类SQL语言,这种语言使数据库开发者能更好地建立数据挖掘的应用。本文给出了一个关于运用SQL Server 2000构建数据挖掘应用的示例。  相似文献   

6.
用神经网络选择多维数据挖掘空间的研究   总被引:1,自引:0,他引:1  
多维数据挖掘是基于数据仓库系统的重要决策支持技术,遗憾的是,由于多维数据的复杂性使数据挖掘的效率和实用性都很差,本文通过对多维数据挖掘模型的分析,说明在多维数据挖掘过程中挖掘空间的选择是影响结果成败的关键步骤,在此基础上提出一种用于挖掘空间选择的神经网络模型,并通过实例应用说明该模型能够正确寻找到正确的挖掘空间,最后文章讨论了该模型的优缺点。  相似文献   

7.
数据分析和预测的高质量性和高效性是非常重要的,尤其是在复杂的数据环境中,其作用更加明显。采用层次分析法构建挖掘模型(AHP Construct Mining Component,ACMC)策略可以更加直观地进行数据挖掘,其优点非常明显。AC-MC策略能够很好地升华原本的层次分析理念。本文基于复杂的数据环境,对ACMC的实用性进行充分的研究和分析。  相似文献   

8.
原始数据集中含有大量噪声数据,且数据的规模很大,直接进行关联规则挖掘会影响准确度和效率。文章提出了一种对原始数据先进行聚类,再提取关联规则的挖掘策略,可以在一定程度内减少噪声数据的干扰,消除数据对象中的冗余属性,提高规则挖掘的有效性。  相似文献   

9.
提出了不确定干预分析模型,主要工作包括:(1)建立了用于多维不确定数据分析的不确定监测点模型(uncertain surveillance);(2)建立了基于不确定监测点的不确定干预策略及挖掘评价算法;(3)在真实数据及仿真数据上对所提出的两种算法作了大量实验比较,验证了所提出的干预策略评价优化算法具有较高精度,效率比朴素方法高出3个数量级,适合在实际系统中处理海量干预评价.  相似文献   

10.
In this paper, we propose three divide-and-conquer approaches for Bayesian information criterion (BIC)-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using $Delta BIC$ , a widely-adopted distance measure of two audio segments. We compare our approaches to three popular distance-based approaches, namely, Chen and Gopalakrishnan's window-growing-based approach, Siegler 's fixed-size sliding window approach, and Delacourt and Wellekens's DISTBIC approach, by performing computational cost analysis and conducting speaker change detection experiments on two broadcast news data sets. The results show that the proposed approaches are more efficient and achieve higher segmentation accuracy than the compared distance-based approaches. In addition, we apply the segmentation approaches discussed in this paper to the speaker diarization task. The experiment results show that a more effective segmentation approach leads to better diarization accuracy.   相似文献   

11.
基于策略模式的中医数据挖掘平台   总被引:1,自引:0,他引:1  
随着数据挖掘技术的发展和中医信息化的逐渐深入,很多数据挖掘方法被应用到中医研究领域。针对面向对象软件设计模式中的策略模式在数据挖掘科研软件平台设计开发上的应用进行了研究,并提出了平台设计概要。在此基础之上,提出了一种中医数据挖掘研究的思想方法:将中医问题(数据)封装、将数据挖掘方法(算法)封装,实现统一的接口,从而实现在某一类中医问题中尝试不同的数据挖掘方法、将某一种数据挖掘方法应用于不同的中医问题。基于上述思想方法,实现了中医数据挖掘平台,用于中医相关领域的数据挖掘研究。  相似文献   

12.
鉴于目前高压变电站告警数据多、运维人员巡检效率低的情况,提出了一种基于关联规则的告警数据分析方法。根据大量告警数据中隐含的先导—后继关系,分析故障发生后产生的告警报文,这有利于找出故障的根本原因;并参考了电网公司的电力设备状态量评定标准,综合告警事件间的关系与各巡检单元权重系数,确定系统频繁告警后巡检的优先顺序,以提高巡检效率。最后,通过某站变压器近一年的告警数据来具体说明这种方法,并且分析了该变压器频繁告警的原因并提出了改进的巡检策略。  相似文献   

13.
用SQL Server2000实现数据挖掘的技术与策略   总被引:2,自引:0,他引:2  
OLE DB是被许多第三方数据挖掘产品供应商所支持的微软制定的工业标准。OLE DB for DataMining以表的形式表达数据挖掘模型对象,而且也提供一种类似SQL的DDL使开发者能更好地建立数据挖掘应用。给出了一个关于运用SQL Server 2000构建数据挖掘应用的实例。  相似文献   

14.
曹洁 《电脑开发与应用》2010,23(5):44-46,49
扩大数据挖掘系统的使用人群,使普通用户能够方便地操作数据挖掘系统,是数据挖掘算法搜索策略的主要研究目标。建立案例库存储专家经验,采用面向对象的方法来表示案例库中的案例,利用模糊商空间来描述案例库的组织结构,结合统计启发式搜索技术实现案例检索,缩小检索范围,加快求解速度,提高了运行效率和准确率。以银行客户经理分析客户流失群体为例进行相应的操作,验证了案例推理数据挖掘算法搜索策略的准确性。  相似文献   

15.
The design of an efficient credit card fraud detection technique is, however, particularly challenging, due to the most striking characteristics which are; imbalancedness and non-stationary environment of the data. These issues in credit card datasets limit the machine learning algorithm to show a good performance in detecting the frauds. The research in the area of credit card fraud detection focused on detection the fraudulent transaction by analysis of normality and abnormality concepts. Balancing strategy which is designed in this paper can facilitate classification and retrieval problems in this domain. In this paper, we consider the classification problem in supervised learning scenario by creating a contrast vector for each customer based on its historical behaviors. The performance evaluation of proposed model is made possible by a real credit card data-set provided by FICO, and it is found that the proposed model has significant performance than other state-of-the-art classifiers.  相似文献   

16.
卢铮松  赵洁 《计算机工程》2009,35(20):81-82
对某供热公司累积的大量供热用户收费数据进行分析,通过构建数据仓库和利用数据概化方法建立供热用户数据挖掘模型,使用频繁项集方法产生关联规则,利用决策树算法得出交费时间特征,从而得出不同区域和类型用户的习惯交费时间段。对该数据挖掘模型进行评价,提出的4项收费决策建议在实际应用中取得良好效果。  相似文献   

17.
18.
This paper describes a clustering method for unsupervised classification of objects in large data sets. The new methodology combines the mixture likelihood approach with a sampling and subsampling strategy in order to cluster large data sets efficiently. This sampling strategy can be applied to a large variety of data mining methods to allow them to be used on very large data sets. The method is applied to the problem of automated star/galaxy classification for digital sky data and is tested using a sample from the Digitized Palomar Sky Survey (DPOSS) data. The method is quick and reliable and produces classifications comparable to previous work on these data using supervised clustering.  相似文献   

19.
陈元  陈文伟 《计算机工程》2000,26(10):9-10,85
通过定义SOL数据挖掘抽取器,设计了数据挖掘算法和数据库管理系统的接口的框架体系。并通过一个常用的数据挖掘算法简单贝叶斯算法说明了这种标准的SOL数据挖掘抽取器的适用性。  相似文献   

20.
一、引言数据库中的知识发现(Knowledge Discovery in Databas-es,KDD)有时又叫数据挖掘(Data Mining,DM),它的各项技术在各个领域得到了应用,并得到广泛的重视。建立数据仓库是数据挖掘工作的第一步。数据仓库被定义为面向主题、集成的、随时间变化的、数据稳定的,被用来组织决策的数据集合。数据仓库作为一个很重要的策略来为一个组织的从各种异构的信息来源进行结合,并进行在线分析(OLAP)以及数据挖掘。不幸,数据挖掘中的数据品质未得到人们的足够重视。实  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号