首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
数据挖掘技术   总被引:13,自引:0,他引:13       下载免费PDF全文
数据挖掘技术是当前数据库和人工智能领域研究的热点课题,为了使人们对该领域现状有个概略了解,在消化大量文献资料的基础上,首先对数据挖掘技术的国内外总体研究情况进行了概略介绍,包括数据挖掘技术的产生背景、应用领域、分类及主要挖掘技术;结合作者的研究工作,对关联规则的挖掘、分类规则的挖掘、离群数据的挖掘及聚类分析作了 较详细的论述;介绍了关联规则挖掘的主要研究成果,同时指出了关联规则衡量标准的不足及其改进方法,提出了分类模式的准确度评估方法;最后,描述了数据挖掘技术在科学研究、金属投资、市场营销、保险业、制造业及通信网络管理等行业的应用情况,并对数据挖掘技术的应用前景作了展望。  相似文献   

2.
This work explores the use of characterization features extracted based on breast-mass contours obtained by automated segmentation methods, for the classification of masses in mammograms according to their diagnosis (benign or malignant). Two sets of mass contours were obtained via two segmentation methods (a dynamic-programming-based method and a constrained region-growing method), and simplified versions of these contours (modeling the contours as ellipses) were employed to extract a set of six features designed for characterization of mass margins (contrast between foreground region and background region, coefficient of variation of edge strength, two measures of the fuzziness of mass margins, a measure of spiculation based on relative gradient orientation, and a measure of spiculation based on edge-signature information). Three popular classifiers (Bayesian classifier, Fisher's linear discriminant, and a support vector machine) were then used to predict the diagnosis of a set of 349 masses based on each of said features and some combinations of these. The systems (each system consists of a segmentation method, a featureset, and a classifier) were compared with each other in terms of their performance on the diagnosis of the set of breast masses. It was found that, although there was a percent difference of about 14% in the average segmentation quality between methods, this was translated into an average percent difference of only 4% in the classification performance. It was also observed that the spiculation feature based on edge-signature information was distinctly better than the rest of the features, although it is not very robust to changes in the quality of the segmentation. All systems were more efficient in predicting the diagnosis of benign masses than that of the malignant masses, resulting in low sensitivity and high specificity values (e.g. 0.6 and 0.8, respectively) since the positive class in the classification experiments is the set of malignant masses. It was concluded that features extracted from automated contours can contribute to the diagnosis of breast masses in screening programs by correctly identifying a majority of benign masses.  相似文献   

3.
尽管乳腺癌的诊断和处理技术在不断进步,但乳腺病灶的早期检测仍然是阻止癌症的主要方法。乳腺组织中肿块的存在是乳腺癌的重要特征。通过使用自联想神经网络和多层感知器技术研究了良恶性肿瘤的分类方法。该研究的实验结果显示,在DDSM数据库上进行训练和测试,得到了较高的CAD系统的灵敏度(TP)和较低的假阳性率(FP);在100%的训练分类率上获得了91%的测试分类率;ROC曲线下方面积最大可达约0.948。  相似文献   

4.

The high incidence of breast cancer in women has increased significantly in the recent years. Mammogram breast X-ray imaging is considered the most effective, low-cost, and reliable method in early detection of breast cancer. Although general rules for the differentiation between benign and malignant breast lesion exist, only 15–30% of masses referred for surgical biopsy are actually malignant. Physician experience of detecting breast cancer can be assisted by using some computerized feature extraction and classification algorithms. Computer-aided classification system was used to help in diagnosing abnormalities faster than traditional screening program without the drawback attribute to human factors. In this work, an approach is proposed to develop a computer-aided classification system for cancer detection from digital mammograms. The proposed system consists of three major steps. The first step is region of interest (ROI) extraction of 256 × 256 pixels size. The second step is the feature extraction; we used a set of 26 features, and we found that these features are capable of differentiating between normal and cancerous breast tissues in order to minimize the classification error. The third step is the classification process; we used the technique of the association rule mining to classify between normal and cancerous tissues. The proposed system was shown to have the large potential for cancer detection from digital mammograms.

  相似文献   

5.
In this paper we propose a machine learning approach to classify melanocytic lesions as malignant or benign, using dermoscopic images. The lesion features used in the classification framework are inspired on border, texture, color and structures used in popular dermoscopy algorithms performed by clinicians by visual inspection. The main weakness of dermoscopy algorithms is the selection of a set of weights and thresholds, that appear not to be robust or independent of population. The use of machine learning techniques allows to overcome this issue. The proposed method is designed and tested on an image database composed of 655 images of melanocytic lesions: 544 benign lesions and 111 malignant melanoma. After an image pre-processing stage that includes hair removal filtering, each image is automatically segmented using well known image segmentation algorithms. Then, each lesion is characterized by a feature vector that contains shape, color and texture information, as well as local and global parameters. The detection of particular dermoscopic patterns associated with melanoma is also addressed, and its inclusion in the classification framework is discussed. The learning and classification stage is performed using AdaBoost with C4.5 decision trees. For the automatically segmented database, classification delivered a specificity of 77% for a sensitivity of 90%. The same classification procedure applied to images manually segmented by an experienced dermatologist yielded a specificity of 85% for a sensitivity of 90%.  相似文献   

6.
张诚  郑诚 《微机发展》2007,17(7):60-62
关联规则是数据挖掘研究中的一个重要的主题。一些算法都是假设数据中根本的关联基于时间是稳定的。然而,在现实世界领域,数据具有自己的特征,因此关联随着时间发生巨大的改变。现有的数据挖掘算法没有考虑关联的改变,这导致了严重的性能下降,特别是挖掘出的关联规则被用来分类和预测。尽管关联改变的挖掘是一个重要的问题,因为需要基于过去的历史数据来预测未来,现有的数据挖掘算法不符合这样的工作。文中引入模糊数据挖掘算法来发现基于时间的关联规则的改变。基于挖掘出的模糊规则,能预测关联规则在未来如何改变。实验表明了算法的有效性。  相似文献   

7.
X射线乳腺影像与自然图像相比,色彩较为单调,且乳腺肿块边缘模糊,良性肿块与恶性肿块纹理相似,区分度较小。基于卷积深度学习网络提出一种适用于X射线乳腺肿块影像分类的方法,主要贡献如下:(1)提出一种提取乳腺影像多个卷积粒度的特征图的方案,分别使用不同尺寸的卷积核来提取不同粒度的卷积特征图,获得更为丰富的乳腺影像特征;(2)将判别方法嵌入到优化模型中,即设计新的目标函数,对分类误差进行差异化放大,从而加大分类错误的惩罚力度,指导模型向着分类错误最小的方向演进。在公开的乳腺X射线影像数据集上进行训练,通过交叉验证,AUC达到0.712?9,优于最好的乳腺影像分类方法,具有较强的鲁棒性。  相似文献   

8.
The image mining technique deals with the extraction of implicit knowledge and image with data relationship or other patterns not explicitly stored in the images. It is an extension of data mining to image domain. The main objective of this paper is to apply image mining in the domain such as breast mammograms to classify and detect the cancerous tissue. Mammogram image can be classified into normal, benign, and malignant class. Total of 26 features including histogram intensity features and gray-level co-occurrence matrix features are extracted from mammogram images. A hybrid approach of feature selection is proposed, which approximately reduces 75% of the features, and new decision tree is used for classification. The most interesting one is that branch and bound algorithm that is used for feature selection provides the best optimal features and no where it is applied or used for gray-level co-occurrence matrix feature selection from mammogram. Experiments have been taken for a data set of 300 images taken from MIAS of different types with the aim of improving the accuracy by generating minimum number of rules to cover more patterns. The accuracy obtained by this method is approximately 97.7%, which is highly encouraging.  相似文献   

9.
Association rule mining and classification are important tasks in data mining. Using association rules has proved to be a good approach for classification. In this paper, we propose an accurate classifier based on class association rules (CARs), called CAR-IC, which introduces a new pruning strategy for mining CARs, which allows building specific rules with high confidence. Moreover, we propose and prove three propositions that support the use of a confidence threshold for computing rules that avoids ambiguity at the classification stage. This paper also presents a new way for ordering the set of CARs based on rule size and confidence. Finally, we define a new coverage strategy, which reduces the number of non-covered unseen-transactions during the classification stage. Results over several datasets show that CAR-IC beats the best classifiers based on CARs reported in the literature.  相似文献   

10.
Association rule mining is a data mining technique for discovering useful and novel patterns or relationships from databases. These rules are simple to infer and intuitive and can be easily used for classification in any domain that requires explanation for and investigation into how the classification works. Examples of such areas are medicine, agriculture, education, etc. For such a system to find wide adoptability, it should give output that is correct and comprehensible. The amount of data has been growing very fast and so has the search space of these problems. So we need to change traditional methods. This paper discusses a rule mining classifier called DA-AC (dynamic adaptive-associative classifier) which is based on a Dynamic Particle Swarm Optimizer. Due to its seeding method, exemplar selection, adaptive parameters, dynamic reconstruction of regions and velocity update, it avoids premature convergence and provides a better value in every dimension. Quality evaluation is done both for individual rules as well as entire rulesets. Experiments were conducted over fifteen benchmark datasets to evaluate performance of proposed algorithm in comparison with six other state-of-the-art non associative classifiers and eight associative classifiers. Results demonstrate competitive performance of proposed DA-AC while considering predictive accuracy and number of mined patterns as parameters. The method was then applied to predict life expectancy of post operative thoracic surgery patients.  相似文献   

11.
Mammogram—breast X-ray—is considered the most effective, low cost, and reliable method in early detection of breast cancer. Although general rules for the differentiation between benign and malignant breast lesions exist, only 15–30 % of masses referred for surgical biopsy are actually malignant. In this work, an approach is proposed to develop a computer-aided classification system for cancer detection from digital mammograms. The proposed system consists of three major steps. The first step is region of interest (ROI) extraction of 256 × 256 pixels size. The second step is the feature extraction; we used a set of 19 GLCM and GLRLM features, and the 19 (nineteen) features extracted from gray-level run-length matrix and gray-level co-occurrence matrix could distinguish malignant masses from benign masses with an accuracy of 96.7 %. Further analysis was carried out by involving only 12 of the 19 features extracted, which consists of 5 features extracted from GLCM matrix and 7 features extracted from GLRL matrix. The 12 selected features are as follows: Energy, Inertia, Entropy, Maxprob, Inverse, SRE, LRE, GLN, RLN, LGRE, HGRE, and SRLGE; ARM with 12 features as prediction can distinguish malignant mass image and benign mass with a level of accuracy of 93.6 %. Further analysis showed that area under the receiver operating curve was 0.995, which means that the accuracy level of classification is good or very good. Based on that data, it was concluded that texture analysis based on GLCM and GLRLM could distinguish malignant image and benign image with considerably good result. The third step is the classification process; we used the technique of decision tree using image content to classify between normal and cancerous masses. The proposed system was shown to have the large potential for cancer detection from digital mammograms.  相似文献   

12.
基于分布数据库的快速关联规则挖掘算法   总被引:8,自引:0,他引:8  
关联规则发现是数据挖掘的重要研究内容,随着数据库中数据的不断增加,大数据集环境下的关联规则发现日益受到重视,分布式关联规则发现是解决这一问题的有效方法。分布式数据库环境下的关联规则挖掘算法中,时间开销主要体现在两方面(:1)频繁项目集的确定;(2)网络的通讯量。为了解决第一个问题,文章提出了一种基于二进制形式的候选频繁项目集生成和相应的计算支持数算法,该算法只需对挖掘对象进行一些”或”、”与”、”异或”等逻辑运算操作,显著降低了算法的实现难度。将该算法与DMA算法相结合提出改进算法FDMA。理论分析和实验结果表明,算法FDMA大大提高了关联规则挖掘的效率,算法是有效可行的。  相似文献   

13.
乳腺X线摄影技术是早期发现乳腺癌的主要方法,但其结果很大程度上受放射科医师临床诊断经验的限制;基于卷积神经网络对乳腺钼靶图像自动分类的研究可以为放射科医师临床诊断提供意见,然而乳腺癌肿块边缘模糊且良恶性肿块特征差异较小,分类任务面临重重挑战;为了提高乳腺钼靶图像分类的准确率,提出一种基于Xception模型的改进优化算法,改进模型中的残差连接模块,并嵌入Squeeze-and-excitation(SE)注意力机制对模型进行优化;采用优化后的Xception模型并结合迁移学习算法进行乳腺钼靶图像特征提取,并优化全连接层网络进行图像分类,使用公开的乳腺癌图像数据库CBIS-DDSM进行实验,将乳腺钼靶图像自动分为良性和恶性;实验结果表明该方法可以有效提高模型的分类效果,准确率和AUC分别达到了97.46%和99.12%。  相似文献   

14.
15.
Association rule mining is an important topic in data mining. The problem is to discover all (or almost all) associations among items in the transaction database that satisfy some user-specified constraints. Usually, the constraints are related to minimal support and minimal confidence. Class association rules (CARs) are a special type of association rules that can be applied for classification problem. Previous research showed that classification based on association rules has higher accuracy than can be achieved with an inductive learning algorithm or C4.5. As such, many methods have been proposed for mining CARs, although these use batch processing. However, datasets are often changed, with records added or/and deleted, and consequently updating CARs is a challenging problem. This paper proposes an efficient method for updating CARs when records are deleted. First, we use an MECR-tree to store nodes for the original dataset. The information in the nodes of this tree are updated based on the deleted records. Second, the concept of pre-large itemsets is used to avoid rescanning the original dataset. Finally, we propose an algorithm to efficiently update and generate CARs. We also analyze the time complexity to show the efficiency of our proposed algorithm. The experimental results show that the proposed method outperforms mining CARs from the dataset after record deletion.  相似文献   

16.
基于关联规则的空间数据知识发现及实现   总被引:4,自引:0,他引:4  
空间数据挖掘就是从空间数据库中抽取隐含知识、空间关系及空间数据库中存储的其它模式的方法。空间关联规则是空间数据挖掘的一个重要表现形式,利用空间关联规则把空间数据库中的数据转化为知识是一个很好的方法。本文在分析空间关联规则的基础上,用基于关联规则的逐步求精挖掘算法,得出空间数据库中的知识,通过实例证明其方法的可行性。  相似文献   

17.
关联规则挖掘是经典的数据挖掘方法,越来越多的企业都把它看作是必不可少的战略分析工具。当前关联规则挖掘方法得到的规则过多,令用户在运用时难以理解,因此研究关联规则集的约简方法具有应用价值。研究了数据库模式中关键字包含的主属性对基于Apriori算法的关联规则挖掘产生的关联规则的影响,即部分函数依赖会导致关联规则挖掘的数据集中冗余信息的频繁出现,并产生没有实际价值的关联规则,识别并消除这样的规则就能实现规则集的约简。求全部主属性如同求所有候选关键字问题都是NP难题,因此提出了一种基于一个候选关键字进行验证的算法来判定主属性,从而完成基于主属性判定的关联规则挖掘约简算法的设计与实现,并在最后的实验中验证了该算法的有效性。   相似文献   

18.
一种新的多维关联规则挖掘算法   总被引:12,自引:0,他引:12  
关联规则是数据挖掘中一个重要课题.文章给出一种基于遗传算法和蚂蚁算法相结合的多维关联规则挖掘算法.新算法利用了遗传和蚂蚁算法共有的良好全局搜索能力,并克服了遗传算法局部搜索能力弱和蚂蚁算法搜索速魔慢的缺陷.实验结果表明,新算法在对具有稀疏特性的多维关联规则的挖掘中体现了良好的性能.  相似文献   

19.
基于增量式遗传算法的分类规则挖掘   总被引:12,自引:1,他引:11  
分类知识发现是数据挖掘的一项重要任务,目前研究各种高性能和高可扩展性的分类算法是数据挖掘面临的主要问题之一。将遗传算法与分类规则挖掘问题相结合,提出了一种基于遗传算法的增量式的分类规则挖掘方法,并通过实例证明了该方法的有效性。此外,还提出了一种分类规则约简方法,使挖掘的结果更简洁、更易理解。  相似文献   

20.
Association rule mining is one of most popular data analysis methods that can discover associations within data. Association rule mining algorithms have been applied to various datasets, due to their practical usefulness. Little attention has been paid, however, on how to apply the association mining techniques to analyze questionnaire data. Therefore, this paper first identifies the various data types that may appear in a questionnaire. Then, we introduce the questionnaire data mining problem and define the rule patterns that can be mined from questionnaire data. A unified approach is developed based on fuzzy techniques so that all different data types can be handled in a uniform manner. After that, an algorithm is developed to discover fuzzy association rules from the questionnaire dataset. Finally, we evaluate the performance of the proposed algorithm, and the results indicate that our method is capable of finding interesting association rules that would have never been found by previous mining algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号