首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Prediction of diseases would help physicians to make informal decision regarding the type of treatment. Jaundice is the most common condition that requires medical attention in newborn babies. Although most newborns develop some degree of jaundice, a high level bilirubin puts a newborn at risk of bilirubin encephalopathy and kernicterus, which are rare but still occur in Egypt. This paper presents a new weighted rough set framework for early intervention and prevention of neurological dysfunction and kernicterus that are catastrophic sequels of neonatal jaundice. The obtained results illustrate that the weighted rough set can provide significantly more accurate and reliable predictive accuracy than well known algorithms such as weighted SVM and decision tree considering the fact that physicians do not have any estimation about probability of jaundice appearance.  相似文献   

2.
Class imbalance limits the performance of most learning algorithms since they cannot cope with large differences between the number of samples in each class, resulting in a low predictive accuracy over the minority class. In this respect, several papers proposed algorithms aiming at achieving more balanced performance. However, balancing the recognition accuracies for each class very often harms the global accuracy. Indeed, in these cases the accuracy over the minority class increases while the accuracy over the majority one decreases. This paper proposes an approach to overcome this limitation: for each classification act, it chooses between the output of a classifier trained on the original skewed distribution and the output of a classifier trained according to a learning method addressing the course of imbalanced data. This choice is driven by a parameter whose value maximizes, on a validation set, two objective functions, i.e. the global accuracy and the accuracies for each class. A series of experiments on ten public datasets with different proportions between the majority and minority classes show that the proposed approach provides more balanced recognition accuracies than classifiers trained according to traditional learning methods for imbalanced data as well as larger global accuracy than classifiers trained on the original skewed distribution.  相似文献   

3.
MGRS: A multi-granulation rough set   总被引:4,自引:0,他引:4  
The original rough set model was developed by Pawlak, which is mainly concerned with the approximation of sets described by a single binary relation on the universe. In the view of granular computing, the classical rough set theory is established through a single granulation. This paper extends Pawlak’s rough set model to a multi-granulation rough set model (MGRS), where the set approximations are defined by using multi equivalence relations on the universe. A number of important properties of MGRS are obtained. It is shown that some of the properties of Pawlak’s rough set theory are special instances of those of MGRS.Moreover, several important measures, such as accuracy measureα, quality of approximationγ and precision of approximationπ, are presented, which are re-interpreted in terms of a classic measure based on sets, the Marczewski-Steinhaus metric and the inclusion degree measure. A concept of approximation reduct is introduced to describe the smallest attribute subset that preserves the lower approximation and upper approximation of all decision classes in MGRS as well. Finally, we discuss how to extract decision rules using MGRS. Unlike the decision rules (“AND” rules) from Pawlak’s rough set model, the form of decision rules in MGRS is “OR”. Several pivotal algorithms are also designed, which are helpful for applying this theory to practical issues. The multi-granulation rough set model provides an effective approach for problem solving in the context of multi granulations.  相似文献   

4.
处理类不平衡数据时,少数类的边界实例非常容易被错分。为了降低类不平衡对分类器性能的影响,提出了自适应边界采样算法(AB-SMOTE)。AB-SMOTE算法对少数类的边界样本进行自适应采样,提高了数据集的平衡度和有效性。同时将AB-SMOTE算法与数据清理技术融合,形成基于AdaBoost的集成算法ABTAdaBoost。ABTAdaBoost算法主要包括三个阶段:第一个阶段对训练数据集采用AB-SMOTE算法,降低数据集的类不平衡度;第二个阶段使用Tomek links数据清理技术,清除数据集中的噪声和抽样方法产生的重叠样例,有效提高数据的可用性;第三个阶段使用AdaBoost集成算法生成一个基于N个弱分类器的集成分类器。实验分别以J48决策树和朴素贝叶斯作为基分类器,在12个UCI数据集上的实验结果表明:ABTAdaBoost算法的预测性能优于其它几种算法。  相似文献   

5.
一种基于粗糙集理论的最简决策规则挖掘算法   总被引:1,自引:2,他引:1       下载免费PDF全文
钱进  孟祥萍  刘大有  叶飞跃 《控制与决策》2007,22(12):1368-1372
研究粗糙集理论中可辨识矩阵,扩展了类别特征矩阵,提出一种基于粗糙集理论的最筒决策规则算法.该算法根据决策属性将原始决策表分成若干个等价子决策表.借助核属性和属性频率函数对各类别特征矩阵挖掘出最简决策规则.与可辨识矩阵相比,采用类别特征矩阵可有效减少存储空间和时间复杂度。增强规则的泛化能力.实验结果表明,采用所提出的算法获得的规则更为简洁和高效.  相似文献   

6.
The incremental technique is a way to solve the issue of added-in data without re-implementing the original algorithm in a dynamic database. There are numerous studies of incremental rough set based approaches. However, these approaches are applied to traditional rough set based rule induction, which may generate redundant rules without focus, and they do not verify the classification of a decision table. In addition, these previous incremental approaches are not efficient in a large database. In this paper, an incremental rule-extraction algorithm based on the previous rule-extraction algorithm is proposed to resolve there aforementioned issues. Applying this algorithm, while a new object is added to an information system, it is unnecessary to re-compute rule sets from the very beginning. The proposed approach updates rule sets by partially modifying the original rule sets, which increases the efficiency. This is especially useful while extracting rules in a large database.  相似文献   

7.
Hepatitis is a disease which is seen at all levels of age. Hepatitis disease solely does not have a lethal effect, but the early diagnosis and treatment of hepatitis is crucial as it triggers other diseases. In this study, a new hybrid medical decision support system based on rough set (RS) and extreme learning machine (ELM) has been proposed for the diagnosis of hepatitis disease. RS-ELM consists of two stages. In the first one, redundant features have been removed from the data set through RS approach. In the second one, classification process has been implemented through ELM by using remaining features. Hepatitis data set, taken from UCI machine learning repository has been used to test the proposed hybrid model. A major part of the data set (48.3%) includes missing values. As removal of missing values from the data set leads to data loss, feature selection has been done in the first stage without deleting missing values. In the second stage, the classification process has been performed through ELM after the removal of missing values from sub-featured data sets that were reduced in different dimensions. The results showed that the highest 100.00% classification accuracy has been achieved through RS-ELM and it has been observed that RS-ELM model has been considerably successful compared to the other methods in the literature. Furthermore in this study, the most significant features have been determined for the diagnosis of the hepatitis. It is considered that proposed method is to be useful in similar medical applications.  相似文献   

8.
基于粗糙集理论的神经网络研究及应用   总被引:2,自引:0,他引:2  
张赢  李琛 《控制与决策》2007,22(4):462-464
为了补偿神经网络的黑箱特性并提高其工作性能,将粗糙集理论同神经网络结合起来,提出一种基于粗糙集的神经网络体系结构.首先,利用粗糙集理论对神经网络初始化参数的选择和确定进行指导,赋予各参数相关的物理意义;然后,以系统输出误差最小化为目标对粗糙神经网络进行训练,使其满足性能要求.实验结果表明,粗糙神经网络能较好地完成数据挖掘任务,并能获得较高的分类精度.  相似文献   

9.
为解决MCCNN网络立体匹配的训练数据集选择问题,研究一种基于相关性比较、余弦相似性和结构相似性的加权度量选择方法,通过实验确定三者的加权系数,使用三者的加权值衡量训练集与待匹配图像数据分布的互相似性、训练集本身的自相似性,以互相似性和自相似性加和值最高的对应数据集作为选择的训练集.通过InStereo2k图像和实拍图...  相似文献   

10.
Multi-Source Information Fusion (MSIF) is a comprehensive and interdisciplinary subject, and is referred to as, multi-sensor information fusion which was originated in the 1970s. Nowadays, the types and updates of data are becoming more multifarious and frequent, which bring new challenges for information fusion to deal with the multi-source data. Consequently, the construction of MSIF models suitable for different scenarios and the application of different fusion technologies are the core problems that need to be solved urgently. Rough set theory (RST) provides a computing paradigm for uncertain data modeling and reasoning, especially for classification issues with noisy, inaccurate or incomplete data. Furthermore, due to the rapid development of MSIF in recent years, the methodologies of learning under RST are becoming increasingly mature and systematic, unveiling a framework which has not been mentioned in the literature. In order to better clarify the approaches and application of MSIF in RST research community, this paper reviews the existing models and technologies from the perspectives of MSIF model (i.e., homogeneous and heterogeneous MSIF model), multi-view rough sets information fusion model (i.e., multi-granulation, multi-scale and multi-view decisions information fusion models), parallel computing information fusion model, incremental learning fusion technology and cluster ensembles fusion technology. Finally, RST based MSIF related research directions and challenges are also covered and discussed. By providing state-of-the-art understanding in specialized literature, this survey will directly help researchers understand the research developments of MSIF under RST.  相似文献   

11.
增量式学习中,当向决策表中增加一个新例子时,为了获得极小决策规则集,一般方法是对决策表中的所有数据重新计算。但这种方法显然效率很低,而且也是不必要的。论文从粗集理论出发,提出了一种最小重新计算的标准,并在此基础上,给出了一个增量式学习的改进算法。该算法在一定程度上优于传统的增量式学习算法。  相似文献   

12.
随着计算机网络的发展,传统的计算机安全理论己不能适应网络环境的发展变化。传统的入侵检测系统在有效性、适应性和可扩展性方面存在一定的不足。因此,神经网络、遗传算法、粗糙集等理论被不断引入到入侵检测领域,来提高入侵检测的性能。本文主要是在对提高入侵检测能力的有效手段方面作了一些探讨。  相似文献   

13.
The classification of imbalanced data is a major challenge for machine learning. In this paper, we presented a fuzzy total margin based support vector machine (FTM-SVM) method to handle the class imbalance learning (CIL) problem in the presence of outliers and noise. The proposed method incorporates total margin algorithm, different cost functions and the proper approach of fuzzification of the penalty into FTM-SVM and formulates them in nonlinear case. We considered an excellent type of fuzzy membership functions to assign fuzzy membership values and got six FTM-SVM settings. We evaluated the proposed FTM-SVM method on two artificial data sets and 16 real-world imbalanced data sets. Experimental results show that the proposed FTM-SVM method has higher G_Mean and F_Measure values than some existing CIL methods. Based on the overall results, we can conclude that the proposed FTM-SVM method is effective for CIL problem, especially in the presence of outliers and noise in data sets.  相似文献   

14.
Neighborhood rough set based heterogeneous feature subset selection   总被引:6,自引:0,他引:6  
Feature subset selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining. Most of researches are focused on dealing with homogeneous feature selection, namely, numerical or categorical features. In this paper, we introduce a neighborhood rough set model to deal with the problem of heterogeneous feature subset selection. As the classical rough set model can just be used to evaluate categorical features, we generalize this model with neighborhood relations and introduce a neighborhood rough set model. The proposed model will degrade to the classical one if we specify the size of neighborhood zero. The neighborhood model is used to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. In this model the sizes of the neighborhood lower and upper approximations of decisions reflect the discriminating capability of feature subsets. The size of lower approximation is computed as the dependency between decision and condition attributes. We use the neighborhood dependency to evaluate the significance of a subset of heterogeneous features and construct forward feature subset selection algorithms. The proposed algorithms are compared with some classical techniques. Experimental results show that the neighborhood model based method is more flexible to deal with heterogeneous data.  相似文献   

15.
Fuzzy rough set theory for the interval-valued fuzzy information systems   总被引:1,自引:0,他引:1  
The concept of the rough set was originally proposed by Pawlak as a formal tool for modelling and processing incomplete information in information systems, then in 1990, Dubois and Prade first introduced the rough fuzzy sets and fuzzy rough sets as a fuzzy extension of the rough sets. The aim of this paper is to present a new extension of the rough set theory by means of integrating the classical Pawlak rough set theory with the interval-valued fuzzy set theory, i.e., the interval-valued fuzzy rough set model is presented based on the interval-valued fuzzy information systems which is defined in this paper by a binary interval-valued fuzzy relations RF(i)(U×U) on the universe U. Several properties of the rough set model are given, and the relationships of this model and the others rough set models are also examined. Furthermore, we also discuss the knowledge reduction of the classical Pawlak information systems and the interval-valued fuzzy information systems respectively. Finally, the knowledge reduction theorems of the interval-valued fuzzy information systems are built.  相似文献   

16.
研究并提出了一种图像边缘检测算法,对被脉冲噪声污染的图像进行预处理,从而便于图像的分析、理解等后续操作。与一般算法不同,该算法从图像的边缘特性入手来解决边缘检测问题,以由特性决定的一系列边缘点的约束条件作为算法基础,并用粗糙集理论解决了这些条件的相关问题,进而建立了整个边缘检测算法。计算机仿真表明,本文的算法能够有效地从含噪图像中提取边缘信息,较好地克服了传统算法对噪声的敏感性问题。  相似文献   

17.
摘 要:目的:图像阈值化将灰度图像转换为二值图像,被广泛应用于多个领域。因实际工程应用中固有的不确定性,自动阈值选择仍然是一个极具挑战的课题。针对图像自动阈值化问题,提出了一种利用粗糙集的自适应方法。方法:该方法分析了基于粗糙集的图像表示框架,建立了图像粗糙粒度与局部灰度标准差的相互关系,通过最小化自适应粗糙粒度准则获得最优的划分粒度。进一步在该粒度下构造了图像目标和背景的上下近似集及其粗糙不确定度,通过搜索灰度级最大化粗糙熵获得图像最优灰度阈值,并将图像目标和背景的边界作为过渡区,利用其灰度均值作为阈值完成图像二值化。结果:对所提出的方法通过多个图像分三组进行了实验比较,包括三种经典阈值化方法和一种利用粗糙集的方法。其中,所提出的方法生成的可视化二值图像结果远远优于传统粗糙集阈值化方法。此外,也采用了误分率、平均结构相似性、假阴率和假阳率等指标进一步量化评估与比较相关实验结果。定性和定量的实验结果表明,所提出方法的图像分割质量较高、性能稳定。结论:所提出的方法适应能力较好,具有合理性和有效性,可以作为现有经典方法的有力补充。  相似文献   

18.
提出一种集成粗糙集理论的RBF网络设计方法.由布尔逻辑推理方法进行属性离散化,得到初始决策模式集,通过差异度对初始决策模式的相似度进行衡量并实现聚类,以聚类决策模式构造RBF网络.为加快训练速度,分别对隐层参数和输出权值采用BP算法和线性最小二乘滤波法进行训练.实验结果表明,该方法设计的RBF网络结构简洁,泛化性能良好,混合学习算法的收敛速度优于单纯的BP算法.  相似文献   

19.
连续属性离散化是Rough集理论应用中面临的主要问题之一.提出了一种基于的Rough集连续属性离散化方法.首先提出主泛化决策等概念,在数据过滤方法的基础上,利用等价类的合并对属性离散化.实验表明,利用该方法对数据进行离散预处理后提取的规则具有较好的分类预测准确性.  相似文献   

20.
Semi-supervised outlier detection based on fuzzy rough C-means clustering   总被引:1,自引:0,他引:1  
This paper presents a fuzzy rough semi-supervised outlier detection (FRSSOD) approach with the help of some labeled samples and fuzzy rough C-means clustering. This method introduces an objective function, which minimizes the sum squared error of clustering results and the deviation from known labeled examples as well as the number of outliers. Each cluster is represented by a center, a crisp lower approximation and a fuzzy boundary by using fuzzy rough C-means clustering and only those points located in boundary can be further discussed the possibility to be reassigned as outliers. As a result, this method can obtain better clustering results for normal points and better accuracy for outlier detection. Experiment results show that the proposed method, on average, keep, or improve the detection precision and reduce false alarm rate as well as reduce the number of candidate outliers to be discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号