首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
《计算机科学与探索》2017,(10):1652-1661
人们倾向于使用少量的有代表性的特征来描述一条规则,而忽略极为次要的冗余的信息。经典的区间二型TSK(Takagi-Sugeno-Kang)模糊系统,在规则前件和后件部分会使用完整的数据特征空间,对于高维数据而言,易导致系统的复杂度增加和可解释性的损失。针对于此,提出了区间二型模糊子空间0阶TSK系统。在规则前件部分,使用模糊子空间聚类和网格划分相结合的方法生成稀疏的规整的规则中心,在规则后件部分,使用简化的0阶形式,从而得到规则语义更为简洁的区间二型模糊系统。在模拟和真实数据上的实验结果表明该方法分类效果良好,可解释性更好。  相似文献   

2.
为了进一步提升Takagi-Sugeno-Kang(TSK)模糊分类器在不平衡数据集上的泛化能力和保持其较好的语义可解释性,受集成学习的启发,提出面向不平衡数据的深度TSK模糊分类器(A Deep TSK Fuzzy Classifier for Imbalanced Data, ID-TSK-FC).ID-TSK-FC主要由一个不平衡全局线性回归子分类器(Imbalanced Global Linear Regression Sub-Classifier, IGLRc)和多个不平衡TSK模糊子分类器(Imbalanced TSK Fuzzy Sub-Classifier, I-TSK-FC)组成.根据人类“从全局粗糙到局部精细”的认知行为和栈式叠加泛化原理,ID-TSK-FC首先在所有原始训练样本上训练一个IGLRc,获得全局粗糙的分类结果.然后根据IGLRc的输出,识别原始训练样本中的非线性分布训练样本.在非线性分布训练样本上,以栈式深度结构生成多个局部I-TSK-FC,获得局部精细的结果.最后,对于栈式堆叠IGLRc和所有I-TSK-FC的输出,使用基于最小距离投票原理,得到ID...  相似文献   

3.
不平衡数据分类研究综述   总被引:2,自引:1,他引:1  
赵楠  张小芳  张利军 《计算机科学》2018,45(Z6):22-27, 57
在很多应用领域中,数据的类别分布不平衡,如何对其正确分类是数据挖掘和机器学习领域中的研究热点。经典的数据分类算法未考虑数据类别的不平衡性,认为类别之间的误分类代价相同,导致不平衡数据分类的效果不理想。针对数据分类的各个步骤,相继提出了不同的不平衡数据分类处理方法。对多年来的相关研究成果进行归类分析,从特征选择、数据分布调整、分类算法、分类结果评估等几个方面系统地介绍了相关方法,并探讨了进一步的探索方向。  相似文献   

4.
针对现有分类算法通常对不平衡数据挖掘表现出有偏性,即正类样本(通常是更重要的一类)的分类和预测性能差于负类样本的分类和预测性能,提出一种不平衡数据分类方法。该方法通过一个超球面将两类数据以最大分离比率分离,并且引入类权重因子和样本模糊隶属度,同时考虑了不同类的重要性和不同样本对该类的不同贡献,从而提高了不平衡数据中正类的分类和预测的性能以及整体的推广能力。分别在人造数据和UCI真实数据上进行了实验,结果验证了该方法的有效性。  相似文献   

5.
不平衡数据集的分类方法研究   总被引:2,自引:0,他引:2  
传统的分类算法在处理不平衡数据分类问题时会倾向于多数类,而导致少数类的分类精度较低。针对不平衡数据的分类,首先介绍了现有不平衡数据分类的性能评价;然后介绍了现有常用的基于数据采样的方法及现有的分类方法;最后介绍了基于数据采样和分类方法结合的综合方法。  相似文献   

6.
用于不平衡数据分类的FE-SVDD算法   总被引:1,自引:0,他引:1  
现有的支持向量数据描述(SVDD)算法在解决不平衡数据集问题时通常存在有偏性。针对该问题,在研究PCA特征提取技术和SVDD分类理论的基础上,提出一种用于平衡数据分类的FE-SVDD算法。该方法对2类样本数据进行主成分分析,分别求出主要特征值,根据样本容量及特征值对SVDD中的 值重新定义。在人工样本集和UCI数据集上进行实验,结果验证了该方法的有效性。  相似文献   

7.
不平衡数据分类的研究现状*   总被引:6,自引:3,他引:6  
不平衡数据在实际应用中广泛存在,它们已对机器学习领域构成了一个挑战,如何有效处理不平衡数据也成为目前的一个新的研究热点.综述了这一新领域的研究现状,包括该领域最新研究内容、方法及成果.  相似文献   

8.
不平衡数据分类方法综述   总被引:1,自引:0,他引:1  
随着信息技术的快速发展,各领域的数据正以前所未有的速度产生并被广泛收集和存储,如何实现数据的智能化处理从而利用数据中蕴含的有价值信息已成为理论和应用的研究热点.数据分类作为一种基础的数据处理方法,已广泛应用于数据的智能化处理.传统分类方法通常假设数据类别分布均衡且错分代价相等,然而,现实中的数据通常具有不平衡特性,即某一类的样本数量要小于其他类的样本数量,且少数类具有更高错分代价.当利用传统的分类算法处理不平衡数据时,由于多数类和少数类在数量上的倾斜,以总体分类精度最大为目标会使得分类模型偏向于多数类而忽略少数类,造成少数类的分类精度较低.如何针对不平衡数据分类问题设计分类算法,同时保证不平衡数据中多数类与少数类的分类精度,已成为机器学习领域的研究热点,并相继出现了一系列优秀的不平衡数据分类方法.鉴于此,对现有的不平衡数据分类方法给出较为全面的梳理,从数据预处理层面、特征层面和分类算法层面总结和比较现有的不平衡数据分类方法,并结合当下机器学习的研究热点,探讨不平衡数据分类方法存在的挑战.最后展望不平衡数据分类未来的研究方向.  相似文献   

9.
经典数据驱动型TSK模糊系统在利用高维数据训练模型时,由于规则前件采用的特征过多,导致规则的解释性和简洁性下降.对此,根据模糊子空间聚类算法的子空间特性,为TSK模型添加特征抽取机制,并进一步利用岭回归实现后件的学习,提出一种基于模糊子空间聚类的0阶岭回归TSK模型构建方法.该方法不仅能为规则抽取出重要子空间特征,而且可为不同规则抽取不同的特征.在模拟和真实数据集上的实验结果验证了所提出方法的优势.  相似文献   

10.
基于不平衡数据分类的一种平衡模糊支持向量机   总被引:1,自引:1,他引:0  
秦传东  刘三阳  张市芳 《计算机科学》2012,39(6):188-190,212
鉴于不平衡数据集中类不平衡比较大的分类问题,利用样本点的特性建立类不平衡调节因子和模糊隶属度,提出了平衡模糊支持向量机。首先计算样本协方差矩阵,求得类不平衡调节因子,然后计算各样本点的模糊隶属度,得到各样本对分类超平面的贡献率。类平衡调节因子和模糊隶属度同时对分类器的误差项产生影响。结果表明,这种平衡模糊支持向量机对类不平衡比较大的分类问题具有很好的分类效果。  相似文献   

11.
模糊系统的独特优势在于其高度的可解释性,然而传统的基于聚类的模糊系统往往需要使用输入空间的全部特征且常出现模糊集交叉的现象,系统的可解释性不高;此外,此类模糊系统对高维数据处理时还会因使用大量的特征而使规则过于复杂.针对此问题,探讨了一种知识嵌入的贝叶斯MA型模糊系统(knowledge embedded Bayesian Mamdan-Assilan type fuzzy system, KE-B-MA).首先,KE-B-MA使用DC(dont care)方法进行知识嵌入的模糊集划分,对模糊隶属度函数中心和输入空间特征的选择进行有效指导,其获得的规则可对应于不同的特征空间.其次,KE-B-MA基于贝叶斯推理使用马尔可夫蒙特卡洛(Markov chain Monte Carlo, MCMC)方法对模糊规则的前后件参数同时学习,所得结果为全局最优解.实验结果表明:与一些经典模糊系统相比,KE-B-MA具有令人满意的分类性能且具有更强的可解释性和清晰性.  相似文献   

12.
基于隐含变量的聚类集成模型   总被引:5,自引:1,他引:5       下载免费PDF全文
王红军  李志蜀  成飏  周鹏  周维 《软件学报》2009,20(4):825-833
聚类集成能成为机器学习活跃的研究热点,是因为聚类集成能够保护私有信息、分布式处理数据和对知识进行重用,此外,噪声和孤立点对结果的影响较小.主要工作包括:第一,分析了把每一个基聚类器看成是原数据的一个属性这种处理方式的优越性,发现按此方法建立起来的聚类集成算法就具有良好的扩展性和灵活性;第二,在此基础之上,建立了latent variable cluster ensemble(LVCE)概率模型进行聚类集成,并且给出了LVCE 模型的Markovchain Monte Carlo(MCMC)算法.实验结果表明,LVCE 模型的MCMC 算法能够进行聚类集成并且达到良好的效果,同时可以体现数据聚类的紧密程度.  相似文献   

13.
Item response theory is one of the modern test theories with applications in educational and psychological testing. Recent developments made it possible to characterize some desired properties in terms of a collection of manifest ones, so that hypothesis tests on these traits can, in principle, be performed. But the existing test methodology is based on asymptotic approximation, which is impractical in most applications since the required sample sizes are often unrealistically huge. To overcome this problem, a class of tests is proposed for making exact statistical inference about four manifest properties: covariances given the sum are non-positive (CSN), manifest monotonicity (MM), conditional association (CA), and vanishing conditional dependence (VCD). One major advantage is that these exact tests do not require large sample sizes. As a result, tests for CSN and MM can be routinely performed in empirical studies. For testing CA and VCD, the exact methods are still impractical in most applications, due to the unusually large number of parameters to be tested. However, exact methods are still derived for them as an exploration toward practicality. Some numerical examples with applications of the exact tests for CSN and MM are provided.  相似文献   

14.
A Markov chain Monte Carlo method has previously been introduced to estimate weighted sums in multiplicative weight update algorithms when the number of inputs is exponential. However, the original algorithm still required extensive simulation of the Markov chain in order to get accurate estimates of the weighted sums. We propose an optimized version of the original algorithm that produces exactly the same classifications while often using fewer Markov chain simulations. We also apply three other sampling techniques and empirically compare them with the original Metropolis sampler to determine how effective each is in drawing good samples in the least amount of time, in terms of accuracy of weighted sum estimates and in terms of Winnow’s prediction accuracy. We found that two other samplers (Gibbs and Metropolized Gibbs) were slightly better than Metropolis in their estimates of the weighted sums. For prediction errors, there is little difference between any pair of MCMC techniques we tested. Also, on the data sets we tested, we discovered that all approximations of Winnow have no disadvantage when compared to brute force Winnow (where weighted sums are exactly computed), so generalization accuracy is not compromised by our approximation. This is true even when very small sample sizes and mixing times are used. An early version of this paper appeared as Tao and Scott (2003).  相似文献   

15.
This paper presents a new glottal inverse filtering (GIF) method that utilizes a Markov chain Monte Carlo (MCMC) algorithm. First, initial estimates of the vocal tract and glottal flow are evaluated by an existing GIF method, iterative adaptive inverse filtering (IAIF). Simultaneously, the initially estimated glottal flow is synthesized using the Rosenberg–Klatt (RK) model and filtered with the estimated vocal tract filter to create a synthetic speech frame. In the MCMC estimation process, the first few poles of the initial vocal tract model and the RK excitation parameter are refined in order to minimize the error between the synthetic and original speech signals in the time and frequency domain. MCMC approximates the posterior distribution of the parameters, and the final estimate of the vocal tract is found by averaging the parameter values of the Markov chain. Experiments with synthetic vowels produced by a physical modeling approach show that the MCMC-based GIF method gives more accurate results compared to two known reference methods.  相似文献   

16.
    
In Bayesian signal processing, all the information about the unknowns of interest is contained in their posterior distributions. The unknowns can be parameters of a model, or a model and its parameters. In many important problems, these distributions are impossible to obtain in analytical form. An alternative is to generate their approximations by Monte Carlo-based methods like Markov chain Monte Carlo (MCMC) sampling, adaptive importance sampling (AIS) or particle filtering (PF). While MCMC sampling and PF have received considerable attention in the literature and are reasonably well understood, the AIS methodology remains relatively unexplored. This article reviews the basics of AIS as well as provides a comprehensive survey of the state-of-the-art of the topic. Some of its most relevant implementations are revisited and compared through computer simulation examples.  相似文献   

17.
In many situations it is important to be able to propose N independent realizations of a given distribution law. We propose a strategy for making N parallel Monte Carlo Markov chains (MCMC) interact in order to get an approximation of an independent N-sample of a given target law. In this method each individual chain proposes candidates for all other chains. We prove that the set of interacting chains is itself a MCMC method for the product of N target measures. Compared to independent parallel chains this method is more time consuming, but we show through examples that it possesses many advantages. This approach is applied to a biomass evolution model.  相似文献   

18.
This paper focuses on estimating sample selection models with two incidentally truncated outcomes and two corresponding selection mechanisms. The method of estimation is an extension of the Markov chain Monte Carlo (MCMC) sampling algorithm from Chib (2007) and Chib et al. (2009). Contrary to conventional data augmentation strategies when dealing with missing data, the proposed algorithm augments the posterior with only a small subset of the total missing data caused by sample selection. This results in improved convergence of the MCMC chain and decreased storage costs, while maintaining tractability in the sampling densities. The methods are applied to estimate the effects of residential density on vehicle miles traveled and vehicle holdings in California.  相似文献   

19.
Lee  Herbert K.H. 《Machine Learning》2003,50(1-2):197-212
While many implementations of Bayesian neural networks use large, complex hierarchical priors, in much of modern Bayesian statistics, noninformative (flat) priors are very common. This paper introduces a noninformative prior for feed-forward neural networks, describing several theoretical and practical advantages of this approach. In particular, a simpler prior allows for a simpler Markov chain Monte Carlo algorithm. Details of MCMC implementation are included.  相似文献   

20.
We show that sampling with a biased Metropolis scheme is essentially equivalent to using the heatbath algorithm. However, the biased Metropolis method can also be applied when an efficient heatbath algorithm does not exist. This is first illustrated with an example from high energy physics (lattice gauge theory simulations). We then illustrate the Rugged Metropolis method, which is based on a similar biased updating scheme, but aims at very different applications. The goal of such applications is to locate the most likely configurations in a rugged free energy landscape, which is most relevant for simulations of biomolecules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号