共查询到20条相似文献,搜索用时 62 毫秒
1.
《计算机科学与探索》2017,(10):1652-1661
人们倾向于使用少量的有代表性的特征来描述一条规则,而忽略极为次要的冗余的信息。经典的区间二型TSK(Takagi-Sugeno-Kang)模糊系统,在规则前件和后件部分会使用完整的数据特征空间,对于高维数据而言,易导致系统的复杂度增加和可解释性的损失。针对于此,提出了区间二型模糊子空间0阶TSK系统。在规则前件部分,使用模糊子空间聚类和网格划分相结合的方法生成稀疏的规整的规则中心,在规则后件部分,使用简化的0阶形式,从而得到规则语义更为简洁的区间二型模糊系统。在模拟和真实数据上的实验结果表明该方法分类效果良好,可解释性更好。 相似文献
2.
为了进一步提升Takagi-Sugeno-Kang(TSK)模糊分类器在不平衡数据集上的泛化能力和保持其较好的语义可解释性,受集成学习的启发,提出面向不平衡数据的深度TSK模糊分类器(A Deep TSK Fuzzy Classifier for Imbalanced Data, ID-TSK-FC).ID-TSK-FC主要由一个不平衡全局线性回归子分类器(Imbalanced Global Linear Regression Sub-Classifier, IGLRc)和多个不平衡TSK模糊子分类器(Imbalanced TSK Fuzzy Sub-Classifier, I-TSK-FC)组成.根据人类“从全局粗糙到局部精细”的认知行为和栈式叠加泛化原理,ID-TSK-FC首先在所有原始训练样本上训练一个IGLRc,获得全局粗糙的分类结果.然后根据IGLRc的输出,识别原始训练样本中的非线性分布训练样本.在非线性分布训练样本上,以栈式深度结构生成多个局部I-TSK-FC,获得局部精细的结果.最后,对于栈式堆叠IGLRc和所有I-TSK-FC的输出,使用基于最小距离投票原理,得到ID... 相似文献
3.
4.
5.
6.
7.
8.
不平衡数据分类方法综述 总被引:1,自引:0,他引:1
随着信息技术的快速发展,各领域的数据正以前所未有的速度产生并被广泛收集和存储,如何实现数据的智能化处理从而利用数据中蕴含的有价值信息已成为理论和应用的研究热点.数据分类作为一种基础的数据处理方法,已广泛应用于数据的智能化处理.传统分类方法通常假设数据类别分布均衡且错分代价相等,然而,现实中的数据通常具有不平衡特性,即某一类的样本数量要小于其他类的样本数量,且少数类具有更高错分代价.当利用传统的分类算法处理不平衡数据时,由于多数类和少数类在数量上的倾斜,以总体分类精度最大为目标会使得分类模型偏向于多数类而忽略少数类,造成少数类的分类精度较低.如何针对不平衡数据分类问题设计分类算法,同时保证不平衡数据中多数类与少数类的分类精度,已成为机器学习领域的研究热点,并相继出现了一系列优秀的不平衡数据分类方法.鉴于此,对现有的不平衡数据分类方法给出较为全面的梳理,从数据预处理层面、特征层面和分类算法层面总结和比较现有的不平衡数据分类方法,并结合当下机器学习的研究热点,探讨不平衡数据分类方法存在的挑战.最后展望不平衡数据分类未来的研究方向. 相似文献
9.
经典数据驱动型TSK模糊系统在利用高维数据训练模型时,由于规则前件采用的特征过多,导致规则的解释性和简洁性下降.对此,根据模糊子空间聚类算法的子空间特性,为TSK模型添加特征抽取机制,并进一步利用岭回归实现后件的学习,提出一种基于模糊子空间聚类的0阶岭回归TSK模型构建方法.该方法不仅能为规则抽取出重要子空间特征,而且可为不同规则抽取不同的特征.在模拟和真实数据集上的实验结果验证了所提出方法的优势. 相似文献
10.
11.
模糊系统的独特优势在于其高度的可解释性,然而传统的基于聚类的模糊系统往往需要使用输入空间的全部特征且常出现模糊集交叉的现象,系统的可解释性不高;此外,此类模糊系统对高维数据处理时还会因使用大量的特征而使规则过于复杂.针对此问题,探讨了一种知识嵌入的贝叶斯MA型模糊系统(knowledge embedded Bayesian Mamdan-Assilan type fuzzy system, KE-B-MA).首先,KE-B-MA使用DC(dont care)方法进行知识嵌入的模糊集划分,对模糊隶属度函数中心和输入空间特征的选择进行有效指导,其获得的规则可对应于不同的特征空间.其次,KE-B-MA基于贝叶斯推理使用马尔可夫蒙特卡洛(Markov chain Monte Carlo, MCMC)方法对模糊规则的前后件参数同时学习,所得结果为全局最优解.实验结果表明:与一些经典模糊系统相比,KE-B-MA具有令人满意的分类性能且具有更强的可解释性和清晰性. 相似文献
12.
聚类集成能成为机器学习活跃的研究热点,是因为聚类集成能够保护私有信息、分布式处理数据和对知识进行重用,此外,噪声和孤立点对结果的影响较小.主要工作包括:第一,分析了把每一个基聚类器看成是原数据的一个属性这种处理方式的优越性,发现按此方法建立起来的聚类集成算法就具有良好的扩展性和灵活性;第二,在此基础之上,建立了latent variable cluster ensemble(LVCE)概率模型进行聚类集成,并且给出了LVCE 模型的Markovchain Monte Carlo(MCMC)算法.实验结果表明,LVCE 模型的MCMC 算法能够进行聚类集成并且达到良好的效果,同时可以体现数据聚类的紧密程度. 相似文献
13.
Item response theory is one of the modern test theories with applications in educational and psychological testing. Recent developments made it possible to characterize some desired properties in terms of a collection of manifest ones, so that hypothesis tests on these traits can, in principle, be performed. But the existing test methodology is based on asymptotic approximation, which is impractical in most applications since the required sample sizes are often unrealistically huge. To overcome this problem, a class of tests is proposed for making exact statistical inference about four manifest properties: covariances given the sum are non-positive (CSN), manifest monotonicity (MM), conditional association (CA), and vanishing conditional dependence (VCD). One major advantage is that these exact tests do not require large sample sizes. As a result, tests for CSN and MM can be routinely performed in empirical studies. For testing CA and VCD, the exact methods are still impractical in most applications, due to the unusually large number of parameters to be tested. However, exact methods are still derived for them as an exploration toward practicality. Some numerical examples with applications of the exact tests for CSN and MM are provided. 相似文献
14.
A Markov chain Monte Carlo method has previously been introduced to estimate weighted sums in multiplicative weight update
algorithms when the number of inputs is exponential. However, the original algorithm still required extensive simulation of
the Markov chain in order to get accurate estimates of the weighted sums. We propose an optimized version of the original
algorithm that produces exactly the same classifications while often using fewer Markov chain simulations. We also apply three
other sampling techniques and empirically compare them with the original Metropolis sampler to determine how effective each
is in drawing good samples in the least amount of time, in terms of accuracy of weighted sum estimates and in terms of Winnow’s
prediction accuracy. We found that two other samplers (Gibbs and Metropolized Gibbs) were slightly better than Metropolis
in their estimates of the weighted sums. For prediction errors, there is little difference between any pair of MCMC techniques
we tested. Also, on the data sets we tested, we discovered that all approximations of Winnow have no disadvantage when compared
to brute force Winnow (where weighted sums are exactly computed), so generalization accuracy is not compromised by our approximation.
This is true even when very small sample sizes and mixing times are used.
An early version of this paper appeared as Tao and Scott (2003). 相似文献
15.
《Computer Speech and Language》2014,28(5):1139-1155
This paper presents a new glottal inverse filtering (GIF) method that utilizes a Markov chain Monte Carlo (MCMC) algorithm. First, initial estimates of the vocal tract and glottal flow are evaluated by an existing GIF method, iterative adaptive inverse filtering (IAIF). Simultaneously, the initially estimated glottal flow is synthesized using the Rosenberg–Klatt (RK) model and filtered with the estimated vocal tract filter to create a synthetic speech frame. In the MCMC estimation process, the first few poles of the initial vocal tract model and the RK excitation parameter are refined in order to minimize the error between the synthetic and original speech signals in the time and frequency domain. MCMC approximates the posterior distribution of the parameters, and the final estimate of the vocal tract is found by averaging the parameter values of the Markov chain. Experiments with synthetic vowels produced by a physical modeling approach show that the MCMC-based GIF method gives more accurate results compared to two known reference methods. 相似文献
16.
In Bayesian signal processing, all the information about the unknowns of interest is contained in their posterior distributions. The unknowns can be parameters of a model, or a model and its parameters. In many important problems, these distributions are impossible to obtain in analytical form. An alternative is to generate their approximations by Monte Carlo-based methods like Markov chain Monte Carlo (MCMC) sampling, adaptive importance sampling (AIS) or particle filtering (PF). While MCMC sampling and PF have received considerable attention in the literature and are reasonably well understood, the AIS methodology remains relatively unexplored. This article reviews the basics of AIS as well as provides a comprehensive survey of the state-of-the-art of the topic. Some of its most relevant implementations are revisited and compared through computer simulation examples. 相似文献
17.
Fabien Campillo Rivo Rakotozafy Vivien Rossi 《Mathematics and computers in simulation》2009,79(12):3424
In many situations it is important to be able to propose N independent realizations of a given distribution law. We propose a strategy for making N parallel Monte Carlo Markov chains (MCMC) interact in order to get an approximation of an independent N-sample of a given target law. In this method each individual chain proposes candidates for all other chains. We prove that the set of interacting chains is itself a MCMC method for the product of N target measures. Compared to independent parallel chains this method is more time consuming, but we show through examples that it possesses many advantages. This approach is applied to a biomass evolution model. 相似文献
18.
Phillip Li 《Computational statistics & data analysis》2011,55(2):1099-1108
This paper focuses on estimating sample selection models with two incidentally truncated outcomes and two corresponding selection mechanisms. The method of estimation is an extension of the Markov chain Monte Carlo (MCMC) sampling algorithm from Chib (2007) and Chib et al. (2009). Contrary to conventional data augmentation strategies when dealing with missing data, the proposed algorithm augments the posterior with only a small subset of the total missing data caused by sample selection. This results in improved convergence of the MCMC chain and decreased storage costs, while maintaining tractability in the sampling densities. The methods are applied to estimate the effects of residential density on vehicle miles traveled and vehicle holdings in California. 相似文献
19.
While many implementations of Bayesian neural networks use large, complex hierarchical priors, in much of modern Bayesian statistics, noninformative (flat) priors are very common. This paper introduces a noninformative prior for feed-forward neural networks, describing several theoretical and practical advantages of this approach. In particular, a simpler prior allows for a simpler Markov chain Monte Carlo algorithm. Details of MCMC implementation are included. 相似文献
20.
We show that sampling with a biased Metropolis scheme is essentially equivalent to using the heatbath algorithm. However, the biased Metropolis method can also be applied when an efficient heatbath algorithm does not exist. This is first illustrated with an example from high energy physics (lattice gauge theory simulations). We then illustrate the Rugged Metropolis method, which is based on a similar biased updating scheme, but aims at very different applications. The goal of such applications is to locate the most likely configurations in a rugged free energy landscape, which is most relevant for simulations of biomolecules. 相似文献