首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
非平衡数据集分类方法探讨   总被引:2,自引:1,他引:1  
由于数据集中类分布极不平衡,很多分类算法在非平衡数据集上失效,而非平衡数据集中占少数的类在现实生活中通常具有显著意义,因此如何提高非平衡数据集中少数类的分类性能成为近年来研究的热点。详细讨论了非平衡数据集分类问题的本质、影响非平衡数据集分类的因素、非平衡数据集分类通常采用的方法、常用的评估标准以及该问题中存在的问题与挑战。  相似文献   

2.
不平衡数据集的分类方法研究   总被引:2,自引:0,他引:2  
传统的分类算法在处理不平衡数据分类问题时会倾向于多数类,而导致少数类的分类精度较低。针对不平衡数据的分类,首先介绍了现有不平衡数据分类的性能评价;然后介绍了现有常用的基于数据采样的方法及现有的分类方法;最后介绍了基于数据采样和分类方法结合的综合方法。  相似文献   

3.
 由于二手车推荐的数据集具有非平衡特性,因此,二手车推荐可视为非平衡分类问题,可借助解决非平衡分类问题的方法来实现二手车推荐。本文对非平衡数据分类的数据集重构进行研究,通过分析合成少数类过采样方法(Synthetic Minority Over-sampling Technique, SMOTE)的特点与不足,提出合成少数类过采样过滤器方法(Synthetic Minority Over-sampling Technique Filter, SmoteFilter),对SMOTE方法合成样本进行过滤,减少合成样本中的噪声数据,提高训练样本“质量”。使用支持向量机对SMOTE合成的数据和SmoteFilter合成的数据进行实验对比,结果表明SmoteFilter方法相较传统的SMOTE过采样方法,提高了二手车推荐中少数类的预测精度,提升了对二手车推荐的整体预测性能。  相似文献   

4.
非平衡问题是数据挖掘领域中普遍存在的一个问题,数据的偏态分布会使得分类器的分类效果不理想.卷积神经网络作为一种高效的数据挖掘工具,被广泛应用于分类任务,但其训练过程若受到数据非平衡的不利影响,则将导致少数类的分类准确率下降.针对二分类非平衡数据分类问题,文中提出了一种基于代价敏感卷积神经网络的非平衡问题混合方法.首先将密度峰值聚类算法与SMOTE相结合,通过过采样对数据进行预处理,降低原始数据集的不平衡程度;然后利用代价敏感思想对非平衡数据中的不同类别给予不同权重,并考虑预测值与标签值之间的欧氏距离,对非平衡数据中多数类和少数类赋予不同的代价损失,构建代价敏感卷积神经网络模型,以提高卷积神经网络对少数类的识别率.选取6个不同的数据集,用于验证所提方法的有效性.实验结果表明,所提方法可以提高卷积神经网络模型对非平衡数据的分类性能.  相似文献   

5.
动态非平衡数据分类是在线学习和类不平衡学习领域重要的研究问题,用于处理类分布非常倾斜的数据流。这类问题在实际场景中普遍存在,如实时控制监控系统的故障诊断和计算机网络中的入侵检测等。由于动态数据流中存在概念漂移现象和不平衡问题,因此数据流分类算法既要处理概念漂移,又要解决类不平衡问题。针对以上问题,提出了在检测概念漂移的同时对非平衡数据进行处理的一种方法。该方法采用Kappa系数检测概念漂移,进而检测平衡率,利用非平衡数据分类方法更新分类器。实验结果表明,在不同的评价指标上,该算法对非平衡数据流具有较好的分类性能。  相似文献   

6.
现实中许多领域产生的数据通常具有多个类别并且是不平衡的。在多类不平衡分类中,类重叠、噪声和多个少数类等问题降低了分类器的能力,而有效解决多类不平衡问题已经成为机器学习与数据挖掘领域中重要的研究课题。根据近年来的多类不平衡分类方法的文献,从数据预处理和算法级分类方法两方面进行了分析与总结,并从优缺点和数据集等方面对所有算法进行了详细的分析。在数据预处理方法中,介绍了过采样、欠采样、混合采样和特征选择方法,对使用相同数据集算法的性能进行了比较。从基分类器优化、集成学习和多类分解技术三个方面对算法级分类方法展开介绍和分析。最后对多类不平衡数据分类研究领域的未来发展方向进行总结归纳。  相似文献   

7.
针对非平衡数据分类时学习器泛化性能较差的问题,本文提出一种基于虚拟中心约减的非平衡分类(Imbalanced classification based on virtual center reduction,IC_VCR)方法.该方法首先将非平衡的二分类样本中的多类数据集进行聚类,然后计算虚拟的聚类中心,通过虚拟中心有效地替代多类样本参与学习,将多类样本有效压缩,从而使得两类样本的规模趋于平衡,以提高非平衡数据分类的性能.实验结果表明,IC_VCR方法在非平衡数据集上可以有效提高泛化性能,并同时得到较高的学习效率.  相似文献   

8.
陈刚  吴振家 《控制与决策》2020,35(3):763-768
非平衡数据的分类问题是机器学习领域的一个重要研究课题.在一个非平衡数据里,少数类的训练样本明显少于多数类,导致分类结果往往偏向多数类.针对非平衡数据分类问题,提出一种基于高斯混合模型-均值最大化方法(GMM-EM)的概率增强算法.首先,通过高斯混合模型(GMM)与均值最大化算法(EM)建立少数类数据的概率密度函数;其次,根据高概率密度的样本生成新样本的能力比低概率密度的样本更强的性质,建立一种基于少数类样本密度函数的过采样算法,该算法保证少数类数据集在平衡前后的概率分布的一致性,从数据集的统计性质使少数类达到平衡;最后,使用决策树分类器对已经达到平衡的数据集进行分类,并且利用评价指标对分类效果进行评判.通过从UCI和KEEL数据库选出的8组数据集的分类实验,表明了所提出算法比现有算法更有效.  相似文献   

9.
非平衡数据训练方法概述   总被引:7,自引:0,他引:7  
张琦  吴斌  王柏 《计算机科学》2005,32(10):181-186
现实世界中数据分类的应用通常会遇到数据非平衡的问题,即数据中的一类样本在数量上远多于另一类,例如欺诈检测和文本分类问题等.其中少数类的样本通常具有巨大的影响力和价值,是我们主要关心的对象,称为正类,另一类则称为负类.正类样本与负类样本可能数量上相差极大,这给训练非平衡数据提出了挑战.传统机器训练算法可能会产生偏向多数类的结果,因而对于正类来说,预测的性能可能会很差.本文分析了导致非平衡数据分类性能差的多方面原因,并针对这些原因列出了多种解决方法.  相似文献   

10.
现实生活中存在大量的非平衡数据,大多数传统的分类算法假定类分布平衡或者样本的错分代价相同,因此在对这些非平衡数据进行分类时会出现少数类样本错分的问题。针对上述问题,在代价敏感的理论基础上,提出了一种新的基于代价敏感集成学习的非平衡数据分类算法--NIBoost(New Imbalanced Boost)。首先,在每次迭代过程中利用过采样算法新增一定数目的少数类样本来对数据集进行平衡,在该新数据集上训练分类器;其次,使用该分类器对数据集进行分类,并得到各样本的预测类标及该分类器的分类错误率;最后,根据分类错误率和预测的类标计算该分类器的权重系数及各样本新的权重。实验采用决策树、朴素贝叶斯作为弱分类器算法,在UCI数据集上的实验结果表明,当以决策树作为基分类器时,与RareBoost算法相比,F-value最高提高了5.91个百分点、G-mean最高提高了7.44个百分点、AUC最高提高了4.38个百分点;故该新算法在处理非平衡数据分类问题上具有一定的优势。  相似文献   

11.
产生式规则作为知识库系统进行推理的常用的、可读性好的知识表示形式,在构建知识库系统时有极大的优越性.提出一种基于场景及规则获取模板的知识获取方法,并以某高分子复合材料的加工专家为知识获取对象.该方法通过分析、记录领域专家进行设计的过程、解决问题的过程和动作,将领域问题按层次细化为一系列子问题,并在子问题场景下结合场景模型及知识获取模板来获取规则性知识.采用该方法可以辅助领域专家在明晰领域知识结构的基础上,逐步挖掘领域中细粒度的规则性知识.  相似文献   

12.
§1.引言许多科学和工程计算问题都可归结为无界区域上的偏微分方程边值问题,数值求解无界  相似文献   

13.
In this paper, we consider continuous dependence of the optimal control with respect to the actuator domain which is varying as open subset in the spatial domain for a multi-dimensional heat equation. Both time optimal control and norm optimal control problems are considered. The reason behind combining these two problems together is that these two problems are actually equivalent: The energy to be used to drive the system to target set in minimal time interval is actually the minimal energy of driving the system to target set in this minimal time interval, and visa versa. It is shown that both optimal control and optimal cost are continuous with respect to open controlled actuator domain under the Lebesgue measure.  相似文献   

14.
In this article we further investigate the solution of linear second order elliptic boundary value problems by distributed Lagrange multipliers based fictitious domain methods. The following issues are addressed: (i) Derivation of the fictitious domain formulations. (ii) Finite element approximation. (iii) Iterative solution of the resulting finite dimensional problems (of the saddle-point type) by preconditioned conjugate gradient and Lanczos algorithms.  相似文献   

15.
In this paper we study the problem of designing and specifying standard program components applicable to a wide variety of tasks; we choose for this study the specific problem domain of data structures for general searching problems. Within this domain Bentley and Saxe [1] have developed transformations for converting solutions of simple searching problems to solutions of more complex problems. We discuss one of those transformations, specify precisely the transformation and its conditions of applicability, and prove its correctness; we accomplish this by casting it in terms of abstract data types–specifically by using the Alphard form mechanism. The costs of the structures derived by this transformation are only slightly greater than the costs of the original structures, and the correctness of the transformation definition together with the correctness of the original structure assure the correctness of the derived structure. The transformation we describe has already been used to develop a number of new algorithms, and it represents a new level of generality in software engineering tools.  相似文献   

16.
本文针对基于Web服务跨域访问控制问题,首先对该问题进行了系统的分析,进而采用SAML身份认证机制和XACML访问控制策略结合ABAC的方法,提出了基于属性的跨域访问控制模型,从而实现了分布式平台的单点登录、多点认证和跨域访问。  相似文献   

17.
The process of microplanning in natural language generation (NLG) encompasses a range of problems in which a generator must bridge underlying domain‐specific representations and general linguistic representations. These problems include constructing linguistic referring expressions to identify domain objects, selecting lexical items to express domain concepts, and using complex linguistic constructions to concisely convey related domain facts. In this paper, we argue that such problems are best solved through a uniform, comprehensive, declarative process. In our approach, the generator directly explores a search space for utterances described by a linguistic grammar. At each stage of search, the generator uses a model of interpretation, which characterizes the potential links between the utterance and the domain and context, to assess its progress in conveying domain‐specific representations. We further address the challenges for implementation and knowledge representation in this approach. We show how to implement this approach effectively by using the lexicalized tree‐adjoining grammar (LTAG) formalism to connect structure to meaning and using modal logic programming to connect meaning to context. We articulate a detailed methodology for designing grammatical and conceptual resources which the generator can use to achieve desired microplanning behavior in a specified domain. In describing our approach to microplanning, we emphasize that we are in fact realizing a deliberative process of goal‐directed activity. As we formulate it, interpretation offers a declarative representation of a generator's communicative intent. It associates the concrete linguistic structure planned by the generator with inferences that show how the meaning of that structure communicates needed information about some application domain in the current discourse context. Thus, interpretations are plans that the microplanner constructs and outputs. At the same time, communicative intent representations provide a rich and uniform resource for the process of NLG. Using representations of communicative intent, a generator can augment the syntax, semantics, and pragmatics of an incomplete sentence simultaneously, and can work incrementally toward solutions for the various problems of microplanning.  相似文献   

18.
In this paper we study domain decomposition methods for solving some elliptic problem arising from flows in heterogeneous porous media. Due to the multiple scale nature of the elliptic coefficients arising from the heterogeneous formations, the construction of efficient domain decomposition methods for these problems requires a coarse solver which is adaptive to the fine scale features, [4]. We propose the use of a multiscale coarse solver based on a finite volume – finite element formulation. The resulting domain decomposition methods seem to induce a convergence rate nearly independent of the aspect ratio of the extreme permeability values within the substructures. A rigorous convergence analysis based on the Schwarz framework is carried out, and we demonstrate the efficiency and robustness of the preconditioner through numerical experiments which include problems with multiple scale coefficients, as well as problems with continuous scales. Communicated by: G. Wittum  相似文献   

19.
Developing GIS Applications with Objects: A Design Patterns Approach   总被引:1,自引:0,他引:1  
In this paper we present an object-oriented approach for designing GIS applications; it combines well known software engineering practices with the use of design patterns as a conceptual tool to cope with recurrent problems appearing in the GIS domain. Our approach allows the designer to decouple the conceptual definition of application objects from their spatial representation. In this way, GIS applications can evolve smoothly, because maintenance is achieved by focusing on different concerns at different times. We show that our approach is also useful to support spatial features in conventional applications built with object-oriented technology. The structure of this paper is as follows: We first introduce design patterns, an efficient strategy to record design experience; then we discuss the most common design problems a developer of GIS applications must face. The core of our method is then presented by explaining how the use of decorators helps in extending objects to incorporate spatial attributes and behavior. Next, we analyze some recurrent design problems in the GIS domain and present some new patterns addressing those problems. Some further work is finally discussed.  相似文献   

20.
In non-binary constraint satisfaction problems, the study of local consistencies that only prune values from domains has so far been largely limited to generalized arc consistency or weaker local consistency properties. This is in contrast with binary constraints where numerous such domain filtering consistencies have been proposed. In this paper we present a detailed theoretical, algorithmic and empirical study of domain filtering consistencies for non-binary problems. We study three domain filtering consistencies that are inspired by corresponding variable based domain filtering consistencies for binary problems. These consistencies are stronger than generalized arc consistency, but weaker than pairwise consistency, which is a strong consistency that removes tuples from constraint relations. Among other theoretical results, and contrary to expectations, we prove that these new consistencies do not reduce to the variable based definitions of their counterparts on binary constraints. We propose a number of algorithms to achieve the three consistencies. One of these algorithms has a time complexity comparable to that for generalized arc consistency despite performing more pruning. Experiments demonstrate that our new consistencies are promising as they can be more efficient than generalized arc consistency on certain non-binary problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号