首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
支持向量机组合分类及其在文本分类中的应用   总被引:3,自引:0,他引:3  
针对标准支持向量机对野值点和噪音敏感,分类时明显倾向于大类别的问题,提出了一种同时考虑样本差异和类别差异的双重加权支持向量机。并给出了由近似支持向量机结合支持向量识别算法,识别野值点和计算样本重要性权值的方法.双重加权支持向量机和近似支持向量机组合的新分类算法尤其适用于样本规模大、样本质量不一、类别不平衡的文本分类问题.实验表明新算法改善了分类器的泛化性能。比传统方法具有更高的查准率和查全率.  相似文献   

2.
基于纠错编码的CSNN及其在遥感图像分类中的应用   总被引:1,自引:0,他引:1  
单输出组合神经网络(CSNN)克服了BP神经网络固有的缺陷,具有网络结构确定、分类行为易于解释、并行性好等优点,但分类精度比经过结构选择的BPNN略差.采用纠错编码可以提高CSNN的分类精度,首先根据类别数与纠错能力确定类别码组,每个码字对应一种类别,每个SNN子网对这些码字中的同一位进行训练,从而确定网络结构与每个子网所学习的二值函数;对未知类别的样本进行分类时,各SNN的结果组成一个输出码,计算该输出码与各类别码的汉明距离,选择与其距离最近的类别码所对应的类别为该样本的类别;基于纠错编码的CSNN的分类行为易于转化为规则集形式,可理解性强.将该网络结构用于遥感图像分类,并与其他分类算法进行比较,结果表明采用纠错编码技术,CSNN不仅具备原有的各项优点,而且分类精度得到显著提高.  相似文献   

3.
针对电力档案自动分类中应用效果不佳的问题,提出基于多特征选择的电力档案自动分类方法。首先,对电力档案文本内容进行提取、分词、去停词处理,并利用向量空间模型表示电力档案本文;其次,利用多特征选择技术提取文档频率、卡方检验、归一化差异、基尼指数及信息增益多项特征;最后,根据特征确定电力档案文档与类别的相似度,通过与分类阈值对比确定电力档案类别。实验结果表明,设计方法的档案错误分类数量较少,优于传统方法,在电力档案自动分类方面拥有广阔的应用前景。  相似文献   

4.
文章提出了一种基于模糊相似测量的小类别数多字体汉字及数字识别方法.该方法通过模糊逻辑处理,直接将字符的二值化图像转换成基于非线性加权相似函数的模糊样板,然后通过分类模糊模型的统计,相似性测量样板的分级组合和基于规则的分类进行识别.实验表明,该方法用于小类别数多字体汉字及数字识别的效果良好.  相似文献   

5.
王强  关毅  王晓龙 《自动化学报》2007,33(8):809-816
提出一种应用文本特征的类别属性进行文本分类过程中的类别噪声裁剪 (Eliminating class noise, ECN) 的算法. 算法通过分析文本关键特征中蕴含的类别指示信息, 主动预测待分类文本可能归属的类别集, 从而减少参与决策的分类器数目, 降低分类延迟,提高分类精度. 在中、英文测试语料上的实验表明, 该算法的 F 值分别达到 0.76 与 0.93, 而且分类器运行效率也有明显提升, 整体性能较好. 进一步的实验表明,此算法的扩展性能较好, 结合一定的反馈学习策略, 分类性能可进一步提高, 其 F 值可达到 0.806 与 0.943.  相似文献   

6.
针对目前文本分类中对向量空间模型的依赖以及文档频率(DF)特征提取方法在二值分类方面的不足,提出了基于差异频度的类别空间模型的二值分类方法,该方法突破了向量空间模型的限制,采用改进DF的差异频度方法进行特征提取,实现了二值分类功能。实验结果表明,改进的方法是有效的,其分类结果中精确率、召回率、F1测试值均有改善,提高了分类的准确率。并且本文的方法在其他领域的二值分类中同样值得借鉴。  相似文献   

7.
基于特征熵相关度差异的KNN算法   总被引:1,自引:0,他引:1       下载免费PDF全文
周靖  刘晋胜 《计算机工程》2011,37(17):146-148
传统K最近邻(KNN)法在进行样本分类时容易产生无法判断或判断错误的问题。为此,将特征熵与KNN相结合,提出一种新的分类算法(FECD-KNN)。该算法采用熵作为类相关度,以其差异值计算样本距离。用熵理论规约分类相关度,并根据相关度间的差异值衡量特征对分类的影响程度,从而建立距离测度与类别间的内在联系。仿真实验结果表明,与KNN及Entropy-KNN法相比,FECD-KNN在保持效率的情况下,能够提高分类准确性。  相似文献   

8.
基于多神经网络结构的常压塔侧线产品质量软测量   总被引:1,自引:0,他引:1  
根据常压塔的原料进料及产品多变,提出采用多神经网络结构建立侧线产品质量软测量模型。利用基于马氏距离的数据分类技术、对输入样本分类。利用产品质量化验分析值,对软测量模型进行校正。实际应用表明多神经网络结构的软测量精度高。  相似文献   

9.
分类是数据挖掘领域研究中的核心技术之一。得到一个性能良好的分类器需要大量的训练样本,而对样本进行标记是一个十分消耗资源的过程,对多标签样本进行标记就更加困难。为了尽可能降低标记样本的成本,需要找出最能代表类别信息的样本。在基于SVM的分类方法中,分类器间隔越大,分类的精度就会越差。提出了一种基于期望间隔的主动学习方法,即依据当前分类器,选择最快缩小分类间隔的样本。通过实验证明,基于期望间隔的学习策略比基于决策值以及基于后验概率的策略有着更好的学习效果。  相似文献   

10.
为了提高遥感影像分类精度,从抽象级和测量级的两个层次出发,提出混合多分类器结合算法。该算法利用不同子分类器的分类结果及对各类别的分类精度,设定单个类别精度的阈值,选择最优子分类器,得到部分类别的最终分类结果;然后使用基于抽象级Bagging算法和测量级上的最大置信度进行多分类器结合。该算法应用于北京1号遥感影像的分类研究,结果表明该算法的总体精度和单个类别的分类精度比选用的子分类器都有明显的提高,是一种新的有效算法。  相似文献   

11.
This paper proposes an evolutionary approach for discovering difference in the usage of words to facilitate collaboration among people. In general, different people seem to have different ways of conception and thus can have different concepts even on the same thing. When people try to communicate their concepts with words, such difference in the meaning and usage can lead to misunderstanding in communication, which can hinder their collaboration. In our approach each granule of knowledge in classification from users is structured into a decision tree so that difference in the usage of words can be discovered as difference in the structure of decision trees. By treating each granule of classification knowledge (i.e., decision tree) as an individual in Genetic Algorithm (GA), evolution is carried out with respect to both classification efficiency of each individual and diversity as a population so that the granule for classification is gradually evolved with diverse structure. Experiments were carried out on motor diagnosis cases with artificially encoded difference in the usage of words and the result shows the effectiveness of the proposed evolutionary approach.  相似文献   

12.
阂华松  胥贵萍 《计算机科学》2011,38(12):239-241,262
在完全自动化的数据生命周期管理((Information Lifecycle Management)中,数据的价值以及随着时间改变而带来的价值变化是进行数据分级的重要依据。与以往多数考虑文件的使用等影响数据价值的因素的数据价值模型不同,在ILM价值模型基础上,考虑磁盘的数据分布随时间变化对数据价值的影响,提出了数据生命周期动态管理价值模型ILDM(Information Lifecycle Dynamic Management),它综合考虑数据的最近使用、数据的使用频度、数据的分布等因素。通过实验验证表明,ILDM可有效地减少数据迁移工作量,提高系统资源利用率。  相似文献   

13.
Artificial neural networks (ANN) have a wide ranging usage area in the data classification problems. Backpropagation algorithm is classical technique used in the training of the artificial neural networks. Since this algorithm has many disadvantages, the training of the neural networks has been implemented with the binary and real-coded genetic algorithms. These algorithms can be used for the solutions of the classification problems. The real-coded genetic algorithm has been compared with other training methods in the few works. It is known that the comparison of the approaches is as important as proposing a new classification approach. For this reason, in this study, a large-scale comparison of performances of the neural network training methods is examined on the data classification datasets. The experimental comparison contains different real classification data taken from the literature and a simulation study. A comparative analysis on the real data sets and simulation data shows that the real-coded genetic algorithm may offer efficient alternative to traditional training methods for the classification problem.  相似文献   

14.
刘杉  侯整风 《计算机工程》2010,36(19):99-101
为使包分类具有快速点定位和良好的可扩展性,结合cross-producting表与线性查找提出一种新的基于计算几何的流分类算法。该算法通过控制规则的数目调整存储使用情况,使数据包中越来越多的规则被一维数据结构搜索到,进一步降低算法中cross-producting表需要的存储量。实验结果表明,该算法不仅改进了cross-producting的存储性能,而且能提高时间性能。  相似文献   

15.
序列模式挖掘在电子商务个性化服务中的应用   总被引:1,自引:0,他引:1  
靳明霞  李玉华  管建军 《微机发展》2006,16(10):233-236
分析了电子商务发展面临的问题和个性化服务的特点,提出了Web使用挖掘技术在电子商务个性化服务中的应用方法,论述了基于Web挖掘的个性化服务研究,详细阐述了其挖掘过程,最后讨论了使用序列模式和分类相结合的技术得以实现个性化服务的方法。利用这些算法得到的个性化信息可以准确把握用户兴趣模式并对Web信息资源的组织方式进行有效更新,从而提高网络信息服务效率,为用户提供“一对一”的具备自适应性的智能个性化服务。  相似文献   

16.
在文物数字化支撑环境中,包含用于联机事务处理的数据库和应用程序。另外,为了支持专业文物研究人员的研究工作,还需要构造文物信息数据仓库。讨论了构建文物信息数据仓库的方法和技术,给出了星型模式结构。该数据仓库可以支持联机分析处理。结合该数据仓库对文物的分类进行了初步探讨,说明了该数据仓库的研究作用。  相似文献   

17.
In this paper, we investigate the relationship between automatically extracted behavioral characteristics derived from rich smartphone data and self-reported Big-Five personality traits (extraversion, agreeableness, conscientiousness, emotional stability and openness to experience). Our data stem from smartphones of 117 Nokia N95 smartphone users, collected over a continuous period of 17 months in Switzerland. From the analysis, we show that several aggregated features obtained from smartphone usage data can be indicators of the Big-Five traits. Next, we describe a machine learning method to detect the personality trait of a user based on smartphone usage. Finally, we study the benefits of using gender-specific models for this task. Apart from a psychological viewpoint, this study facilitates further research on the automated classification and usage of personality traits for personalizing services on smartphones.  相似文献   

18.
Currently, high-dimensional data such as image data is widely used in the domain of pattern classification and signal processing. When using high-dimensional data, feature analysis methods such as PCA (principal component analysis) and LDA (linear discriminant analysis) are usually required in order to reduce memory usage or computational complexity as well as to increase classification performance. We propose a feature analysis method for dimension reduction based on a data generation model that is composed of two types of factors: class factors and environment factors. The class factors, which are prototypes of the classes, contain important information required for discriminating between various classes. The environment factors, which represent distortions of the class prototypes, need to be diminished for obtaining high class separability. Using the data generation model, we aimed to exclude environment factors and extract low-dimensional class factors from the original data. By performing computational experiments on artificial data sets and real facial data sets, we confirmed that the proposed method can efficiently extract low-dimensional features required for classification and has a better performance than the conventional methods.  相似文献   

19.
罗弦  查志勇  徐焕  刘芬  詹伟 《计算机测量与控制》2017,25(10):278-280, 288
随着现代网络技术不断进步,系统数据量也在逐渐增多;传统的大数据自动分类处理系统已经无法满足现阶段用户需求,其软件与硬件的设计都比较单一,存在能源消耗大、分类速度慢、处理时间长、内存占用率高等问题,为此,提出基于云计算的大数据自动分类处理系统的设计;首先设计系统硬件结构,主要包括数据采集器、数据处理器以及数据自动存储模块,并详细的介绍了各硬件结构;然后利用时域特征提取数据的算法对频域特征数据进行提取,从而实现数据自动分类处理系统的软件设计;最后对两种系统性能进行对比实验;实验结果证明,基于云计算的大数据自动分类处理系统的资源不仅占用率低,内存消耗小,而且数据库内存较大;该系统不但可以提高数据自动分类精准度,还能加快数据分类速度,从而使系统拥有更好的分类性能。  相似文献   

20.
Pattern classification applications can be found everywhere, especially the ones that use computer vision. What makes them difficult to embed is the fact that they often require a lot of computational resources. Embedded computer vision has been applied in many contexts, such as industrial or home automation, robotics, and assistive technologies. This work performs a design space exploration in an image classification system and embeds a computer vision application into a minimum resource platform, targeting wearable devices. The feature extractor and the classifier are evaluated for memory usage and computation time. A method is proposed to optimize such characteristics, leading to a reduction of over 99% in computation time and 92% in memory usage, with respect to a standard implementation. Experimental results in an ARM Cortex-M platform showed a total classification time of 0.3 s, maintaining the same accuracy as in the simulation performed. Furthermore, less than 20 KB of data memory was required, which is the most limited resource available in low-cost and low-power microcontrollers. The target application, used for the experimental evaluation, is a crosswalk detector used to help visually impaired persons.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号