首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
分类方法的新发展:研究综述   总被引:7,自引:0,他引:7  
分类是数据挖掘的重要任务之一,也是机器学习、模式识别和人工智能等相关领域广泛研究的问题。分类在实际中有广泛的应用,包括医疗诊断、信用评估、选择购物等。近年来,随着相关领域中新技术的不断涌现,分类方法也得到了新发展。本文对这些新发展进行了较详细的归纳,总结了分类方法发展的趋势。  相似文献   

2.
郑明杰  宋余庆  刘毅 《计算机科学》2015,42(12):8-12, 31
肺音(Lung Sound) 信号是人体呼吸系统与外界在换气过程中产生的一种生理声信号,其因含有大量的生理和病理信息而具有很高的研究价值。近年来,频发的雾霾天气等环境问题所带来的呼吸道疾病发病率的提高,也使得对肺部疾病诊断的快速性与准确性的需求大幅提升。肺部听诊以其迅捷便利和无创等优良特性重新引发人们的广泛关注,而自动肺音诊断技术的发展无疑会对肺部疾病诊断带来重要的帮助。电子听诊器以及其他信号采集技术等硬件方面的发展进一步促进了现代肺音信号的分析和识别技术的研究与进步。主要介绍了肺音的概念、基于计算机的肺音信号处理和模式识别技术,并对近年来基于机器学习的肺音分类技术的发展状况进行了总结与列举;最后,对肺音分类技术的研究和应用发展趋势进行了展望。  相似文献   

3.
Artificial intelligence for digital games constitutes the implementation of a set of algorithms and techniques from both traditional and modern artificial intelligence in order to provide solutions to a range of game dependent problems. However, the majority of current approaches lead to predefined, static and predictable game agent responses, with no ability to adjust during game-play to the behaviour or playing style of the player. Machine learning techniques provide a way to improve the behavioural dynamics of computer controlled game agents by facilitating the automated generation and selection of behaviours, thus enhancing the capabilities of digital game artificial intelligence and providing the opportunity to create more engaging and entertaining game-play experiences. This paper provides a survey of the current state of academic machine learning research for digital game environments, with respect to the use of techniques from neural networks, evolutionary computation and reinforcement learning for game agent control.  相似文献   

4.
Intersection-closed classes of concepts arise naturally in many contexts and have been intensively studied in computational learning theory. In this paper, we study intersection-closed classes that contain the concepts invariant under an operation satisfying a certain algebraic condition. We give a learning algorithm in the exact model with equivalence queries for such classes. This algorithm utilizes a novel encoding scheme, which we call a signature.  相似文献   

5.
A program has been developed which derives classification rules from empirical observations and expresses these rules in a knowledge representation format called 'counting criteria'. Decision rules derived in this format are often more comprehensible than rules derived by existing machine learning programs such as AQ11. Use of the program is illustrated by the inference of discrimination criteria for certain types of bacteria based upon their biochemical characteristics. The program may be useful for the conceptual analysis of data and for the automatic generation of prototype knowledge bases for expert systems.  相似文献   

6.
We consider the problem of PAC-learning distributions over strings, represented by probabilistic deterministic finite automata (PDFAs). PDFAs are a probabilistic model for the generation of strings of symbols, that have been used in the context of speech and handwriting recognition, and bioinformatics. Recent work on learning PDFAs from random examples has used the KL-divergence as the error measure; here we use the variation distance. We build on recent work by Clark and Thollard, and show that the use of the variation distance allows simplifications to be made to the algorithms, and also a strengthening of the results; in particular that using the variation distance, we obtain polynomial sample size bounds that are independent of the expected length of strings.  相似文献   

7.
This article addresses the problem of identifying the most likely music performer, given a set of performances of the same piece by a number of skilled candidate pianists. We propose a set of very simple features for representing stylistic characteristics of a music performer, introducing ‘norm-based’ features that relate to a kind of ‘average’ performance. A database of piano performances of 22 pianists playing two pieces by Frédéric Chopin is used in the presented experiments. Due to the limitations of the training set size and the characteristics of the input features we propose an ensemble of simple classifiers derived by both subsampling the training set and subsampling the input features. Experiments show that the proposed features are able to quantify the differences between music performers. The proposed ensemble can efficiently cope with multi-class music performer recognition under inter-piece conditions, a difficult musical task, displaying a level of accuracy unlikely to be matched by human listeners (under similar conditions).  相似文献   

8.
一种加权支持向量机分类算法   总被引:17,自引:1,他引:17  
提出了一种加权C—SVM分类算法,并从理论上分析了算法的性能。该算法通过引入类权重因子和样本权重因子实现了类加权和样本加权两种功能。实验结果表明,该算法可以有效地解决由类大小不均衡引发的分类错误问题以及重要样本的错分问题。  相似文献   

9.
Patient no-shows have significant adverse effects on healthcare systems. Therefore, predicting patients’ no-shows is necessary to use their appointment slots effectively. In the literature, filter feature selection methods have been prominently used for patient no-show prediction. However, filter methods are less effective than wrapper methods. This paper presents new wrapper methods based on three variants of the proposed algorithm, Opposition-based Self-Adaptive Cohort Intelligence (OSACI). The three variants of OSACI are referred to in this paper as OSACI-Init, OSACI-Update, and OSACI-Init_Update, which are formed by the integration of Self-Adaptive Cohort Intelligence (SACI) with three Opposition-based Learning (OBL) strategies; namely: OBL initialization, OBL update, and OBL initialization and update, respectively. The performance of the proposed algorithms was examined and compared with that of Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Differential Evolution (DE), and SACI in terms of AUC, sensitivity, specificity, dimensionality reduction, and convergence speed. Patient no-show data of a primary care clinic in upstate New York was used in the numerical experiments. The results showed that the proposed algorithms outperformed the other compared algorithms by achieving higher dimensionality reduction and better convergence speed while achieving comparable AUC, sensitivity, and specificity scores.  相似文献   

10.
This paper presents a system for monitoring and prognostics of machine conditions using soft computing (SC) techniques. The machine condition is assessed through a suitable ‘monitoring index’ extracted from the vibration signals. The progression of the monitoring index is predicted using an SC technique, namely adaptive neuro-fuzzy inference system (ANFIS). Comparison with a machine learning method, namely support vector regression (SVR), is also presented. The proposed prediction procedures have been evaluated through benchmark data sets. The prognostic effectiveness of the techniques has been illustrated through previously published data on several types of faults in machines. The performance of SVR was found to be better than ANFIS for the data sets used. The results are helpful in understanding the relationship of machine conditions, the corresponding indicating features, the level of damage/degradation and their progression.  相似文献   

11.
Practising to operate an unknown system and observing the input and output of the system, in a sense, helps to optimally control that system. The acquired knowledge, is, in turn, used to solve future analogous control problems. This means that it is very important to know how to memorize the acquired knowledge and to utilize it for learning. In this paper, we propose a new knowledge representation and reasoning method and develop a learning machine (KBLC: Knowledge-Based Learning Controller) by using them. A simple implementation has been constructed that demonstrates the feasibility of building such a machine.  相似文献   

12.
It is important to develop a reliable system for predicting bacterial virulent proteins for finding novel drug/vaccine and for understanding virulence mechanisms in pathogens.In this work we have proposed a bacterial virulent protein prediction method based on an ensemble of classifiers where the features are extracted directly from the amino acid sequence of a given protein. It is well known in the literature that the features extracted from the evolutionary information of a given protein are better than the features extracted from the amino acid sequence. Our method tries to fill the gap between the amino acid sequence based approaches and the evolutionary information based approaches.An extensive evaluation according to a blind testing protocol, where the parameters of the system are calculated using the training set and the system is validated in three different independent datasets, has demonstrated the validity of the proposed method.  相似文献   

13.
The performance of eight machine learning classifiers were compared with three aphasia related classification problems. The first problem contained naming data of aphasic and non-aphasic speakers tested with the Philadelphia Naming Test. The second problem included the naming data of Alzheimer and vascular disease patients tested with Finnish version of the Boston Naming Test. The third problem included aphasia test data of patients suffering from four different aphasic syndromes tested with the Aachen Aphasia Test. The first two data sets were small. Therefore, the data used in the tests were artificially generated from the original confrontation naming data of 23 and 22 subjects, respectively. The third set contained aphasia test data of 146 aphasic speakers and was used as such in the experiments. With the first and the third data set the classifiers could successfully be used for the task, while the results with the second data set were less encouraging. However, based on the results, no single classifier performed exceptionally well with all data sets, suggesting that the selection of the classifier used for classification of aphasic data should be based on the experiments performed with the data set at hand.  相似文献   

14.
We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence and model class that ensure that a class of natural estimators gives near-optimal smoothing. In the case of i.i.d. random noise, we show that the accuracy of these estimators depends on a measure of complexity of the model class involving covering numbers. Our formulation and results are quite general and are related to a number of problems in learning, prediction, and estimation. As a special case, we consider an application to output smoothing for certain classes of linear and nonlinear systems. The performance of output smoothing is given in terms of natural complexity parameters of the model class, such as bounds on the order of linear systems, the -norm of the impulse response of stable linear systems, or the memory of a Lipschitz nonlinear system satisfying a fading memory condition.  相似文献   

15.
Steels of different classes (austenitic, martensitic, pearlitic, etc.) have different applications and characteristic areas of properties. In the present work two methods are used to predict steel class, based on the composition and heat treatment parameters: the physically-based Calphad method and data-driven machine learning method. They are applied to the same dataset, collected from open sources (mostly steels for high-temperature applications). Classification accuracy of 93.6% is achieved by machine learning model, trained on the concentration of three elements (C, Cr, Ni) and heat treatment parameters (heating temperatures). Calphad method gives 76% accuracy, based on the temperature and cooling rate. The reasons for misclassification by both methods are discussed, and it is shown that the part of them caused by ambiguity/inaccuracy in the data or limitations of the models used. For the rest of cases reasonable classification accuracy is demonstrated. We suggest that the reason of the supremacy of machine learning classifier is the small variation in the data used, which indeed does not change the steel class: the properties of steel should be insensitive to the details of the manufacturing process.  相似文献   

16.
We show that halfspaces in n dimensions can be PAC-learned with respect to the uniform distribution with accuracy ε and confidence δ using examples.  相似文献   

17.
During their synthesis, a large fraction of proteins are directed to the secretory pathway. There are several models that aim to distinguish between different destinations along this pathway; however, they rarely distinguish between known stages of this translocation process.This paper presents a translocation probability function which models the protein SRP-recruitment process—the first stage of the secretory pathway. It unifies groups of proteins with distinct final destinations, allowing more specific sorting to be done in due course, mirroring the hierarchical nature of secretory translocation.We apply conditional random fields to evaluate the prediction accuracy of a full sequence model. Introducing the translocation function improves substantially compared to a model based on properties that are relevant to the subsequent stages and final destinations only. For the discrimination of secretory, signal peptide (SP)-equipped proteins and non-secretory proteins a correlation coefficient of 0.98 is achieved—a level of performance that is only met by specialized SP predictors. Transmembrane proteins cause considerable confusion in signal peptide predictors, but fit naturally into our transparent design and reduce the performance of the translocation function only slightly.The proposed function and model assist efforts to uncover localization and function for the growing numbers of protein sequence data. Applying our model we estimate with high confidence that about 27% of the human and 29% of the mouse proteins are associated with the secretory pathway.  相似文献   

18.
We study a model of probably exactly correct (PExact) learning that can be viewed either as the Exact model (learning from equivalence queries only) relaxed so that counterexamples to equivalence queries are distributionally drawn rather than adversarially chosen or as the probably approximately correct (PAC) model strengthened to require a perfect hypothesis. We also introduce a model of probably almost exactly correct (PAExact) learning that requires a hypothesis with negligible error and thus lies between the PExact and PAC models. Unlike the Exact and PExact models, PAExact learning is applicable to classes of functions defined over infinite instance spaces. We obtain a number of separation results between these models. Of particular note are some positive results for efficient parallel learning in the PAExact model, which stand in stark contrast to earlier negative results for efficient parallel Exact learning.  相似文献   

19.
ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of “yes” or “no” decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way.ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge.MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets.ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric.ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended.  相似文献   

20.
快速、准确和全面地从大量互联网文本信息中定位情感倾向是当前大数据技术领域面临的一大挑战.文本情感分类方法大致分为基于语义理解和基于有监督的机器学习两类.语义理解处理情感分类的优势在于其对不同领域的文本都可以进行情感分类,但容易受到中文存在的不同句式及搭配的影响,分类精度不高.有监督的机器学习虽然能够达到比较高的情感分类精度,但在一个领域方面得到较高分类能力的分类器不适应新领域的情感分类.在使用信息增益对高维文本做特征降维的基础上,将优化的语义理解和机器学习相结合,设计了一种新的混合语义理解的机器学习中文情感分类算法框架.基于该框架的多组对比实验验证了文本信息在不同领域中高且稳定的分类精度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号