首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
Low back disorders (LBDs) due to manual material lifting tasks have become a significant issue which affects the quality of life of industrial population of workers in the U.S. and has enormous economic impact. For the last three decades researchers have been trying to understand the phenomena of LBDs and develop practical guidelines which could prevent these injuries from happening or limit the severity of these injuries after they have already occurred. One of the research streams concentrated on creating and testing various classification models based on a landmark Marras data set. The goal of these models was to categorize manual lifting jobs as low risk or high risk with respect to LBDs. This paper summarizes and critiques the previous approaches as some of them yielded unrealistically high classification accuracy rates. The paper also proposes an adaptive neuro-fuzzy inference system (ANFIS) to classify tasks into high risk or low risk. To our best knowledge ANFIS has not been used in this context yet and has not been used for classification of a binary target variable. The paper also compares the classification performances of the different parameters or configurations of ANFIS. The ANFIS model appears to be a viable option for risk classification as it exhibits the classification accuracy rates consistent with several previous studies. More importantly ANFIS generates easy to interpret control surfaces, membership functions, and fuzzy rules, thus allowing one to get a deeper insight into the relationships between risk factors which interact with each other in a complex and nonlinear way. Such insights could prove to be very useful for the much needed efforts to better understand LBDs.  相似文献   

2.
Though technology has shown a rapid development, manual material handling (MMH) tasks are still usual activities in most industries. According to recent surveys, MMH tasks remain one of the main reasons for the emergence of occupational low back disorders (LBDs). It is critical to be able to discriminate as accurately as possible between MMH jobs that place workers at high versus low risk of LBDs. In this study, the risk of occupational LBDs has been classified by support vector machines (SVMs) considering both trunk motion variables and workplace variables, which have been extensively used to identify risk of LBDs. The LBDs‐SVM model has outperformed the existing models in terms of accuracy, which is equal to 88.5% for correct classification of high risk when 10‐cross validation is applied. In other words, the proposed model has correctly classified an average of 29.2 cases out of 33 high‐risk cases, which is critical to be determined compared to low‐risk cases. The results obtained in this study indicate that SVM is a better classifier than the other existing methods in the literature to classify LBDs risks.  相似文献   

3.
Manual material handling (MMH) tasks, the leading cause of low back disorders (LBDs), are still extensively used in industry in spite of the advanced technology. Classification of industrial jobs related to LBD risks has great significance in preventing injuries and designing workplaces. In this article, industrial jobs have been classified into two categories, low risk and high risk, using ant colony optimization (ACO). ACO classification (ACOCLASS) has obtained better results than studies that used the same experimental data. Ergonomic interventions can be done by means of obtained classification schema for future reductions in low back injuries. © 2008 Wiley Periodicals, Inc.  相似文献   

4.
集成学习被广泛用于提高分类精度, 近年来的研究表明, 通过多模态扰乱策略来构建集成分类器可以进一步提高分类性能. 本文提出了一种基于近似约简与最优采样的集成剪枝算法(EPA_AO). 在EPA_AO中, 我们设计了一种多模态扰乱策略来构建不同的个体分类器. 该扰乱策略可以同时扰乱属性空间和训练集, 从而增加了个体分类器的多样性. 我们利用证据KNN (K-近邻)算法来训练个体分类器, 并在多个UCI数据集上比较了EPA_AO与现有同类型算法的性能. 实验结果表明, EPA_AO是一种有效的集成学习方法.  相似文献   

5.
Advocating the Use of Imprecisely Observed Data in Genetic Fuzzy Systems   总被引:2,自引:0,他引:2  
In our opinion, and in accordance with current literature, the precise contribution of genetic fuzzy systems to the corpus of the machine learning theory has not been clearly stated yet. In particular, we question the existence of a set of problems for which the use of fuzzy rules, in combination with genetic algorithms, produces more robust models, or classifiers that are inherently better than those arising from the Bayesian point of view. We will show that this set of problems actually exists, and comprises interval and fuzzy valued datasets, but it is not being exploited. Current genetic fuzzy classifiers deal with crisp classification problems, where the role of fuzzy sets is reduced to give a parametric definition of a set of discriminant functions, with a convenient linguistic interpretation. Provided that the customary use of fuzzy sets in statistics is vague data, we propose to test genetic fuzzy classifiers over imprecisely measured data and design experiments well suited to these problems. The same can be said about genetic fuzzy models: the use of a scalar fitness function assumes crisp data, where fuzzy models, a priori, do not have advantages over statistical regression.  相似文献   

6.
Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly, different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results show that our method increases the performance (F1-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers both in harvest rate and target recall.  相似文献   

7.
目的 细粒度分类近年来受到了越来越多研究者的广泛关注,其难点是分类目标间的差异非常小。为此提出一种分类错误指导的分层双线性卷积神经网络模型。方法 该模型的核心思想是将双线性卷积神经网络算法(B-CNN)容易分错、混淆的类再分别进行重新训练和分类。首先,为得到易错类,提出分类错误指导的聚类算法。该算法基于受限拉普拉斯秩(CLR)聚类模型,其核心“关联矩阵”由“分类错误矩阵”构造。其次,以聚类结果为基础,构建了新的分层B-CNN模型。结果 用分类错误指导的分层B-CNN模型在CUB-200-2011、 FGVC-Aircraft-2013b和Stanford-cars 3个标准数据集上进行了实验,相比于单层的B-CNN模型,分类准确率分别由84.35%,83.56%,89.45%提高到了84.67%,84.11%,89.78%,验证了本文算法的有效性。结论 本文提出了用分类错误矩阵指导聚类从而进行重分类的方法,相对于基于特征相似度而构造的关联矩阵,分类错误矩阵直接针对分类问题,可以有效提高易混淆类的分类准确率。本文方法针对比较相近的目标,尤其是有非常相近的目标的情况,通过将容易分错、混淆的目标分组并进行再训练和重分类,使得分类效果更好,适用于细粒度分类问题。  相似文献   

8.
基于隐马尔可夫模型的文本分类算法   总被引:2,自引:0,他引:2  
杨健  汪海航 《计算机应用》2010,30(9):2348-2350
自动文本分类领域近年来已经产生了若干成熟的分类算法,但这些算法主要基于概率统计模型,没有与文本自身的语法和语义建立起联系。提出了将隐马尔可夫序列分析模型(HMM)用于自动文本分类的算法,首先构造表示文档类别的特征词集合,并以文档类别的特征词序列作为不同HMM分类器的观察序列,而HMM的状态转换序列则隐含地表示了不同类别文档内容的形成演化过程。分类时,具有最大生成概率的HMM分类器类标即为测试文档的分类结果。该算法构造的分类器模型一定程度上体现了不同类别文档的语法和语义特征,并可以实现多类别的自动文本分类,分类效率较高。  相似文献   

9.
Traditional approaches for text data stream classification usually require the manual labeling of a number of documents, which is an expensive and time consuming process. In this paper, to overcome this limitation, we propose to classify text streams by keywords without labeled documents so as to reduce the burden of labeling manually. We build our base text classifiers with the help of keywords and unlabeled documents to classify text streams, and utilize classifier ensemble algorithms to cope with concept drifting in text data streams. Experimental results demonstrate that the proposed method can build good classifiers by keywords without manual labeling, and when the ensemble based algorithm is used, the concept drift in the streams can be well detected and adapted, which performs better than the single window algorithm.  相似文献   

10.
提出了一种基于线性分类器的混合颜色空间查找表颜色分类方法,该方法主要解决颜色查找表分类方法的区分能力受颜色空间选择、阈值确定等因素影响而难以区分近似颜色的问题。将模式识别中的线性分类器思想应用于颜色查找表映射关系的建立,并通过同时使用HSI空间与YUV空间的方法提高查找表对近似颜色的区分能力。实验结果表明,基于线性分类器的混合空间查找表颜色分类方法具有查找表建立原则简单、效果直观的特点,并且对近似颜色有较强的区分能力,适用于彩色图像的快速颜色分割。  相似文献   

11.
Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well-performing set of classifiers is the Naïve Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an instance are conditionally independent given the class of that instance. When this assumption is violated (which is often the case in practice) it can reduce classification accuracy due to “information double-counting” and interaction omission. In this paper we focus on a relatively new set of models, termed Hierarchical Naïve Bayes models. Hierarchical Naïve Bayes models extend the modeling flexibility of Naïve Bayes models by introducing latent variables to relax some of the independence statements in these models. We propose a simple algorithm for learning Hierarchical Naïve Bayes models in the context of classification. Experimental results show that the learned models can significantly improve classification accuracy as compared to other frameworks.  相似文献   

12.
In this article, the task of remote-sensing image classification is tackled with local maximal margin approaches. First, we introduce a set of local kernel-based classifiers that alleviate the computational limitations of local support vector machines (SVMs), maintaining at the same time high classification accuracies. Such methods rely on the following idea: (a) during training, build a set of local models covering the considered data and (b) during prediction, choose the most appropriate local model for each sample to evaluate. Additionally, we present a family of operators on kernels aiming to integrate the local information into existing (input) kernels in order to obtain a quasi-local (QL) kernel. To compare the performances achieved by the different local approaches, an experimental analysis was conducted on three distinct remote-sensing data sets. The obtained results show that interesting performances can be achieved in terms of both classification accuracy and computational cost.  相似文献   

13.
Mapping of patterns and spatial distribution of land-use/cover (LULC) has long been based on remotely sensed data. In the recent past, efforts to improve the reliability of LULC maps have seen a proliferation of image classification techniques. Despite these efforts, derived LULC maps are still often judged to be of insufficient quality for operational applications, due to disagreement between generated maps and reference data. In this study we sought to pursue two objectives: first, to test the new-generation multispectral RapidEye imagery classification output using machine-learning random forest (RF) and support vector machines (SVM) classifiers in a heterogeneous coastal landscape; and second, to determine the importance of different RapidEye bands on classification output. Accuracy of the derived thematic maps was assessed by computing confusion matrices of the classifiers’ cover maps with respective independent validation data sets. An overall classification accuracy of 93.07% with a kappa value of 0.92, and 91.80 with a kappa value of 0.92 was achieved using RF and SVM, respectively. In this study, RF and SVM classifiers performed comparatively similarly as demonstrated by the results of McNemer’s test (Z = 1.15). An evaluation of different RapidEye bands using the two classifiers showed that incorporation of the red-edge band has a significant effect on the overall classification accuracy in vegetation cover types. Consequently, pursuit of high classification accuracy using high-spatial resolution imagery on complex landscapes remains paramount.  相似文献   

14.
Kawakita M  Eguchi S 《Neural computation》2008,20(11):2792-2838
We propose a local boosting method in classification problems borrowing from an idea of the local likelihood method. Our proposal, local boosting, includes a simple device for localization for computational feasibility. We proved the Bayes risk consistency of the local boosting in the framework of Probably approximately correct learning. Inspection of the proof provides a useful viewpoint for comparing ordinary boosting and local boosting with respect to the estimation error and the approximation error. Both boosting methods have the Bayes risk consistency if their approximation errors decrease to zero. Compared to ordinary boosting, local boosting may perform better by controlling the trade-off between the estimation error and the approximation error. Ordinary boosting with complicated base classifiers or other strong classification methods, including kernel machines, may have classification performance comparable to local boosting with simple base classifiers, for example, decision stumps. Local boosting, however, has an advantage with respect to interpretability. Local boosting with simple base classifiers offers a simple way to specify which features are informative and how their values contribute to a classification rule even though locally. Several numerical studies on real data sets confirm these advantages of local boosting.  相似文献   

15.
Literature on supervised Machine-Learning (ML) approaches for classifying text-based safety reports for the construction sector has been growing. Recent studies have emphasized the need to build ML approaches that balance high classification accuracy and performance on management criteria, such as resource intensiveness. However, despite being highly accurate, the extensively focused, supervised ML approaches may not perform well on management criteria as many factors contribute to their resource intensiveness. Alternatively, the potential for semi-supervised ML approaches to achieve balanced performance has rarely been explored in the construction safety literature. The current study contributes to the scarce knowledge on semi-supervised ML approaches by demonstrating the applicability of a state-of-the-art semi-supervised learning approach, i.e., Yet, Another Keyword Extractor (YAKE) integrated with Guided Latent Dirichlet Allocation (GLDA) for construction safety report classification. Construction-safety-specific knowledge is extracted as keywords through YAKE, relying on accessible literature with minimal manual intervention. Keywords from YAKE are then seeded in the GLDA model for the automatic classification of safety reports without requiring a large quantity of prelabeled datasets. The YAKE-GLDA classification performance (F1 score of 0.66) is superior to existing unsupervised methods for the benchmark data containing injury narratives from Occupational Health and Safety Administration (OSHA). The YAKE-GLDA approach is also applied to near-miss safety reports from a construction site. The study demonstrates a high degree of generality of the YAKE-GLDA approach through a moderately high F1 score of 0.86 for a few categories in the near-miss data. The current research demonstrates that, unlike the existing supervised approaches, the semi-supervised YAKE-GLDA approach can achieve a novel possibility of consistently achieving reasonably good classification performance across various construction-specific safety datasets yet being resource-efficient. Results from an objective comparative and sensitivity analysis contribute to much-required knowledge-contesting insights into the functioning and applicability of the YAKE-GLDA. The results from the current study will help construction organizations implement and optimize an efficient ML-based knowledge-mining strategy for domains beyond safety and across sites where the availability of a pre-labeled dataset is a significant limitation.  相似文献   

16.
Adapted One-versus-All Decision Trees for Data Stream Classification   总被引:1,自引:0,他引:1  
One versus all (OVA) decision trees learn k individual binary classifiers, each one to distinguish the instances of a single class from the instances of all other classes. Thus OVA is different from existing data stream classification schemes whose majority use multiclass classifiers, each one to discriminate among all the classes. This paper advocates some outstanding advantages of OVA for data stream classification. First, there is low error correlation and hence high diversity among OVA's component classifiers, which leads to high classification accuracy. Second, OVA is adept at accommodating new class labels that often appear in data streams. However, there also remain many challenges to deploy traditional OVA for classifying data streams. First, as every instance is fed to all component classifiers, OVA is known as an inefficient model. Second, OVA's classification accuracy is adversely affected by the imbalanced class distribution in data streams. This paper addresses those key challenges and consequently proposes a new OVA scheme that is adapted for data stream classification. Theoretical analysis and empirical evidence reveal that the adapted OVA can offer faster training, faster updating and higher classification accuracy than many existing popular data stream classification algorithms.  相似文献   

17.
Gas and oil reservoirs have been the focus of modeling efforts for decades as an attempt to locate zones with high volumes. Certain subsurface layers and layer sequences, such as those containing shale, are known to be impermeable to gas and/or liquid. Oil and natural gas then become trapped by these layers, making it possible to drill wells to reach the supply, and extract for use. The drilling of these wells, however, is costly. In this paper, we utilize multi-agent machine learning and classifier combination to learn rock facies sequences from wireline well log data. The paper focuses on how to construct a successful set of classifiers, which periodically collaborate, to increase the classification accuracy. Utilizing multiple, heterogeneous collaborative learning agents is shown to be successful for this classification problem. Utilizing the Multi-Agent Collaborative Learning Architecture, 84.5% absolute accuracy was obtained, an improvement of about 6.5% over the best results achieved by the Kansas Geological Survey with the same data set. A number of heuristics are presented for constructing teams of multiple collaborative classifiers for predicting rock facies.  相似文献   

18.
On the use of ROC analysis for the optimization of abstaining classifiers   总被引:1,自引:0,他引:1  
Classifiers that refrain from classification in certain cases can significantly reduce the misclassification cost. However, the parameters for such abstaining classifiers are often set in a rather ad-hoc manner. We propose a method to optimally build a specific type of abstaining binary classifiers using ROC analysis. These classifiers are built based on optimization criteria in the following three models: cost-based, bounded-abstention and bounded-improvement. We show that selecting the optimal classifier in the first model is similar to known iso-performance lines and uses only the slopes of ROC curves, whereas selecting the optimal classifier in the remaining two models is not straightforward. We investigate the properties of the convex-down ROCCH (ROC Convex Hull) and present a simple and efficient algorithm for finding the optimal classifier in these models, namely, the bounded-abstention and bounded-improvement models. We demonstrate the application of these models to effectively reduce misclassification cost in real-life classification systems. The method has been validated with an ROC building algorithm and cross-validation on 15 UCI KDD datasets. An early version of this paper was published at ICML2005. Action Editor: Johannes Fürnkranz.  相似文献   

19.
用于不完整数据的选择性贝叶斯分类器   总被引:3,自引:0,他引:3  
选择性分类器通过删除数据集中的无关属性和冗余属性可以有效地提高分类精度和效率.因此,一些选择性分类器应运而生.然而,由于处理不完整数据的复杂性,它们大都是针对完整数据的.由于各种原因,现实中的数据通常是不完整的并且包含许多冗余属性或无关属性.如同完整数据的情形一样,不完整数据集中的冗余属性或无关属性也会使分类性能大幅下降.因此,对用于不完整数据的选择性分类器的研究是一项重要的研究课题.通过分析以往在分类过程中对不完整数据的处理方法,提出了两种用于不完整数据的选择性贝叶斯分类器:SRBC和CBSRBC.SRBC是基于一种鲁棒贝叶斯分类器构建的,而CBSRBC则是在SRBC基础上利用X2统计量构建的.在12个标准的不完整数据集上的实验结果表明,这两种方法在大幅度减少属性数目的同时,能显著提高分类准确率和稳定性.从总体上来讲,CBSRBC在分类精度、运行效率等方面都优于SRBC算法,而SRBC需要预先指定的阈值要少一些.  相似文献   

20.
Many techniques have been proposed for credit risk prediction, from statistical models to artificial intelligence methods. However, very few research efforts have been devoted to deal with the presence of noise and outliers in the training set, which may strongly affect the performance of the prediction model. Accordingly, the aim of the present paper is to systematically investigate whether the application of filtering algorithms leads to an increase in accuracy of instance-based classifiers in the context of credit risk assessment. The experimental results with 20 different algorithms and 8 credit databases show that the filtered sets perform significantly better than the non-preprocessed training sets when using the nearest neighbour decision rule. The experiments also allow to identify which techniques are most robust and accurate when confronted with noisy credit data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号