首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The aim of this paper is to propose a new hybrid data mining model based on combination of various feature selection and ensemble learning classification algorithms, in order to support decision making process. The model is built through several stages. In the first stage, initial dataset is preprocessed and apart of applying different preprocessing techniques, we paid a great attention to the feature selection. Five different feature selection algorithms were applied and their results, based on ROC and accuracy measures of logistic regression algorithm, were combined based on different voting types. We also proposed a new voting method, called if_any, that outperformed all other voting methods, as well as a single feature selection algorithm's results. In the next stage, a four different classification algorithms, including generalized linear model, support vector machine, naive Bayes and decision tree, were performed based on dataset obtained in the feature selection process. These classifiers were combined in eight different ensemble models using soft voting method. Using the real dataset, the experimental results show that hybrid model that is based on features selected by if_any voting method and ensemble GLM + DT model performs the highest performance and outperforms all other ensemble and single classifier models.  相似文献   

2.
蔡铁  伍星  李烨 《计算机应用》2008,28(8):2091-2093
为构造集成学习中具有差异性的基分类器,提出基于数据离散化的基分类器构造方法,并用于支持向量机集成。该方法采用粗糙集和布尔推理离散化算法处理训练样本集,能有效删除不相关和冗余的属性,提高基分类器的准确性和差异性。实验结果表明,所提方法能取得比传统集成学习算法Bagging和Adaboost更好的性能。  相似文献   

3.
Activity recognition aims to detect the physical activities such as walking, sitting, and jogging performed by humans. With the widespread adoption and usage of mobile devices in daily life, several advanced applications of activity recognition were implemented and distributed all over the world. In this study, we explored the power of ensemble of classifiers approach for accelerometer-based activity recognition and built a novel activity prediction model based on machine learning classifiers. Our approach utilizes from J48 decision tree, Multi-Layer Perceptrons (MLP) and Logistic Regression techniques and combines these classifiers with the average of probabilities combination rule. Publicly available activity recognition dataset known as WISDM (Wireless Sensor Data Mining) which includes information from thirty six users was used during the experiments. According to the experimental results, our model provides better performance than MLP-based recognition approach suggested in previous study. These results strongly suggest researchers applying ensemble of classifiers approach for activity recognition problem.  相似文献   

4.
Cyber security classification algorithms usually operate with datasets presenting many missing features and strongly unbalanced classes. In order to cope with these issues, we designed a distributed genetic programming (GP) framework, named CAGE-MetaCombiner, which adopts a meta-ensemble model to operate efficiently with missing data. Each ensemble evolves a function for combining the classifiers, which does not need of any extra phase of training on the original data. Therefore, in the case of changes in the data, the function can be recomputed in an incremental way, with a moderate computational effort; this aspect together with the advantages of running on parallel/distributed architectures makes the algorithm suitable to operate with the real time constraints typical of a cyber security problem. In addition, an important cyber security problem that concerns the classification of the users or the employers of an e-payment system is illustrated, in order to show the relevance of the case in which entire sources of data or groups of features are missing. Finally, the capacity of approach in handling groups of missing features and unbalanced datasets is validated on many artificial datasets and on two real datasets and it is compared with some similar approaches.  相似文献   

5.
Compared with structured data sources that are usually stored and analyzed in spreadsheets, relational databases, and single data tables, unstructured construction data sources such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our vision for data management and mining addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data mining on text-based, web-based, image-based, and network-based construction databases.  相似文献   

6.
Molecular level diagnostics based on microarray technologies can offer the methodology of precise, objective, and systematic cancer classification. Genome-wide expression patterns generally consist of thousands of genes. It is desirable to extract some significant genes for accurate diagnosis of cancer because not all genes are associated with a cancer. In this paper, we have used representative gene vectors that are highly discriminatory for cancer classes and extracted multiple significant gene subsets based on those representative vectors respectively. Also, an ensemble of neural networks learned from the multiple significant gene subsets is proposed to classify a sample into one of several cancer classes. The performance of the proposed method is systematically evaluated using three different cancer types: Leukemia, colon, and B-cell lymphoma.  相似文献   

7.
Today, construction planning and scheduling is almost always performed manually, by experienced practitioners. The knowledge of those individuals is materialized, maintained, and propagated through master schedules and look-ahead plans. While historical project schedules are available, manually mining their embedded knowledge to create generic work templates for future projects or revising look-ahead schedules is very difficult, time-consuming and error-prone. The rigid work templates from prior research are also not scalable to cover the inter and intra-class variability in historical schedule activities. This paper aims at fulfilling these needs via a new method to automatically learn construction knowledge from historical project planning and scheduling records and digitize such knowledge in a flexible and generalizable data schema. Specifically, we present Dynamic Process Templates (DPTs) based on a novel vector representation for construction activities where the sequencing knowledge is modeled with generative Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs). Our machine learning models are exhaustively tested and validated on a diverse dataset of 32 schedules obtained from real-world projects. The experimental results show our method is capable of learning planning and sequencing knowledge at high accuracy across different projects. The benefits for automated project planning and scheduling, schedule quality control, and automated generation of project look-aheads are discussed in detail.  相似文献   

8.
根据分类技术建立入侵检测系统的思路,构造了一个基于贝叶斯分类的入侵检测系统模型。本文提出了利用未标记数据提高贝叶斯分类器性能的方法,可以大大提高入侵检测系统准确率和效率。  相似文献   

9.
This paper addresses the supervised learning in which the class memberships of training data are subject to ambiguity. This problem is tackled in the ensemble learning and the Dempster-Shafer theory of evidence frameworks. The initial labels of the training data are ignored and by utilizing the main classes’ prototypes, each training pattern is reassigned to one class or a subset of the main classes based on the level of ambiguity concerning its class label. Multilayer perceptron neural network is employed to learn the characteristics of the data with new labels and for a given test pattern its outputs are considered as basic belief assignment. Experiments with artificial and real data demonstrate that taking into account the ambiguity in labels of the learning data can provide better classification results than single and ensemble classifiers that solve the classification problem using data with initial imperfect labels.  相似文献   

10.
Existing attempts to automate construction document analysis are limited in understanding the varied semantic properties of different documents. Due to the semantic conflicts, the construction specification review process is still conducted manually in practice despite the promising performance of the existing approaches. This research aimed to develop an automated system for reviewing construction specifications by analyzing the different semantic properties using natural language processing techniques. The proposed method analyzed varied semantic properties of 56 different specifications from five different countries in terms of vocabulary, sentence structure, and the organizing styles of provisions. First, the authors developed a semantic thesaurus for construction terms including 208 word-replacement rules based on Word2Vec embedding to understand the different vocabularies. Second, the authors developed a named entity recognition model based on bi-directional long short-term memory with a conditional random field layer, which identified the required keywords from given provisions with an averaged F1 score of 0.928. Third, the authors developed a provision-pairing model based on Doc2Vec embedding, which identified the most relevant provisions with an average accuracy of 84.4%. The web-based prototype demonstrated that the proposed system can facilitate the construction specification review process by reducing the time spent, supplementing the reviewer’s experience, enhancing accuracy, and achieving consistency. The results contribute to risk management in the construction industry, with practitioners being able to review construction specifications thoroughly in spite of tight schedules and few available experts.  相似文献   

11.
在方便面包装过程中经常出现三种调味包丢失的情况,目前主要依靠人工检测识别,因而提出了一种基于HSI颜色模型特征分类方法的识别技术。该技术已在方便面生产流水线上试运行成功。经过8个小时,6万包的现场测试,结果表明,该方法实时性好,准确率高,完全能满足生产工艺要求,提高了整个流水线的生产速度,减轻了工人劳动量。  相似文献   

12.
3D point cloud data obtained from laser scans, images, and videos are able to provide accurate and fast records of the 3D geometries of construction-related objects. Thus, the construction industry has been using point cloud data for a variety of purposes including 3D model reconstruction, geometry quality inspection, construction progress tracking, etc. Although a number of studies have been reported on applying point cloud data for the construction industry in the recent decades, there has not been any systematic review that summaries these applications and points out the research gaps and future research directions. This paper, therefore, aims to provide a thorough review on the applications of 3D point cloud data in the construction industry and to provide recommendations on future research directions in this area. A total of 197 research papers were collected in this study through a two-fold literature search, which were published within a fifteen-year period from 2004 to 2018. Based on the collected papers, applications of 3D point cloud data in the construction industry are reviewed according to three categories including (1) 3D model reconstruction, (2) geometry quality inspection, and (3) other applications. Following the literature review, this paper discusses on the acquisition and processing of point cloud data, particularly focusing on how to properly perform data acquisition and processing to fulfill the needs of the intended construction applications. Specifically, the determination of required point cloud data quality and the determination of data acquisition parameters are discussed with regard to data acquisition, and the extraction and utilization of semantic information and the platforms for data visualization and processing are discussed with regard to data processing. Based on the review of applications and the following discussions, research gaps and future research directions are recommended including (1) application-oriented data acquisition, (2) semantic enrichment for as-is BIM, (3) geometry quality inspection in fabrication phase, and (4) real-time visualization and processing.  相似文献   

13.
由于访问控制需要处理大量数据,造成现有自动访问控制模型误报率高且实时性差,数据挖掘技术的优势在于能从大量数据中根据规则发现特征和模式,本文提出了一种基于并行组合分类数据挖掘思想的自动访问控制模型.  相似文献   

14.
一种基于图像的关联规则发现算法的研究   总被引:6,自引:0,他引:6  
论文介绍的是在图像数据库中的知识发现,介绍了一种在二维彩色图像中发现关联规则的挖掘算法。算法主要由四步组成:属性抽取,对象鉴别,辅助图像创建和对象挖掘。重点是关于图像内容的数据挖掘,并不涉及到其它的知识领域。文章最后对所提出的算法进行了分析。  相似文献   

15.
A new region based lossy compression scheme for color images is proposed. The segmentation method belongs to the split and merge category. Splitting is carried out using the watershed transform. In the merging stage, a fuzzy color preserving rule-based system and a novel one-dimensional graph structure are introduced to provide accurate results with reduced computational complexity. The compression part is based on the Shape Adaptive DCT with ΔDC correction method. The quantization matrices used have been designed according to the properties of the employed transform. Promising perceptual results for the low bit rate range compared to previously reported compression methods have also been reported.  相似文献   

16.
数据挖掘中分类算法综述   总被引:17,自引:0,他引:17  
分类是数据挖掘、机器学习和模式识别中一个重要的研究领域。通过对当前数据挖掘中具有代表性的优秀分类算法进行分析和比较,总结出了各种算法的特性,为使用者选择算法或研究者改进算法法提供了依据。此外,提出了评价分类器的5条标准,以便于研究者提出新的有效算法。  相似文献   

17.
用细胞神经网络提取二值与灰度图象边缘   总被引:6,自引:0,他引:6       下载免费PDF全文
边缘是图象的重要特征,采用细胞神经网络提取图象边缘时,网络参数的选择是一个重要问题。为了能够有效地提取图象边缘,基于高通滤波模板,选择了细胞神经网络的一组简单易行的参数,首先将其用于检测二值图象边缘,再在此基础上,通过综合灰度值各位面边缘检测的结果提取出灰度图象的边缘。与传统边缘提取方法Sobel和Log方法的比较可见,该方法是有效的,并且由于细胞神经网络具有高速并行运算、便于硬件实现等特点,因此使其在图象实时处理中具有更大的潜力。  相似文献   

18.
图象局部域缺损信息恢复   总被引:6,自引:0,他引:6  
图象某些象元受损称信息缺损。为了修复多点连通信息缺损,提出了纹理拟合方案。基于局部域上象元的整体相关性和基图象纹理连续性,对已知象元进行迭代拟合,同时修复多点缺损数据。实验结果表明修复区具有边界光滑和纹理连续的特点。  相似文献   

19.
A procedure for modelling the distribution of solar spectral irradiance is proposed. It uses both statistical and data mining techniques. As a result, it is possible to simulate solar spectral irradiance distribution using some astronomical parameters and the meteorological parameters solar irradiance, temperature and humidity. With these parameters, the average photon energy and the normalization factor, which characterise the solar spectra, are estimated. First, the Kolmogorov–Smirnov two-sample test is used to analyse and compare all measured spectra. The k-means data mining technique is subsequently used to cluster all measurements. We found that three clusters are enough to characterise all observed spectra. Finally, an artificial neural network and a multivariate linear regression are estimated to simulate the solar spectral distribution matching certain meteorological parameters. The results obtained show that over 99.98% of cumulative probability distribution functions of measured spectra are the same as simulated ones.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号