首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
任务执行时间估计是云数据中心环境下工作流调度的前提.针对现有工作流任务执行时间预测方法缺乏类别型和数值型数据特征的有效提取问题,提出了基于多维度特征融合的预测方法.首先,通过构建具有注意力机制的堆叠残差循环网络,将类别型数据从高维稀疏的特征空间映射到低维稠密的特征空间,以增强类别型数据的解析能力,有效提取类别型特征;其次,采用极限梯度提升算法对数值型数据进行离散化编码,通过对稠密空间的输入向量进行稀疏化处理,提高了数值型特征的非线性表达能力;在此基础上,设计多维异质特征融合策略,将所提取的类别型、数值型特征与样本的原始输入特征进行融合,建立基于多维融合特征的预测模型,实现了云工作流任务执行时间的精准预测;最后,在真实云数据中心集群数据集上进行了仿真实验.实验结果表明,相对于已有的基准算法,该方法具有较高的预测精度,可用于大数据驱动的云工作流任务执行时间预测.  相似文献   

2.
In this paper, we propose a nonlinear feature extraction method for regression problems to reduce the dimensionality of the input space. Previously, a feature extraction method LDAr, a regressional version of the linear discriminant analysis, was proposed. In this paper, LDAr is generalized to a nonlinear discriminant analysis by using the so-called kernel trick. The basic idea is to map the input space into a high-dimensional feature space where the variables are nonlinear transformations of input variables. Then we try to maximize the ratio of distances of samples with large differences in the target value and those with small differences in the target value in the feature space. It is well known that the distribution of face images, under a perceivable variation in translation, rotation, and scaling, is highly nonlinear and the face alignment problem is a complex regression problem. We have applied the proposed method to various regression problems including face alignment problems and achieved better performances than those of conventional linear feature extraction methods.  相似文献   

3.
K-modes算法中原有的分类变量间距离度量方法无法体现属性值之间差异,对此提出了一种基于朴素贝叶斯分类器中间运算结果的距离度量。该度量构建代表分类变量的特征向量并计算向量间的欧氏距离作为变量间的距离。将提出的距离度量代入K-modes聚类算法并在多个UCI公共数据集上与其他度量方法进行比较,实验结果表明该距离度量更加有效。  相似文献   

4.
Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices.  相似文献   

5.
This paper introduces a self-organizing map dedicated to clustering, analysis and visualization of categorical data. Usually, when dealing with categorical data, topological maps use an encoding stage: categorical data are changed into numerical vectors and traditional numerical algorithms (SOM) are run. In the present paper, we propose a novel probabilistic formalism of Kohonen map dedicated to categorical data where neurons are represented by probability tables. We do not need to use any coding to encode variables. We evaluate the effectiveness of our model in four examples using real data. Our experiments show that our model provides a good quality of results when dealing with categorical data.  相似文献   

6.
Skyline has been widely recognized as being useful for multi-criteria decision-making applications. While most of the existing work computes skylines in various contexts, in this paper, we consider a novel problem: how far away a point is from the skyline? We propose a novel notion of skyline distance that measures the minimum cost of upgrading a point to the skyline given a cost function. Skyline distance can be regarded as a measure of multidimensional competence and can be used to rank possible choices in recommendation systems. Computing skyline distances efficiently is far from trivial and cannot be handled by any straightforward extension of the existing skyline computation methods. To tackle this problem, we systematically explore several directions. We first present a dynamic programming method. Then, we investigate the boundary of skylines and develop a sort-projection method that utilizes the skyline boundary in calculating skyline distances. Last, we develop a space partitioning method to further improve the performance. We report extensive experiment results which show that our methods are efficient and scalable.  相似文献   

7.
Clustering is one of the most popular techniques in data mining. The goal of clustering is to identify distinct groups in a dataset. Many clustering algorithms have been published so far, but often limited to numeric or categorical data. However, most real world data are mixed, numeric and categorical. In this paper, we propose a clustering algorithm CAVE which is based on variance and entropy, and is capable of mining mixed data. The variance is used to measure the similarity of the numeric part of the data. To express the similarity between categorical values, distance hierarchy has been proposed. Accordingly, the similarity of the categorical part is measured based on entropy weighted by the distances in the hierarchies. A new validity index for evaluating the clustering results has also been proposed. The effectiveness of CAVE is demonstrated by a series of experiments on synthetic and real datasets in comparison with that of several traditional clustering algorithms. An application of mining a mixed dataset for customer segmentation and catalog marketing is also presented.  相似文献   

8.
Ordered weighted average (OWA) operators with their weighting vectors are very important in many applications. We show that directly taking Minkowski distances (including Manhattan distance and Euclidean distance) as the distances for any two OWA operator is not reasonable. In this study, we propose the standard distance measures for any two OWA operators and then propose a standard metric space for the set of all n‐dimension OWA operators. We analyze and discuss some properties of the introduced OWA metric and further propose a metric space of Choquet integrals represented by the underlying fuzzy measures. Some applications in decision making of OWA distances are also presented in this study.  相似文献   

9.
An external context like weather conditions, lighting, etc. influences classification results, but it is frequently omitted in a mathematical model of the problem at hand. Our aim is to propose a mathematical model, which extends the Bayesian problem of pattern recognition by incorporating external context variables. They are implanted as functions, which influence parameters of class distributions. We prove that context variables influence a shape or a position of the optimal class separating surface, without enlarging the dimensionality of a pattern space. Thus, one can treat the proposed extended Bayesian model as a fusion of patterns and external context variables, embedded into the same pattern space. Then, learning algorithms for neural network classifiers are proposed, which take context variables into account.  相似文献   

10.
Several grammar-based genetic programming algorithms have been proposed in the literature to automatically generate heuristics for hard optimization problems. These approaches specify the algorithmic building blocks and the way in which they can be combined in a grammar; the best heuristic for the problem being tackled is found by an evolutionary algorithm that searches in the algorithm design space defined by the grammar.In this work, we propose a novel representation of the grammar by a sequence of categorical, integer, and real-valued parameters. We then use a tool for automatic algorithm configuration to search for the best algorithm for the problem at hand. Our experimental evaluation on the one-dimensional bin packing problem and the permutation flowshop problem with weighted tardiness objective shows that the proposed approach produces better algorithms than grammatical evolution, a well-established variant of grammar-based genetic programming. The reasons behind such improvement lie both in the representation proposed and in the method used to search the algorithm design space.  相似文献   

11.
认知科学表明基于流形学习的人脸图像检索能准确反映人脸图片的内在相似性和人类的视觉感知本质. 提出一种基于相关反馈的人脸高维索引方法--NDL,以提高人脸图像检索的性能.同时在该索引基础上提出一种流形空间下的相似查询--虚拟k近邻查询(Vk-NN), 该查询方法特别为基于NDL的人脸检索而设计.首先通过在一定阈值约束下计算任何两个人脸图片的相似度,建立一个称为邻接距离表(NDL)的二维距离图. 同时将距离值用B+-树建立索引.最后, 高维流形空间的Vk-NN查询转化为一维空间的基于B+树的查询. 实验表明:NDL索引在流形空间的检索效率明显优于顺序检索,特别适合海量人脸图片的检索.  相似文献   

12.
目前已有的不完整数据填充方法大多局限于单一类型的缺失变量,对大规模数据的填充效果相对弱势.为了解决真实大数据中混合类型变量的缺失问题,本文提出了一个新的模型——SXGBI(Spark-based e Xtreme Gradient Boosting Imputation),其适应于连续型和分类型两种缺失变量并存的不完整数据填充,同时具备快速处理大数据的泛化能力.该方法通过对集成学习方法 XGBoost的改进,将多种补全算法结合在一起,构建了一个集成学习器,并结合Spark分布式计算框架进行了并行化设计,能较好地运行于Spark分布式集群上.实验表明,随着缺失率的增长,SXGBI在RMSE、PFC和F1几项评价指标上都取得了比实验中其它填充方法更好的填充结果.此外,它还可以有效地运用在大规模的数据集上.  相似文献   

13.
The k-Nearest Neighbour (k-NN) estimation and prediction technique is widely used to produce pixel-level predictions and areal estimates of continuous forest variables such as area and volume, often by sub-categories such as species. An advantage of k-NN is that the same parameters (e.g., k-value, distance metric, weight vector for the feature space variables) can be used for all variables, whether continuous or categorical. An obvious question is the degree to which accuracy can be improved if the k-NN estimation parameters are tailored for specific variable groups such as volumes by tree species or categorical variables. We investigated prediction of categorical forest attribute variables from satellite image spectral data using k-NN with optimisation of the weight vector for the ancillary variables obtained using a genetic algorithm. We tested several genetic algorithm fitness functions, all derived from well-known accuracy measures. For a Finnish test site, the categorical forest attribute variables were site fertility and tree species dominance, and for an Italian test site, the variables were forest type and conifer/broad-leaved dominance. The results for both test sites were validated using independent data sets. Our results indicate that use of the genetic algorithm to optimize the weight vector for prediction of a single forest attribute variable had a slight positive effect on the prediction accuracies for other variables. Errors can be further decreased if the optimisation is done by variable groups.  相似文献   

14.
Genomic data, and more generally biomedical data, are often characterized by high dimensionality. An input selection procedure can attain the two objectives of highlighting the relevant variables (genes) and possibly improving classification results. In this paper, we propose a wrapper approach to gene selection in classification of gene expression data using simulated annealing along with supervised classification. The proposed approach can perform global combinatorial searches through the space of all possible input subsets, can handle cases with numerical, categorical or mixed inputs, and is able to find (sub-)optimal subsets of inputs giving low classification errors. The method has been tested on publicly available bioinformatics data sets using support vector machines and on a mixed type data set using classification trees. We also propose some heuristics able to speed up the convergence. The experimental results highlight the ability of the method to select minimal sets of relevant features.  相似文献   

15.
滕辎  于晓升  吴成东 《控制与决策》2024,39(5):1469-1477
为了实现强噪声和模糊干扰下的低清人脸图像重建,提出一种基于多级隐空间信息约束的噪声人脸超分辨率算法.首先设计一个用于人脸有效信息提取的特征蒸馏网络, 并通过统计性抗干扰模型和隐空间特征对比算法移除噪声等无效信息,构建一个具有高噪声鲁棒性的人脸信息提取模型;然后,设计人脸重建网络,该网络利用提取的人脸特征重建高清人脸图像; 最后,通过人脸身份嵌入模型和离散小波变换模型,分别从超球面身份度量空间和小波域进一步对重建人脸的身份信息和空间结构进行约束.实验结果表明,所提出的算法不仅能够有效去除高噪声环境下的人脸噪声,而且还能有效提升人脸图像分辨率,获得更高的峰值信噪比(peak signal-to-noise ratio,PSNR)和结构相似度(structural similarity index,SSIM),具有较好的实用性.  相似文献   

16.
17.
18.
针对船用污水处理装置运行状态的监测问题,提出一种含时间因子的正交状态基设备运行状态分类模型.根据设备系统的运行特点、条件及各状态的时间属性,确定训练集变量的取值特征,将输入向量映射到一个正交完备的特征空间,生成各数据类别的正交状态基,由此得到判别矩阵,井引入时间相位因子构成含时正交状态基矢量,从而实现对未知状态的分类....  相似文献   

19.
In this paper, we propose a robust Kalman filter and smoother for the errors‐in‐variables (EIV) state space models subject to observation noise with outliers. We introduce the EIV problem with outliers and then present the minimum covariance determinant (MCD) estimator which is a highly robust estimator in terms of protecting the estimate from the outliers. Then, we propose the randomized algorithm to find the MCD estimate. However, the uniform sampling method has a high computational cost and may lead to biased estimates, therefore we apply the sub‐sampling method. A Monte Carlo simulation result shows the efficiency of the proposed algorithm. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society  相似文献   

20.
This paper presents a data driven approach that enables one to obtain a measure of comparability between-groups in the presence of observational data.The main idea lies in the use of the general framework of conditional multiple correspondences analysis as a tool for investigating the dependence relationship between a set of observable categorical covariates X and an assignment-to-treatment indicator variable T, in order to obtain a global measure of comparability between-groups according to their dependence structure. Then, we propose a strategy that enables one to find treatment groups, directly comparable with respect to pre-treatment characteristics, on which estimate local causal effects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号