期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李慧芳黄姜杭徐光浩夏元清《自动化学报》2023,49(1):67-78

任务执行时间估计是云数据中心环境下工作流调度的前提.针对现有工作流任务执行时间预测方法缺乏类别型和数值型数据特征的有效提取问题,提出了基于多维度特征融合的预测方法.首先,通过构建具有注意力机制的堆叠残差循环网络,将类别型数据从高维稀疏的特征空间映射到低维稠密的特征空间,以增强类别型数据的解析能力,有效提取类别型特征;其次,采用极限梯度提升算法对数值型数据进行离散化编码,通过对稠密空间的输入向量进行稀疏化处理,提高了数值型特征的非线性表达能力;在此基础上,设计多维异质特征融合策略,将所提取的类别型、数值型特征与样本的原始输入特征进行融合,建立基于多维融合特征的预测模型,实现了云工作流任务执行时间的精准预测;最后,在真实云数据中心集群数据集上进行了仿真实验.实验结果表明,相对于已有的基准算法,该方法具有较高的预测精度,可用于大数据驱动的云工作流任务执行时间预测. 相似文献

2.

Kernel discriminant analysis for regression problems

Nojun Kwak 《Pattern recognition》2012,45(5):2019-2031

In this paper, we propose a nonlinear feature extraction method for regression problems to reduce the dimensionality of the input space. Previously, a feature extraction method LDAr, a regressional version of the linear discriminant analysis, was proposed. In this paper, LDAr is generalized to a nonlinear discriminant analysis by using the so-called kernel trick. The basic idea is to map the input space into a high-dimensional feature space where the variables are nonlinear transformations of input variables. Then we try to maximize the ratio of distances of samples with large differences in the target value and those with small differences in the target value in the feature space. It is well known that the distribution of face images, under a perceivable variation in translation, rotation, and scaling, is highly nonlinear and the face alignment problem is a complex regression problem. We have applied the proposed method to various regression problems including face alignment problems and achieved better performances than those of conventional linear feature extraction methods. 相似文献

3.

基于贝叶斯距离的K-modes聚类算法

赵亮刘建辉张昭昭《计算机工程与科学》2017,39(1):188-193

K-modes算法中原有的分类变量间距离度量方法无法体现属性值之间差异,对此提出了一种基于朴素贝叶斯分类器中间运算结果的距离度量。该度量构建代表分类变量的特征向量并计算向量间的欧氏距离作为变量间的距离。将提出的距离度量代入K-modes聚类算法并在多个UCI公共数据集上与其他度量方法进行比较,实验结果表明该距离度量更加有效。相似文献

4.

Partition-and-merge based fuzzy genetic clustering algorithm for categorical data

《Applied Soft Computing》2019

Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices. 相似文献

5.

Visualization and clustering of categorical data with probabilistic self-organizing map 总被引：1，自引：1，他引：0

Mustapha Lebbah Khalid Benabdeslem 《Neural computing & applications》2010,19(3):393-404

This paper introduces a self-organizing map dedicated to clustering, analysis and visualization of categorical data. Usually, when dealing with categorical data, topological maps use an encoding stage: categorical data are changed into numerical vectors and traditional numerical algorithms (SOM) are run. In the present paper, we propose a novel probabilistic formalism of Kohonen map dedicated to categorical data where neurons are represented by probability tables. We do not need to use any coding to encode variables. We evaluate the effectiveness of our model in four examples using real data. Our experiments show that our model provides a good quality of results when dealing with categorical data. 相似文献

6.

Skyline distance: a measure of multidimensional competence

Jin Huang Bin Jiang Jian Pei Jian Chen Yong Tang 《Knowledge and Information Systems》2013,34(2):373-396

Skyline has been widely recognized as being useful for multi-criteria decision-making applications. While most of the existing work computes skylines in various contexts, in this paper, we consider a novel problem: how far away a point is from the skyline? We propose a novel notion of skyline distance that measures the minimum cost of upgrading a point to the skyline given a cost function. Skyline distance can be regarded as a measure of multidimensional competence and can be used to rank possible choices in recommendation systems. Computing skyline distances efficiently is far from trivial and cannot be handled by any straightforward extension of the existing skyline computation methods. To tackle this problem, we systematically explore several directions. We first present a dynamic programming method. Then, we investigate the boundary of skylines and develop a sort-projection method that utilizes the skyline boundary in calculating skyline distances. Last, we develop a space partitioning method to further improve the performance. We report extensive experiment results which show that our methods are efficient and scalable. 相似文献

7.

Mining of mixed data with application to catalog marketing

《Expert systems with applications》2007,32(1):12-23

Clustering is one of the most popular techniques in data mining. The goal of clustering is to identify distinct groups in a dataset. Many clustering algorithms have been published so far, but often limited to numeric or categorical data. However, most real world data are mixed, numeric and categorical. In this paper, we propose a clustering algorithm CAVE which is based on variance and entropy, and is capable of mining mixed data. The variance is used to measure the similarity of the numeric part of the data. To express the similarity between categorical values, distance hierarchy has been proposed. Accordingly, the similarity of the categorical part is measured based on entropy weighted by the distances in the hierarchies. A new validity index for evaluating the clustering results has also been proposed. The effectiveness of CAVE is demonstrated by a series of experiments on synthetic and real datasets in comparison with that of several traditional clustering algorithms. An application of mining a mixed dataset for customer segmentation and catalog marketing is also presented. 相似文献

8.

The Metric Space of Ordered Weighted Average Operators with Distance Based on Accumulated Entries

下载免费PDF全文

LeSheng Jin Radko Mesiar 《国际智能系统杂志》2017,32(7):665-675

Ordered weighted average (OWA) operators with their weighting vectors are very important in many applications. We show that directly taking Minkowski distances (including Manhattan distance and Euclidean distance) as the distances for any two OWA operator is not reasonable. In this study, we propose the standard distance measures for any two OWA operators and then propose a standard metric space for the set of all n‐dimension OWA operators. We analyze and discuss some properties of the introduced OWA metric and further propose a metric space of Choquet integrals represented by the underlying fuzzy measures. Some applications in decision making of OWA distances are also presented in this study. 相似文献

9.

Classifiers sensitive to external context – theory and applications to video sequences

Ewaryst Rafajłowicz 《Expert Systems》2012,29(1):84-104

An external context like weather conditions, lighting, etc. influences classification results, but it is frequently omitted in a mathematical model of the problem at hand. Our aim is to propose a mathematical model, which extends the Bayesian problem of pattern recognition by incorporating external context variables. They are implanted as functions, which influence parameters of class distributions. We prove that context variables influence a shape or a position of the optimal class separating surface, without enlarging the dimensionality of a pattern space. Thus, one can treat the proposed extended Bayesian model as a fusion of patterns and external context variables, embedded into the same pattern space. Then, learning algorithms for neural network classifiers are proposed, which take context variables into account. 相似文献

10.

Grammar-based generation of stochastic local search heuristics through automatic algorithm configuration tools

《Computers & Operations Research》2014

Several grammar-based genetic programming algorithms have been proposed in the literature to automatically generate heuristics for hard optimization problems. These approaches specify the algorithmic building blocks and the way in which they can be combined in a grammar; the best heuristic for the problem being tackled is found by an evolutionary algorithm that searches in the algorithm design space defined by the grammar.In this work, we propose a novel representation of the grammar by a sequence of categorical, integer, and real-valued parameters. We then use a tool for automatic algorithm configuration to search for the best algorithm for the problem at hand. Our experimental evaluation on the one-dimensional bin packing problem and the permutation flowshop problem with weighted tardiness objective shows that the proposed approach produces better algorithms than grammatical evolution, a well-established variant of grammar-based genetic programming. The reasons behind such improvement lie both in the representation proposed and in the method used to search the algorithm design space. 相似文献

11.

基于流形空间的交互式人脸图像索引

庄毅胡华袁承祥蒋国昌胡海洋琚春华《计算机研究与发展》2010,47(Z1)

认知科学表明基于流形学习的人脸图像检索能准确反映人脸图片的内在相似性和人类的视觉感知本质. 提出一种基于相关反馈的人脸高维索引方法--NDL,以提高人脸图像检索的性能.同时在该索引基础上提出一种流形空间下的相似查询--虚拟k近邻查询(Vk-NN), 该查询方法特别为基于NDL的人脸检索而设计.首先通过在一定阈值约束下计算任何两个人脸图片的相似度,建立一个称为邻接距离表(NDL)的二维距离图. 同时将距离值用B+-树建立索引.最后, 高维流形空间的Vk-NN查询转化为一维空间的基于B+树的查询. 实验表明:NDL索引在流形空间的检索效率明显优于顺序检索,特别适合海量人脸图片的检索. 相似文献

12.

Spark环境下不完整数据集成填充方法

邹萌萍彭敦陆《小型微型计算机系统》2021,(1):111-116

目前已有的不完整数据填充方法大多局限于单一类型的缺失变量,对大规模数据的填充效果相对弱势.为了解决真实大数据中混合类型变量的缺失问题,本文提出了一个新的模型——SXGBI(Spark-based e Xtreme Gradient Boosting Imputation),其适应于连续型和分类型两种缺失变量并存的不完整数据填充,同时具备快速处理大数据的泛化能力.该方法通过对集成学习方法 XGBoost的改进,将多种补全算法结合在一起,构建了一个集成学习器,并结合Spark分布式计算框架进行了并行化设计,能较好地运行于Spark分布式集群上.实验表明,随着缺失率的增长,SXGBI在RMSE、PFC和F1几项评价指标上都取得了比实验中其它填充方法更好的填充结果.此外,它还可以有效地运用在大规模的数据集上. 相似文献

13.

Predicting categorical forest variables using an improved k-Nearest Neighbour estimator and Landsat imagery 总被引：2，自引：0，他引：2

Erkki O. Tomppo Caterina Gagliano Flora De Natale Matti Katila Ronald E. McRoberts 《Remote sensing of environment》2009,113(3):500-3174

The k-Nearest Neighbour (k-NN) estimation and prediction technique is widely used to produce pixel-level predictions and areal estimates of continuous forest variables such as area and volume, often by sub-categories such as species. An advantage of k-NN is that the same parameters (e.g., k-value, distance metric, weight vector for the feature space variables) can be used for all variables, whether continuous or categorical. An obvious question is the degree to which accuracy can be improved if the k-NN estimation parameters are tailored for specific variable groups such as volumes by tree species or categorical variables. We investigated prediction of categorical forest attribute variables from satellite image spectral data using k-NN with optimisation of the weight vector for the ancillary variables obtained using a genetic algorithm. We tested several genetic algorithm fitness functions, all derived from well-known accuracy measures. For a Finnish test site, the categorical forest attribute variables were site fertility and tree species dominance, and for an Italian test site, the variables were forest type and conifer/broad-leaved dominance. The results for both test sites were validated using independent data sets. Our results indicate that use of the genetic algorithm to optimize the weight vector for prediction of a single forest attribute variable had a slight positive effect on the prediction accuracies for other variables. Errors can be further decreased if the optimisation is done by variable groups. 相似文献

14.

Simulated annealing for supervised gene selection

Maurizio Filippone Francesco Masulli Stefano Rovetta 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(8):1471-1482

Genomic data, and more generally biomedical data, are often characterized by high dimensionality. An input selection procedure can attain the two objectives of highlighting the relevant variables (genes) and possibly improving classification results. In this paper, we propose a wrapper approach to gene selection in classification of gene expression data using simulated annealing along with supervised classification. The proposed approach can perform global combinatorial searches through the space of all possible input subsets, can handle cases with numerical, categorical or mixed inputs, and is able to find (sub-)optimal subsets of inputs giving low classification errors. The method has been tested on publicly available bioinformatics data sets using support vector machines and on a mixed type data set using classification trees. We also propose some heuristics able to speed up the convergence. The experimental results highlight the ability of the method to select minimal sets of relevant features. 相似文献

15.

基于多级隐空间信息约束的噪声人脸超分辨率算法

滕辎于晓升吴成东《控制与决策》2024,39(5):1469-1477

为了实现强噪声和模糊干扰下的低清人脸图像重建,提出一种基于多级隐空间信息约束的噪声人脸超分辨率算法.首先设计一个用于人脸有效信息提取的特征蒸馏网络, 并通过统计性抗干扰模型和隐空间特征对比算法移除噪声等无效信息,构建一个具有高噪声鲁棒性的人脸信息提取模型;然后,设计人脸重建网络,该网络利用提取的人脸特征重建高清人脸图像; 最后,通过人脸身份嵌入模型和离散小波变换模型,分别从超球面身份度量空间和小波域进一步对重建人脸的身份信息和空间结构进行约束.实验结果表明,所提出的算法不仅能够有效去除高噪声环境下的人脸噪声,而且还能有效提升人脸图像分辨率,获得更高的峰值信噪比(peak signal-to-noise ratio,PSNR)和结构相似度(structural similarity index,SSIM),具有较好的实用性. 相似文献

16.

Image-based 3D model retrieval using manifold learning

Pan-pan Mu San-yuan Zhang Yin Zhang Xiu-zi Ye Xiang Pan 《浙江大学学报:C卷英文版》2018,19(11):1397-1408

相似文献

17.

Robust analysis and control of parameter-dependent uncertain descriptor systems 总被引：1，自引：0，他引：1

Gabriela Iuliana Bara 《Systems & Control Letters》2011,60(5):356-364

相似文献

18.

污水处理装置运行状态的含时正交基分类模型

下载免费PDF全文

王荣秀曹晓莉孙怀义胡卫军江朝元《计算机工程》2011,37(17):233-235,238

针对船用污水处理装置运行状态的监测问题,提出一种含时间因子的正交状态基设备运行状态分类模型.根据设备系统的运行特点、条件及各状态的时间属性,确定训练集变量的取值特征,将输入向量映射到一个正交完备的特征空间,生成各数据类别的正交状态基,由此得到判别矩阵,井引入时间相位因子构成含时正交状态基矢量,从而实现对未知状态的分类.... 相似文献

19.

Robust Kalman filter and smoother for errors‐in‐variables state space models with observation outliers based on the minimum‐covariance determinant estimator

Jaafar Almutawa 《Asian journal of control》2011,13(4):513-521

In this paper, we propose a robust Kalman filter and smoother for the errors‐in‐variables (EIV) state space models subject to observation noise with outliers. We introduce the EIV problem with outliers and then present the minimum covariance determinant (MCD) estimator which is a highly robust estimator in terms of protecting the estimate from the outliers. Then, we propose the randomized algorithm to find the MCD estimate. However, the uniform sampling method has a high computational cost and may lead to biased estimates, therefore we apply the sub‐sampling method. A Monte Carlo simulation result shows the efficiency of the proposed algorithm. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society 相似文献

20.

A new data mining approach to estimate causal effects of policy interventions

F. Camillo Ida D&#x;Attoma 《Expert systems with applications》2010,37(1):171-181

This paper presents a data driven approach that enables one to obtain a measure of comparability between-groups in the presence of observational data.The main idea lies in the use of the general framework of conditional multiple correspondences analysis as a tool for investigating the dependence relationship between a set of observable categorical covariates X and an assignment-to-treatment indicator variable T, in order to obtain a global measure of comparability between-groups according to their dependence structure. Then, we propose a strategy that enables one to find treatment groups, directly comparable with respect to pre-treatment characteristics, on which estimate local causal effects. 相似文献