首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The problems arising when there are outliers in a data set that follow a circular distribution are considered. A robust estimation of the unknown parameters is obtained using the methods of weighted likelihood and minimum disparity, each of which is defined for a general parametric family of circular data. The class of power divergence and the related residual adjustment function is investigated in order to improve the performance of the two methods which are studied for the Von Mises (circular normal) and the Wrapped Normal distributions. The techniques are illustrated via two examples based on a real data set and a Monte Carlo study, which also enables the discussion of various computational aspects.  相似文献   

2.
A new algorithm is developed to train feed-forward neural networks for non-linear input-to-output mappings with small incomplete data in arbitrary distributions. The developed Training-EStimation-Training (TEST) algorithm consists of 3 steps, i.e., (1) training with the complete portion of the training data set, (2) estimation of the missing attributes with the trained neural networks, and (3) re-training the neural networks with the whole data set. Error back propagation is still applicable to estimate the missing attributes. Unlike other training methods with missing data, it does not assume data distribution models which may not be appropriate for small training data. The developed TEST algorithm is first tested for the Iris benchmark data. By randomly removing some attributes from the complete data set and estimating the values latter, accuracy of the TEST algorithm is demonstrated. Then it is applied to the Diabetes benchmark data, of which about 50% contains missing attributes. Compared with other existing algorithms, the proposed TEST algorithm results in much better recognition accuracy for test data.  相似文献   

3.
针对小数据集条件下贝叶斯网络参数学习问题,约束最大似然(CML)和定性最大后验概率(QMAP)方法是两种约束适用性较好的方法.当样本数量、约束数量、参数位置不同时,上述两种方法互有优劣,进而导致方法上的难以选择.因此,本文提出一种自适应参数学习方法:首先,利用CML和QMAP方法学习得到两组参数;然后,基于拒绝–接受采样和空间最大后验概率思想自定义计算得到样本权重、约束权重、参数位置权重;最后,基于上述参数和权重计算得到新的参数解.实验表明:在任何条件下,本文方法计算得到参数的精度接近甚至优于CML和QMAP方法的最优解.  相似文献   

4.
The behavioral approach to systems theory, put forward 40 years ago by Jan C. Willems, takes a representation-free perspective of a dynamical system as a set of trajectories. Till recently, it was an unorthodox niche of research but has gained renewed interest for the newly emerged data-driven paradigm, for which it is uniquely suited due to the representation-free perspective paired with recently developed computational methods. A result derived in the behavioral setting that became known as the fundamental lemma started a new class of subspace-type data-driven methods. The fundamental lemma gives conditions for a non-parametric representation of a linear time-invariant system by the image of a Hankel matrix constructed from raw time series data. This paper reviews the fundamental lemma, its generalizations, and related data-driven analysis, signal processing, and control methods. A prototypical signal processing problem, reviewed in the paper, is missing data estimation. It includes simulation, state estimation, and output tracking control as special cases. The direct data-driven control methods using the fundamental lemma and the non-parametric representation are loosely classified as implicit and explicit approaches. Representative examples are data-enabled predictive control (an implicit method) and data-driven linear quadratic regulation (an explicit method). These methods are equally amenable to certainty-equivalence as well as to robust control. Emphasis is put on the robustness of the methods under noise. The methods allow for theoretical certification, they are computationally tractable, in comparison with machine learning methods require small amount of data, and are robustly implementable in real-time on complex physical systems.  相似文献   

5.
局部子空间聚类   总被引:6,自引:1,他引:5  
刘展杰  陈晓云 《自动化学报》2016,42(8):1238-1247
现有子空间聚类方法通常以数据全局线性为前提,将每个样本点表示为其他样本点的线性组合,因而导致常见子空间聚类方法不能很好地应用于非线性数据.为克服全局线性表示的局限,借鉴流形学习思想,用k近邻局部线性表示代替全局线性表示,与稀疏子空间聚类和最小二乘子空间聚类方法相结合,提出局部稀疏子空间聚类和局部最小二乘子空间聚类方法,统称局部子空间聚类方法.在双月形数据、6个图像数据集和4个基因表达数据集上进行实验,实验结果表明该方法是有效的.  相似文献   

6.
Three edge correction methods for (marked) spatio-temporal point processes are proposed. They are all based on the idea of placing an approximated expected behaviour of the process at hand (simulated realisations) outside the study region which interacts with the data during the estimation. These methods are applied to the so-called growth-interaction model. The specific choices of growth function and interaction function made are purely motivated by the forestry applications considered. The parameters of the growth and interaction functions, i.e. the parameters related to the development of the marks, are estimated using the least-squares approach together with the proposed edge corrections. Finally, the edge corrected estimation methods are applied to a data set of Swedish Scots pine.  相似文献   

7.
Given the current expansion of the computer vision field, several applications that rely on extracting biometric information like facial gender for access control, security or marketing purposes are becoming more common. A typical gender classifier requires many training samples to learn as many distinguishable features as possible. However, collecting facial images from individuals is usually a sensitive task, and it might violate either an individual's privacy or a specific data privacy law. In order to bridge the gap between privacy and the need for many facial images for deep learning training, an artificially generated dataset of facial images is proposed. We acquire a pre-trained Style-Generative Adversarial Networks (StyleGAN) generator and use it to create a dataset of facial images. We label the images according to the observed gender using a set of criteria that differentiate the facial features of males and females apart. We use this manually-labelled dataset to train three facial gender classifiers, a custom-designed network, and two pre-trained networks based on the Visual Geometry Group designs (VGG16) and (VGG19). We cross-validate these three classifiers on two separate datasets containing labelled images of actual subjects. For testing, we use the UTKFace and the Kaggle gender dataset. Our experimental results suggest that using a set of artificial images for training produces a comparable performance with accuracies similar to existing state-of-the-art methods, which uses actual images of individuals. The average classification accuracy of each classifier is between 94% and 95%, which is similar to existing proposed methods.  相似文献   

8.
In recent years, data quality issues have attracted wide attentions. Data quality problems are mainly caused by dirty data. Currently, many methods for dirty data management have been proposed, and one of them is entity-based relational database in which one tuple represents an entity. The traditional query optimizations are not suitable for the new entity-based model. Then new query optimizations need to be developed. In this paper, we propose a new query selectivity estimation strategy based on histogram, and focus on solving the overestimation which traditional methods lead to. We prove our approaches are unbiased. The experimental results on both real and synthetic data sets show that our approaches can give good estimates with low error.  相似文献   

9.
Cluster analysis is a useful tool for data analysis. Clustering methods are used to partition a data set into clusters such that the data points in the same cluster are the most similar to each other and the data points in the different clusters are the most dissimilar. The mean shift was originally used as a kernel-type weighted mean procedure that had been proposed as a clustering algorithm. However, most mean shift-based clustering (MSBC) algorithms are used for numeric data. The circular data that are the directional data on the plane have been widely used in data analysis. In this paper, we propose a MSBC algorithm for circular data. Three types of mean shift implementation procedures with nonblurring, blurring and general methods are furthermore compared in which the blurring mean shift procedure is the best and recommended. The proposed MSBC for circular data is not necessary to give the number of cluster. It can automatically find a final cluster number with good clustering centers. Several numerical examples and comparisons with some existing clustering methods are used to demonstrate its effectiveness and superiority of the proposed method.  相似文献   

10.
一种数据约简方法的探讨   总被引:3,自引:0,他引:3  
Rough集理论在数据挖掘中的重要性已日益为大家所重视,也已经发展出一些行之有效的方法如:数据分析法,分明矩阵法。本文从以上两种方法出发,提出简单相异矩阵,并用以进行数据约简。  相似文献   

11.
A new missing data algorithm ARFIL gives good results in spectral estimation. The log likelihood of a multivariate Gaussian random variable can always be written as a sum of conditional log likelihoods. For a complete set of autoregressive AR(p) data the best predictor in the likelihood requires only p previous observations. If observations are missing, the best AR predictor in the likelihood will in general include all previous observations. Using only those observations that fall within a finite time interval will approximate this likelihood. The resulting non-linear estimation algorithm requires no user provided starting values. In various simulations, the spectral accuracy of robust maximum likelihood methods was much better than the accuracy of other spectral estimates for randomly missing data.  相似文献   

12.
石振国  孙景玉 《计算机应用研究》2021,38(5):1520-1523,1528
由于传感器的电池容量和存储容量有限,导致无法持续对传感器进行能量补充并收集传感器生成的感测数据。针对该问题,研究了周期性能量补充和数据收集问题,提出了一种用于充能和数据收集的方法,包括基于网格的算法(GBA)、基于支配集的算法(DSBA)和基于圆相交的算法(CIBA)。通过这三种方法或两两相结合的方法找到锚点集合,通过移动设备调度算法调度最小数量的移动设备来访问生成的锚点。仿真结果验证了所提方法的有效性。与联合能量数据采集(JEDA)算法、最小覆盖圆(SEC)算法相比,所提CIBA需要的移动设备数量最少,总移动距离也最短,具有良好的综合性能。  相似文献   

13.
本文综合了线性表、循环表、矩阵等数据结构的基本概念以及压缩存储、索引存储等存储方式的基本概念,通过矿山计划计算机交互辅助编制系统中遇到的极大量数据的处理问题,导出了压缩循环阵存储的概念及定义。这种存储结构既保持了线性表、循环表的灵活性,又能有效压缩存储空间,而且能象索引存储那样对数据结点快速定位,是对大量分段重复数据地行处理的有效存储结构之一。  相似文献   

14.
Interval methods have been shown to be efficient, robust and reliable to solve difficult set-membership localization problems. However, they are unsuitable in a probabilistic context, where the approximation of an unbounded probability density function by a set cannot be accepted. This paper proposes a new probabilistic approach which makes possible to use classical set-membership localization methods which are robust with respect to outliers. The approach is illustrated on two simulated examples.  相似文献   

15.
ContextAlong with expert judgment, analogy-based estimation, and algorithmic methods (such as Function point analysis and COCOMO), Least Squares Regression (LSR) has been one of the most commonly studied software effort estimation methods. However, an effort estimation model using LSR, a single LSR model, is highly affected by the data distribution. Specifically, if the data set is scattered and the data do not sit closely on the single LSR model line (do not closely map to a linear structure) then the model usually shows poor performance. In order to overcome this drawback of the LSR model, a data partitioning-based approach can be considered as one of the solutions to alleviate the effect of data distribution. Even though clustering-based approaches have been introduced, they still have potential problems to provide accurate and stable effort estimates.ObjectiveIn this paper, we propose a new data partitioning-based approach to achieve more accurate and stable effort estimates via LSR. This approach also provides an effort prediction interval that is useful to describe the uncertainty of the estimates.MethodEmpirical experiments are performed to evaluate the performance of the proposed approach by comparing with the basic LSR approach and clustering-based approaches, based on industrial data sets (two subsets of the ISBSG (Release 9) data set and one industrial data set collected from a banking institution).ResultsThe experimental results show that the proposed approach not only improves the accuracy of effort estimation more significantly than that of other approaches, but it also achieves robust and stable results according to the degree of data partitioning.ConclusionCompared with the other considered approaches, the proposed approach shows a superior performance by alleviating the effect of data distribution that is a major practical issue in software effort estimation.  相似文献   

16.
基于本地差分隐私的用户数据收集与分析算法已延伸到了键-值数据类型.然而,该类数据值域大小与稀疏性以及本地扰动机制直接制约着收集与分析精度.针对现有机制难以有效应对该类数据收集的不足,提出了一种基于直方图技术的有效收集与分析算法HISKV(histogram-based key-value data collection...  相似文献   

17.
苏亮  邹鹏  贾焰 《自动化学报》2008,34(3):360-366
Skyline 查询的结果集为数据集中不被其他对象所``支配'的对象的全体. 近年来, 它在在线服务、决策支持和实时监测等领域的良好应用前景, 使其成为数据管理与数据挖掘领域的研究热点. 实际应用中, 用户通常期望快速、渐进地获得 Skyline 计算结果, 而流数据的连续、海量、高维等特性, 使得在确保查询质量损失受控的前提下挖掘稀疏 Skyline 集合成为一个极具价值和挑战性的问题. 本文首先提出一个新颖的概念: 稀疏 Skyline (Sparse-skyline), 它采用一个 Skyline 对象来代表其周围 ε-邻域内的所有 Skyline 对象; 接着, 给出了通过数据维度之间的相关性来自适应调整查询质量的两个在线算法; 最后, 理论分析和实验结果表明, 与现有的 Skyline 挖掘算法相比, 本文提出的方法具有良好的性能和效率, 更适合于数据流应用.  相似文献   

18.
纪滨 《微机发展》2008,18(2):126-128
随着数据挖掘的兴起,有许多分类和预测的方法。数据挖掘研究的实旌对象多为关系型数据库,这给粗糙集方法的应用带来了极大的方便。关系表可被看作为粗糙集理论中的决策表,而利用粗糙集理论来处理数据挖掘有着传统挖掘工具所不具有的优点。粗糙集理论是一种处理不确定和不精确问题的数学工具,文中通过实例介绍了粗糙集的基本理论,并通过实例详细介绍了在基于对决策表属性约简的基础上采用了可变精度粗糙模型实现规则的获取。该实例说明了对于不完备的信息系统,应用粗糙集理论进行数据挖掘是非常有效的。  相似文献   

19.
将Martin Ester提出的最小一致性覆盖方法应用于不完备规则集的规则提取。首先对不完备数据分别进行两种方法的预处理,然后通过定义数据间的一致性来使规则提取的覆盖问题转化成划分问题。经过UCI上两个数据集的测试,证明了这种方法的有效性。  相似文献   

20.
Exploiting mobile elements (MEs) to accomplish data collection in wireless sensor networks (WSNs) can improve the energy efficiency of sensor nodes, and prolong network lifetime. However, it will lead to large data collection latency for the network, which is unacceptable for data-critical applications. In this paper, we address this problem by minimizing the traveling length of MEs. Our methods mainly consist of two steps: we first construct a virtual grid network and select the minimal stop point set (SPS) from it; then, we make optimal scheduling for the MEs based on the SPS in order to minimize their traveling length. Different implementations of genetic algorithm (GA) are used to solve the problem. Our methods are evaluated by extensive simulations. The results show that these methods can greatly reduce the traveling length of MEs, and decrease the data collection latency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号