首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
通过模糊数据分析,发现此数据和概率统计中数据有某些联系,得到模糊数据的线性估计结果,对模糊概率及模糊概率回归系数的研究有一定的帮助和借鉴。  相似文献   

2.
Robust Optical Flow Computation Based on Least-Median-of-Squares Regression   总被引:4,自引:1,他引:3  
An optical flow estimation technique is presented which is based on the least-median-of-squares (LMedS) robust regression algorithm enabling more accurate flow estimates to be computed in the vicinity of motion discontinuities. The flow is computed in a blockwise fashion using an affine model. Through the use of overlapping blocks coupled with a block shifting strategy, redundancy is introduced into the computation of the flow. This eliminates blocking effects common in most other techniques based on blockwise processing and also allows flow to be accurately computed in regions containing three distinct motions.A multiresolution version of the technique is also presented, again based on LMedS regression, which enables image sequences containing large motions to be effectively handled.An extensive set of quantitative comparisons with a wide range of previously published methods are carried out using synthetic, realistic (computer generated images of natural scenes with known flow) and natural images. Both angular and absolute flow errors are calculated for those sequences with known optical flow. Displaced frame difference error, used extensively in video compression, is used for those natural scenes with unknown flow. In all of the sequences tested, a comparison with those methods that result in a dense flow field (greater than 80% spatial coverage), show that the LMedS technique produces the least error irrespective of the error measure used.  相似文献   

3.
基于线性形态的时间序列异常模式挖掘   总被引:1,自引:0,他引:1  
对于子序列长度相等的时间序列数据,文章提出了一种基于序列偏离度的异常模式发现方法。与传统的基于某一特定模型的方法不同,该方法首先对子序列进行线性分段,对每一分段计算其斜率,并将斜率进行离散化,离散化后的符号构成的特征序列就代表了原时间子序列的变化趋势,在此基础上,序列偏离度被定义和计算。这种方法不需预先定义模型,同时避免了通过序列间的两两比较来查找异常所带来的时间消耗。  相似文献   

4.
New Algorithm for Computing Cube on Very Large Compressed Data Sets   总被引:2,自引:0,他引:2  
Data compression is an effective technique to improve the performance of data warehouses. Since cube operation represents the core of online analytical processing in data warehouses, it is a major challenge to develop efficient algorithms for computing cube on compressed data warehouses. To our knowledge, very few cube computation techniques have been proposed for compressed data warehouses to date in the literature. This paper presents a novel algorithm to compute cubes on compressed data warehouses. The algorithm operates directly on compressed data sets without the need of first decompressing them. The algorithm is applicable to a large class of mapping complete data compression methods. The complexity of the algorithm is analyzed in detail. The analytical and experimental results show that the algorithm is more efficient than all other existing cube algorithms. In addition, a heuristic algorithm to generate an optimal plan for computing cube is also proposed  相似文献   

5.
Large Scale Kernel Regression via Linear Programming   总被引:1,自引:0,他引:1  
The problem of tolerant data fitting by a nonlinear surface, induced by a kernel-based support vector machine is formulated as a linear program with fewer number of variables than that of other linear programming formulations. A generalization of the linear programming chunking algorithm for arbitrary kernels is implemented for solving problems with very large datasets wherein chunking is performed on both data points and problem variables. The proposed approach tolerates a small error, which is adjusted parametrically, while fitting the given data. This leads to improved fitting of noisy data (over ordinary least error solutions) as demonstrated computationally. Comparative numerical results indicate an average time reduction as high as 26.0% over other formulations, with a maximal time reduction of 79.7%. Additionally, linear programs with as many as 16,000 data points and more than a billion nonzero matrix elements are solved.  相似文献   

6.
Technical Note: Naive Bayes for Regression   总被引:1,自引:0,他引:1  
Frank  Eibe  Trigg  Leonard  Holmes  Geoffrey  Witten  Ian H. 《Machine Learning》2000,41(1):5-25
Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates.This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e., regression) tasks by modeling the probability distribution of the target value with kernel density estimators, and compares it to linear regression, locally weighted linear regression, and a method that produces model trees—decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on real-world datasets it is almost uniformly worse than locally weighted linear regression and model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, while for another it is worse. We also show that standard naive Bayes applied to regression problems by discretizing the target value performs similarly badly. We then present empirical evidence that isolates naive Bayes' independence assumption as the culprit for its poor performance in the regression setting. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.  相似文献   

7.
离群点检测是数据管理领域中的热点问题之一,在医疗诊断、金融诈骗、环境监测等领域中具有广泛的应用。目前,随着传感器等设备在数据采集方面的应用,人们发现数据的不确定性普遍存在。与确定性数据相比,挖掘出不确定数据集中潜在的富有价值的信息变得十分困难。针对上述问题,提出了一种快速的不确定离群点检测算法FODU(Fast Outlier Detection approach on Uncertain data sets)。采用分层次划分思想给出了索引的构建策略,这种索引结构不仅克服了传统索引对多维数据管理的局限性,而且能够被快速地进行空间剪枝;为了快速地挖掘出不确定离群点,提出了高效的过滤方法。该方法通过批量过滤与单点过滤两个过程减少了大量的冗余计算,从而提高了检测效率,为了避免可能世界的空间膨胀,给出了数据对象离群概率值的计算方法。通过实验验证了所提算法的有效性,结果表明,相对于现有研究,该算法可以显著提高不确定离群点的检测效率。  相似文献   

8.
苗宇  苏宏业  褚健 《自动化学报》2009,35(6):707-716
化工厂中过程数据的质量严重影响到来自例如性能监控, 在线优化和控制等活动所获得的效益和性能. 由于许多化工过程通常表现为非线性动态特性, 例如扩展卡尔曼滤波(EKF)和非线性动态数据协调(NDDR)等技术已经被发展出来改进数据的质量. 近期, 迭代非线性动态数据协调(RNDDR)技术已被提出, 该技术结合了EKF和NDDR技术的优点. 但是, RNDDR技术不能够处理具有显著误差的测量值. 本文中, 一种非线性动态系统中迭代的同步数据协调与显著误差检测的支持向量(SV)回归方法被提出. SV回归是一种经验风险和结构风险间的妥协, 并且对于数据协调来说, 其对随机误差和显著误差是鲁棒的.通过将结构风险取代RNDDR中的极大似然估计并使其最小化, 我们的方法不仅可以实现迭代非线性动态数据协调, 还可以同时实现显著误差检测. 本文中的非线性动态系统仿真结果显示出, 所提出的方法在迭代实时估计框架下, 对于非线性动态系统的同步数据协调和显著误差检测是鲁棒、稳定并且精确的. 该方法也可以提供更好的控制性能.  相似文献   

9.
加权稳健支撑向量回归方法   总被引:8,自引:0,他引:8  
张讲社  郭高 《计算机学报》2005,28(7):1171-1177
给出一类基于奇异值软剔除的加权稳健支撑向量回归方法(WRSVR).该方法的基本思想是首先由支撑向量回归方法(SVR)得到一个近似支撑向量回归函数,基于这个近似模型给出了加权SVR目标函数并利用高效的SVR求解技巧得到一个新的近似模型,然后再利用这个新的近似模型重新给出一个加权SVR目标函数并求解得到一个更为精确的近似模型,重复这一过程直至收敛.加权的目的是为了对奇异值进行软剔除.该方法具有思路简捷、稳健性强、容易实现等优点.实验表明,新算法WRSVR比标准SVR方法、稳健支撑向量网(RSVR)方法和加权最小二乘支撑向量机方法(WLS—SVM)更加稳健,算法的逼近精度受奇异值的影响远小于SVM、RSVR和WLS—SVM算法.  相似文献   

10.
 AFM(Atomic Force Microscope,原子力显微镜)图像经常会出现背景倾斜或弯曲。背景倾斜的原因源于探针和样本表面的倾角或XYZ扫描仪带来的弯曲。本文将稳健的MM估计算法应用到AFM图像二维背景拟合中,消除背景的倾斜,并利用fast-s估计算法作为初始化,以缩短计算时间。实验结果表明,与传统方法相比,本方法的AFM图像水平矫正效果更好。  相似文献   

11.
在许多业务应用中,非平衡数据分类问题都会频繁出现,然而这个问题仍未得到很好的解决.除了直接预测数据对应的分类标签,许多应用还可能关心这个预测的准确性有多少.然而,已有的许多研究都主要集中在分类准确度上而忽略分类概率预测值的准确度.为了解决这个问题,提出了一种新的线性回归算法,该算法在广义线性模型的框架下,结合广义极值(generalized extreme value, GEV)分布作为链接函数以及校准损失函数作为目标优化函数,形成凸优化问题,利用广义极值分布的非对称性解决非平衡数据分类问题.另外,由于广义极值分布的形状参数对建模精度有较大影响,还提出了2种参数寻优方法.在实验部分,人工数据集和真实数据集均表明所提算法有着优异的分类性能以及准确的分类概率预测.  相似文献   

12.
在数据密集型计算环境中,数据的海量、高维、分布存储等特点,为数据挖掘算法的设计与实现带来了新的挑战。基于 MapReduce模型提出网格技术与基于密度的方法相结合的离群点挖掘算法,该算法分为两步:Map阶段采用网格技术删除大量不可能成为离群点的正常数据,将代表点信息发送给主节点;Reduce阶段采用基于密度的聚类方法,通过改进其核心对象选取,可以挖掘任意形状的离群点。实验结果表明,在数据密集型计算环境中,该方法能有效的对离群点进行挖掘。  相似文献   

13.
本文介绍了在Excel环境下的三种线性回归模型的构建方法,分析了三种方法的优缺点,讨论了各自的适用范围,不同用户可以根据自身特点和需要来选择使用适合自己的方法。  相似文献   

14.
刘群 《计算机科学》2004,31(Z2):185-186
1引言 随着Internet所提供的信息和服务资源的快速增长,许多强有力的搜索引擎通过基于内容、关键词等方式对Web文挡进行搜索,但是不幸的是所查询的结果并不能使用户满意.聚类分析可以在数据集合特征未知的情况下,使用一种无示教的学习过程,对数据集合分布和聚合特性进行初步了解,但是聚类模型选择的好坏以及聚类结果的准确性都将影响到整个知识发现的质量.  相似文献   

15.
为了削弱陀螺漂移对光纤陀螺寻北精度的影响,在小波阈值消噪的基础上,采用抗差估计处理陀螺信号.首先进行频谱分析,确定相应的多分辨分析尺度及对不同尺度下的高频系数采取的措施.对噪声占主要成份的尺度层其高频系数置零,对噪声与有用信号都存在的尺度层进行小波阈值消噪处理.然后将抗差估计应用于陀螺数据的处理,利用一次抗差估计求得的观测残差再用中位数法求得均方差因子,采用高崩溃污染率的初值辅以IGGⅢ方案迭代解算的混合算法.计算结果表明这种方法能够有效地抵制异常扰动的影响,提高了寻北精(密)度.  相似文献   

16.
胡云  王崇骏  谢俊元  吴骏  周作建 《软件学报》2013,24(11):2710-2720
时序数据集中的社群演化模式是网络行为动力学研究与应用的重要领域.基于社群演化的离群点检测不仅能够发现新颖的异常行为模式,同时也有利于更准确地理解社群的演化趋势.运用成员关于社群隶属关系的变化,提出了社群演化迁移矩阵的概念,研究并揭示了迁移矩阵的若干性质及其与社群结构演化之间的关系.在采用稳健回归M-估计方法进一步优化迁移矩阵降低异常点干扰的同时,对社群演化离群点加以刻画和定义.鉴于复杂网络包含大量随机游走的边缘个体,所定义的离群点综合考虑其在社群中角色的变化和相对于社群总体迁移模式的差异.基于上述思想提出的演化离群点检测算法能够适应各类社群演化趋势,更有效地聚焦和发现大规模社会网络中重要成员的异常演化行为.实验结果表明,所提出的方法能够从大规模社会网络演化序列中发现重要的离群演化模式,并在现实中找到合理的解释.  相似文献   

17.
本文研究在未知有界误差(UBBE:UnknownButBoundedError)假定下考虑模型应用域的回归问题,提出了最佳预测参数的概念,给出了确定最佳预测参数的基本准则和方法。实际案例研究结果显示,同不考虑模型应用域的回归方法相比,新方法能够提高模型的预测精度。  相似文献   

18.
Exploratory data analysis is a widely used technique to determine which factors have the most influence on data values in a multi-way table, or which cells in the table can be considered anomalous with respect to the other cells. In particular, median polish is a simple yet robust method to perform exploratory data analysis. Median polish is resistant to holes in the table (cells that have no values), but it may require many iterations through the data. This factor makes it difficult to apply median polish to large multidimensional tables, since the I/O requirements may be prohibitive. This paper describes a technique that uses median polish over an approximation of a datacube, easing the burden of I/O. The cube approximation is achieved by fitting log-linear models to the data. The results obtained are tested for quality, using a variety of measures. The technique scales to large datacubes and proves to give a good approximation of the results that would have been obtained by median polish in the original data.  相似文献   

19.
基于核密度估计的分布数据流离群点检测   总被引:2,自引:1,他引:2  
基于数据流数据的挖掘算法研究受到了越来越多的重视.针对分布式数据流环境,提出基于核密度估计的分布数据流离群点检测算法.算法将各分布节点上的数据流作为全局数据流的子集,通过分布节点与中心节点的通信,维护基于全局数据流的分布密度估计.各分布节点基于该估计对其上的分布数据流进行离群点检测,从而得到基于全局数据流的离群点集合.对节点之间的交互以及离群点检测算法的细节进行了讨论.通过实验验证了算法的适用性和有效性.  相似文献   

20.
本文介绍了在Excel环境下的三种线性回归模型的构建方法,分析了三种方法的优缺点,讨论了各自的适用范围,不同用户可以根据自身特点和需要来选择使用适合自己的方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号