期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A classification EM algorithm for binned data

Allou Samé 《Computational statistics & data analysis》2006,51(2):466-480

A real-time flaw diagnosis application for pressurized containers using acoustic emissions is described. The pressurized containers used are cylindrical tanks containing fluids under pressure. The surface of the pressurized containers is divided into bins, and the number of acoustic signals emanating from each bin is counted. Spatial clustering of high density bins using mixture models is used to detect flaws. A dedicated EM algorithm can be derived to select the mixture parameters, but this is a greedy algorithm since it requires the numerical computation of integrals and may converge only slowly. To deal with this problem, a classification version of the EM (CEM) algorithm is defined, and using synthetic and real data sets, the proposed algorithm is compared to the CEM algorithm applied to classical data. The two approaches generate comparable solutions in terms of the resulting partition if the histogram is sufficiently accurate, but the algorithm designed for binned data becomes faster when the number of available observations is large enough. 相似文献

2.

基于区间数的不确定性数据聚类算法:UD-OPTICS

吴翠先何少元《计算机工程与科学》2019,41(7):1303-1311

在不确定性数据聚类算法的研究中,普遍需要假设不确定性数据服从某种分布,继而获得表示不确定性数据的概率密度函数或概率分布函数,然而这种假设很难保证与实际应用系统中的不确定性数据分布一致。现有的基于密度的算法对初始参数敏感,在对密度不均匀的不确定性数据聚类时,无法发现任意密度的类簇。鉴于这些不足,提出基于区间数的不确定性数据对象排序识别聚类结构算法(UD-OPTICS)。该算法利用区间数理论,结合不确定性数据的相关统计信息来更加合理地表示不确定性数据,提出了低计算复杂度的区间核心距离与区间可达距离的概念与计算方法,将其用于度量不确定性数据间的相似度,拓展类簇与对象排序识别聚类结构。该算法可很好地发现任意密度的类簇。实验结果表明,UD-OPTICS算法具有较高的聚类精度和较低的复杂度。相似文献

3.

Improved integral equation solution for the first passage time of leaky integrate-and-fire neurons

Dong Y Mihalas S Niebur E 《Neural computation》2011,23(2):421-434

An accurate calculation of the first passage time probability density (FPTPD) is essential for computing the likelihood of solutions of the stochastic leaky integrate-and-fire model. The previously proposed numerical calculation of the FPTPD based on the integral equation method discretizes the probability current of the voltage crossing the threshold. While the method is accurate for high noise levels, we show that it results in large numerical errors for small noise. The problem is solved by analytically computing, in each time bin, the mean probability current. Efficiency is further improved by identifying and ignoring time bins with negligible mean probability current. 相似文献

4.

Histogram-based embedding for learning on statistical manifolds

Yue Zhang Chuancai Liu Jian Zou 《Pattern Analysis & Applications》2016,19(1):21-40

A novel binning and learning framework is presented for analyzing and applying large data sets that have no explicit knowledge of distribution parameterizations, and can only be assumed generated by the underlying probability density functions (PDFs) lying on a nonparametric statistical manifold. For models’ discretization, the uniform sampling-based data space partition is used to bin flat-distributed data sets, while the quantile-based binning is adopted for complex distributed data sets to reduce the number of under-smoothed bins in histograms on average. The compactified histogram embedding is designed so that the Fisher–Riemannian structured multinomial manifold is compatible to the intrinsic geometry of nonparametric statistical manifold, providing a computationally efficient model space for information distance calculation between binned distributions. In particular, without considering histogramming in optimal bin number, we utilize multiple random partitions on data space to embed the associated data sets onto a product multinomial manifold to integrate the complementary bin information with an information metric designed by factor geodesic distances, further alleviating the effect of over-smoothing problem. Using the equipped metric on the embedded submanifold, we improve classical manifold learning and dimension estimation algorithms in metric-adaptive versions to facilitate lower-dimensional Euclidean embedding. The effectiveness of our method is verified by visualization of data sets drawn from known manifolds, visualization and recognition on a subset of ALOI object database, and Gabor feature-based face recognition on the FERET database. 相似文献

5.

基于联系数的位置不确定性数据UCNK-Means聚类算法

王骏黄德才《计算机科学》2016,43(Z11):436-442

摘要位置不确定性数据的聚类是一个新的不确定性数据聚类问题。其聚类方法主要包括获取对象的概率密度函数,通过积分计算对象间的期望距离来进行聚类分析和以区间数表示对象,通过区间数的系列运算来进行聚类分析这两大类。前者存在概率密度函数获取困难、计算复杂、实用性不强的缺陷;后者在区间数转化为实数过程中,忽略了区间数变化范围对聚类效果的影响,其聚类质量不佳。鉴于此,提出一种基于联系数的不确定对象聚类新算法UCNK-Means。该算法用联系数巧妙地表示不确定性对象,并专门定义了对象间的联系距离,运用联系数态势值比较联系距离大小,克服了现有算法的不足。仿真实验表明,UCNK-Means具有聚类精度高、计算复杂度低、实用性强的特点。相似文献

6.

Efficiently extendible mappings for balanced data distribution 总被引：2，自引：0，他引：2

D. M. Choy R. Fagin L. Stockmeyer 《Algorithmica》1996,16(2):215-232

In data storage applications, a large collection of consecutively numbered data “buckets” are often mapped to a relatively small collection of consecutively numbered storage “bins.” For example, in parallel database applications, buckets correspond to hash buckets of data and bins correspond to database nodes. In disk array applications, buckets correspond to logical tracks and bins correspond to physical disks in an array. Measures of the “goodness” of a mapping method include:

Thetime (number of operations) needed to compute the mapping.
Thestorage needed to store a representation of the mapping.
Thebalance of the mapping, i.e., the extent to which all bins receive the same number of buckets.
The cost ofrelocation, that is, the number of buckets that must be relocated to a new bin if a new mapping is needed due to an expansion of the number of bins or the number of buckets.

One contribution of this paper is to give a new mapping method, theInterval-Round-Robin (IRR) method. The IRR method has optimal balance and relocation cost, and its time complexity and storage requirements compare favorably with known methods. Specifically, ifm is the number of times that the number of bins and/or buckets has increased, then the time complexity isO(logm) and the storage isO(m ²). Another contribution of the paper is to identify the concept of ahistory-independent mapping, meaning informally that the mapping does not “remember” the past history of expansions to the number of buckets and bins, but only the current number of buckets and bins. Thus, such mappings require very little information to be stored. Assuming that balance and relocation are optimal, we prove that history-independent mappings are possible if the number of buckets is fixed (so only the number of bins can increase), but not possible if the number of bins and buckets can both increase. 相似文献

7.

Variable-sized object packing and its applications to instruction cache design

Rung-Bin Lin 《Computers & Electrical Engineering》2008,34(5):438-444

In this paper we study the problem of packing a sequence of objects into bins. The objects are all either divisible or indivisible and occur in accordance with a certain probability distribution. We would like to find the average number of entries wasted in a bin if objects are indivisible and the probability of splitting the last object in a bin if objects are divisible. We solve this problem under a unified formulation by modeling a packing process as a Markov chain whose state transition probabilities are derived from an application of the partitions of integers. An application of this study to instruction cache design shows that a line size of 16 bytes has minimized the probability of splitting the last ×86 instruction in a cache line. For micro-op cache design, a line size of four entries has minimized the number of entries wasted per line. 相似文献

8.

Quantization-based clustering algorithm

Zhiwen Yu Author Vitae Hau-San Wong Author Vitae 《Pattern recognition》2010,43(8):2698-2711

In this paper, a quantization-based clustering algorithm (QBCA) is proposed to cluster a large number of data points efficiently. Unlike previous clustering algorithms, QBCA places more emphasis on the computation time of the algorithm. Specifically, QBCA first assigns the data points to a set of histogram bins by a quantization function. Then, it determines the initial centers of the clusters according to this point distribution. Finally, QBCA performs clustering at the histogram bin level, rather than the data point level. We also propose two approaches to improve the performance of QBCA further: (i) a shrinking process is performed on the histogram bins to reduce the number of distance computations and (ii) a hierarchical structure is constructed to perform efficient indexing on the histogram bins. Finally, we analyze the performance of QBCA theoretically and experimentally and show that the approach: (1) can be easily implemented, (2) identifies the clusters effectively and (3) outperforms most of the current state-of-the-art clustering approaches in terms of efficiency. 相似文献

9.

Minimum Cross-Entropy Approximation for Modeling of Highly Intertwining Data Sets at Subclass Levels

Qiuming Zhu 《Journal of Intelligent Information Systems》1998,11(2):139-152

We study the problem of how to accurately model the data sets that contain a number of highly intertwining sets in terms of their spatial distributions. Applying the Minimum Cross-Entropy minimization technique, the data sets are placed into a minimum number of subclass clusters according to their high intraclass and low interclass similarities. The method leads to a derivation of the probability density functions for the data sets at the subclass levels. These functions then, in combination, serve as an approximation to the underlying functions that describe the statistical features of each data set. 相似文献

10.

Exploiting user behavior learning for personalized trajectory recommendations

Xiao PAN Lei WU Fenjie LONG Ang MA 《Frontiers of Computer Science》2022,16(3):163610

相似文献

11.

The min-conflict packing problem

Ali KhanaferFrançois Clautiaux Saïd HanafiEl-Ghazali Talbi 《Computers & Operations Research》2012,39(9):2122-2132

In the classical bin-packing problem with conflicts (BPC), the goal is to minimize the number of bins used to pack a set of items subject to disjunction constraints. In this paper, we study a new version of BPC: the min-conflict packing problem (MCBP), in which we minimize the number of violated conflicts when the number of bins is fixed. In order to find a tradeoff between the number of bins used and the violation of the conflict constraints, we also consider a bi-objective version of this problem. We show that the special structure of its Pareto front allows to reformulate the problem as a small set of MCBP. We solved these two problems through heuristics, column-generation methods, and a tabu search. Computational experiments are reported to assess the quality of our methods. 相似文献

12.

高斯混合概率假设密度无序估计分布式融合

孔云波冯新喜乔向东刘钊《控制理论与应用》2015,32(4):464-471

针对分布式传感器网络中多目标随机集状态混合无序估计问题,本文提出了一种基于高斯混合概率假设密度无序估计分布式融合算法.在高斯混合概率假设密度滤波器的框架下,首先基于概率假设密度递推滤波特性,建立适用于多目标随机集状态混合无序估计的最新可利用估计判别机制,然后利用扩展协方差交叉融合算法对经过最新可利用估计判别机制获得的无序概率假设密度强度估计进行融合处理,针对融合过程中高斯分量快速增长的问题,在保证信息损失最小的前提下,对融合过程的不同环节实施高斯混合分量裁剪操作,给出了一种多级分层分量裁剪算法.最后,仿真实验验证了文中所提的算法的有效性和可行性. 相似文献

13.

非均匀杂波中的直接变换域STAP算法研究

程向娇《计算机工程与应用》2015,51(5):217-221

降维空时自适应处理和降秩空时自适应处理均需要已知杂波协方差矩阵,或者通过参考单元对杂波协方差矩阵进行估计,在非均匀杂波环境中无法获取足够多的有效样本,使得算法性能急剧下降。提出一种直接变换域空时自适应处理算法,不需要对杂波协方差矩阵估计,在角度-多普勒域中的通道之间进行自适应处理,采用最小均方准则求解,能够适用于非均匀杂波环境中目标检测。仿真结果表明,提出的算法相比于直接数据域算法,抑制杂波能力更强,对于目标角度偏差更具稳健性。相似文献

14.

Analysis of voice and low-priority data traffic by means of brisk periods and slack periods

Y.K. Tham J.N.P. Hume 《Computer Communications》1983,6(1):14-22

Differentiating data traffic in an integrated voice and data network into high and low priority classes, it is noted that spare voice transmission capacity may be used to transmit low priority data packets, which can tolerate large delays. This paper introduces the notion of brisk periods and slack periods for studying the fluctuation in the number of voice calls in progress and the number of customers, in general, in a multiserver queue with Poisson arrivals and exponential service times. Recurrence relations are derived for the Laplace-Stieltjes transform of the probability density functions and the mean and variance of the length of brisk periods and slack periods of voice calls and the mean and variance of the increase in low priority data packets during brisk periods of voice calls. Computation of spare transmission capacity available during slack periods of voice calls is outlined. 相似文献

15.

基于维度扩展的Radviz可视化聚类分析方法

周芳芳李俊材黄伟王俊韡赵颖《软件学报》2016,27(5):1127-1139

Radviz是一种多维数据可视化技术,它通过径向投影机制将多维数据映射到低维空间,使具有相似特征的数据点投影到相近位置,从而形成可视化聚类效果.Radviz圆周上的维度排列顺序对数据投影结果影响很大,提出将原始维度划分为多个新维度来拓展Radviz圆周上的维度排序空间,从而获得比原始维度条件下更好的可视化聚类效果.该维度划分方法首先计算数据在每个原始维度的概率分布直方图,然后使用均值漂移算法对直方图进行划分,最后根据划分结果将原始维度扩展为多个新维度.提出使用Dunn指数和正确率来量化评估Radviz可视化聚类效果.进行了多组对比实验,结果表明,维度扩展有利于多维数据在Radviz投影中获得更好的可视化聚类效果. 相似文献

16.

基于高斯混合模型的非平衡数据对称翻转算法

陈刚王丽娟《信息与控制》2020,(2):203-209,218

针对传统分类器对于非平衡数据的分类效果存在的问题,提出了一种基于高斯混合模型-期望最大化(GMM-EM)的对称翻转算法.该算法的核心思想是基于概率论中的"3σ法则"使数据达到平衡.首先,利用高斯混合模型和EM算法得到多数类与少数类数据的密度函数;其次,以少数类数据的均值为对称中心,根据"3σ法则"确定多数类侵入少数类的翻转边界,进行数据翻转,同时剔除与翻转区间中少数类原始数据数据重复的点;此时,若两类数据不平衡,则在翻转区域内使用概率密度增强方法使数据达到平衡.最后,从UCI、KEEL数据库中选取的14组数据使用决策树分类器对平衡后的数据进行分类,实例分析表明了该算法的有效性. 相似文献

17.

基于mean-shift算法的目标跟踪方法

叶佳张建秋《传感技术学报》2006,19(6):2621-2624,2629

针对雷达多目标跟踪提出一种基于mean-shift[算法的目标跟踪方法.首次将mean-shift的方法应用于目标的数据关联,找出源于目标的观测值后对其进行Kalman滤波,从而估计出目标运动的轨迹,实现目标跟踪.MST跳出传统思维框架,首次利用概率密度分布的不同来区分服从不同参数分布的数据,从整体上对观测数据进行整合再结合最邻近法完成数据关联,该方法具有计算速度快,跟踪效果好的特点. 相似文献

18.

改进型平均移位柱状图估算概率密度并对互信息作相关分析 总被引：1，自引：0，他引：1

韩敏梁志平《控制理论与应用》2011,28(6):845-850

将平均移位柱状图(averaged shifted histogram,ASH)概率密度估计中二次型平滑权值与均匀权值进行结合,提出一种改进的概率密度估计方法:IASH(improved averaged shifted histogram).通过相应区间内样本数目的方差确定原平滑权值与均匀权值之间的比例系数,动态的改变平滑权值:对ASH概率密度估计中边缘值部分的平滑权值按比例进行补偿,改善过平滑的问题,提高了IASH概率密度估计的精度.在此基础上应用互信息进行变量间的相关性分析,选择输入变量,实现多元时间序列的预测.采用人工数据和实际Housing数据进行仿真分析,仿真结果验证了改进后方法的有效性. 相似文献

19.

PALIREL,a Computer Program for Analyzing Particle-to-Membrane Relations,with Emphasis on Electron Micrographs of Immunocytochemical Preparations and Gold Labeled Molecules

《Computers and biomedical research》1999,32(2):93-122

Many vital substances, such as receptors, transporters, and ion channels, in cells occur associated with membranes. To an increasing extent their precise localization is demonstrated by immunocytochemical methods including labeling with gold particles followed by electron microscopy. PALIREL has primarily been developed to facilitate such research, enabling rapid analysis of topographic relations of particles (gold or others) to neighboring linear interfaces (membranes). After digitization of membranes and particles, the program particularly allows computation of (1) the particle number and number per unit length of membrane, in individual bins (membrane lengths) interactively defined along the membrane; (2) the distance of each particle from the membrane; (3) the particle number, and the density (number per μm²), in zones defined along (over and under) the membrane; and (4) the particle number and density in “zonebins” resulting from zones and bins being defined simultaneously. If there occurs, somewhere in the membrane, a segment of different nature, such as a synapse, the quantitative data may be had separately for that and the adjoining parts of the membrane. PALIREL allows interactive redefinition of bins, zones, or objects (particle-line files) while other definitions are retained. The results can be presented on the screen as tables and histograms and be printed on request. A dedicated graphic routine permits inspection on screen of lines, particles, zones, and bins. PALIREL is equally applicable to biological investigations of other kinds, in which the topographic relations of points (structures represented as points) to lines (boundaries) are to be examined. PALIREL is available from the authors on a noncommercial basis. 相似文献

20.

Bayesian density estimation from grouped continuous data

Philippe Lambert Paul H.C. Eilers 《Computational statistics & data analysis》2009,53(4):1388-1399

Grouped data occur frequently in practice, either because of limited resolution of instruments, or because data have been summarized in relatively wide bins. A combination of the composite link model with roughness penalties is proposed to estimate smooth densities from such data in a Bayesian framework. A simulation study is used to evaluate the performances of the strategy in the estimation of a density, of its quantiles and first moments. Two illustrations are presented: the first one involves grouped data of lead concentration in the blood and the second one the number of deaths due to tuberculosis in The Netherlands in wide age classes. 相似文献