首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A conditional density function, which describes the relationship between response and explanatory variables, plays an important role in many analysis problems. In this paper, we propose a new kernel-based parametric method to estimate conditional density. An exponential function is employed to approximate the unknown density, and its parameters are computed from the given explanatory variable via a nonlinear mapping using kernel principal component analysis (KPCA). We develop a new kernel function, which is a variant to polynomial kernels, to be used in KPCA. The proposed method is compared with the Nadaraya-Watson estimator through numerical simulation and practical data. Experimental results show that the proposed method outperforms the Nadaraya-Watson estimator in terms of revised mean integrated squared error (RMISE). Therefore, the proposed method is an effective method for estimating the conditional densities.  相似文献   

2.
While most previous work in the subject of Bayesian Fault diagnosis and control loop diagnosis use discretized evidence for performing diagnosis (an example of evidence being a monitor reading), discretizing continuous evidence can result in information loss. This paper proposes the use of kernel density estimation, a non-parametric technique for estimating the density functions of continuous random variables. Kernel density estimation requires the selection of a bandwidth parameter, used to specify the degree of smoothing, and a number of bandwidth selection techniques (optimal Gaussian, sample-point adaptive, and smoothed cross-validation) are discussed and compared. Because kernel density estimation is known to have reduced performance in high dimensions, this paper also discusses a number of existing preprocessing methods that can be used to reduce the dimensionality (grouping according to dependence, and independent component analysis). Bandwidth selection and dimensionality reduction techniques are tested on a simulation and an industrial process.  相似文献   

3.
We describe a fast, data-driven bandwidth selection procedure for kernel conditional density estimation (KCDE). Specifically, we give a Monte Carlo dual-tree algorithm for efficient, error-controlled approximation of a cross-validated likelihood objective. While exact evaluation of this objective has an unscalable O(n2) computational cost, our method is practical and shows speedup factors as high as 286,000 when applied to real multivariate datasets containing up to one million points. In absolute terms, computation times are reduced from months to minutes. This enables applications at much greater scale than previously possible. The core idea in our method is to first derive a standard deterministic dual-tree approximation, whose loose deterministic bounds we then replace with tight, probabilistic Monte Carlo bounds. The resulting Monte Carlo dual-tree algorithm exhibits strong error control and high speedup across a broad range of datasets several orders of magnitude greater in size than those reported in previous work. The cost of this high acceleration is the loss of the formal error guarantee of the deterministic dual-tree framework; however, our experiments show that error is still amply controlled by our Monte Carlo algorithm, and the many-order-of-magnitude speedups are worth this sacrifice in the large-data case, where cross-validated bandwidth selection for KCDE would otherwise be impractical.  相似文献   

4.
Standard fixed symmetric kernel-type density estimators are known to encounter problems for positive random variables with a large probability mass close to zero. It is shown that, in such settings, alternatives of asymmetric gamma kernel estimators are superior, but also differ in asymptotic and finite sample performance conditionally on the shape of the density near zero and the exact form of the chosen kernel. Therefore, a refined version of the gamma kernel with an additional tuning parameter adjusted according to the shape of the density close to the boundary is suggested. A data-driven method for the appropriate choice of the modified gamma kernel estimator is also provided. An extensive simulation study compares the performance of this refined estimator to those of standard gamma kernel estimates and standard boundary corrected and adjusted fixed kernels. It is found that the finite sample performance of the proposed new estimator is superior in all settings. Two empirical applications based on high-frequency stock trading volumes and realized volatility forecasts demonstrate the usefulness of the proposed methodology in practice.  相似文献   

5.
A new semiparametric dynamic copula model is proposed where the marginals are specified as parametric GARCH-type processes, and the dependence parameter of the copula is allowed to change over time in a nonparametric way. A straightforward two-stage estimation method is given by local maximum likelihood for the dependence parameter, conditional on consistent first stage estimates of the marginals. First, the properties of the estimator are characterized in terms of bias and variance and the bandwidth selection problem is discussed. The proposed estimator attains the semiparametric efficiency bound and its superiority is demonstrated through simulations. Finally, the wide applicability of the model in financial time series is illustrated, and it is compared with traditional models based on conditional correlations.  相似文献   

6.
A novel non-parametric density estimator is developed based on geometric principles. A penalised centroidal Voronoi tessellation forms the basis of the estimator, which allows the data to self-organise in order to minimise estimate bias and variance. This approach is a marked departure from usual methods based on local averaging, and has the advantage of being naturally adaptive to local sample density (scale-invariance). The estimator does not require the introduction of a plug-in kernel, thus avoiding assumptions of symmetricity and morphology. A numerical experiment is conducted to illustrate the behaviour of the estimator, and it's characteristics are discussed.  相似文献   

7.
A method for designing near-optimal nonlinear classifiers, based on a self-organizing technique for estimating probability density functions when only weak assumptions are made about the densities, is described. The method avoids disadvantages of other existing methods by parametrizing a set of component densities from which the actual densities are constructed. The parameters of the component densities are optimized by a self-organizing algorithm, reducing to a minimum the labeling of design samples. All the required computations are realized with the simple sum-of-product units commonly used in connectionist models. The density approximations produced by the method are illustrated in two dimensions for a multispectral image classification task. The practical use of the method is illustrated by a small speech recognition problem. Related issues of invariant projections, cross-class pooling of data, and subspace partitioning are discussed.  相似文献   

8.
Kernel density estimation is a popular and widely used non-parametric method for data-driven density estimation. Its appeal lies in its simplicity and ease of implementation, as well as its strong asymptotic results regarding its convergence to the true data distribution. However, a major difficulty is the setting of the bandwidth, particularly in high dimensions and with limited amount of data. An approximate Bayesian method is proposed, based on the Expectation-Propagation algorithm with a likelihood obtained from a leave-one-out cross validation approach. The proposed method yields an iterative procedure to approximate the posterior distribution of the inverse bandwidth. The approximate posterior can be used to estimate the model evidence for selecting the structure of the bandwidth and approach online learning. Extensive experimental validation shows that the proposed method is competitive in terms of performance with state-of-the-art plug-in methods.  相似文献   

9.
The polynomial classifier (PC) that takes the binomial terms of reduced subspace features as inputs has shown superior performance to multilayer neural networks in pattern classification. In this paper, we propose a class-specific feature polynomial classifier (CFPC) that extracts class-specific features from class-specific subspaces, unlike the ordinary PC that uses a class-independent subspace. The CFPC can be viewed as a hybrid of ordinary PC and projection distance method. The class-specific features better separate one class from the others, and the incorporation of class-specific projection distance further improves the separability. The connecting weights of CFPC are efficiently learned class-by-class to minimize the mean square error on training samples. To justify the promise of CFPC, we have conducted experiments of handwritten digit recognition and numeral string recognition on the NIST Special Database 19 (SD19). The digit recognition task was also benchmarked on two standard databases USPS and MNIST. The results show that the performance of CFPC is superior to that of ordinary PC, and is competitive with support vector classifiers (SVCs).  相似文献   

10.
A bootstrap-based methodology is developed for parameter estimation and polyspectral density estimation in the case of the approximating model of the underlying stochastic process being non-minimum phase autoregressive-moving-average (ARMA) type, given a finite realisation of a single time series data. The method is based on a minimum phase/maximum phase decomposition of the system function together with a time reversal step for the parameter and polyspectral confidence interval estimation. Simulation examples are provided to illustrate the proposed method.  相似文献   

11.
In this paper we propose a Gaussian-kernel-based online kernel density estimation which can be used for applications of online probability density estimation and online learning. Our approach generates a Gaussian mixture model of the observed data and allows online adaptation from positive examples as well as from the negative examples. The adaptation from the negative examples is realized by a novel concept of unlearning in mixture models. Low complexity of the mixtures is maintained through a novel compression algorithm. In contrast to the existing approaches, our approach does not require fine-tuning parameters for a specific application, we do not assume specific forms of the target distributions and temporal constraints are not assumed on the observed data. The strength of the proposed approach is demonstrated with examples of online estimation of complex distributions, an example of unlearning, and with an interactive learning of basic visual concepts.  相似文献   

12.
When analysing the movements of an animal, a common task is to generate a continuous probability density surface that characterises the spatial distribution of its locations, termed a home range. Traditional kernel density estimation (KDE), the Brownian Bridges kernel method, and time-geographic density estimation are all commonly used for this purpose, although their applicability in some practical situations is limited. Other studies have argued that KDE is inappropriate analysing moving objects, while the latter two methods are only suitable for tracking data collected at frequent enough intervals such that an object’s movement pattern can be adequately represented using a space–time path created by connecting consecutive points. This research formulates and evaluates KDE using generalised movement trajectories approximated by Delaunay triangulation (KDE-DT) as a method for analysing infrequently sampled animal tracking data. In this approach, a DT is constructed from a point pattern of tracking data in order to approximate the network of movement trajectories for an animal. This network represents the generalised movement patterns of an animal rather than its specific, individual trajectories between locations. Then, kernel density estimates are calculated with distances measured using that network. First, this paper describes the method and then applies it to generate a probability density surface for a Florida panther from radio-tracking data collected three times per week. Second, the performance of the technique is evaluated in the context of delineating wildlife home ranges and core areas from simulated animal locational data. The results of the simulations suggest that KDE-DT produces more accurate home range estimates than traditional KDE, which was evaluated with the same data in a previous study. In addition to animal home range analysis, the technique may be useful for characterising a variety of spatial point patterns generated by objects that move through continuous space, such as pedestrians or ships.  相似文献   

13.
The Field Estimator for Arbitrary Spaces (FiEstAS) computes the continuous probability density field underlying a given discrete data sample in multiple, non-commensurate dimensions. The algorithm works by constructing a metric-independent tessellation of the data space based on a recursive binary splitting. Individual, data-driven bandwidths are assigned to each point, scaled so that a constant “mass” M0 is enclosed. Kernel density estimation may then be performed for different kernel shapes, and a combination of balloon and sample point estimators is proposed as a compromise between resolution and variance. A bias correction is evaluated for the particular (yet common) case where the density is computed exactly at the locations of the data points rather than at an uncorrelated set of locations. By default, the algorithm combines a top-hat kernel with M0=2.0 with the balloon estimator and applies the corresponding bias correction. These settings are shown to yield reasonable results for a simple test case, a two-dimensional ring, that illustrates the performance for oblique distributions, as well as for a six-dimensional Hernquist sphere, a fairly realistic model of the dynamical structure of stellar bulges in galaxies and dark matter haloes in cosmological N-body simulations. Results for different parameter settings are discussed in order to provide a guideline to select an optimal configuration in other cases. Source code is available upon request.  相似文献   

14.
为了实时监控重要场所的人群密度、采取有效措施疏散高密度人群,避免人群密度过大而引发事故,造成生命和财产的损失,提出了一种基于完全局部二值模式的人群密度估计方法.该方法提取人群图像的3种局部纹理特征,建立了3-D联合直方图统计模型,用卡方距离最近邻方法对人群密度级别进行分类,实现了特定场景下人群密度的监测.对比实验结果表明了该方法能兼顾任意密度级别人群图像的分类,不仅准确率高,而且实时性好,同时对场景背景具有较强的鲁棒性.  相似文献   

15.
针对噪声分布未知的ARMAX系统,提出了一种自适应非参数噪声密度估计方法,由估计误差动态调整高斯核函数的全局带宽和局部带宽,实现了未知噪声分布密度的自适应估计;通过极小化似然函数,给出了基于噪声密度估计的参数辨识迭代算法,分析了算法的收敛性并给出了算法收敛的充分条件.仿真结果表明本文提出的算法在系统噪声未知时具有较强的抗噪能力和良好的收敛性.  相似文献   

16.
There has been an important emergence of applications in which data arrives in an online time-varying fashion (e.g. computer network traffic, sensor data, web searches, ATM transactions) and it is not feasible to exchange or to store all the arriving data in traditional database systems to operate on it. For this kind of applications, as it is for traditional static database schemes, density estimation is a fundamental block for data analysis. A novel online approach for probability density estimation based on wavelet bases suitable for applications involving rapidly changing streaming data is presented. The proposed approach is based on a recursive formulation of the wavelet-based orthogonal estimator using a sliding window and includes an optimised procedure for reevaluating only relevant scaling and wavelet functions each time new data items arrive. The algorithm is tested and compared using both simulated and real world data.  相似文献   

17.
A statistical classification scheme for a given set of data requires knowledge of the probability distribution of the observations. Traditional approaches to this problem have revolved around chosen various parametric forms for the probability distribution and evaluating these by goodness of fit methods. Among the difficulties with this method are that it is time consuming, it may not lead to satisfactory results and it may lie beyond the statistical expertise of many practitioners. In this paper, the author's consider the use of a recently developed nonparametric probability density estimator in classification schemes with mean squared error loss criterion. Classical parametric approaches are compared to the nonparametric method on simulated data on the basis of the misclassification probability. Real data from the medical and biological sciences are also used to illustrate the usefulness of the nonparametric method.  相似文献   

18.
The problems arising when there are outliers in a data set that follow a circular distribution are considered. A robust estimation of the unknown parameters is obtained using the methods of weighted likelihood and minimum disparity, each of which is defined for a general parametric family of circular data. The class of power divergence and the related residual adjustment function is investigated in order to improve the performance of the two methods which are studied for the Von Mises (circular normal) and the Wrapped Normal distributions. The techniques are illustrated via two examples based on a real data set and a Monte Carlo study, which also enables the discussion of various computational aspects.  相似文献   

19.
In both nonparametric density estimation and regression, the so-called boundary effects, i.e. the bias and variance increase due to one sided data information, can be quite serious. For estimation performed on transformed variables this problem can easily get boosted and may distort substantially the final estimates, and consequently the conclusions. After a brief review of some existing methods a new, straightforward and very simple boundary correction is proposed, applying local bandwidth variation at the boundaries. The statistical behavior is discussed and the performance for density and regression estimation is studied for small and moderate sample sizes. In a simulation study this method is shown to perform very well. Furthermore, it appears to be excellent for estimating the world income distribution, and Engel curves in economics.  相似文献   

20.
A new method of kernel density estimation with a varying adaptive window size is proposed. It is based on the so-called intersection of confidence intervals (ICI) rule. Several examples of the proposed method are given for different types of densities and the quality of the adaptive density estimate is assessed by means of numerical simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号