共查询到20条相似文献,搜索用时 46 毫秒
1.
Neural Computing and Applications - Tackling air pollution has become of utmost importance since the last few decades. Different statistical as well as deep learning methods have been proposed till... 相似文献
2.
Machine Learning - Statistical relational learning (SRL) frameworks are effective at defining probabilistic models over complex relational data. They often use weighted first-order logical rules... 相似文献
3.
This paper introduces a new learning paradigm, called Learning Using Statistical Invariants (LUSI), which is different from the classical one. In a classical paradigm, the learning machine constructs a classification rule that minimizes the probability of expected error; it is data-driven model of learning. In the LUSI paradigm, in order to construct the desired classification function, a learning machine computes statistical invariants that are specific for the problem, and then minimizes the expected error in a way that preserves these invariants; it is thus both data- and invariant-driven learning. From a mathematical point of view, methods of the classical paradigm employ mechanisms of strong convergence of approximations to the desired function, whereas methods of the new paradigm employ both strong and weak convergence mechanisms. This can significantly increase the rate of convergence. 相似文献
4.
Inductive learning is a method for automated knowledge acquisition. It converts a set of training data into a knowledge structure. In the process of knowledge induction, statistical techniques can play a major role in improving performance. In this paper, we investigate the competition and integration between the traditional statistical and the inductive learning methods. First, the competition between these two approaches is examined. Then, a general framework for integrating these two approaches is presented. This framework suggests three possible integrations: (1) statistical methods as preprocessors for inductive learning, (2) inductive learning methods as preprocessors for statistical classification, and (3) the combination of the two methods to develop new algorithms. Finally, empirical evidence concerning these three possible integrations are discussed. The general conclusion is that algorithms integrating statistical and inductive learning concepts are likely to make the most improvement in performance. 相似文献
5.
The demand for development of good quality software has seen rapid growth in the last few years. This is leading to increase in the use of the machine learning methods for analyzing and assessing public domain data sets. These methods can be used in developing models for estimating software quality attributes such as fault proneness, maintenance effort, testing effort. Software fault prediction in the early phases of software development can help and guide software practitioners to focus the available testing resources on the weaker areas during the software development. This paper analyses and compares the statistical and six machine learning methods for fault prediction. These methods (Decision Tree, Artificial Neural Network, Cascade Correlation Network, Support Vector Machine, Group Method of Data Handling Method, and Gene Expression Programming) are empirically validated to find the relationship between the static code metrics and the fault proneness of a module. In order to assess and compare the models predicted using the regression and the machine learning methods we used two publicly available data sets AR1 and AR6. We compared the predictive capability of the models using the Area Under the Curve (measured from the Receiver Operating Characteristic (ROC) analysis). The study confirms the predictive capability of the machine learning methods for software fault prediction. The results show that the Area Under the Curve of model predicted using the Decision Tree method is 0.8 and 0.9 (for AR1 and AR6 data sets, respectively) and is a better model than the model predicted using the logistic regression and other machine learning methods. 相似文献
6.
Lexical databases following the wordnet paradigm capture information about words, word senses, and their relationships. A large number of existing tools and datasets are based on the original WordNet, so extending the landscape of resources aligned with WordNet leads to great potential for interoperability and to substantial synergies. Wordnets are being compiled for a considerable number of languages, however most have yet to reach a comparable level of coverage. We propose a method for automatically producing such resources for new languages based on WordNet, and analyse the implications of this approach both from a linguistic perspective as well as by considering natural language processing tasks. Our approach takes advantage of the original WordNet in conjunction with translation dictionaries. A small set of training associations is used to learn a statistical model for predicting associations between terms and senses. The associations are represented using a variety of scores that take into account structural properties as well as semantic relatedness and corpus frequency information. Although the resulting wordnets are imperfect in terms of their quality and coverage of language-specific phenomena, we show that they constitute a cheap and suitable alternative for many applications, both for monolingual tasks as well as for cross-lingual interoperability. Apart from analysing the resources directly, we conducted tests on semantic relatedness assessment and cross-lingual text classification with very promising results. 相似文献
8.
运动估计对视频编码十分重要,基于参数模型的运动估计方法也越来越受到人们的关注,参数模型的选择是该方法的关键。基于此,提出了基于统计学原理的模型选择方法,它以少量的图像数据流为基础,通过参数估计,并分析各近似模型的预测风险和误差,选出最优模型,它最符合预测对象的实际发展变化规律,进而利用该模型对未知对象进行运动估计。试验结果表明,在对实际图像序列进行运动估计时,这种方法是可靠并且实用的。 相似文献
9.
Multipath is one of the main causes of degraded position accuracy in the Global Navigation Satellite System (GNSS) because portions of the signals can be reflected by high buildings in dense urban areas. Multipath mitigation techniques based on hardware enhancement or signal processing help to improve GNSS accuracy for high-precision surveying. A Geographic Information System (GIS) is also used in the signal propagation model to predict multipath effects. In addition to these existing approaches, we found that spatial statistical methods are useful in multipath mitigation because the multipath produces a unique spatial distribution of user positions. In this paper, we present a spatial statistics-based simulation system for mitigating multipath and improving the accuracy in GNSS positioning. Multipath tends to be associated with spatial outliers in simulated user positions (SUPs) and contributes little to the spatial clustering of SUPs. Using these spatial characteristics, we developed a method called the “satellite participation ratio in outliers versus cluster” (SPROC) to identify multipath satellites. Once the identified multipath satellites are excluded, a user position is determined using a mean spatial center of the SUPs from the remaining satellites. The effects of such multipath mitigation were validated by examining whether the SPROC method correctly identified multipath satellites and by comparing the position errors with and without SPROC. We demonstrated the applicability of our system with a simulation experiment using a precise ephemeris for the Global Positioning System (GPS) and the orbital parameters for the proposed constellations of the GALILEO and the Quasi-Zenith Satellite System (QZSS). In our simulation of Shinjuku ward in Tokyo, the user-equivalent range error (UERE) other than multipath was virtually generated and added to the multipath delay. The pattern of position errors with and without SPROC showed that the improvements in accuracy were considerable at locations close to buildings. 相似文献
10.
The paper deals with the problem of reconstructing a continuous 1D function from discrete noisy samples. The measurements may also be indirect in the sense that the samples may be the output of a linear operator applied to the function. Bayesian estimation provides a unified treatment of this class of problems. We show that a rigorous Bayesian solution can be efficiently implemented by resorting to a Markov chain Monte Carlo (MCMC) simulation scheme. In particular, we discuss how the structure of the problem can be exploited in order to improve the computational and convergence performances. The effectiveness of the proposed scheme is demonstrated on two classical benchmark problems as well as on the analysis of IVGTT (IntraVenous glucose tolerance test) data, a complex identification-deconvolution problem concerning the estimation of the insulin secretion rate following the administration of an intravenous glucose injection 相似文献
12.
Discrete event simulations (DES) provide a powerful means for modeling complex systems and analyzing their behavior. DES capture all possible interactions between the entities they manage, which makes them highly expressive but also compute-intensive. These computational requirements often impose limitations on the breadth and/or depth of research that can be conducted with a discrete event simulation.This work describes our approach for leveraging the vast quantity of computing and storage resources available in both private organizations and public clouds to enable real-time exploration of discrete event simulations. Rather than directly targeting simulation execution speeds, we autonomously generate and execute novel scenario variants to explore a representative subset of the simulation parameter space. The corresponding outputs from this process are analyzed and used by our framework to produce models that accurately forecast simulation outcomes in real time, providing interactive feedback and facilitating exploratory research.Our framework distributes the workloads associated with generating and executing scenario variants across a range of commodity hardware, including public and private cloud resources. Once the models have been created, we evaluate their performance and improve prediction accuracy by employing dimensionality reduction techniques and ensemble methods. To make these models highly accessible, we provide a user-friendly interface that allows modelers and epidemiologists to modify simulation parameters and see projected outcomes in real time. 相似文献
13.
The approach to process monitoring known as multivariate statistical process control (MSPC) has developed as a distinct technology, closely related to the field of fault detection and isolation. A body of technical research and industrial applications indicate a unique applicability to complex large-scale processes, but has paid relatively little attention to generic live process issues. In this paper, the impact of various classes of generic abnormality in the operation of continuous process plants on MSPC monitoring is investigated. It is shown how the effectiveness of the MSPC approach may be understood in terms of model and signal-based fault detection methods, and how the multivariate tools may be configured to maximize their effectiveness. A brief review of MSPC for the process industries is also presented, indicating the current state of the art. 相似文献
14.
Leaf Area Index (LAI) is one of the most important variables characterizing land surface vegetation and dynamics. Many satellite data, such as the Moderate Resolution Imaging Spectroradiometer (MODIS), have been used to generate LAI products. It is important to characterize their spatial and temporal variations by developing mathematical models from these products. In this study, we aim to model MODIS LAI time series and further predict its future values by decomposing the LAI time series of each pixel into several components: trend, intra-annual variations, seasonal cycle, and stochastic stationary or irregular parts. Three such models that can characterize the non-stationary time series data and predict the future values are explored, including Dynamic Harmonics Regression (DHR), STL (Seasonal-Trend Decomposition Procedure based on Loess), and Seasonal ARIMA (AutoRegressive Intergrated Moving Average) (SARIMA). The preliminary results using six years (2001-2006) of the MODIS LAI product indicate that all these methods are effective to model LAI time series and predict 2007 LAI values reasonably well. The SARIMA model gives the best prediction, DHR produces the smoothest curve, and STL is more sensitive to noise in the data. These methods work best for land cover types with pronounced seasonal variations. 相似文献
15.
Cloud computing is a very attractive research topic. Many studies have examined the infrastructure as a service and software as a service aspects of cloud computing; however, few studies have focused on platform as a service (PaaS). According to recent reports, demand for enterprise PaaS solutions will increase continuously. However, different sectors require different types of PaaS applications and computing resources. Therefore, an evaluation and ranking framework for PaaS solutions according to application needs is required. To address this need, this study presents the most essential aspects of PaaS solutions and provides a framework for evaluating the performance of PaaS providers. It also proposes a suitable set of benchmarking algorithms that can help determine the most appropriate PaaS provider based on different resource needs and application requirements. Performance evaluations of three well-known cloud computing PaaS providers were conducted using the analytic hierarchy process and the logic scoring of preference methods. 相似文献
16.
Pattern Analysis and Applications - Time–frequency representations of the speech signals provide dynamic information about how the frequency component changes with time. In order to process... 相似文献
17.
The Journal of Supercomputing - The distributed denial-of-service (DDoS) attack is a security challenge for the software-defined network (SDN). The different limitations of the existing DDoS... 相似文献
18.
The re-identification problem is to match objects across multiple but possibly disjoint fields of view for the purpose of sequential authentication over space and time. Detection and seeding for initialization do not presume known identity and allow for re-identification of objects and/or faces whose identity might remain unknown. Specific functionalities involved in re-identification include clustering and selection, recognition-by-parts, anomaly and change detection, sampling and tracking, fast indexing and search, sensitivity analysis, and their integration for the purpose of identity management. As re-identification processes data streams and involves change detection and on-line adaptation three complementary statistical learning frameworks, driven by randomness for the purpose of robust prediction, are advanced here to support the functionalities listed earlier and their combination thereof. The intertwined learning frameworks employed are those of (a) semi-supervised learning (SSL); (b) transduction; and (c) conformal prediction. The overall architecture proposed is data-driven and modular, on one side, and discriminative and progressive, on the other side. The architecture is built around autonomic computing and W5+. Autonomic computing or self-management provides for closed-loop control. W5+ answers questions related to What data to consider for sampling and collection, When to capture the data and from Where, and How to best process the data. The Who (is) query is about identity for biometrics, and the Why question for explanation purposes. The challenge addressed throughout is that of evidence-based management to progressively collect and add value to data in order to generate knowledge that leads to purposeful and gainful action including active learning for the overall purpose of re-identification. A venue for future research includes adversarial learning when re-identification is possibly “distracted” using deliberate corrupt information. 相似文献
19.
Maintenance technologies have been progressed from a time-based to a condition-based manner. The fundamental idea of condition-based maintenance (CBM) is built on the real-time diagnosis of impending failures and/or the prognosis of residual lifetime of equipment by monitoring health conditions using various sensors. The success of CBM, therefore, hinges on the capability to develop accurate diagnosis/prognosis models. Even though there may be an unlimited number of methods to implement models, the models can normally be classified into two categories in terms of their origins: using physical principles or historical observations. We have focused on the latter method (sometimes referred as the empirical model based on statistical learning) because of some practical benefits such as context-free applicability, configuration flexibility, and customization adaptability. While several pilot-scale systems using empirical models have been applied to work sites in Korea, it should be noted that these do not seem to be generally competitive against conventional physical models. As a result of investigating the bottlenecks of previous attempts, we have recognized the need for a novel strategy for grouping correlated variables such that an empirical model can accept not only statistical correlation but also some extent of physical knowledge of a system. Detailed examples of problems are as follows: (1) missing of important signals in a group caused by the lack of observations, (2) problems of signals with the time delay, and (3) problems of optimal kernel bandwidth. This paper presents an improved statistical learning framework including the proposed strategy and case studies illustrating the performance of the method. 相似文献
20.
Over the period of 1987-1991, a series of theoretical and experimental results have suggested that multilayer perceptrons (MLP) are an effective family of algorithms for the smooth estimation of high-dimension probability density functions that are useful in continuous speech recognition. The early form of this work has focused on hidden Markov models (HMM) that are independent of phonetic context. More recently, the theory has been extended to context-dependent models. The authors review the basic principles of their hybrid HMM/MLP approach and describe a series of improvements that are analogous to the system modifications instituted for the leading conventional HMM systems over the last few years. Some of these methods directly trade off computational complexity for reduced requirements of memory and memory bandwidth. Results are presented on the widely used Resource Management speech database that has been distributed by the US National Institute of Standards and Technology. 相似文献
|