首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Sampling is a fundamental method for generating data subsets. As many data analysis methods are deve-loped based on probability distributions, maintaining distributions when sampling can help to ensure good data analysis performance. However, sampling a minimum subset while maintaining probability distributions is still a problem. In this paper, we decompose a joint probability distribution into a product of conditional probabilities based on Bayesian networks and use the chi-square test to formulate a sampling problem that requires that the sampled subset pass the distribution test to ensure the distribution. Furthermore, a heuristic sampling algorithm is proposed to generate the required subset by designing two scoring functions: one based on the chi-square test and the other based on likelihood functions. Experiments on four types of datasets with a size of 60000 show that when the significant difference level,α, is set to 0.05, the algorithm can exclude 99.9%, 99.0%, 93.1% and 96.7% of the samples based on their Bayesian networks—ASIA, ALARM, HEPAR2, and ANDES, respectively. When subsets of the same size are sampled, the subset generated by our algorithm passes all the distribution tests and the average distribution difference is approximately 0.03; by contrast, the subsets generated by random sampling pass only 83.8%of the tests, and the average distribution difference is approximately 0.24.  相似文献   

2.
针对原有集成学习多样性不足而导致的集成效果不够显著的问题,提出一种基于概率校准的集成学习方法以及两种降低多重共线性影响的方法。首先,通过使用不同的概率校准方法对原始分类器给出的概率进行校准;然后使用前一步生成的若干校准后的概率进行学习,从而预测最终结果。第一步中使用的不同概率校准方法为第二步的集成学习提供了更强的多样性。接下来,针对校准概率与原始概率之间的多重共线性问题,提出了选择最优(choose-best)和有放回抽样(bootstrap)的方法。选择最优方法对每个基分类器,从原始分类器和若干校准分类器之间选择最优的进行集成;有放回抽样方法则从整个基分类器集合中进行有放回的抽样,然后对抽样出来的分类器进行集成。实验表明,简单的概率校准集成学习对学习效果的提高有限,而使用了选择最优和有放回抽样方法后,学习效果得到了较大的提高。此结果说明,概率校准为集成学习提供了更强的多样性,其伴随的多重共线性问题可以通过抽样等方法有效地解决。  相似文献   

3.
主动学习通过主动选择要学习的样例进行标注,从而有效地降低学习算法的样本复杂度。针对当前主动学习算法普遍采用的平分版本空间策略,本文提出过半缩减版本空间的策略,这种策略避免了平分版本空间策略所要求的较强假设。基于过半缩减版本空间的策略,本文实现了一种选取具有最大可能性被误分类的样例作为训练样例的启发式主动动学习算法(CBMPMS)。该算法计算版本空间中随机抽取的假设组成的委员会和当前学习器对样例预测的类概率差异的熵,以此作为选择样例的标准。针对UCI数据集的实验表明,该算法能够在大多数数据集上取得比相关研究更好的性能。  相似文献   

4.
Importance sampling is a technique that is commonly used to speed up Monte Carlo simulation of rare events. However, little is known regarding the design of efficient importance sampling algorithms in the context of queueing networks. The standard approach, which simulates the system using an a priori fixed change of measure suggested by large deviation analysis, has been shown to fail in even the simplest network settings. Estimating probabilities associated with rare events has been a topic of great importance in queueing theory, and in applied probability at large. In this article, we analyse the performance of an importance sampling estimator for a rare event probability in a Jackson network. This article carries out strict deadlines to a two-node Jackson network with feedback whose arrival and service rates are modulated by an exogenous finite state Markov process. We have estimated the probability of network blocking for various sets of parameters, and also the probability of missing the deadline of customers for different loads and deadlines. We have finally shown that the probability of total population overflow may be affected by various deadline values, service rates and arrival rates.  相似文献   

5.
In this article, the problem of robust sampled-data H output tracking control is investigated for a class of nonlinear networked systems with stochastic sampling and time-varying norm-bounded uncertainties. For the sake of technical simplicity, only two different sampling periods are considered, their occurrence probabilities are given constants and satisfy Bernoulli distribution, and can be extended to the case with multiple stochastic sampling periods. By the way of an input delay, the probabilistic system is transformed into a stochastic continuous time-delay system. A new linear matrix inequality-based procedure is proposed for designing state-feedback controllers, which would guarantee that the closed-loop networked system with stochastic sampling tracks the output of a given reference model well in the sense of H . Conservatism is reduced by taking the probability into account. Both network-induced delays and packet dropouts have been considered. Finally, an illustrative example is given to show the usefulness and effectiveness of the proposed H output tracking design.  相似文献   

6.
Balanced sampling is a very efficient sampling design when the variable of interest is correlated to the auxiliary variables on which the sample is balanced. A procedure to select balanced samples in a stratified population has previously been proposed. Unfortunately, this procedure becomes very slow as the number of strata increases and it even fails to select samples for some large numbers of strata. A new algorithm to select balanced samples in a stratified population is proposed. This new procedure is much faster than the existing one when the number of strata is large. Furthermore, this new procedure makes it possible to select samples for some large numbers of strata, which was impossible with the existing method. Balanced sampling can then be applied on a highly stratified population when only a few units are selected in each stratum. Finally, this algorithm turns out to be valuable for many applications as, for instance, for the handling of nonresponse.  相似文献   

7.
In this paper, a digital filter bank structure is proposed for the reconstruction of uniformly sampled bandlimited signals from their N-th order nonuniform samples. The proposed filter bank structure is arrived at after incorporating polyphase-domain filtering operations and discrete Fourier transform (DFT) modulation to an existing filter bank framework.In this paper, an idea is also presented, so that uniform samples can be reconstructed from N-th order nonuniform samples using the structures based on recurrent nonuniform sampling. A comparison of the computational complexity and the signal-to-noise ratio (SNR) performance is also given for various structures existing in the literature.  相似文献   

8.
Importance sampling is a technique that is commonly used to speed up Monte Carlo simulation of rare events. The standard approach, which simulates the system using an a priori fixed change of measure, has been shown to fail in even the simplest network settings. Estimating probabilities associated with rare events has been a topic of great importance in queuing theory, and in applied probability at large. In this paper, we estimate the probability of two rare events known as total population overflow and individual buffer overflow in an open Jackson network in which the customers should receive the needed service in a definite deadline. we use parallel computing in implementing the estimator. Moreover, we consider the effect of various network parameters on aforementioned overflow probabilities, and we have also shown that how these parameters affect the probability of missing the deadline.  相似文献   

9.
The sequential probability ratio test is widely used in in-situ monitoring, anomaly detection, and decision making for electronics, structures, and process controls. However, because model parameters for this method, such as the system disturbance magnitudes, and false and missed alarm probabilities, are selected by users primarily based on experience, the actual false and missed alarm probabilities are typically higher than the requirements of the users. This paper presents a systematic method to select model parameters for the sequential probability ratio test by using a cross-validation technique. The presented method can improve the accuracy of the sequential probability ratio test by reducing the false and missed alarm probabilities caused by improper model parameters. A case study of anomaly detection of resettable fuses is used to demonstrate the application of a cross validation method to select model parameters for the sequential probability ratio test.  相似文献   

10.
符永铨  王意洁  周婧 《软件学报》2009,20(3):630-643
针对非结构化P2P 系统中可扩展的快速无偏抽样问题,提出了一种基于多个peer 自适应随机行走的抽样方法SMARW.在该方法中,基于代理随机行走选择一组临时的peer 执行抽样过程,一次产生一组可调数目的抽样节点,提高了抽样速度,选择每次产生的抽样节点作为临时peer 进行新的抽样过程,这种简单的方法可以保证系统具有近似最优的系统负载均衡程度.同时,SMARW 利用自适应的分布式随机行走修正过程提高抽样过程的收敛速度.理论分析和模拟测试表明,SMARW 方法具有较高的无偏抽样能力以及近似最优的系统负载均衡程度.  相似文献   

11.
This paper deals with the state estimation for the systems under measurement noise whose mean and covariance change with Markov transition probabilities. The minimum variance estimate for the state involves consideration of a prohibitively large number of sequences, so that the usual computation method becomes impractical. In the algorithm proposed here, the estimate is calculated with a relatively small number of sequences sampled at random from the set of a large number of sequences. The average risk of the algorithm is shown to converge to the optimal average risk as the number of sampled sequences increases. An ideal sampling probability yielding a very fast convergence is found. The probability is approximated in a minimum mean squared sense by a probability according to which sequences can be sampled sequentially and with great ease. This policy of determination of sampling probability makes it possible to design practical and efficient algorithms. Digital simulation results show a good performance of the proposed algorithm.  相似文献   

12.
Due to the application-specific nature of wireless sensor networks, the sensitivity to coverage and data reporting latency varies depending on the type of applications. In light of this, algorithms and protocols should be application-aware to achieve the optimum use of highly limited resources in sensors and hence to increase the overall network performance. This paper proposes a probabilistic constrained random sensor selection (CROSS) scheme for application-aware sensing coverage with a goal to maximize the network lifetime. The CROSS scheme randomly selects in each round (approximately) k data-reporting sensors which are sufficient for a user/application-specified desired sensing coverage (DSC) maintaining a minimum distance between any pair of the selected k sensors. We exploit the Poisson sampling technique to force the minimum distance. Consequently, the CROSS improves the spatial regularity of randomly selected k sensors and hence the fidelity of satisfying the DSC in each round, and the connectivity among the selected sensors increase. To this end, we also introduce an algorithm to compute the desired minimum distance to be forced between any pair of sensors. Finally, we present the probabilistic analytical model to measure the impact of the Poisson sampling technique on selecting k sensors, along with the optimality of the desired minimum distance computed by the proposed algorithm.  相似文献   

13.
针对大数据环境中存在很多的冗余和噪声数据,造成存储耗费和学习精度差等问题,为有效的选取代表性样本,同时提高学习精度和降低训练时间,提出了一种基于选择性抽样的SVM增量学习算法,算法采用马氏抽样作为抽样方式,抽样过程中利用决策模型来计算样本间的转移概率,然后通过转移概率来决定是否接受样本作为训练数据,以达到选取代表性样本的目的。并与其他SVM增量学习算法做出比较,实验选取9个基准数据集,采用十倍交叉验证方式选取正则化参数,数值实验结果表明,该算法能在提高学习精度的同时,大幅度的减少抽样与训练总时间和支持向量总个数。  相似文献   

14.
The extended union model (EUM) was recently proposed and shown to be effective in handling short time temporal corruption. Because of the computational complexity, the EUM probability can only be computed over groups of consecutive observations (called segments) and recognition can only be performed under N-best re-scoring paradigm. In this paper, we introduce a hidden variable called “pattern of corruption” and re-formulate the extended union model as marginalizing over possible patterns of corruption with likelihood computed via the missing feature theory. We then introduce a recursive relationship between the EUM probabilities of two successive observation sequences that can greatly simplify the EUM probability computation. This makes it possible to compute the EUM probability over a long sequence. Using this recursive relationship, the EUM probability over frames, called the “frame-based EUM” can easily be computed. To simplify the EUM-based recognition, we propose an approximated, dynamic programming-based EUM recognition algorithm, called the Frame-based EUM Viterbi algorithm (FEVA), that performs recognition directly instead of via N-best re-scoring. Experimental results on digit recognition under added impulsive noises show that both the frame-base EUM and the FEVA outperform the segment-based EUM.  相似文献   

15.
《国际计算机数学杂志》2012,89(8):1565-1572
Recently, the estimation of a population quantile has received quite attention. Existing quantile estimators generally assume that values of an auxiliary variable are known for the entire population, and most of them are defined under simple random sampling without replacement. Assuming two-phase sampling for stratification with arbitrary sampling designs in each of the two phases, a new quantile estimator and its variance estimator are defined. The proposed estimators can be used when the population auxiliary information is not available, which is a common situation in practice. Desirable properties such as the unbiasedness are derived. Suggested estimators are compared numerically with an alternative stratification estimator and its variance estimator, and desirable results are observed. Confidence intervals based upon the proposed estimators are also defined, and they are compared via simulation studies with the confidence intervals based upon the stratification estimator. The proposed confidence intervals give desirable coverage probabilities with the smallest interval lengths.  相似文献   

16.
We propose a dynamic model for the evolution of an open animal population that is subject to an environmental catastrophe. The model incorporates a capture-recapture experiment often conducted for studying wildlife population, and enables inferences on the population size and possible effect of the catastrophe. A Bayesian approach is used to model unobserved quantities in the problem as latent variables and Markov chain Monte Carlo (MCMC) is used for posterior computation. Because the particular interrelationship between observed and latent variables negates the feasibility of standard MCMC methods, we propose a hybrid Monte Carlo approach that integrates a Gibbs sampler with the strategies of sequential importance sampling (SIS) and acceptance-rejection (AR) sampling for model estimation. We develop results on how to construct effective proposal densities for the SIS scheme. The approach is illustrated through a simulation study, and is applied to data from a mountain pygmy possum (Burramys Parvus) population that was affected by a bushfire.  相似文献   

17.
ProbLog is a recently introduced probabilistic extension of Prolog (De Raedt, et al. in Proceedings of the 20th international joint conference on artificial intelligence, pp. 2468–2473, 2007). A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defined by the success probability of a query in a randomly sampled program. This paper introduces the theory compression task for ProbLog, which consists of selecting that subset of clauses of a given ProbLog program that maximizes the likelihood w.r.t. a set of positive and negative examples. Experiments in the context of discovering links in real biological networks demonstrate the practical applicability of the approach. Editors: Stephen Muggleton, Ramon Otero, Simon Colton.  相似文献   

18.
The archetypical topology optimization problem concerns designing the layout of material within a given region of space so that some performance measure is extremized. To improve manufacturability and reduce manufacturing costs, restrictions on the possible layouts may be imposed. Among such restrictions, constraining the minimum length scales of different regions of the design has a significant place. Within the density filter based topology optimization framework the most commonly used definition is that a region has a minimum length scale not less than D if any point within that region lies within a sphere with diameter D >?0 that is completely contained in the region. In this paper, we propose a variant of this minimum length scale definition for subsets of a convex (possibly bounded) domain. We show that sets with positive minimum length scale are characterized as being morphologically open. As a corollary, we find that sets where both the interior and the exterior have positive minimum length scales are characterized as being simultaneously morphologically open and (essentially) morphologically closed. For binary designs in the discretized setting, the latter translates to that the opening of the design should equal the closing of the design. To demonstrate the capability of the developed theory, we devise a method that heuristically promotes designs that are binary and have positive minimum length scales (possibly measured in different norms) on both phases for minimum compliance problems. The obtained designs are almost binary and possess minimum length scales on both phases.  相似文献   

19.
基于某外场试验同步采集某装甲车内不同时刻多个位置气体样品的现实需求,依据电磁阀通电开阀断电关阀的工作原理和时间继电器的延时控制原理,运用电磁阀控制一定规格的真空负压采样瓶开始与停止采样、时间继电器控制多条采样线路按预设顺序自动开启采样的方法,采用多点分布式布设方式设计了一款气体采样系统。在预试验过程中,分为分系统测试和系统联调联试两个阶段,对系统工作的稳定性、采样的可靠性进行了全面测试。从测试结果来看,该系统工作稳定、采集的样品可信,样品量能够满足仪器分析需求,采样系统引入的试验误差小,试验结果真实可靠,系统完全能够满足在某行进装甲车内多位置多时刻自动同步采集气体样品的试验需求,为试验顺利实施提供了物质条件,有效解决了密闭空间气体多位置多时刻同步采集难题。  相似文献   

20.
A desirable feature of a global sampling design for estimating forest cover change based on satellite imagery is the ability to adapt the design to obtain precise regional estimates, where a region may be a country, state, province, or conservation area. A sampling design stratified by an auxiliary variable correlated with forest cover change has this adaptability. A global stratified random sample can be augmented by additional sample units within a region selected by the same stratified protocol and the resulting sample constitutes a stratified random sample of the region. Stratified sampling allows increasing the sample size in a region by a few to many additional sample units. The additional sample units can be effectively allocated to strata to reduce the standard errors of the regional estimates, even though these strata were not initially constructed for the objective of regional estimation. A complete coverage map of deforestation within the Brazilian Legal Amazon (BLA) is used as a population to evaluate precision of regional estimates obtained by augmenting a global stratified random sample. The standard errors of the regional estimates for the BLA and states within the BLA obtained from the augmented stratified design were generally smaller than those attained by simple random sampling and systematic sampling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号