首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The generalized likelihood ratio (GLR) test is a widely used method for detecting abrupt changes in linear systems and signals. In this paper the marginalized likelihood ratio (MLR) test is introduced for eliminating three shortcomings of GLR while preserving its applicability and generality. First, the need for a user-chosen threshold is eliminated in MLR. Second, the noise levels need not be known exactly and may even change over time, which means that MLR is robust. Finally, a very efficient exact implementation with linear in time complexity for batch-wise data processing is developed. This should be compared to the quadratic in time complexity of the exact GLR  相似文献   

Control charts based on generalized likelihood ratio test (GLRT) are attractive from both theoretical and practical points of view. Most of the existing works in the literature focusing on the detection of the process mean and variance are almost based on the assumption that the shifts remain constant over time. The case of the patterned mean and variance changes may not be well discussed. In this research, we propose a new control chart which integrates the exponentially weighted moving average (EWMA) procedure with the GLRT statistics to monitor the process with patterned mean and variance shifts. The attractive advantage of our control chart is its reference-free property. Due to the good properties of GLRT and EWMA procedures, our simulation results show that the proposed chart provides quite effective and robust detecting ability for various types of shifts. The implementation of our proposed control chart is illustrated by a real data example from chemical process control.  相似文献   

Hypothesis testing is one of the most significant facets of statistical inference, which like other situations in the real world is definitely affected by uncertain conditions. The aim of this paper is to develop hypothesis testing based on likelihood ratio test in fuzzy environment, where it is supposed that both hypotheses under study and sample data are fuzzy. The main idea is to employ Zadeh’s extension principle. In this regard, a pair of non-linear programming problems is exploited toward obtaining membership function of likelihood ratio test statistic. Afterwards, the membership function is compared with critical value of the test in order to assess acceptability of the fuzzy null hypothesis under consideration. In this step, two distinct procedures are applied. In the first procedure, a ranking method for fuzzy numbers is utilized to make an absolute decision about acceptability of fuzzy null hypothesis. From a different point of view, in the second procedure, membership degrees of fuzzy null hypothesis acceptance and rejection are first derived using resolution identity and then, a relative decision is made on fuzzy null hypothesis acceptance or rejection based on some arbitrary decision rules. Flexibility of the proposed approach in testing fuzzy hypothesis with vague data is presented using some numerical examples.  相似文献   

周典瑞  周莲英 《计算机应用》2013,33(8):2208-2211
针对海量数据下相似重复记录检测算法的低查准率和低效率问题,采用综合加权法和基于字符串长度过滤法对数据集进行相似重复检测。综合加权法通过结合用户经验和数理统计法计算各属性的权重。基于字符串长度过滤法在相似检测过程中利用字符串间的长度差异提前结束编辑距离算法的计算,减少待匹配的记录数。实验结果表明,通过综合加权法计算的权重向量更加全面、准确反映出各属性的重要性,基于字符串的长度过滤法减少了记录间的比对时间,能够有效地解决海量数据的相似重复记录检测问题。  相似文献   

A new method is proposed for identifying clusters in spatial point processes. It relies on a specific ordering of events and the definition of area spacings which have the same distribution as one-dimensional spacings. Then the spatial clusters are detected using a scan statistic adapted to the analysis of one-dimensional point processes. This flexible spatial scan test seems to be very powerful against any arbitrarily-shaped cluster alternative. These results have applications in epidemiological studies of rare diseases.  相似文献   

We introduce a nonparametric test intended for large-scale simultaneous inference in situations where the utility of distribution-free tests is limited because of their discrete nature. Such situations are frequently dealt with in microarray analysis where the number of tests is much larger than the sample size. The proposed test statistic is based on a certain distance between the distributions from which the samples under study are drawn. In a simulation study, the proposed permutation test is compared with permutation counterparts of the t-test and the Kolmogorov–Smirnov test. The usefulness of the proposed test is discussed in the context of microarray gene expression data and illustrated with an application to real datasets.  相似文献   

The asymptotic distribution of the likelihood ratio test statistic in two-sample testing problems for hidden Markov models is derived when allowing for unequal sample sizes as well as for different families of state-dependent distributions. In both cases under regularity conditions the limit distribution is a standard χ2-distribution, and in particular does not depend on the ratio of the distinct sample sizes. In a simulation study, the finite sample properties are investigated, and the methodology is illustrated in an application to modeling the movement of Drosophila larvae.  相似文献   

Microarray data has significant potential in clinical medicine, which always owns a large quantity of genes relative to the samples’ number. Finding a subset of discriminatory genes (features) through intelligent algorithms has been trend. Based on this, building a disease prognosis expert system will bring a great effect on clinical medicine. In addition, the fewer the selected genes are, the less cost the disease prognosis expert system is. So the small gene set with high classification accuracy is what we need. In this paper, a multi-objective model is built according to the analytic hierarchy process (AHP), which treats the classification accuracy absolutely important than the number of selected genes. And a multi-objective heuristic algorithm called MOEDA is proposed to solve the model, which is an improvement of Univariate Marginal Distribution Algorithm. Two main rules are designed, one is ’Higher and Fewer Rule’ which is used for evaluating and sorting individuals and the other is ‘Forcibly Decrease Rule’ which is used for generate potential individuals with high classification accuracy and fewer genes. Our proposed method is tested on both binary-class and multi-class microarray datasets. The results show that the gene set selected by MOEDA not only results in higher accuracies, but also keep a small scale, which cannot only save computational time but also improve the interpretability and application of the result with the simple classification model. The proposed MOEDA opens up a new way for the heuristic algorithms applying on microarray gene expression data.  相似文献   

Varying-coefficient models are popular multivariate nonparametric fitting techniques. When all coefficient functions in a varying-coefficient model share the same smoothing variable, inference tools available include the F-test, the sieve empirical likelihood ratio test and the generalized likelihood ratio (GLR) test. However, when the coefficient functions have different smoothing variables, these tools cannot be used directly to make inferences on the model because of the differences in the process of estimating the functions. In this paper, the GLR test is extended to models of the latter case by the efficient estimators of these coefficient functions. Under the null hypothesis the new proposed GLR test follows the χ2-distribution asymptotically with scale constant and degree of freedom independent of the nuisance parameters, known as Wilks phenomenon. Further, we have derived its asymptotic power which is shown to achieve the optimal rate of convergence for nonparametric hypothesis testing. A simulation study is conducted to evaluate the test procedure empirically.  相似文献   

A paired data set is common in microarray experiments, where the data are often incompletely observed for some pairs due to various technical reasons. In microarray paired data sets, it is of main interest to detect differentially expressed genes, which are usually identified by testing the equality of means of expressions within a pair. While much attention has been paid to testing mean equality with incomplete paired data in previous literature, the existing methods commonly assume the normality of data or rely on the large sample theory. In this paper, we propose a new test based on permutations, which is free from the normality assumption and large sample theory. We consider permutation statistics with linear mixtures of paired and unpaired samples as test statistics, and propose a procedure to find the optimal mixture that minimizes the conditional variances of the test statistics, given the observations. Simulations are conducted for numerical power comparisons between the proposed permutation tests and other existing methods. We apply the proposed method to find differentially expressed genes for a colorectal cancer study.  相似文献   

基因表达谱芯片数据挖掘系统*   总被引:1,自引:0,他引:1  
李荣 《计算机应用研究》2009,26(8):2938-2941
基因芯片是基因组研究的重要工具,其数据分析极大依赖于数据挖掘技术。结合数据挖掘技术和生物信息学研究,设计并实现了若干基因表达谱芯片数据挖掘分析模型及相应的数据挖掘系统,具有良好的收缩性和实体独立性,底层复杂的数据挖掘算法对用户透明。  相似文献   

In microarray processing, the appearance of artifacts, donuts, and irregularly shaped spots is a problem. In current microarray analysis, most approaches stress the segmentation of pixel intensities rather than emphasizing ratio estimators. To avoid segmenting spot target areas and to minimize sensitivity to aberrant pixels, we propose a robust ratio estimator of gene expression via inverse-variance weighting. Moreover, a metric is proposed to evaluate the spot quality. Both the simulation and numerical examples explored reveal that the proposed algorithm is superior to existing approaches with respect to mean square error. The acceptance quality measure recommended confirms the validity of the proposed ratio estimator.  相似文献   

Searching for an effective dimension reduction space is an important problem in regression, especially for high-dimensional data such as microarray data. A major characteristic of microarray data consists in the small number of observations n and a very large number of genes p. This “large p, small n” paradigm makes the discriminant analysis for classification difficult. In order to offset this dimensionality problem a solution consists in reducing the dimension. Supervised classification is understood as a regression problem with a small number of observations and a large number of covariates. A new approach for dimension reduction is proposed. This is based on a semi-parametric approach which uses local likelihood estimates for single-index generalized linear models. The asymptotic properties of this procedure are considered and its asymptotic performances are illustrated by simulations. Applications of this method when applied to binary and multiclass classification of the three real data sets Colon, Leukemia and SRBCT are presented.  相似文献   

In this paper, a novel method for voiced-unvoiced decision within a pitch tracking algorithm is presented. Voiced-unvoiced decision is required for many applications, including modeling for analysis/synthesis, detection of model changes for segmentation purposes and signal characterization for indexing and recognition applications. The proposed method is based on the generalized likelihood ratio test (GLRT) and assumes colored Gaussian noise with unknown covariance. Under voiced hypothesis, a harmonic plus noise model is assumed. The derived method is combined with a maximum a-posteriori probability (MAP) scheme to obtain a pitch and voicing tracking algorithm. The performance of the proposed method is tested using several speech databases for different levels of additive noise and phone speech conditions. Results show that the GLRT is robust to speaker and environmental conditions and performs better than existing algorithms.  相似文献   

Domain testing is designed to detect domain errors that result from a small boundary shift in a path domain. Although many researchers have studied domain testing, automatic domain test data generation for string predicates has seldom been explored. This paper presents a novel approach for the automatic generation of ON–OFF test points for string predicate borders, and describes a corresponding test data generator. Our empirical work is conducted on a set of programs with string predicates, where extensive trials have been done for each string predicate, and the results are analysed using the SPSS tool. Conclusions are drawn that: (i) the approach is promising and effective; (ii) there is a strong linear relationship between the performance of the test generator and the length of target string in the predicate tested; and (iii) initial inputs, no shorter than the target string and with characters generated randomly, may enhance the performance in the test data generation for string predicates. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

DNA microarrays make it possible to study simultaneously the expression of thousands of genes in a biological sample. Univariate clustering techniques have been used to discover target genes with differential expression between two experimental conditions. Because of possible loss of information due to use of univariate summary statistics, it may be more effective to use multivariate statistics. We present multivariate normal mixture model based clustering analyses to detect differential gene expression between two conditions.Deviating from the general mixture model and model-based clustering, we propose mixture models with specific mean and covariance structures that account for special features of two-condition microarray experiments. Explicit updating formulas in the EM algorithm for three such models are derived. The methods are applied to a real dataset to compare the expression levels of 1176 genes of rats with and without pneumococcal middle-ear infection to illustrate the performance and usefulness of this approach. About 10 genes and 20 genes are found to be differentially expressed in a six-dimensional modeling and a bivariate modeling, respectively. Two simulation studies are conducted to compare the performance of univariate and multivariate methods. Depending on data, neither method can always dominate the other. The results suggest that multivariate normal mixture models can be useful alternatives to univariate methods to detect differential gene expression in exploratory data analysis.  相似文献   

In this paper we consider a new fault detection approach that merges the benefits of Gaussian process regression (GPR) with a generalized likelihood ratio test (GLRT). The GPR is one of the most well-known machine learning techniques. It is simpler and generally more robust than other methods. To deal with both high computational costs for large data sets and time-varying dynamics of industrial processes, we consider a reduced and online version of the GPR method. The online reduced GPR (ORGPR) aims to select a reduced set of kernel functions to build the GPR model and apply it for online fault detection based on GLRT chart. Compared with the conventional GPR technique, the proposed ORGPR method has the advantages of improving the computational efficiency by decreasing the dimension of the kernel matrix. The developed ORGPR-based GLRT (ORGPR-based GLRT) could improve the fault detection efficiency since it is able to track the time-varying characteristics of the processes. The fault detection performance of the developed ORGPR-based GLRT method is evaluated using a Tennessee Eastman process. The simulation results show that the proposed method outperforms the conventional GPR-based GLRT technique.  相似文献   

Based on the Karhunen-Loeve expansion, the maximum likelihood ratio test for the stability of sequence of Gaussian random processes is investigated. The likelihood function is based on the first p scores of eigenfunctions in the Karhunen-Loeve expansion for Gaussian random processes. Though the scores are unobservable, we show that the effect of the difference between scores and their estimators is negligible as the sample size tends to infinity. The asymptotic distribution is proved to be the Gumbel extreme value distribution. Under the alternative the test is shown to be consistent. For different choices of p, simulation results show that the test behaves quite well in finite samples. The test procedure is also applied to the annual temperature data of central England. The results show that the temperatures have risen in the last twenty years, however there is no evidence to show that the autocovariance functions of the temperatures have changed among the range of the observations.  相似文献   

We address the problem of detecting “anomalies” in the network traffic produced by a large population of end-users following a distribution-based change detection approach. In the considered scenario, different traffic variables are monitored at different levels of temporal aggregation (timescales), resulting in a grid of variable/timescale nodes. For every node, a set of per-user traffic counters is maintained and then summarized into histograms for every time bin, obtaining a timeseries of empirical (discrete) distributions for every variable/timescale node. Within this framework, we tackle the problem of designing a formal Distribution-based Change Detector (DCD) able to identify statistically-significant deviations from the past behavior of each individual timeseries.  相似文献   

The Weighted Gene Regulatory Network (WGRN) problem consists in pruning a regulatory network obtained from DNA microarray gene expression data, in order to identify a reduced set of candidate elements which can explain the expression of all other genes. Since the problem appears to be particularly hard for general-purpose solvers, we develop a Greedy Randomized Adaptive Search Procedure (GRASP) and refine it with three alternative Path Relinking procedures. For comparison purposes, we also develop a Tabu Search algorithm with a self-adapting tabu tenure. The experimental results show that GRASP performs better than Tabu Search and that Path Relinking significantly contributes to its effectiveness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号