首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
A paired data set is common in microarray experiments, where the data are often incompletely observed for some pairs due to various technical reasons. In microarray paired data sets, it is of main interest to detect differentially expressed genes, which are usually identified by testing the equality of means of expressions within a pair. While much attention has been paid to testing mean equality with incomplete paired data in previous literature, the existing methods commonly assume the normality of data or rely on the large sample theory. In this paper, we propose a new test based on permutations, which is free from the normality assumption and large sample theory. We consider permutation statistics with linear mixtures of paired and unpaired samples as test statistics, and propose a procedure to find the optimal mixture that minimizes the conditional variances of the test statistics, given the observations. Simulations are conducted for numerical power comparisons between the proposed permutation tests and other existing methods. We apply the proposed method to find differentially expressed genes for a colorectal cancer study.  相似文献   

2.
The relationship between two sets of variables defined for the same individuals can be evaluated by the RV coefficient. However, it is impossible to assess by the RV value alone whether or not the two sets of variables are significantly correlated, which is why a test is required. Asymptotic tests do exist but fail in many situations, hence the interest in permutation tests. However, the main drawbacks of the permutation tests are that they are time consuming. It is therefore interesting to approximate the permutation distribution with continuous distributions (without doing any permutation). The current approximations (normal approximation, a log-transformation and Pearson type III approximation) are discussed and a new one is described: an Edgeworth expansion. Finally, these different approximations are compared for both simulations and for a sensory example.  相似文献   

3.
The relationship between two sets of variables defined for the same individuals can be evaluated by the RV coefficient. However, it is impossible to assess by the RV value alone whether or not the two sets of variables are significantly correlated, which is why a test is required. Asymptotic tests do exist but fail in many situations, hence the interest in permutation tests. However, the main drawbacks of the permutation tests are that they are time consuming. It is therefore interesting to approximate the permutation distribution with continuous distributions (without doing any permutation). The current approximations (normal approximation, a log-transformation and Pearson type III approximation) are discussed and a new one is described: an Edgeworth expansion. Finally, these different approximations are compared for both simulations and for a sensory example.  相似文献   

4.
The ANOVA method and permutation tests, two heritages of Fisher, have been extensively studied. Several permutation strategies have been proposed by others to obtain a distribution-free test for factors in a fixed effect ANOVA (i.e., single error term ANOVA). The resulting tests are either approximate or exact. However, there exists no universal exact permutation test which can be applied to an arbitrary design to test a desired factor. An exact permutation strategy applicable to fixed effect analysis of variance is presented. The proposed method can be used to test any factor, even in the presence of higher-order interactions. In addition, the method has the advantage of being applicable in unbalanced designs (all-cell-filled), which is a very common situation in practice, and it is the first method with this capability. Simulation studies show that the proposed method has an actual level which stays remarkably close to the nominal level, and its power is always competitive. This is the case even with very small datasets, strongly unbalanced designs and non-Gaussian errors. No other competitor show such an enviable behavior.  相似文献   

5.
In this paper, we study permutation flowshop problems with minimal and/or maximal time lags, where the time lags are defined between couples of successive operations of jobs. Such constraints may be used to model various industrial situations, for instance the production of perishable products. We present theoretical results concerning two-machine cases, we prove that the two-machine permutation flowshop with constant maximal time lags is strongly NP-hard. We develop an optimal branch and bound procedure to solve the mm-machine permutation flowshop problem with minimal and maximal time lags. We test several lower bounds and heuristics providing upper bounds on different classes of benchmarks, and we carry out a performance analysis.  相似文献   

6.
Generalized additive models (GAMs) have distinct advantages over generalized linear models as they allow investigators to make inferences about associations between outcomes and predictors without placing parametric restrictions on the associations. The variable of interest is often smoothed using a locally weighted scatterplot smoothing (LOESS) and the optimal span (degree of smoothing) can be determined by minimizing the Akaike Information Criterion (AIC). A natural hypothesis when using GAMs is to test whether the smoothing term is necessary or if a simpler model would suffice. The statistic of interest is the difference in deviances between models including and excluding the smoothed term. As approximate chi-square tests of this hypothesis are known to be biased, permutation tests are a reasonable alternative. We compare the type I error rates of the chi-square test and of three permutation test methods using synthetic data generated under the null hypothesis. In each permutation method a distribution of differences in deviances is obtained from 999 permuted datasets and the null hypothesis is rejected if the observed statistic falls in the upper 5% of the distribution. One test is a conditional permutation test using the optimal span size for the observed data; this span size is held constant for all permutations. This test is shown to have an inflated type I error rate. Alternatively, the span size can be fixed a priori such that the span selection technique is not reliant on the observed data. This test is shown to be unbiased; however, the choice of span size is not clear. A third method is an unconditional permutation test where the optimal span size is selected for observed and permuted datasets. This test is unbiased though computationally intensive.  相似文献   

7.
《Ergonomics》2012,55(10):1037-1052
This study investigated the relationship between cognitive abilities and driving behaviour in situations of normal driving and hazardous driving. For driving behaviour, driving component skills were measured from two different types of driving situations such as normal driving situations and hazardous driving situations. Normal driving skills were evaluated through an on-street driving test, where search, speed control, and direction control were considered as driving component skills. Hazardous driving component skills were evaluated using a standardized video driving paradigm. Component skills that were evaluated in this paradigm were search, identify, predict, decide, and execute. A battery of predictive tests was administered to the participants. Forty-two students from high schools participated in this study. Analysis of multiple regression implied that the measure of dynamic visual signal perception introduced to this study could be used as a predictor of driving performance for both situations. Driving component skills showed different effects between normal situations and hazardous situations.  相似文献   

8.
The microarray is an important and powerful tool for prescreening of genes for further research. However, alternative solutions are needed to increase power in small microarray experiments. Use of traditional parametric and even non-parametric tests for such small experiments lack power and have distributional problems. A mixture model is described that is performed directly on expression differences assuming that genes in alternative treatments are expressed or not in all combinations (i) not expressed in either condition, (ii) expressed only under the first condition, (iii) expressed only under the second condition, and (iv) expressed under both conditions, giving rise to 4 possible clusters with two treatments. The approach is termed a Mean-Difference-Mixture-Model (MD-MM) method. Accuracy and power of the MD-MM was compared to other commonly used methods, using both simulations, microarray data, and quantitative real time PCR (qRT-PCR). The MD-MM was found to be generally superior to other methods in most situations. The advantage was greatest in situations where there were few replicates, poor signal to noise ratios, or non-homogeneous variances.  相似文献   

9.
Massive spatio-temporal data have been collected from the earth observation systems for monitoring the changes of natural resources and environment. To find the interesting dynamic patterns embedded in spatio-temporal data, there is an urgent need for detecting spatio-temporal clusters formed by objects with similar attribute values occurring together across space and time. Among different clustering methods, the density-based methods are widely used to detect such spatio-temporal clusters because they are effective for finding arbitrarily shaped clusters and rely on less priori knowledge (e.g. the cluster number). However, a series of user-specified parameters is required to identify high-density objects and to determine cluster significance. In practice, it is difficult for users to determine the optimal clustering parameters; therefore, existing density-based clustering methods typically exhibit unstable performance. To overcome these limitations, a novel density-based spatio-temporal clustering method based on permutation tests is developed in this paper. High-density objects and cluster significance are determined based on statistical information on the dataset. First, the density of each object is defined based on the local variance and a fast permutation test is conducted to identify high-density objects. Then, a proposed two-stage grouping strategy is implemented to group high-density objects and their neighbors; hence, spatio-temporal clusters are formed by minimizing the inhomogeneity increase. Finally, another newly developed permutation test is conducted to evaluate the cluster significance based on the cluster member permutation. Experiments on both simulated and meteorological datasets show that the proposed method exhibits superior performance to two state-of-the-art clustering methods, i.e., ST-DBSCAN and ST-OPTICS. The proposed method can not only identify inherent cluster patterns in spatio-temporal datasets, but also greatly alleviates the difficulty in selecting appropriate clustering parameters.  相似文献   

10.
The permutation flow shop scheduling is a well-known combinatorial optimization problem that arises in many manufacturing systems. Over the last few decades, permutation flow shop problems have widely been studied and solved as a static problem. However, in many practical systems, permutation flow shop problems are not really static, but rather dynamic, where the challenge is to schedule n different products that must be produced on a permutation shop floor in a cyclical pattern. In this paper, we have considered a make-to-stock production system, where three related issues must be considered: the length of a production cycle, the batch size of each product, and the order of the products in each cycle. To deal with these tasks, we have proposed a genetic algorithm based lot scheduling approach with an objective of minimizing the sum of the setup and holding costs. The proposed algorithm has been tested using scenarios from a real-world sanitaryware production system, and the experimental results illustrates that the proposed algorithm can obtain better results in comparison to traditional reactive approaches.  相似文献   

11.
There are many sources of systematic variations in cDNA microarray experiments which affect the measured gene expression levels. Print-tip lowess normalization is widely used in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. However, print-tip lowess normalization performs poorly in situations where error variability for each gene is heterogeneous over intensity ranges. We first develop support vector machine quantile regression (SVMQR) by extending support vector machine regression (SVMR) for the estimation of linear and nonlinear quantile regressions, and then propose some new print-tip normalization methods based on SVMR and SVMQR. We apply our proposed normalization methods to previous cDNA microarray data of apolipoprotein AI-knockout (apoAI-KO) mice, diet-induced obese mice, and genistein-fed obese mice. From our comparative analyses, we find that our proposed methods perform better than the existing print-tip lowess normalization method.  相似文献   

12.
Traditional multivariate tests such as Hotelling’s test or Wilk’s test are designed for classical problems, where the number of observations is much larger than the dimension of the variables. For high-dimensional data, however, this assumption cannot be met any longer. In this article, we consider testing problems in high-dimensional MANOVA where the number of variables exceeds the sample size. To overcome the challenges with high dimensionality, we propose a new approach called a shrinkage-based regularization test, which is suitable for a variety of data structures including the one-sample problem and one-way MANOVA. Our approach uses a ridge regularization to overcome the singularity of the sample covariance matrix and applies a soft-thresholding technique to reduce random noise and improve the testing power. An appealing property of this approach is its ability to select relevant variables that provide evidence against the hypothesis. We compare the performance of our approach with some competing approaches via real microarray data and simulation studies. The results illustrate that the proposed statistics maintains relatively high power in detecting a wide family of alternatives.  相似文献   

13.
This paper studies a new generalization of the regular permutation flowshop scheduling problem (PFSP) referred to as the distributed permutation flowshop scheduling problem or DPFSP. Under this generalization, we assume that there are a total of F identical factories or shops, each one with m machines disposed in series. A set of n available jobs have to be distributed among the F factories and then a processing sequence has to be derived for the jobs assigned to each factory. The optimization criterion is the minimization of the maximum completion time or makespan among the factories. This production setting is necessary in today's decentralized and globalized economy where several production centers might be available for a firm. We characterize the DPFSP and propose six different alternative mixed integer linear programming (MILP) models that are carefully and statistically analyzed for performance. We also propose two simple factory assignment rules together with 14 heuristics based on dispatching rules, effective constructive heuristics and variable neighborhood descent methods. A comprehensive computational and statistical analysis is conducted in order to analyze the performance of the proposed methods.  相似文献   

14.
The class of bipartite permutation graphs is the intersection of two well known graph classes: bipartite graphs and permutation graphs. A complete bipartite decomposition of a bipartite permutation graph is proposed in this note. The decomposition gives a linear structure of bipartite permutation graphs, and it can be obtained in O(n) time, where n is the number of vertices. As an application of the decomposition, we show an O(n) time and space algorithm for finding a longest path in a bipartite permutation graph.  相似文献   

15.
A measurement of cluster quality is often needed for DNA microarray data analysis. In this paper, we introduce a new cluster validity index, which measures geometrical features of the data. The essential concept of this index is to evaluate the ratio between the squared total length of the data eigen-axes with respect to the between-cluster separation. We show that this cluster validity index works well for data that contain clusters closely distributed or with different sizes. We verify the method using three simulated data sets, two real world data sets and two microarray data sets. The experiment results show that the proposed index is superior to five other cluster validity indices, including partition coefficients (PC), General silhouette index (GS), Dunn’s index (DI), CH Index and I-Index. Also, we have given a theorem to show for what situations the proposed index works well.  相似文献   

16.
For the non-parametric Behrens-Fisher problem a permutation test based on the studentized rank statistic of Brunner and Munzel is proposed. This procedure is applicable to count or ordered categorical data. By applying the central limit theorem of Janssen, it is shown that the asymptotic permutational distribution of this test statistic is a standard normal distribution. For very small and very different sample sizes, frequently occurring in medical and biological applications, an extensive simulation study suggests that this permutation test works well for data from several underlying distributions. The proposed test is applied to data from a clinical trial.  相似文献   

17.
Inferential methods known in the shape analysis literature make use of configurations of landmarks optimally superimposed using a least-squares procedure or analyze matrices of interlandmark distances. For example, in the two independent sample case, a practical method for comparing the mean shapes in the two groups is to use the Procrustes tangent space coordinates, if data are concentrated, calculate the Mahalanobis distance and then the Hotelling T2-test statistic. Under the assumption of isotropy, another simple approach is to work with statistics based on the squared Procrustes distance and then consider the Goodall F-test statistic. Despite their widespread use, on the one hand it is well known that Hotelling’s T2-test may not be very powerful unless there are a large number of observations available, and on the other hand the underlying model required by Goodall’s F-test is very restrictive. For these reasons, an extension of the nonparametric combination (NPC) methodology to shape analysis is proposed. Focussing on the two independent sample case, through a comparative simulation study and an application to the Mediterranean monk seal skulls dataset, the behaviour of some nonparametric permutation tests has been evaluated, showing that the proposed tests are very powerful, for both balanced and unbalanced sample sizes.  相似文献   

18.
This paper proposes a chaos-based image encryption scheme with a permutation–diffusion structure. In the proposed scheme, the large permutation with the same size as the plain-image is used to shuffle the positions of image pixels totally. An effective method is also presented to construct the large permutation quickly and easily by combining several small permutations, where small permutations are directly generated using a chaotic map. In the diffusion stage, the pixel is enciphered by exclusive or with the previous ciphered pixel and a random number produced by the Logistic map with different initial conditions. Test results and analysis by using several security measures have shown that the proposed scheme is efficient and reliable, and can be applied to real-time image encryption.  相似文献   

19.
Abstract: The paper presents a methodology for building sequential decision support systems based on decision theory using value of information (for short, DT‐VOI based SDSSs). DT‐VOI based SDSSs support decision‐makers in difficult problems of sequential decision‐making. In particular we consider the problem of building DT‐VOI based SDSSs which are capable of supporting decisions in critical situations where (1) making a decision entails knowing the states of some critical hypotheses, and such knowledge is acquired by performing suitable tests; (2) test outcomes are uncertain; (3) performing a test entails, in general, some drawbacks, so that a trade‐off exists between such drawbacks and the value of the information provided by the test; (4) performing a test has the side‐effect that it changes the expected benefit from performing other tests; (5) exceptional situations alter probability and utility default values.  相似文献   

20.
Clustering techniques play an important role in analyzing high dimensional data that is common in high-throughput screening such as microarray and mass spectrometry data. Effective use of the high dimensionality and some replications can help to increase clustering accuracy and stability. In this article a new partitioning algorithm with a robust distance measure is introduced to cluster variables in high dimensional low sample size (HDLSS) data that contain a large number of independent variables with a small number of replications per variable. The proposed clustering algorithm, PPCLUST, considers data from a mixture distribution and uses p-values from nonparametric rank tests of homogeneous distribution as a measure of similarity to separate the mixture components. PPCLUST is able to efficiently cluster a large number of variables in the presence of very few replications. Inherited from the robustness of rank procedure, the new algorithm is robust to outliers and invariant to monotone transformations of data. Numerical studies and an application to microarray gene expression data for colorectal cancer study are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号