期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Statistical analysis in dBASE-compatible databases

M Hauer-Jensen 《Computer applications in the biosciences》1991,7(1):61-62

相似文献

2.

Permutation test for incomplete paired data with application to cDNA microarray data

Donghyeon Yu 《Computational statistics & data analysis》2012,56(3):510-521

A paired data set is common in microarray experiments, where the data are often incompletely observed for some pairs due to various technical reasons. In microarray paired data sets, it is of main interest to detect differentially expressed genes, which are usually identified by testing the equality of means of expressions within a pair. While much attention has been paid to testing mean equality with incomplete paired data in previous literature, the existing methods commonly assume the normality of data or rely on the large sample theory. In this paper, we propose a new test based on permutations, which is free from the normality assumption and large sample theory. We consider permutation statistics with linear mixtures of paired and unpaired samples as test statistics, and propose a procedure to find the optimal mixture that minimizes the conditional variances of the test statistics, given the observations. Simulations are conducted for numerical power comparisons between the proposed permutation tests and other existing methods. We apply the proposed method to find differentially expressed genes for a colorectal cancer study. 相似文献

3.

Quantifying effects in two-sample environmental experiments using bootstrap confidence intervals

《Environmental Modelling & Software》2007,22(1):84-96

Two-sample experiments (paired or unpaired) are often used to analyze treatment effects in life and environmental sciences. Quantifying an effect can be achieved by estimating the difference in center of location between a treated and a control sample. In unpaired experiments, a shift in scale is also of interest. Non-normal data distributions can thereby impose a serious challenge for obtaining accurate confidence intervals for treatment effects. To study the effects of non-normality we analyzed robust and non-robust measures of treatment effects: differences of averages, medians, standard deviations, and normalized median absolute deviations in case of unpaired experiments, and average of differences and median of differences in case of paired experiments. A Monte Carlo study using bivariate lognormal distributions was carried out to evaluate coverage performances and lengths of four types of nonparametric bootstrap confidence intervals, namely normal, Student's t, percentile, and BCa for the estimated measures. The robust measures produced smaller coverage errors than their non-robust counterparts. On the other hand, the robust versions gave average confidence interval lengths approximately 1.5 times larger. In unpaired experiments, BCa confidence intervals performed best, while in paired experiments, Student's t was as good as BCa intervals. Monte Carlo results are discussed and recommendations on data sizes are presented. In an application to physiological source–sink manipulation experiments with sunflower, we quantify the effect of an increased or decreased source–sink ratio on the percentage of unfilled grains and the dry mass of a grain. In an application to laboratory experiments with wastewater, we quantify the disinfection effect of predatory microorganisms. The presented bootstrap method to compare two samples is broadly applicable to measured or modeled data from the entire range of environmental research and beyond. 相似文献

4.

A class of discrete distributions arising from difference of two random variables

S.H. Ong K. Shimizu 《Computational statistics & data analysis》2008,52(3):1490-1499

This paper considers a class of distributions arising from the difference of two discrete random variables belonging to the Panjer family of distributions. Some distributional properties and computation of probabilities are discussed. Goodness of fit and tests of hypotheses involving the likelihood ratio, score and Wald tests have been considered. As an illustration, an application to paired count data is given. 相似文献

5.

Using integrated weighted survival difference for the two-sample censored data problem

Seung-Hwan Lee Eun-Joo Lee 《Computational statistics & data analysis》2008,52(9):4410-4416

For the two-sample censored data problem, Pepe and Fleming [Pepe, M.S., Fleming, T.R., 1989. Weighted Kaplan-Meier statistics: A class of distance tests for censored survival data. Biometrics 45, 497-507] introduced the weighted Kaplan-Meier (WKM) statistics. From these statistics we define stochastic processes which can be approximated by zero-mean martingales. Conditional distributions of the processes, given data, can be easily approximated through simulation techniques. Based on comparison of these processes, we construct a supremum test to assess the model adequacy. Monte Carlo simulations are conducted to evaluate and compare the size and power properties of the proposed test to the WKM and the log-rank tests. The procedures are illustrated using real data. 相似文献

6.

Neighborhood Correlation Analysis for Semi-paired Two-View Data

Xudong Zhou Xiaohong Chen Songcan Chen 《Neural Processing Letters》2013,37(3):335-354

Canonical correlation analysis (CCA) is a widely used technique for analyzing two datasets (two views of the same objects). However, CCA needs that the samples of the two views are fully-paired. Actually, we are often faced up with the semi-paired scenario where the number of available paired samples is limited and yet the number of unpaired samples is sufficient. For such a scenario, CCA is generally prone to overfitting and thus performs poorly, since its definition itself makes it only able to utilize those paired samples. To overcome such a shortcoming, several semi-paired variants of CCA have been proposed. However, unpaired samples in these methods are just used in the way of single-view leaning to capture individual views’ structure information for regularizing CCA. Intuitively, using unpaired samples in the way of two-view learning should be more natural and more attractive since CCA itself is a two-view learning method. As a result, a novel CCAs semi-paired variant named Neighborhood Correlation Analysis (NeCA), which uses unpaired samples in the two-view learning way, is developed through incorporating between-view neighborhood relationships into CCA. The relationships are acquired through leveraging within-view neighborhood relationships of each view’s all data (including paired and unpaired data) and between-view paired information. Thus, it can take more sufficient advantage of the unpaired samples and then mitigate overfitting effectively caused by the limited paired data. Promising experiments results on several popular multi-view datasets show its feasibility and effectiveness. 相似文献

7.

Robust panel unit root tests for cross-sectionally dependent multiple time series

Dong Wan Shin Sangun Park 《Computational statistics & data analysis》2010,54(11):2801-2813

Robust panel unit root tests are developed for cross-sectionally dependent multiple time series. The tests have limiting null distributions derived from standard normal distributions. A Monte Carlo experiment shows that the tests have better finite sample robust performance than existing tests. Some Latin American real exchange rates revealing many outlying observations are analyzed to check the purchasing power parity (PPP) theory. 相似文献

8.

ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing

Ciprian Docan Fan Zhang Tong Jin Hoang Bui Qian Sun Julian Cummings Norbert Podhorszki Scott Klasky Manish Parashar 《Concurrency and Computation》2015,27(14):3724-3745

Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership‐class resources has become a critical challenge. The data have to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, and so on. Several recent research efforts have addressed data‐related challenges at different levels. One attractive approach is to offload expensive input/output operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still have to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data‐processing code to the staging area instead of moving the data to the data‐processing code. Specifically, we describe the ActiveSpaces framework, which provides (1) programming support for defining the data‐processing routines to be downloaded to the staging area and (2) runtime mechanisms for transporting codes associated with these routines to the staging area, executing the routines on the nodes that are part of the staging area, and returning the results. We also present an experimental performance evaluation of ActiveSpaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade‐offs between transporting data and transporting the code required for data processing during coupling, and we characterize sweet spots for each option. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

9.

Inference procedures for the Birnbaum–Saunders distribution and its generalizations

Simos G. Meintanis 《Computational statistics & data analysis》2010,54(2):367-373

Goodness-of-fit tests are constructed for the two-parameter Birnbaum–Saunders distribution in the case where the parameters are unknown and are therefore estimated from the data. With each test the procedure starts by computing efficient estimators of the parameters. Then the data are transformed to normality and normality tests are applied on the transformed data, thereby avoiding reliance on parametric asymptotic critical values or the need for bootstrap computations. Two classes of tests are considered, the first class being the classical tests based on the empirical distribution function, while the other class utilizes the empirical characteristic function. All methods are extended to cover the case of generalized three-parameter Birnbaum–Saunders distributions. 相似文献

10.

Sample size determination for paired right-censored data based on the difference of Kaplan–Meier estimates

《Computational statistics & data analysis》2014

Sample size determination is essential to planning clinical trials. Jung (2008) established a sample size calculation formula for paired right-censored data based on the logrank test, which has been well-studied for comparing independent survival outcomes. An alternative to rank-based methods for independent right-censored data, advocated by Pepe and Fleming (1989), tests for differences between integrated weighted Kaplan–Meier estimates and is more sensitive to the magnitude of difference in survival times between groups. In this paper, we employ the concept of the Pepe–Fleming method to determine an adequate sample size by calculating differences between Kaplan–Meier estimators considering pair-wise correlation. We specify a positive stable frailty model for the joint distribution of paired survival times. We evaluate the performance of the proposed method by simulation studies and investigate the impacts of the accrual times, follow-up times, loss to follow-up rate, and sensitivity of power under misspecification of the model. The results show that ignoring the pair-wise correlation results in overestimating the required sample size. Furthermore, the proposed method is applied to two real-world studies, and the R code for sample size calculation is made available to users. 相似文献

11.

基于弱匹配概率典型相关性分析的图像自动标注

张博郝杰马刚史忠植《软件学报》2017,28(2):292-309

针对弱匹配多模态数据的相关性建模问题,提出了一种弱匹配概率典型相关性分析模型（semi-paired probabilistic CCA,简称SemiPCCA）.SemiPCCA模型关注于各模态内部的全局结构,模型参数的估计受到了未匹配样本的影响,而未匹配样本则揭示了各模态样本空间的全局结构.在人工弱匹配多模态数据集上的实验结果表明,SemiPCCA可以有效地解决传统CCA（canonical correlation analysis）和PCCA（probabilistic CCA）在匹配样本不足的情况下出现的过拟合问题,取得了较好的效果.提出了一种基于SemiPCCA的图像自动标注方法.该方法基于关联建模的思想,同时使用标注图像及其关键词和未标注图像学习视觉模态和文本模态之间的关联,从而能够更准确地对未知图像进行标注. 相似文献

12.

A goodness of fit test for the Pareto distribution in the presence of Type II censoring, based on the cumulative hazard function

Dayna P. Saldaña-Zepeda Humberto Vaquera-Huerta 《Computational statistics & data analysis》2010,54(4):833-1303

A goodness of fit test for the Pareto distribution, when the observations are subjected to Type II right censoring is proposed. The test statistic involves transformations of the original data and is based on the nonparametric Nelson-Aalen estimator of the cumulative hazard function. By Monte Carlo simulation, the empirical distribution of the test statistic is obtained and the power of the test is investigated for some alternative distributions. The power is compared with adaptations for Type II censored data of the Crámer-von Mises and Anderson-Darling tests, and a test based on Kullback-Leibler information. For some alternative distributions with monotone decreasing hazard function, the proposed test has higher power. The methodology is illustrated by reanalyzing two published data sets. 相似文献

13.

A compound class of Weibull and power series distributions

Alice Lemos Morais Wagner Barreto-Souza 《Computational statistics & data analysis》2011,55(3):1410-1425

In this paper we introduce the Weibull power series (WPS) class of distributions which is obtained by compounding Weibull and power series distributions, where the compounding procedure follows same way that was previously carried out by Adamidis and Loukas (1998). This new class of distributions has as a particular case the two-parameter exponential power series (EPS) class of distributions (Chahkandi and Ganjali, 2009), which contains several lifetime models such as: exponential geometric (Adamidis and Loukas, 1998), exponential Poisson (Kus, 2007) and exponential logarithmic (Tahmasbi and Rezaei, 2008) distributions. The hazard function of our class can be increasing, decreasing and upside down bathtub shaped, among others, while the hazard function of an EPS distribution is only decreasing. We obtain several properties of the WPS distributions such as moments, order statistics, estimation by maximum likelihood and inference for a large sample. Furthermore, the EM algorithm is also used to determine the maximum likelihood estimates of the parameters and we discuss maximum entropy characterizations under suitable constraints. Special distributions are studied in some detail. Applications to two real data sets are given to show the flexibility and potentiality of the new class of distributions. 相似文献

14.

Periodic Poisson Processes and Almost-lack-of-memory Distributions

B. N. Dimitrov V. V. Rykov Z. L. Krougly 《Automation and Remote Control》2004,65(10):1597-1610

Certain characterization properties of time-varying periodic Poisson flows are studied in terms of almost-lack-of-memory (ALM) distributions. Parameter estimation formulas are derived. A method for verifying the hypothesis on the membership of a sample to the class of ALM-distributions is developed. Algorithms for computing critical levels and power of the likelihood ratio test by the Monte Carlo method are designed. 相似文献

15.

Maximum-power validation of models without higher-order fitting

Torsten Bohlin 《Automatica》1978,14(2):137-146

A new solution is presented to the problem of validating optimally a given dynamic model against given long-sample observations. If the model can be parametrized and cast into a general innovations structure, i.e. if expressions for the one-step predictor and the prediction error covariances are available, a test can be constructed that has asymptotic maximum discriminating power, for the least favourable case that the difference to be detected between model and observed system is small. A class of alternative models must be specified, but, unlike in other optimal tests, it is not required also to fit a best model within this class. Since the alternative class may include models more complicated than that to be validated, the test can be used for recursive determination of structure and order. For linear transfer-function or polynomial-operator models the asymptotic maximum-power test does not require much more computing, and sometimes less, than the conventional tests of auto- and cross-correlation. Generally, the latter tests are less efficient, even for linear models, if these is some a priori knowledge about the structure. A simple example demonstrates that there are realistic cases where the asymptotic maximum-power test may be considerably better. 相似文献

16.

Another step to the full GPU implementation of the weather research and forecasting model

Juan Pablo Silva José Hagopian Marcel Burdiat Ernesto Dufrechou Martín Pedemonte Alejandro Gutiérrez Gabriel Cazes Pablo Ezzatti 《The Journal of supercomputing》2014,70(2):746-755

Uruguay is currently undergoing a gradual process of inclusion of wind energy in its matrix of electric power generation. In this context, a computational tool has been developed to predict the electrical power that will be injected into the grid. The tool is based on the Weather Research and Forecasting (WRF) numerical model, which is the performance bottleneck of the application. For this reason, and in line with several successful efforts of other researchers, this article presents advances in porting the WRF to GPU. In particular, we present the implementation of sintb and bdy_interp1 routines on GPU and the integration of these routines with previous efforts from other authors. The speedup values obtained for the newly ported routines on a Nvidia GeForce GTX 480 GPU are up to \(33.9\times \) when compared with the sequential WRF and \(9.2\times \) when compared with the four-threaded WRF. The integration of the newly ported routines along with previous works produces a reduction of more than a 30 % in the total runtime of the multi-core four-threaded WRF and of more than a 50 % in the single-threaded version. 相似文献

17.

Computing the nonparametric estimator of the survivor function when all observations are either left- or right-censored with tied observation times

《Computers & Operations Research》2002,29(5):423-431

A data set of missiles tested at various times consists entirely of left- and right-censored observations. We present an algorithm for computing the nonparametric maximum likelihood estimator of the survivor function. When there are a significant number of tied observations, the algorithm saves significant computation time over a direct implementation of the survivor function estimate given in Andersen and Rønn, (Biometrics 1995; 51:323–9).Scope and purposeThe algorithm presented here is of use to a modeler interested in computing the nonparametric maximum likelihood estimator of the survivor function for a data set that consists solely of left- and right-censored observations, which also contains tied observation times. This estimator is of use to a modeler in analyzing a data set of nonnegative response times, as in the case of a reliability engineer modeling component survival times or a biostatistician modeling patient survival times. 相似文献

18.

Bayesian nonlinear regression models with scale mixtures of skew-normal distributions: Estimation and case influence diagnostics

Vicente G. Cancho Dipak K. DeyVictor H. Lachos Marinho G. Andrade 《Computational statistics & data analysis》2011,55(1):588-602

The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. 相似文献

19.

Nonstationary model validation from finite data records

Baram Y. 《Automatic Control, IEEE Transactions on》1980,25(1):10-19

The problems associated with testing a dynamical model, using a data record of finite length, are insufficiency of the data for statistically meaningful decisions, coupling of mean, covariance, and correlation related errors, difficulty of detecting midcourse model departures, and inadequacy of traditional techniques for computing test power for given model alternatives. This paper attempts to provide a comprehensive analysis of nonstationary models via significance tests, specifically addressing these problems. Data records from single and from multiple system operations are analyzed, and the models considered are possibly varying both with respect to time and with respect to operations. Quadratic form distributions prove effective in the statistical analysis. 相似文献

20.

Value of information based scheduling of cloud computing resources

《Future Generation Computer Systems》2017

Traditionally, heavy computational tasks were performed on a dedicated infrastructure requiring a heavy initial investment, such as a supercomputer or a data center. Grid computing relaxed the assumptions of the fixed infrastructure, allowing the sharing of remote computational resources. Cloud computing brought these ideas into the commercial realm and allows users to request on demand an essentially unlimited amount of computing power. However, in contrast to previous assumptions, this computing power is metered and billed on an hour-by-hour basis.In this paper, we are considering applications where the output quality increases with the deployed computational power, a large class including applications ranging from weather prediction to financial modeling. We are proposing a computation scheduling that considers both the financial cost of the computation and the predicted financial benefit of the output, that is, its value of information (VoI). We model the proposed approach for an example of analyzing real-estate investment opportunities in a competitive environment. We show that by using the VoI-based scheduling algorithm, we can outperform minimalistic computing approaches, large but fixedly allocated data centers and cloud computing approaches that do not consider the VoI. 相似文献