期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sample size determination for paired right-censored data based on the difference of Kaplan–Meier estimates

《Computational statistics & data analysis》2014

Sample size determination is essential to planning clinical trials. Jung (2008) established a sample size calculation formula for paired right-censored data based on the logrank test, which has been well-studied for comparing independent survival outcomes. An alternative to rank-based methods for independent right-censored data, advocated by Pepe and Fleming (1989), tests for differences between integrated weighted Kaplan–Meier estimates and is more sensitive to the magnitude of difference in survival times between groups. In this paper, we employ the concept of the Pepe–Fleming method to determine an adequate sample size by calculating differences between Kaplan–Meier estimators considering pair-wise correlation. We specify a positive stable frailty model for the joint distribution of paired survival times. We evaluate the performance of the proposed method by simulation studies and investigate the impacts of the accrual times, follow-up times, loss to follow-up rate, and sensitivity of power under misspecification of the model. The results show that ignoring the pair-wise correlation results in overestimating the required sample size. Furthermore, the proposed method is applied to two real-world studies, and the R code for sample size calculation is made available to users. 相似文献

2.

Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

Lopaka Lee Dennis Helsel 《Computers & Geosciences》2005,31(10):1133

Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these “less thans” is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data.We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. 相似文献

3.

On the versatility of the combination of the weighted log-rank statistics

《Computational statistics & data analysis》2008,52(12):6557-6564

In many applied situations, it is difficult to specify in advance the types of survival differences that may exist between two groups. Therefore, it is tempting to use some tests that emphasize these differences, but are sensitive to a wide range of the survival differences. In this paper such versatile tests are considered, whose procedures are based on the simultaneous use of the weighted log-rank statistics that are asymptotically normal under the null hypothesis of no difference between two groups. Simulations are performed to examine power of the tests in small and moderate sample sizes when the data are uncensored to heavily censored. Implementation of the procedures are discussed in a real data example for illustration. 相似文献

4.

On the versatility of the combination of the weighted log-rank statistics

Seung-Hwan Lee 《Computational statistics & data analysis》2007,51(12):6557-6564

In many applied situations, it is difficult to specify in advance the types of survival differences that may exist between two groups. Therefore, it is tempting to use some tests that emphasize these differences, but are sensitive to a wide range of the survival differences. In this paper such versatile tests are considered, whose procedures are based on the simultaneous use of the weighted log-rank statistics that are asymptotically normal under the null hypothesis of no difference between two groups. Simulations are performed to examine power of the tests in small and moderate sample sizes when the data are uncensored to heavily censored. Implementation of the procedures are discussed in a real data example for illustration. 相似文献

5.

Testing hypotheses in the Birnbaum-Saunders distribution under type-II censored samples

Artur J. LemonteSilvia L.P. Ferrari 《Computational statistics & data analysis》2011,55(7):2388-2399

The two-parameter Birnbaum-Saunders distribution has been used successfully to model fatigue failure times. Although censoring is typical in reliability and survival studies, little work has been published on the analysis of censored data for this distribution. In this paper, we address the issue of performing testing inference on the two parameters of the Birnbaum-Saunders distribution under type-II right censored samples. The likelihood ratio statistic and a recently proposed statistic, the gradient statistic, provide a convenient framework for statistical inference in such a case, since they do not require to obtain, estimate or invert an information matrix, which is an advantage in problems involving censored data. An extensive Monte Carlo simulation study is carried out in order to investigate and compare the finite sample performance of the likelihood ratio and the gradient tests. Our numerical results show evidence that the gradient test should be preferred. Further, we also consider the generalized Birnbaum-Saunders distribution under type-II right censored samples and present some Monte Carlo simulations for testing the parameters in this class of models using the likelihood ratio and gradient tests. Three empirical applications are presented. 相似文献

6.

Censored depth quantiles

M. Debruyne S. Portnoy 《Computational statistics & data analysis》2008,52(3):1604-1614

Quantile regression is a wide spread regression technique which allows to model the entire conditional distribution of the response variable. A natural extension to the case of censored observations has been introduced using a reweighting scheme based on the Kaplan-Meier estimator. The same ideas can be applied to depth quantiles. This leads to regression quantiles for censored data which are robust to both outliers in the predictor and the response variable. For their computation, a fast algorithm over a grid of quantile values is proposed. The robustness of the method is shown in a simulation study and on two real data examples. 相似文献

7.

Identification and validation of the statistics of the initial states of linear dynamic systems based on cross-sectional data

Faung-Kuo Sun 《Automatic Control, IEEE Transactions on》1984,29(10):954-956

This note examines the problem of statistical inference of the initial states of a linear discrete dynamic system based on a set of cross-sectional data. Several compressed data structures are proposed to reduce the amount of the cross-sectional data obtained from multiple independent experiments. It is shown that these data structures are sufficient statistics for estimating the mean and the covariance of the initial states, given the entire raw data from multiple experiments. Thus, the identification and the validation of these parameters can be performed with reduced data structures without referring back to the entire raw data and the original dynamics. For the identification of these parameters, theE-Mprocedure presented in [1] can be applied to this case. For the validation of these parameters having specified values, simple tests of "significance" type are proposed. The major advantage of these tests over the generalized likelihood ratio test is that their probability distributions are known and computable under both the null and the alternative hypotheses even for the finite sample case, i.e., the asymptotic assumption is not necessary. 相似文献

8.

Evaluating discrete choice prediction models when the evaluation data is corrupted: analytic results and bias corrections for the area under the ROC

Roger M. Stein 《Data mining and knowledge discovery》2016,30(4):763-796

There has been a growing recognition that issues of data quality, which are routine in practice, can materially affect the assessment of learned model performance. In this paper, we develop some analytic results that are useful in sizing the biases associated with tests of discriminatory model power when these are performed using corrupt (“noisy”) data. As it is sometimes unavoidable to test models with data that are known to be corrupt, we also provide some guidance on interpreting results of such tests. In some cases, with appropriate knowledge of the corruption mechanism, the true values of the performance statistics such as the area under the ROC curve may be recovered (in expectation), even when the underlying data have been corrupted. We also provide estimators of the standard errors of such recovered performance statistics. An analysis of the estimators reveals interesting behavior including the observation that “noisy” data does not “cancel out” across models even when the same corrupt data set is used to test multiple candidate models. Because our results are analytic, they may be applied in a broad range of settings and this can be done without the need for simulation. 相似文献

9.

Feature significance for multivariate kernel density estimation

Tarn Duong M.P. Wand 《Computational statistics & data analysis》2008,52(9):4225-4242

Multivariate kernel density estimation provides information about structure in data. Feature significance is a technique for deciding whether features-such as local extrema-are statistically significant. This paper proposes a framework for feature significance in d-dimensional data which combines kernel density derivative estimators and hypothesis tests for modal regions. For the gradient and curvature estimators distributional properties are given, and pointwise test statistics are derived. The hypothesis tests extend the two-dimensional feature significance ideas of Godtliebsen et al. [Godtliebsen, F., Marron, J.S., Chaudhuri, P., 2002. Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics 11, 1-21]. The theoretical framework is complemented by novel visualization for three-dimensional data. Applications to real data sets show that tests based on the kernel curvature estimators perform well in identifying modal regions. These results can be enhanced by corresponding tests with kernel gradient estimators. 相似文献

10.

Predicting Survival Outcomes in the Presence of Unlabeled Data

Nateghi Haredasht Fateme Vens Celine 《Machine Learning》2022,111(11):4139-4157

Many clinical studies require the follow-up of patients over time. This is challenging: apart from frequently observed drop-out, there are often also organizational and financial challenges, which can lead to reduced data collection and, in turn, can complicate subsequent analyses. In contrast, there is often plenty of baseline data available of patients with similar characteristics and background information, e.g., from patients that fall outside the study time window. In this article, we investigate whether we can benefit from the inclusion of such unlabeled data instances to predict accurate survival times. In other words, we introduce a third level of supervision in the context of survival analysis, apart from fully observed and censored instances, we also include unlabeled instances. We propose three approaches to deal with this novel setting and provide an empirical comparison over fifteen real-life clinical and gene expression survival datasets. Our results demonstrate that all approaches are able to increase the predictive performance over independent test data. We also show that integrating the partial supervision provided by censored data in a semi-supervised wrapper approach generally provides the best results, often achieving high improvements, compared to not using unlabeled data.

相似文献

11.

Computing P-values for a class of permutation tests of equal survival functions

Dallas MJ Rao PV 《Computer methods and programs in biomedicine》2003,71(2):149-153

Dallas and Rao (Biometrics 56 (2000) 154) proposed a class of permutation tests for testing the equality of two survival distributions based on randomly right censored survival time data consisting of both paired and unpaired observations. Data sets of this type can occur frequently in medical settings. Two members of this class were advocated for use due to their generally high power for detecting scale and location shifts in the exponential and log-logistic distributions for the survival times, and improved power over paired data test procedures that disregard unpaired observations. Because the computations for the tests become quite laborious as the sample sizes increase, computing routines are required for practical implementation of these tests. This paper provides computing routines to execute the tests. 相似文献

12.

The Banerjee–Wolfe and GCD Tests on Exact Data Dependence Information

Kleanthis Psarris 《Journal of Parallel and Distributed Computing》1996,32(2):119

The GCD test and the Banerjee–Wolfe test are the two tests traditionally used to determine statement data dependence, subject to direction vectors, in automatic vectorization/parallelization of loops. In an earlier study, a sufficient condition for the accuracy of the Banerjee–Wolfe test was stated and proved. In that work, we only considered the case of general data dependence, i.e., the case of data dependence without direction vector information. In this paper, we extend the previous result to the case of data dependence subject to an arbitrary direction vector. We also state and prove a sufficient condition for the accuracy of a combination of the GCD and Banerjee–Wolfe tests. Furthermore, we show that the sufficient conditions, for the accuracy of the Banerjee–Wolfe test and the accuracy of a combination of the GCD and Banerjee–Wolfe tests are necessary conditions as well. Finally, we demonstrate how these results can be used in actual practice to obtain exact data dependence information. 相似文献

13.

Analyzing survival curves at a fixed point in time for paired and clustered right-censored data

Pei-Fang SuYunchan Chi Chung-I Li Yu Shyr Yi-De Liao 《Computational statistics & data analysis》2011,55(4):1617-1628

In clinical trials, information about certain time points may be of interest in making decisions about treatment effectiveness. Therefore, rather than comparing entire survival curves, researchers may wish to focus the comparison on fixed time points with potential clinical utility. For two independent samples of right-censored data, Klein et al. (2007) compared survival probabilities at a fixed time point by studying a number of tests based on transformations of the Kaplan-Meier estimators of the survival function. To compare the survival probabilities at a fixed time point for paired right-censored data or clustered right-censored data, however, their approach requires modification. In this paper, we extend the statistics to accommodate possible within-pair and within-cluster correlation. We use simulation studies to present comparative results. Finally, we illustrate the implementation of these methods using two real data sets. 相似文献

14.

Maximum likelihood estimation using probability density functions of order statistics

Andrew G. Glen 《Computers & Industrial Engineering》2010,58(4):658-662

A variation of maximum likelihood estimation (MLE) of parameters that uses probability density functions of order statistic is presented. Results of this method are compared with traditional maximum likelihood estimation for complete and right-censored samples in a life test. Further, while the concept can be applied to most types of censored data sets, results are presented in the case of order statistic interval censoring, in which even a few order statistics estimate well, compared to estimates from complete and right-censored samples. Distributions investigated include the exponential, Rayleigh, and normal distributions. Computation methods using A Probability Programming Language running in Maple are more straightforward than existing methods using various numerical method algorithms. 相似文献

15.

Classification via weighted kernel CNN: application to SAR target recognition

Zhuangzhuang Tian Liping Wang Jiemin Hu Jun Zhang 《International journal of remote sensing》2013,34(23):9249-9268

ABSTRACT

The conventional convolutional neural network (CNN) has proven to be effective for synthetic aperture radar (SAR) target recognition. However, the relationship between different convolutional kernels is not taken into account. The lack of the relationship limits the feature extraction capability of the convolutional layer to a certain extent. To address this problem, this paper presents a novel method named weighted kernel CNN (WKCNN). WKCNN integrates a weighted kernel module (WKM) into the common CNN architecture. The WKM is proposed to model the interdependence between different kernels, and thus to improve the feature extraction capability of the convolutional layer. The WKM consists of variables and activations. The variable represents the weight of the convolutional kernel. The activation is a mapping function which is used to determine the range of the weight. To adjust the variable adaptively, back propagation (BP) algorithm for the WKM is derived. The training of the WKM is driven by optimizing the cost function according to the BP algorithm, and three training modes are presented and analysed. SAR target recognition experiments are conducted on the moving and stationary target acquisition and recognition (MSTAR) dataset, and the results show the superiority of the proposed method. 相似文献

16.

Relations for moments of progressively Type-II censored order statistics from half-logistic distribution with applications to inference

N. Balakrishnan H.M. Saleh 《Computational statistics & data analysis》2011,55(10):2775-2792

In this paper, we establish several recurrence relations for the single and product moments of progressively Type-II right censored order statistics from a half-logistic distribution. The use of these relations in a systematic recursive manner would enable one to compute all the means, variances and covariances of progressively Type-II right censored order statistics from the half-logistic distribution for all sample sizes n, effective sample sizes m, and all progressive censoring schemes (R₁,…,R_m). The results established here generalize the corresponding results for the usual order statistics due to Balakrishnan (1985). These moments are then utilized to derive best linear unbiased estimators of the scale and location-scale parameters of the half-logistic distribution. A comparison of these estimators with the maximum likelihood estimates is then made. The best linear unbiased predictors of censored failure times is then discussed briefly. Finally, two numerical examples are presented to illustrate all the inferential methods developed here. 相似文献

17.

A clustering based feature selection method in spectro-temporal domain for speech recognition

Nafiseh Esfandian Farbod Razzazi Alireza Behrad 《Engineering Applications of Artificial Intelligence》2012,25(6):1194-1202

Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features. 相似文献

18.

On the power transformation of kernel-based tests for serial correlation in vector time series: Some finite sample results and a comparison with the bootstrap

Jennifer Poulin 《Computational statistics & data analysis》2008,52(9):4432-4457

Portmanteau test statistics represent useful diagnostic tools for checking the adequacy of multivariate time series models. For stationary and partially non-stationary vector time series models, Duchesne and Roy [Duchesne, P., Roy, R., 2004. On consistent testing for serial correlation of unknown form in vector time series models. Journal of Multivariate Analysis 89, 148-180] and Duchesne [Duchesne, P., 2005a. Testing for serial correlation of unknown form in cointegrated time series models. Annals of the Institute of Statistical Mathematics 57, 575-595] have proposed kernel-based test statistics, obtained by comparing the spectral density of the errors under the null hypothesis of non-correlation with a kernel-based spectral density estimator; these test statistics are asymptotically standard normal under the null hypothesis of non-correlation in the error term of the model. Following the method of Chen and Deo [Chen, W.W., Deo, R.S., 2004a. Power transformations to induce normality and their applications. Journal of the Royal Statistical Society, Ser. B 66, 117-130], we determine an appropriate power transformation to improve the normal approximation in small samples. Additional corrections for the mean and variance of the distance measures intervening in these test statistics are obtained. An alternative procedure to estimate the finite distribution of the test statistics is to use the bootstrap method; we introduce bootstrap-based versions of the original spectral test statistics. In a Monte Carlo study, comparisons are made under various alternatives between: the original spectral test statistics, the new corrected test statistics, the bootstrap-based versions, and finally the classical Hosking portmanteau test statistic. 相似文献

19.

Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy

Albert Vexler andGregory Gurevich 《Computational statistics & data analysis》2010,54(2):531-545

The likelihood approach based on the empirical distribution functions is a well-accepted statistical tool for testing. However, the proof schemes of the Neyman–Pearson type lemmas induce consideration of density-based likelihood ratios to obtain powerful test statistics. In this article, we introduce the distribution-free density-based likelihood technique, applied to test for goodness-of-fit. We focus on tests for normality and uniformity, which are common tasks in applied studies. The well-known goodness-of-fit tests based on sample entropy are shown to be a product of the proposed empirical likelihood (EL) methodology. Although the efficiency of test statistics based on classes of entropy estimators has been widely addressed in the statistical literature, estimation of the sample entropy has been not invariantly defined, and hence this estimation produces tests that are difficult to be applied to real data studies. The proposed EL approach defines clear forms of the entropy-based tests. Monte Carlo simulation results confirm the preference of the proposed method from a power perspective. Real data examples study the proposed approach in practice. 相似文献

20.

REGRES: A FORTRAN-77 program to calculate nonparametric and “structural” parametric solutions to bivariate regression equations

N. M. S. Rock 《Computers & Geosciences》1986,12(6)

NPSTAT compares the location (median), dispersion (variance), and overall shape of 2 or more groups of data (samples), using the Kruskal-Wallis and Van der Waerden tests (> 2 groups), Mann-Whitney and Kolmogorov-Smirnov tests (2 groups), and squared ranks test (2 or more groups). Exact or approximate significance levels of test statistics are calculated in all situations. Multivariate data are treated one variable at a time. NPSTAT reproduces results from textbooks despite previous inconsistencies with calculation and significance assessment methods. Commercial routines (e.g. in MINITAB, NAG) cover only 3 of the 5 tests programmed, but yield consistent results for these. NPSTAT is useful particularly for comparing nonnormally distributed data, multivariate data with missing values, and data measured only on an ordinal scale. It also can be used to assess outlying values. 相似文献