首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sample size determination is essential to planning clinical trials. Jung (2008) established a sample size calculation formula for paired right-censored data based on the logrank test, which has been well-studied for comparing independent survival outcomes. An alternative to rank-based methods for independent right-censored data, advocated by Pepe and Fleming (1989), tests for differences between integrated weighted Kaplan–Meier estimates and is more sensitive to the magnitude of difference in survival times between groups. In this paper, we employ the concept of the Pepe–Fleming method to determine an adequate sample size by calculating differences between Kaplan–Meier estimators considering pair-wise correlation. We specify a positive stable frailty model for the joint distribution of paired survival times. We evaluate the performance of the proposed method by simulation studies and investigate the impacts of the accrual times, follow-up times, loss to follow-up rate, and sensitivity of power under misspecification of the model. The results show that ignoring the pair-wise correlation results in overestimating the required sample size. Furthermore, the proposed method is applied to two real-world studies, and the R code for sample size calculation is made available to users.  相似文献   

2.
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these “less thans” is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data.We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards.  相似文献   

3.
In many applied situations, it is difficult to specify in advance the types of survival differences that may exist between two groups. Therefore, it is tempting to use some tests that emphasize these differences, but are sensitive to a wide range of the survival differences. In this paper such versatile tests are considered, whose procedures are based on the simultaneous use of the weighted log-rank statistics that are asymptotically normal under the null hypothesis of no difference between two groups. Simulations are performed to examine power of the tests in small and moderate sample sizes when the data are uncensored to heavily censored. Implementation of the procedures are discussed in a real data example for illustration.  相似文献   

4.
In many applied situations, it is difficult to specify in advance the types of survival differences that may exist between two groups. Therefore, it is tempting to use some tests that emphasize these differences, but are sensitive to a wide range of the survival differences. In this paper such versatile tests are considered, whose procedures are based on the simultaneous use of the weighted log-rank statistics that are asymptotically normal under the null hypothesis of no difference between two groups. Simulations are performed to examine power of the tests in small and moderate sample sizes when the data are uncensored to heavily censored. Implementation of the procedures are discussed in a real data example for illustration.  相似文献   

5.
The two-parameter Birnbaum-Saunders distribution has been used successfully to model fatigue failure times. Although censoring is typical in reliability and survival studies, little work has been published on the analysis of censored data for this distribution. In this paper, we address the issue of performing testing inference on the two parameters of the Birnbaum-Saunders distribution under type-II right censored samples. The likelihood ratio statistic and a recently proposed statistic, the gradient statistic, provide a convenient framework for statistical inference in such a case, since they do not require to obtain, estimate or invert an information matrix, which is an advantage in problems involving censored data. An extensive Monte Carlo simulation study is carried out in order to investigate and compare the finite sample performance of the likelihood ratio and the gradient tests. Our numerical results show evidence that the gradient test should be preferred. Further, we also consider the generalized Birnbaum-Saunders distribution under type-II right censored samples and present some Monte Carlo simulations for testing the parameters in this class of models using the likelihood ratio and gradient tests. Three empirical applications are presented.  相似文献   

6.
Quantile regression is a wide spread regression technique which allows to model the entire conditional distribution of the response variable. A natural extension to the case of censored observations has been introduced using a reweighting scheme based on the Kaplan-Meier estimator. The same ideas can be applied to depth quantiles. This leads to regression quantiles for censored data which are robust to both outliers in the predictor and the response variable. For their computation, a fast algorithm over a grid of quantile values is proposed. The robustness of the method is shown in a simulation study and on two real data examples.  相似文献   

7.
This note examines the problem of statistical inference of the initial states of a linear discrete dynamic system based on a set of cross-sectional data. Several compressed data structures are proposed to reduce the amount of the cross-sectional data obtained from multiple independent experiments. It is shown that these data structures are sufficient statistics for estimating the mean and the covariance of the initial states, given the entire raw data from multiple experiments. Thus, the identification and the validation of these parameters can be performed with reduced data structures without referring back to the entire raw data and the original dynamics. For the identification of these parameters, theE-Mprocedure presented in [1] can be applied to this case. For the validation of these parameters having specified values, simple tests of "significance" type are proposed. The major advantage of these tests over the generalized likelihood ratio test is that their probability distributions are known and computable under both the null and the alternative hypotheses even for the finite sample case, i.e., the asymptotic assumption is not necessary.  相似文献   

8.
There has been a growing recognition that issues of data quality, which are routine in practice, can materially affect the assessment of learned model performance. In this paper, we develop some analytic results that are useful in sizing the biases associated with tests of discriminatory model power when these are performed using corrupt (“noisy”) data. As it is sometimes unavoidable to test models with data that are known to be corrupt, we also provide some guidance on interpreting results of such tests. In some cases, with appropriate knowledge of the corruption mechanism, the true values of the performance statistics such as the area under the ROC curve may be recovered (in expectation), even when the underlying data have been corrupted. We also provide estimators of the standard errors of such recovered performance statistics. An analysis of the estimators reveals interesting behavior including the observation that “noisy” data does not “cancel out” across models even when the same corrupt data set is used to test multiple candidate models. Because our results are analytic, they may be applied in a broad range of settings and this can be done without the need for simulation.  相似文献   

9.
Multivariate kernel density estimation provides information about structure in data. Feature significance is a technique for deciding whether features-such as local extrema-are statistically significant. This paper proposes a framework for feature significance in d-dimensional data which combines kernel density derivative estimators and hypothesis tests for modal regions. For the gradient and curvature estimators distributional properties are given, and pointwise test statistics are derived. The hypothesis tests extend the two-dimensional feature significance ideas of Godtliebsen et al. [Godtliebsen, F., Marron, J.S., Chaudhuri, P., 2002. Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics 11, 1-21]. The theoretical framework is complemented by novel visualization for three-dimensional data. Applications to real data sets show that tests based on the kernel curvature estimators perform well in identifying modal regions. These results can be enhanced by corresponding tests with kernel gradient estimators.  相似文献   

10.
Nateghi Haredasht  Fateme  Vens  Celine 《Machine Learning》2022,111(11):4139-4157

Many clinical studies require the follow-up of patients over time. This is challenging: apart from frequently observed drop-out, there are often also organizational and financial challenges, which can lead to reduced data collection and, in turn, can complicate subsequent analyses. In contrast, there is often plenty of baseline data available of patients with similar characteristics and background information, e.g., from patients that fall outside the study time window. In this article, we investigate whether we can benefit from the inclusion of such unlabeled data instances to predict accurate survival times. In other words, we introduce a third level of supervision in the context of survival analysis, apart from fully observed and censored instances, we also include unlabeled instances. We propose three approaches to deal with this novel setting and provide an empirical comparison over fifteen real-life clinical and gene expression survival datasets. Our results demonstrate that all approaches are able to increase the predictive performance over independent test data. We also show that integrating the partial supervision provided by censored data in a semi-supervised wrapper approach generally provides the best results, often achieving high improvements, compared to not using unlabeled data.

  相似文献   

11.
Dallas and Rao (Biometrics 56 (2000) 154) proposed a class of permutation tests for testing the equality of two survival distributions based on randomly right censored survival time data consisting of both paired and unpaired observations. Data sets of this type can occur frequently in medical settings. Two members of this class were advocated for use due to their generally high power for detecting scale and location shifts in the exponential and log-logistic distributions for the survival times, and improved power over paired data test procedures that disregard unpaired observations. Because the computations for the tests become quite laborious as the sample sizes increase, computing routines are required for practical implementation of these tests. This paper provides computing routines to execute the tests.  相似文献   

12.
The GCD test and the Banerjee–Wolfe test are the two tests traditionally used to determine statement data dependence, subject to direction vectors, in automatic vectorization/parallelization of loops. In an earlier study, a sufficient condition for the accuracy of the Banerjee–Wolfe test was stated and proved. In that work, we only considered the case of general data dependence, i.e., the case of data dependence without direction vector information. In this paper, we extend the previous result to the case of data dependence subject to an arbitrary direction vector. We also state and prove a sufficient condition for the accuracy of a combination of the GCD and Banerjee–Wolfe tests. Furthermore, we show that the sufficient conditions, for the accuracy of the Banerjee–Wolfe test and the accuracy of a combination of the GCD and Banerjee–Wolfe tests are necessary conditions as well. Finally, we demonstrate how these results can be used in actual practice to obtain exact data dependence information.  相似文献   

13.
In clinical trials, information about certain time points may be of interest in making decisions about treatment effectiveness. Therefore, rather than comparing entire survival curves, researchers may wish to focus the comparison on fixed time points with potential clinical utility. For two independent samples of right-censored data, Klein et al. (2007) compared survival probabilities at a fixed time point by studying a number of tests based on transformations of the Kaplan-Meier estimators of the survival function. To compare the survival probabilities at a fixed time point for paired right-censored data or clustered right-censored data, however, their approach requires modification. In this paper, we extend the statistics to accommodate possible within-pair and within-cluster correlation. We use simulation studies to present comparative results. Finally, we illustrate the implementation of these methods using two real data sets.  相似文献   

14.
A variation of maximum likelihood estimation (MLE) of parameters that uses probability density functions of order statistic is presented. Results of this method are compared with traditional maximum likelihood estimation for complete and right-censored samples in a life test. Further, while the concept can be applied to most types of censored data sets, results are presented in the case of order statistic interval censoring, in which even a few order statistics estimate well, compared to estimates from complete and right-censored samples. Distributions investigated include the exponential, Rayleigh, and normal distributions. Computation methods using A Probability Programming Language running in Maple are more straightforward than existing methods using various numerical method algorithms.  相似文献   

15.
ABSTRACT

The conventional convolutional neural network (CNN) has proven to be effective for synthetic aperture radar (SAR) target recognition. However, the relationship between different convolutional kernels is not taken into account. The lack of the relationship limits the feature extraction capability of the convolutional layer to a certain extent. To address this problem, this paper presents a novel method named weighted kernel CNN (WKCNN). WKCNN integrates a weighted kernel module (WKM) into the common CNN architecture. The WKM is proposed to model the interdependence between different kernels, and thus to improve the feature extraction capability of the convolutional layer. The WKM consists of variables and activations. The variable represents the weight of the convolutional kernel. The activation is a mapping function which is used to determine the range of the weight. To adjust the variable adaptively, back propagation (BP) algorithm for the WKM is derived. The training of the WKM is driven by optimizing the cost function according to the BP algorithm, and three training modes are presented and analysed. SAR target recognition experiments are conducted on the moving and stationary target acquisition and recognition (MSTAR) dataset, and the results show the superiority of the proposed method.  相似文献   

16.
In this paper, we establish several recurrence relations for the single and product moments of progressively Type-II right censored order statistics from a half-logistic distribution. The use of these relations in a systematic recursive manner would enable one to compute all the means, variances and covariances of progressively Type-II right censored order statistics from the half-logistic distribution for all sample sizes n, effective sample sizes m, and all progressive censoring schemes (R1,…,Rm). The results established here generalize the corresponding results for the usual order statistics due to Balakrishnan (1985). These moments are then utilized to derive best linear unbiased estimators of the scale and location-scale parameters of the half-logistic distribution. A comparison of these estimators with the maximum likelihood estimates is then made. The best linear unbiased predictors of censored failure times is then discussed briefly. Finally, two numerical examples are presented to illustrate all the inferential methods developed here.  相似文献   

17.
Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features.  相似文献   

18.
Portmanteau test statistics represent useful diagnostic tools for checking the adequacy of multivariate time series models. For stationary and partially non-stationary vector time series models, Duchesne and Roy [Duchesne, P., Roy, R., 2004. On consistent testing for serial correlation of unknown form in vector time series models. Journal of Multivariate Analysis 89, 148-180] and Duchesne [Duchesne, P., 2005a. Testing for serial correlation of unknown form in cointegrated time series models. Annals of the Institute of Statistical Mathematics 57, 575-595] have proposed kernel-based test statistics, obtained by comparing the spectral density of the errors under the null hypothesis of non-correlation with a kernel-based spectral density estimator; these test statistics are asymptotically standard normal under the null hypothesis of non-correlation in the error term of the model. Following the method of Chen and Deo [Chen, W.W., Deo, R.S., 2004a. Power transformations to induce normality and their applications. Journal of the Royal Statistical Society, Ser. B 66, 117-130], we determine an appropriate power transformation to improve the normal approximation in small samples. Additional corrections for the mean and variance of the distance measures intervening in these test statistics are obtained. An alternative procedure to estimate the finite distribution of the test statistics is to use the bootstrap method; we introduce bootstrap-based versions of the original spectral test statistics. In a Monte Carlo study, comparisons are made under various alternatives between: the original spectral test statistics, the new corrected test statistics, the bootstrap-based versions, and finally the classical Hosking portmanteau test statistic.  相似文献   

19.
The likelihood approach based on the empirical distribution functions is a well-accepted statistical tool for testing. However, the proof schemes of the Neyman–Pearson type lemmas induce consideration of density-based likelihood ratios to obtain powerful test statistics. In this article, we introduce the distribution-free density-based likelihood technique, applied to test for goodness-of-fit. We focus on tests for normality and uniformity, which are common tasks in applied studies. The well-known goodness-of-fit tests based on sample entropy are shown to be a product of the proposed empirical likelihood (EL) methodology. Although the efficiency of test statistics based on classes of entropy estimators has been widely addressed in the statistical literature, estimation of the sample entropy has been not invariantly defined, and hence this estimation produces tests that are difficult to be applied to real data studies. The proposed EL approach defines clear forms of the entropy-based tests. Monte Carlo simulation results confirm the preference of the proposed method from a power perspective. Real data examples study the proposed approach in practice.  相似文献   

20.
NPSTAT compares the location (median), dispersion (variance), and overall shape of 2 or more groups of data (samples), using the Kruskal-Wallis and Van der Waerden tests (> 2 groups), Mann-Whitney and Kolmogorov-Smirnov tests (2 groups), and squared ranks test (2 or more groups). Exact or approximate significance levels of test statistics are calculated in all situations. Multivariate data are treated one variable at a time. NPSTAT reproduces results from textbooks despite previous inconsistencies with calculation and significance assessment methods. Commercial routines (e.g. in MINITAB, NAG) cover only 3 of the 5 tests programmed, but yield consistent results for these. NPSTAT is useful particularly for comparing nonnormally distributed data, multivariate data with missing values, and data measured only on an ordinal scale. It also can be used to assess outlying values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号