首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 42 毫秒
1.
We present a Bayesian model for two-way ANOVA-type analysis of high-dimensional, small sample-size datasets with highly correlated groups of variables. Modern cellular measurement methods are a main application area; typically the task is differential analysis between diseased and healthy samples, complicated by additional covariates requiring a multi-way analysis. The main complication is the combination of high dimensionality and low sample size, which renders classical multivariate techniques useless. We introduce a hierarchical model which does dimensionality reduction by assuming that the input variables come in similarly-behaving groups, and performs an ANOVA-type decomposition for the set of reduced-dimensional latent variables. We apply the methods to study lipidomic profiles of a recent large-cohort human diabetes study.  相似文献   

2.
In comparing the mean count of two independent samples, some practitioners would use the t-test or the Wilcoxon rank sum test while others may use methods based on a Poisson model. It is not uncommon to encounter count data that exhibit overdispersion where the Poisson model is no longer appropriate. This paper deals with methods for overdispersed data using the negative binomial distribution resulting from a Poisson-Gamma mixture. We investigate the small sample properties of the likelihood-based tests and compare their performances to those of the t-test and of the Wilcoxon test. We also illustrate how these procedures may be used to compute power and sample sizes to design studies with response variables that are overdispersed count data. Although methods are based on inferences about two independent samples, sample size calculations may also be applied to problems comparing more than two independent samples. It will be shown that there is gain in efficiency when using the likelihood-based methods compared to the t-test and the Wilcoxon test. In studies where each observation is very costly, the ability to derive smaller sample size estimates with the appropriate tests is not only statistically, but also financially, appealing.  相似文献   

3.
We develop a Bayesian approach to sample size and power calculations for cross-sectional studies that are designed to evaluate and compare continuous medical tests. For studies that involve one test or two conditionally independent or dependent tests, we present methods that are applicable when the true disease status of sampled individuals will be available and when it will not. Within a hypothesis testing framework, we consider the goal of demonstrating that a medical test has area under the receiver operating characteristic (ROC) curve that exceeds a minimum acceptable level or another relevant threshold, and the goals of establishing the superiority or equivalence of one test relative to another. A Bayesian average power criterion is used to determine a sample size that will yield high posterior probability, on average, of a future study correctly deciding in favor of these goals. The impacts on Bayesian average power of prior distributions, the proportion of diseased subjects in the study, and correlation among tests are investigated through simulation. The computational algorithm we develop involves simulating multiple data sets that are fit with Bayesian models using Gibbs sampling, and is executed by using WinBUGS in tandem with R.  相似文献   

4.
Sample size determination is essential to planning clinical trials. Jung (2008) established a sample size calculation formula for paired right-censored data based on the logrank test, which has been well-studied for comparing independent survival outcomes. An alternative to rank-based methods for independent right-censored data, advocated by Pepe and Fleming (1989), tests for differences between integrated weighted Kaplan–Meier estimates and is more sensitive to the magnitude of difference in survival times between groups. In this paper, we employ the concept of the Pepe–Fleming method to determine an adequate sample size by calculating differences between Kaplan–Meier estimators considering pair-wise correlation. We specify a positive stable frailty model for the joint distribution of paired survival times. We evaluate the performance of the proposed method by simulation studies and investigate the impacts of the accrual times, follow-up times, loss to follow-up rate, and sensitivity of power under misspecification of the model. The results show that ignoring the pair-wise correlation results in overestimating the required sample size. Furthermore, the proposed method is applied to two real-world studies, and the R code for sample size calculation is made available to users.  相似文献   

5.
Lehky SR 《Neural computation》2004,16(7):1325-1343
A Bayesian method is developed for estimating neural responses to stimuli, using likelihood functions incorporating the assumption that spike trains follow either pure Poisson statistics or Poisson statistics with a refractory period. The Bayesian and standard estimates of the mean and variance of responses are similar and asymptotically converge as the size of the data sample increases. However, the Bayesian estimate of the variance of the variance is much lower. This allows the Bayesian method to provide more precise interval estimates of responses. Sensitivity of the Bayesian method to the Poisson assumption was tested by conducting simulations perturbing the Poisson spike trains with noise. This did not affect Bayesian estimates of mean and variance to a significant degree, indicating that the Bayesian method is robust. The Bayesian estimates were less affected by the presence of noise than estimates provided by the standard method.  相似文献   

6.
The aim of dose-ranging phase I (resp. phase II) clinical trials is to rapidly identify the maximum tolerated dose (MTD) (resp., minimal effective dose (MED)) of a new drug or combination. For the conduct and analysis of such trials, Bayesian approaches such as the Continual Reassessment Method (CRM) have been proposed, based on a sequential design and analysis up to a completed fixed sample size. To optimize sample sizes, Zohar and Chevret have proposed stopping rules (Stat. Med. 20 (2001) 2827), the computation of which is not provided by available softwares. We present in this paper a user-friendly software for the design and analysis of these Bayesian Phase I (resp. phase II) dose-ranging Clinical Trials (BPCT). It allows to carry out the CRM with stopping rules or not, from the planning of the trial, with choice of model parameterization based on its operating characteristics, up to the sequential conduct and analysis of the trial, with estimation at stopping of the MTD (resp. MED) of the new drug or combination.  相似文献   

7.
When measuring units are expensive or time consuming, while ranking them can be done easily, it is known that ranked set sampling (RSS) is preferred to simple random sampling (SRS). Available results for RSS are developed under specific parametric assumptions or are asymptotic in nature, with few results available for finite size samples when the underlying distribution of the observed data is unknown. We investigate the use of resampling techniques to draw inferences on population characteristics. To obtain standard error and confidence interval estimates we discuss and compare three methods of resampling a given ranked set sample. Chen et al. (2004. Ranked Set Sampling: Theory and Applications. Springer, New York) suggest a natural method to obtain bootstrap samples from each row of a RSS. We prove that this method is consistent for a location estimator. We propose two other methods that are designed to obtain more stratified resamples from the given sample. Algorithms are provided for these methods. We recommend a method that obtains a bootstrap RSS from the observations. We prove several properties of this method, including consistency for a location parameter. We define two types of L-estimators for RSS and obtain expressions for their exact moments. We discuss an application to obtain confidence intervals for the Winsorized mean of a RSS.  相似文献   

8.
Finite mixture is widely used in the fields of information processing and data analysis. However, its model selection, i.e., the selection of components in the mixture for a given sample data set, has been still a rather difficult task. Recently, the Bayesian Ying-Yang (BYY) harmony learning has provided a new approach to the Gaussian mixture modeling with a favorite feature that model selection can be made automatically during parameter learning. In this paper, based on the same BYY harmony learning framework for finite mixture, we propose an adaptive gradient BYY learning algorithm for Poisson mixture with automated model selection. It is demonstrated well by the simulation experiments that this adaptive gradient BYY learning algorithm can automatically determine the number of actual Poisson components for a sample data set, with a good estimation of the parameters in the original or true mixture where the components are separated in a certain degree. Moreover, the adaptive gradient BYY learning algorithm is successfully applied to texture classification.  相似文献   

9.
This paper presents an incremental algorithm for image classification problems. Virtual labels are automatically formed by clustering in the output space. These virtual labels are used for the process of deriving discriminating features in the input space. This procedure is performed recursively in a coarse-to-fine fashion resulting in a tree, performing incremental hierarchical discriminating regression (IHDR). Embedded in the tree is a hierarchical probability distribution model used to prune unlikely cases. A sample size dependent negative-log-likelihood (NLL) metric is introduced to deal with large sample-size cases, small sample-size cases, and unbalanced sample-size cases, measured among different internal nodes of the IHDR algorithm. We report the experimental results of the proposed algorithm for an OCR classification problem and an image orientation classification problem. Received: November 20, 2001 / Accepted: May 10, 2002  相似文献   

10.
We derive a profile-likelihood confidence interval and a score based confidence interval to estimate the population prevalences, test sensitivities, and test specificities of two conditionally independent diagnostic tests when no gold standard is available. We are motivated by a real-data example on the study of the properties for two fallible diagnostic tests for bovine immunodeficiency virus. We compare the coverage and average width of two new intervals with an interval based on the asymptotic normality of the maximum likelihood estimator and a Bayesian interval estimator via Monte Carlo simulation. We determine that for the parameter configurations considered here, the profile-likelihood, score, and Bayesian intervals all perform adequately in terms of coverage, but overall, the profile-likelihood interval performs best in terms of yielding at least nominal coverage with minimum expected width.  相似文献   

11.
Arto Klami 《Machine Learning》2013,92(2-3):225-250
Matching of object refers to the problem of inferring unknown co-occurrence or alignment between observations or samples in two data sets. Given two sets of equally many samples, the task is to find for each sample a representative sample in the other set, without prior knowledge on a distance measure between the sets. Given a distance measure, the problem would correspond to a linear assignment problem, the problem of finding a permutation that re-orders samples in one set to minimize the total distance. When no such measure is available, we need to consider more complex solutions. Typical approaches maximize statistical dependency between the two sets, whereas in this work we present a Bayesian solution that builds a joint model for the two sources. We learn a Bayesian canonical correlation analysis model that includes a permutation parameter for re-ordering the samples in one of the sets. We provide both variational and sampling-based inference for approximative Bayesian analysis, and demonstrate on three data sets that the resulting methods outperform the earlier solutions.  相似文献   

12.
In this paper we describe sample elimination for generating Poisson disk sample sets with a desired size. We introduce a greedy sample elimination algorithm that assigns a weight to each sample in a given set and eliminates the ones with greater weights in order to pick a subset of a desired size with Poisson disk property without having to specify a Poisson disk radius. This new algorithm is simple, computationally efficient, and it can work in any sampling domain, producing sample sets with more pronounced blue noise characteristics than dart throwing. Most importantly, it allows unbiased progressive (adaptive) sampling and it scales better to high dimensions than previous methods. However, it cannot guarantee maximal coverage. We provide a statistical analysis of our algorithm in 2D and higher dimensions as well as results from our tests with different example applications.  相似文献   

13.
Many problems in vision can be formulated as Bayesian inference. It is important to determine the accuracy of these inferences and how they depend on the problem domain. In this paper, we provide a theoretical framework based on Bayesian decision theory which involves evaluating performance based on an ensemble of problem instances. We pay special attention to the task of detecting a target in the presence of background clutter. This framework is then used to analyze the detectability of curves in images. We restrict ourselves to the case where the probability models are ergodic (both for the geometry of the curve and for the imaging). These restrictions enable us to use techniques from large deviation theory to simplify the analysis. We show that the detectability of curves depend on a parameter K which is a function of the probability distributions characterizing the problem. At critical values of K the target becomes impossible to detect on average. Our framework also enables us to determine whether a simpler approximate model is sufficient to detect the target curve and hence clarify how much information is required to perform specific tasks. These results generalize our previous work (Yuille, A.L. and Coughlan, J.M. 2000. Pattern Analysis and Machine Intelligence PAMI, 22(2):160–173) by placing it in a Bayesian decision theory framework, by extending the class of probability models which can be analyzed, and by analysing the case where approximate models are used for inference.  相似文献   

14.
Efficient and accurate Bayesian Markov chain Monte Carlo methodology is proposed for the estimation of event rates under an overdispersed Poisson distribution. An approximate Gibbs sampling method and an exact independence-type Metropolis-Hastings algorithm are derived, based on a log-normal/gamma mixture density that closely approximates the conditional distribution of the Poisson parameters. This involves a moment matching process, with the exact conditional moments obtained employing an entropy distance minimisation (Kullback-Liebler divergence) criterion. A simulation study is conducted and demonstrates good Bayes risk properties and robust performance for the proposed estimators, as compared with other estimating approaches under various loss functions. Actuarial data on insurance claims are used to illustrate the methodology. The approximate analysis displays superior Markov chain Monte Carlo mixing efficiency, whilst providing almost identical inferences to those obtained with exact methods.  相似文献   

15.
Bayesian inferences for complex models need to be made by approximation techniques, mainly by Markov chain Monte Carlo (MCMC) methods. For these models, sensitivity analysis is a difficult task. A novel computationally low-cost approach to estimate local parametric sensitivities in Bayesian models is proposed. This method allows to estimate the sensitivity measures and their errors with the same random sample that has been generated to estimate the quantity of interest. Conditions to allow a derivative-integral interchange in the operator of interest are required. Two illustrative examples have been considered to show how sensitivity computations with respect to the prior distribution and the loss function are easily obtained in practice.  相似文献   

16.
Drop-the-losers designs were introduced for normal distributions as a method of combining phase II and III clinical trials together under a single protocol with the purpose of more rapidly evaluating drugs by eliminating as much as possible the delays that typically occur between the two phases of clinical development. In the design, the sponsor would administer k treatments along with a control in the first stage. During a brief interim period, efficacy data would be used to select the best treatment (with a rule to deal with ties) for further evaluation against the control in a second stage. At the end of the study, data from both stages would be used to draw inferences about the selected treatment relative to the control with adjustments made for selection in between the two stages. Because the inferences are model based, exact confidence intervals can be determined for the parameter of interest. In the present case, the parameter of concern is the probability of a beneficial response that is dichotomous in nature.  相似文献   

17.
In this study the authors analyse the International Software Benchmarking Standards Group data repository, Release 8.0. The data repository comprises project data from several different companies. However, the repository exhibits missing data, which must be handled in an appropriate manner, otherwise inferences may be made that are biased and misleading. The authors re-examine a statistical model that explained about 62% of the variability in actual software development effort (Summary Work Effort) which was conditioned on a sample from the repository of 339 observations. This model exhibited covariates Adjusted Function Points and Maximum Team Size and dependence on Language Type (which includes categories 2nd, 3rd, 4th Generation Languages and Application Program Generators) and Development Type (enhancement, new development and re-development). The authors now use Bayesian inference and the Bayesian statistical simulation program, BUGS, to impute missing data avoiding deletion of observations with missing Maximum Team size and increasing sample size to 616. Providing that by imputing data distributional biases are not introduced, the accuracy of inferences made from models that fit the data will increase. As a consequence of imputation, models that fit the data and explain about 59% of the variability in actual effort are identified. These models enable new inferences to be made about Language Type and Development Type. The sensitivity of the inferences to alternative distributions for imputing missing data is also considered. Furthermore, the authors contemplate the impact of these distributions on the explained variability of actual effort and show how valid effort estimates can be derived to improve estimate consistency.  相似文献   

18.
Although a sequential test, in general, requires a smaller sample size on the average, most clinical trials are not designed in a sequential approach. One of the reasons is that the sample size by a sequential test can exceed that required by an equivalent fixed-sample size test. Truncation of a triangular sequential design is considered in this paper. It is intended so that the maximum sample size for the test becomes approximately equal to the sample size for the fixed-sample test of about the same power. The method is intended for one-sided group sequential tests where the treatment group is compared to the control group for the difference in proportions. The method is illustrated using a clinical trial of head injury patients. Comparisons with other group sequential tests suggested that the proposed method may provide a more efficient test in clinical trials.  相似文献   

19.
Nonhomogeneous Poisson process (NHPP) also known as Weibull process with power law, has been widely used in modeling hardware reliability growth and detecting software failures. Although statistical inferences on the Weibull process have been studied extensively by various authors, relevant discussions on predictive analysis are scattered in the literature. It is well known that the predictive analysis is very useful for determining when to terminate the development testing process. This paper presents some results about predictive analyses for Weibull processes. Motivated by the demand on developing complex high-cost and high-reliability systems (e.g., weapon systems, aircraft generators, jet engines), we address several issues in single-sample and two-sample prediction associated closely with development testing program. Bayesian approaches based on noninformative prior are adopted to develop explicit solutions to these problems. We will apply our methodologies to two real examples from a radar system development and an electronics system development.  相似文献   

20.
多模块贝叶斯网络中推理的简化   总被引:3,自引:0,他引:3  
多模块贝叶斯网络(MSBN)引入了模块化和面向对象思想,是复杂大系统建模的有力工具.目前,如何简化MSBN中局部和全局推理的时空复杂度已成为影响其应用的关键问题.首先分析了用于局部贝叶斯网络推理的两类经典算法的时空复杂度,证明了它们本质上的一致性,并给出了统一的理论解释;进而用实验证明了影响推理复杂度的决定性因素是网络模型相应导出图的导出宽度,并指出了可以精确推理的贝叶斯网络族.最后,分析了降低MSBN全局推理复杂度的可行性,给出了简化MSBN全局推理的指导性原则.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号