首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
Water quality studies often include the analytical challenge of incorporating censored data and quantifying error of estimation. Many analytical methods exist for estimating distribution parameters when censored data are present. This paper presents a Bayesian-based hierarchical model for estimating the national distribution of the mean concentrations of chemicals occurring in U.S. public drinking water systems using fluoride and thallium as examples. The data used are Safe Drinking Water Act compliance monitoring data (with a significant proportion of left-censored data). The model, which assumes log-normality, was evaluated using simulated data sets generated from a series of Weibull distributions to illustrate the robustness of the model. The hierarchical model is easily implemented using the Markov chain Monte Carlo simulation method. In addition, the Bayesian method is able to quantify the uncertainty in the estimated cumulative density function. The estimated fluoride and thallium national distributions are presented. Results from this study can be used to develop prior distributions for future U.S. drinking water regulatory studies of contaminant occurrence.  相似文献   

2.
Species sensitivity distributions (SSDs) are increasingly used to analyze toxicity data but have been criticized for a lack of consistency in data inputs, lack of relevance to the real environment, and a lack of transparency in implementation. This paper shows how the Bayesian approach addresses concerns arising from frequentist SSD estimation. Bayesian methodologies are used to estimate SSDs and compare results obtained with time-dependent (LC50) and time-independent (predicted no observed effect concentration) endpoints for the insecticide chlorpyrifos. Uncertainty in the estimation of each SSD is obtained either in the form of a pointwise percentile confidence interval computed by bootstrap regression or an associated credible interval. We demonstrate that uncertainty in SSD estimation can be reduced by applying a Bayesian approach that incorporates expert knowledge and that use of Bayesian methodology permits estimation of an SSD that is more robust to variations in data. The results suggest that even with sparse data sets theoretical criticisms of the SSD approach can be overcome.  相似文献   

3.
Every year hundreds of thousands, if not millions, of samples are collected and analyzed to assess microbial contamination in food and water. The concentration of pathogenic organisms at the end of the production process is low for most commodities, so a highly sensitive screening test is used to determine whether the organism of interest is present in a sample. In some applications, samples that test positive are subjected to quantitation. The most probable number (MPN) technique is a common method to quantify the level of contamination in a sample because it is able to provide estimates at low concentrations. This technique uses a series of dilution count experiments to derive estimates of the concentration of the microorganism of interest. An application for these data is food-safety risk assessment, where the MPN concentration estimates can be fitted to a parametric distribution to summarize the range of potential exposures to the contaminant. Many different methods (e.g., substitution methods, maximum likelihood and regression on order statistics) have been proposed to fit microbial contamination data to a distribution, but the development of these methods rarely considers how the MPN technique influences the choice of distribution function and fitting method. An often overlooked aspect when applying these methods is whether the data represent actual measurements of the average concentration of microorganism per milliliter or the data are real-valued estimates of the average concentration, as is the case with MPN data. In this study, we propose two methods for fitting MPN data to a probability distribution. The first method uses a maximum likelihood estimator that takes average concentration values as the data inputs. The second is a Bayesian latent variable method that uses the counts of the number of positive tubes at each dilution to estimate the parameters of the contamination distribution. The performance of the two fitting methods is compared for two data sets that represent Salmonella and Campylobacter concentrations on chicken carcasses. The results demonstrate a bias in the maximum likelihood estimator that increases with reductions in average concentration. The Bayesian method provided unbiased estimates of the concentration distribution parameters for all data sets. We provide computer code for the Bayesian fitting method.  相似文献   

4.
Characterizing contaminant occurrences in China's centralized source waters can provide an understanding of source water quality for stakeholders. The single-factor (i.e., worst contaminant) water-quality assessment method, commonly used in Chinese official analysis and publications, provides a qualitative summary of the country's water-quality status but does not specify the extent and degree of specific contaminant occurrences at the national level. Such information is needed for developing scientifically sound management strategies. This article presents a Bayesian hierarchical modeling approach for estimating contaminant concentration distributions in China's centralized source waters using arsenic and fluoride as examples. The data used are from the most recent national census of centralized source waters in 2006. The article uses three commonly used source water stratification methods to establish alternative hierarchical structures reflecting alternative model assumptions as well as competing management needs in characterizing pollutant occurrences. The results indicate that the probability of arsenic exceeding the standard of 0.05 mg/L is about 0.96-1.68% and the probability of fluoride exceeding 1 mg/L is about 9.56-9.96% nationally, both with strong spatial patterns. The article also discusses the use of the Bayesian approach for establishing a source water-quality information management system as well as other applications of our methods.  相似文献   

5.
In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. Without detailed biological information on the genes involved, the evaluation of the number of clusters becomes difficult, and we have to rely on an internal measure that is based on the distribution of the data of the clustering result. The Gap statistic has been proposed as a superior method for estimating the number of clusters in crisp clustering. In this study, we proposed a modified Fuzzy Gap statistic (MFGS) and applied it to fuzzy k-means clustering. For estimating the number of clusters, fuzzy k-means clustering with the MFGS was applied to two artificial data sets with noise and to two experimentally observed gene expression data sets. For the artificial data sets, compared with other internal measures, the MFGS showed a higher performance in terms of robustness against noise for estimating the optimal number of clusters. Moreover, it could be used to estimate the optimal number of clusters in experimental data sets. It was confirmed that the proposed MFGS is a useful method for estimating the number of clusters for microarray data sets.  相似文献   

6.
《Journal of dairy science》1987,70(2):331-336
Exact relationships of quadratic forms for restricted maximum likelihood and the pseudo expectation methods were presented for balanced data and a model containing only one random factor. These relationships were extended to unbalanced data as approximations and schemes for improving the convergence rates of the two methods of variance component estimation were compared empirically using simulated data. The proposed scheme for restricted maximum likelihood was found inappropriate since it converged rapidly but to a different final estimate than usual restricted maximum likelihood. The scheme for the pseudo expectation method also converged rapidly and to the same final estimates as the usual pseudo expectation method, and hence is recommended as a means of obtaining a good prior for restricted maximum likelihood.  相似文献   

7.
Two integral methods for kinetic parameter estimation with linear temperature profiles were tested using simulated and experimental data sets. They were the second rational approximation method (SRAM) and the equivalent point method (EPM) using least squares nonlinear regression (LSNR) and weighted least squares nonlinear regression (WLSNR). For three simulated data sets, the SRAM with WLSNR yielded accurate parameter estimation. For the experimental data set, both SRAM and EPM using WLSNR yielded accurate parameter estimation. The standard error in activation energy were 37.3% (for SRAM) and 46.7% (for EPM) lower than that of the differential method. The SRAM with WLSNR was the best parameter estimation procedure.  相似文献   

8.
The usefulness of the variance and covariance component estimation methods based on a threshold model was studied in a multiple-trait situation with two binary traits. Estimation equations that yield marginal maximum likelihood estimates of variance components on the underlying continuous variable scale and point estimates of location parameters with empirical Bayesian properties are described. Methods were tested on simulated data sets that were generated to exhibit three different incidences, 25, 15, and 5%. Results were compared with analyses of the same data sets with a REML method based on normal distribution and a linear model. Heritabilities and residual correlations calculated from discrete observations were transformed to underlying parameters. In estimation of heritabilities, all methods performed equally well at all incidence levels and with no detectable bias. As suggested by threshold theory, the genetic correlation was accurately estimated directly from the observations without any need of correction for incidence. Marginal maximum likelihood estimates of genetic correlations were similar to linear model estimates; discrepancies from the true parameters were consistent with both methods. In estimation of residual correlations, the method with the linear model approach yielded satisfactory estimates only at the highest incidence level, 25%. For 5% incidence, the uncorrected estimate of residual correlation was 50% less than the true value, and after correction for incidence, the parameter was overestimated by 90%. The estimates of residual correlation from the threshold model were regarded fair, except at the lowest level of incidence, where the estimate was 27% higher than the true value. Results indicated that when an accurate estimate of residual correlation is needed, the marginal maximum likelihood estimates are superior to the estimates calculated with the linear model. Using correction for the incidence level for residual correlation did not work well except at the highest incidence level.  相似文献   

9.
Reliable survival parameter estimation is an essential part of building predictive models for microbial survival. It has been demonstrated that these parameters can be accurately identified using a one‐step regression approach that fits a survival model to multiple dynamic data sets at once. However, the existing methods are not quite user‐friendly because their application requires relatively high computer skills. In this study, a recursive equation for the Weibull model was used to construct microbial survival curves under dynamic conditions. Based on this, a procedure was developed to estimate survival parameters by fitting the equation to dynamic survival data sets using the built‐in functions and Solver of Microsoft Excel. The results showed that the method provided an easy and accurate way for estimating microbial survival parameters.  相似文献   

10.
《Journal of dairy science》1986,69(1):187-194
Two computationally simple methods for estimation of variances and covariances, with estimates always within the allowable parameter space, are presented for multiple traits. Both methods involve transformations of data to convert observations into independent sets of traits. A Monte Carlo simulation comparison of eight methods for estimating variances and covariances was conducted. The two proposed methods compared favorably with a general purpose restricted maximum likelihood method. However, one of the methods was slow to converge and requires additional work to make it more useful.  相似文献   

11.
Faced with the fragmented and heterogeneous character of knowledge regarding complex food systems, we have developed a practical methodology, in the framework of the dynamic Bayesian networks associated with Dirichlet distributions, able to incrementally build and update model parameters each time new information is available whatever its source and format. From a given network structure, the method consists in using a priori Dirichlet distributions that may be assessed from literature, empirical observations, experts opinions, existing models, etc. Next, they are successively updated by using Bayesian inference and the expected a posteriori each time new or additional information is available and can be formulated into a frequentist form. This method also enables to take (1) uncertainties pertaining to the system; (2) the confidence level on the different sources of information into account. The aim is to be able to enrich the model each time a new piece of information is available whatever its source and format in order to improve the representation and thus provide a better understanding of systems. We have illustrated the feasibility and practical using of our approach in a real case namely the modelling of the Camembert-type cheese ripening.  相似文献   

12.
目的:通过食品安全评价指数体系来评价地区食品安全状况。方法:在构建的指数体系中,综合考虑影响食品安全的各种因素,如地市规模、食品种类、检测项目危害、生产地等,并采用多种数据来源对模型进行支撑。结果:利用2021年广西食品安全评价抽检数据进行实证,分别使用经典大样本估计法和经验贝叶斯估计法进行模型计算后,发现基于食品种类的安全指数的评估结果与简单合格率评估结果存在一定差异,但两种算法均显示餐饮食品的安全情况显著低于其他食品大类。此外,使用贝叶斯估计方法有效解决了合格率为100%或数值普遍接近时的计算问题。结论:该模型可以实现从地区、食品种类、地区和食品种类等多角度、不同层次计算得到食品安全指数结果。  相似文献   

13.
Single-step genomic prediction models utilizing both genotyped and nongenotyped animals are likely to become the prevailing tool in genetic evaluations of livestock. Various single-step prediction models have been proposed, based either on estimation of individual marker effects or on direct prediction via a genomic relationship matrix. In this study, a classical pedigree-based animal model, a regular single-step genomic BLUP (ssGBLUP) model, algorithm for proven and young (APY) with 2 strategies for choosing core animals, and a single-step Bayesian regression (ssBR) model were compared for 305-d production traits (milk, fat, protein) in the Finnish red dairy cattle population. A residual polygenic effect with 10% of total genetic variance was included in the single-step models to reduce inflation of genomic predictions. Validation reliability was calculated as the squared Pearson correlation coefficient between genomically enhanced breeding value (GEBV) and yield deviation for masked records for 2,056 validation cows from the last year in the data set investigated. The results showed that gains of 0.02 to 0.04 on validation reliability were achieved by using single-step methods compared with the classical animal model. The regular ssGBLUP model and ssBR model with an extra polygenic effect yielded the same results. The APY methods yielded similar reliabilities as the regular ssGBLUP and ssBR. Exact prediction error variance of GEBV could be obtained by ssBR to avoid any approximation methods used for ssGBLUP when inversion left-hand side of mixed model equations is computationally infeasible for large data sets.  相似文献   

14.
Linear mixed models, for which the prior multivariate normal distributions of random effects are assumed to have a mean equal to 0, are commonly used in animal breeding. However, some statistical analyses (e.g., the consideration of a population under selection into a genomic scheme breeding, multiple-trait predictions of lactation yields, and Bayesian approaches integrating external information into genetic evaluations) need to alter both the mean and (co)variance of the prior distributions and, to our knowledge, most software packages available in the animal breeding community do not permit such alterations. Therefore, the aim of this study was to propose a method to alter both the mean and (co)variance of the prior multivariate normal distributions of random effects of linear mixed models while using currently available software packages. The proposed method was tested on simulated examples with 3 different software packages available in animal breeding. The examples showed the possibility of the proposed method to alter both the mean and (co)variance of the prior distributions with currently available software packages through the use of an extended data file and a user-supplied (co)variance matrix.  相似文献   

15.
Few studies have demonstrated changes in community structure along a contaminant plume in terms of phylogenetic, functional, and geochemical changes, and such studies are essential to understand how a microbial ecosystem responds to perturbations. Clonal libraries of multiple genes (SSU rDNA, nirK, nirS, amoA, pmoA, and dsrAB) were analyzed from groundwater samples (n = 6) that varied in contaminant levels, and 107 geochemical parameters were measured. Principal components analyses (PCA) were used to compare the relationships among the sites with respect to the biomarker (n = 785 for all sequences) distributions and the geochemical variables. A major portion of the geochemical variance measured among the samples could be accounted for by tetrachloroethene, 99Tc, No3, SO4, Al, and Th. The PCA based on the distribution of unique biomarkers resulted in different groupings compared to the geochemical analysis, but when the SSU rRNA gene libraries were directly compared (deltaC(xy) values) the sites were clustered in a similar fashion compared to geochemical measures. The PCA based upon functional gene distributions each predicted different relationships among the sites, and comparisons of Euclidean distances based upon diversity indices for all functional genes (n = 432) grouped the sites by extreme or intermediate contaminant levels. The data suggested that the sites with low and high perturbations were functionally more similar than sites with intermediate conditions, and perhaps captured the overall community structure better than a single phylogenetic biomarker. Moreover, even though the background site was phylogenetically and geochemically distinct from the acidic sites, the extreme conditions of the acidic samples might be more analogous to the limiting nutrient conditions of the background site. An understanding of microbial community-level responses within an ecological framework would provide better insight for restoration strategies at contaminated field sites.  相似文献   

16.
A common problem in animal breeding research is estimation of variance and covariance components. Usual methods of estimation have been described by Henderson in 1953. In 1973 Henderson reported a computing algorithm for maximum likelihood estimation of variance components which may have been overlooked. This note reviews the Henderson computing technique and illustrates application of maximum likelihood estimation to two specific types of models for large, unbalanced data sets.  相似文献   

17.
Microbiological contamination data often is censored because of the presence of non-detects or because measurement outcomes are known only to be smaller than, greater than, or between certain boundary values imposed by the laboratory procedures. Therefore, it is not straightforward to fit distributions that summarize contamination data for use in quantitative microbiological risk assessment, especially when variability and uncertainty are to be characterized separately. In this paper, distributions are fit using Bayesian analysis, and results are compared to results obtained with a methodology based on maximum likelihood estimation and the non-parametric bootstrap method. The Bayesian model is also extended hierarchically to estimate the effects of the individual elements of a covariate such as, for example, on a national level, the food processing company where the analyzed food samples were processed, or, on an international level, the geographical origin of contamination data. Including this extra information allows a risk assessor to differentiate between several scenario’s and increase the specificity of the estimate of risk of illness, or compare different scenario’s to each other. Furthermore, inference is made on the predictive importance of several different covariates while taking into account uncertainty, allowing to indicate which covariates are influential factors determining contamination.  相似文献   

18.
The usefulness of risk assessment is limited by its ability or inability to model and evaluate risk uncertainty and variability separately. A key factor of variability and uncertainty in microbial risk assessment could be growth variability between strains and growth model parameter uncertainty. In this paper, we propose a Bayesian procedure for growth parameter estimation which makes it possible to separate these two components by means of hyperparameters. This model incorporates in a single step the logistic equation with delay as a primary growth model and the cardinal temperature equation as a secondary growth model. The estimation of Listeria monocytogenes growth parameters in milk using literature data is proposed as a detailed application. While this model should be applied on genuine data, it is highlighted that the proposed approach may be convenient for estimating the variability and uncertainty of growth parameters separately, using a complete predictive microbiology model.  相似文献   

19.
Several single marker association and haplotypic analyses have been performed to identify susceptible genes for various common diseases, but these approaches using candidate genes did not provide accurate and consistent evidence in each analysis. This inconsistency is partly due to the fact that the common diseases are caused by complex interactions among various genetic factors. Therefore, in this study, to evaluate exhaustive genotype or allele combinations, we applied the binomial and random permutation test (BRP) proposed by Tomita et al. [IPSJ Digital Courier, 2, 691-709 (2006)] for the association analysis between an Apolipoprotein L gene cluster and schizophrenia. Using the seven selected representative single nucleotide polymorphisms (SNPs) based on the results of linkage disequilibrium evaluation, we analyzed 845 schizophrenic patients and 707 healthy controls, and investigated the validation of risk and protective factors with two randomly divided data sets. A comparative study of a method for analyzing the interactions was performed by conventional methods. Even if all the tested methods were used for analysis, the risk factor with a high significance was not commonly selected from both independent data sets. However, the significant interactions for the protective factor against disease development were commonly obtained from both data sets by BRP analysis. In conclusion, although it is considered that the causality of schizophrenia is too complex to identify a susceptible interaction using a small sample size, it was suggested that the healthy controls tend to have the same combination of certain alleles or genotypes for protection from disease development when BRP as a new exhaustive combination analytical method was used.  相似文献   

20.
《International Dairy Journal》2005,15(6-9):631-643
Traditionally, trained, experienced judges have classified cheese (type and maturity) with the aid of chemical compositional data. More recently, classification has been attempted using casein, peptide and amino acid data produced using electrophoretic and chromatographic methods. Structure within these large data sets can be determined when they are objectively analysed using multivariate statistical methods. Data can also be correlated to sensory or instrumental textural information. This approach has also been applied to research, quality and regulatory issues associated with cheese, and a wide range of physico-chemical techniques and statistical methods has been used. There is no ‘best’ combination of analytical and statistical methods that can be recommended for every situation. However, for successful classification or differentiation, analytical and statistical methods should be carefully selected and the quality of the data set must be high. An inappropriate or poor data set cannot be ‘rescued’ by using a powerful statistical method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号