首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
When analyzing high-throughput genomic data, the multiple comparison problem is most often addressed through estimation of the false discovery rate (FDR), using methods such as the Benjamini & Hochberg, Benjamini & Yekutieli, the q-value method, or in controlling the family-wise error rate (FWER) using Holm's step down method. To date, research studies that have compared various FDR/FWER methodologies have made use of limited simulation studies and/or have applied the methods to one or more microarray gene expression dataset(s). However, for microarray datasets the veracity of each null hypothesis tested is unknown so that an objective evaluation of performance cannot be rendered for application data. Due to the role of methylation in X-chromosome inactivation, we postulate that high-throughput methylation datasets may provide an appropriate forum for assessing the performance of commonly used FDR methodologies. These datasets preserve the complex correlation structure between probes, offering an advantage over simulated datasets. Using several methylation datasets, commonly used FDR methods including the q-value, Benjamini & Hochberg, and Benjamini & Yekutieli procedures as well as Holm's step down method were applied to identify CpG sites that are differentially methylated when comparing healthy males to healthy females. The methods were compared with respect to their ability to identify CpG sites located on sex chromosomes as significant, by reporting the sensitivity, specificity, and observed FDR. These datasets are useful for characterizing the performance of multiple comparison procedures, and may find further utility in other tasks such as comparing variable selection capabilities of classification methods and evaluating the performance of meta-analytic methods for microarray data.  相似文献   

2.
The kriging estimator and its associated covariance model are introduced as a means of describing the verisimilitude of spatial datasets describing flow-fields in their entirety, and further as a means of interpreting and blending said datasets. In this manner a means of comparing uncertain nodal data from numerical models and experimental flow-field anemometry is developed. For spatial datasets, this activity has heretofore been considered to be a simple extension of established methodologies in validation and verification, which have been developed with the validation of scalar data – lift, drag, point velocity components or pressure, in mind. It is demonstrated that a more complex and complete comparison arises when the entire fields of data are correlated via spatial covariance functions, instead. These spatial covariance functions then inform the subsequent estimation, smoothing and blending of velocity fields; known as cokriging. In this paper, the theoretical model underlying kriging estimation is elucidated, and the techniques are demonstrated with reference to Laser Doppler anemometry and Finite Volume modelling of a subsonic flow of air around an experimental model. It is proposed that by developing spatial correlations between datasets, a more rigorous and flexible model for spatial comparison and validation of flow-fields emerges.  相似文献   

3.
高效用模式的挖掘需要设定一个合适的阈值,而阈值设定对用户来说并非易事,阈值过小导致产生大量低效用模式,阈值过大可能导致无高效用模式生成。因而Top-k高效用模式挖掘方法被提出,k指效用值前k大的模式。并且大量的高效用挖掘研究仅针对静态数据库,但在实际应用中常常会遇到新事务的加入的情况。针对以上问题,提出了增量的Top-k高效用挖掘算法TOPK-HUP-INS。算法通过四个有效的策略,在增量数据的情况下,有效地挖掘用户所需数量的高效用模式。通过在不同数据集上的对比实验表明TOPK-HUP-INS算法在时空性能上表现优异。  相似文献   

4.
This paper proposes a probabilistic variant of the SOM-kMER (Self Organising Map-kernel-based Maximum Entropy learning Rule) model for data classification. The classifier, known as pSOM-kMER (probabilistic SOM-kMER), is able to operate in a probabilistic environment and to implement the principles of statistical decision theory in undertaking classification problems. A distinctive feature of pSOM-kMER is its ability in revealing the underlying structure of data. In addition, the Receptive Field (RF) regions generated can be used for variable kernel and non-parametric density estimation. Empirical evaluation using benchmark datasets shows that pSOM-kMER is able to achieve good performance as compared with those from a number of machine learning systems. The applicability of the proposed model as a useful data classifier is also demonstrated with a real-world medical data classification problem.  相似文献   

5.
We evaluate the utility of medium spatial resolution images from the Wide Field Sensor (WiFS) for the estimation of the area burned in a large fire. The performance of methodologies using these images is compared with similar methodologies using high spatial resolution image from the Linear Imaging and Self Scanning Sensor (LISS-III) and other ancillary data. Both sensors are located onboard the Indian Remote Sensing Satellite 1C (IRS-1C). The post-fire LISS image was analysed by means of Matched Filtering (MF) techniques. Two WiFS images (pre- and post-fire) were analysed using MF techniques and also by means of changes in the Normalized Difference Vegetation Index (NDVI). Ground data were used to classify the three thematic images obtained in several post-fire classes. The results show a greater proportion of transition areas between burned and unburned places and a slightly larger area burned estimation in the WiFS than in the LISS analysis. Nevertheless, the results obtained, and the comparisons with ground data, indicate that medium spatial resolution images' estimation of the area burned is a useful tool at regional and national scales.  相似文献   

6.
Registration of 3D data is a key problem in many applications in computer vision, computer graphics and robotics. This paper provides a family of minimal solutions for the 3D-to-3D registration problem in which the 3D data are represented as points and planes. Such scenarios occur frequently when a 3D sensor provides 3D points and our goal is to register them to a 3D object represented by a set of planes. In order to compute the 6 degrees-of-freedom transformation between the sensor and the object, we need at least six points on three or more planes. We systematically investigate and develop pose estimation algorithms for several configurations, including all minimal configurations, that arise from the distribution of points on planes. We also identify the degenerate configurations in such registrations. The underlying algebraic equations used in many registration problems are the same and we show that many 2D-to-3D and 3D-to-3D pose estimation/registration algorithms involving points, lines, and planes can be mapped to the proposed framework. We validate our theory in simulations as well as in three real-world applications: registration of a robotic arm with an object using a contact sensor, registration of planar city models with 3D point clouds obtained using multi-view reconstruction, and registration between depth maps generated by a Kinect sensor.  相似文献   

7.
Data detection in a relay-based communication system (RCS) is challenging because its end-to-end channel, which comprises a cascade of several channels, has unique statistical characteristics. Assuming different channel conditions, in this paper, we address the problem of data detection in an RCS where one amplify-and-forward relay is used as an intermediate node between a transmitter and a receiver. Our approach is based on Bayesian methodologies in which a variant of Markov Chain Monte Carlo (MCMC) technique, known as Metropolis–Hasting-within-Gibbs, is applied for systems with quasi-static channel models, whereas particle filtering technique is used for systems with fast varying channels to develop joint data detection and channel estimation algorithms. By providing detailed derivations, we present two algorithms for each channel condition by formulating the transmission process of the communication systems in different ways. The effectiveness of our algorithms is demonstrated through computer simulations.  相似文献   

8.
Cloud detection from geostationary satellite multispectral images through statistical methodologies is investigated. Discriminant analysis methods are considered to this purpose, endowed with a nonparametric density estimation and a linear transform into principal and independent components. The whole methodology is applied to the MSG-SEVIRI sensor through a set of test images covering the central and southern part of Europe. “Truth” data for the learning phase of discriminant analysis are taken from the cloud mask product MOD35 in correspondence of passages of MODIS close to the SEVIRI images. Performance of the discriminant analysis methods is estimated over sea/land, daytime/nighttime both when training and test datasets coincide and when they are different. Discriminant analysis shows very good performance in detecting clouds, especially over land. PCA and ICA are effective in improving detection.  相似文献   

9.
Software cost estimation is one of the critical tasks in project management. In a highly demanding and competitive market environment, software project managers need robust models and methodologies to accurately predict the cost of a new project. Analogy-based cost estimation is one of the widely used models that rely on historical project data. It checks the similarity of features between past and current projects, and it approximates current project cost from past ones. One shortcoming of analogy-based cost estimation is that it assumes all project features as equal. However, these features may have different impacts on project cost based on their relevance. In this research, we present two feature weight assignment heuristics for cost estimation. We assign weights to the project features by benefiting from a statistical technique, namely principal components analysis (PCA) that is used for extracting optimal linear patterns of high dimensional data. We test our proposed heuristics on public datasets and conclude that the prediction performance in terms of MMRE and Pred(25) increases with a statistical-based assignment technique rather than random assignment approach.  相似文献   

10.
提出了一种新的多扫频接收方案,以改进对电离层相位路径失真的估计精度。该方案无需对发射端进行改动,只在接收端采用多扫频信号进行混频,即可分离出含有相同的电离层相位路径失真信息的多通道数据。利用多通道数据可获得比单通道数据更好的相位失真估计。通过理论推导和仿真实验,验证了提出方案的有效性。  相似文献   

11.
In this paper, an hybrid system is proposed for setting machining parameters from experimental data. A symbolic regression alpha–beta is used to build mathematical models. Every model is validated using statistical analysis then evolutionary computation is used to minimize or maximize the generated model. Symbolic regression αβ is used to build mathematical models by estimation of distribution algorithms. A practical case considering measured data of two machining process on three materials are used to illustrate the utility of the expert system because generates a set of parameters that improve the machining process.  相似文献   

12.
A novel robust observer, intended to solve the output‐feedback chaos synchronization problem for the Master/Slave Configuration, is proposed here. Assuming that the given Master system belongs to a specific class of feedback‐linearized systems, our solution is based on the well‐known Immersion and Invariance (I&I) method. The proposed observer is devoted to the asymptotic estimation of the Master system's underlying dynamics, and its effectiveness is illustrated via computer‐based simulations that involve both the so‐called Duffing's oscillator and the Genesio & Tesi system.  相似文献   

13.
Unsupervised Multiway Data Analysis: A Literature Survey   总被引:1,自引:0,他引:1  
Two-way arrays or matrices are often not enough to represent all the information in the data and standard two-way analysis techniques commonly applied on matrices may fail to find the underlying structures in multi-modal datasets. Multiway data analysis has recently become popular as an exploratory analysis tool in discovering the structures in higher-order datasets, where data have more than two modes. We provide a review of significant contributions in the literature on multiway models, algorithms as well as their applications in diverse disciplines including chemometrics, neuroscience, social network analysis, text mining and computer vision.  相似文献   

14.
Titer estimation is one of the major components of immunoassay and vaccine development. A multiplexed in vitro opsonization assay (MOPA) is widely accepted to quantitate Streptococcus pneumococcal antibodies to serotype-specific pneumococcal capsular polysaccharide. Titer estimation of vaccine based on OPA is one important component of standardization of OPA, and the selected statistical method is a factor influencing the accuracy and precision of titer estimation. We evaluated five titer estimation methods for pneumococcal OPA in terms of precision and accuracy using three data sets generated by specifically designed experiments with both an eight-dilution and an eleven-dilution design. The bootstrap resampling technique was also used to determine the performance of the estimation. We concluded that the traditional direct method did not perform as well as the other four methods in terms of precision and accuracy of titer estimation. The Spearman–Kärber estimator might be biased upward for OPA titer estimation. The four-parameter logistic model (4PL) method is an alternative choice for OPA titer estimation. The eleven-dilution design provided more information than the eight-dilution design for titer estimation and enhanced precision of estimators. UAB opsotiter, computer software using the statistical language R and Microsoft Excel®, was developed to implement OPA titer estimation.  相似文献   

15.
ContextDue to the complex nature of software development process, traditional parametric models and statistical methods often appear to be inadequate to model the increasingly complicated relationship between project development cost and the project features (or cost drivers). Machine learning (ML) methods, with several reported successful applications, have gained popularity for software cost estimation in recent years. Data preprocessing has been claimed by many researchers as a fundamental stage of ML methods; however, very few works have been focused on the effects of data preprocessing techniques.ObjectiveThis study aims for an empirical assessment of the effectiveness of data preprocessing techniques on ML methods in the context of software cost estimation.MethodIn this work, we first conduct a literature survey of the recent publications using data preprocessing techniques, followed by a systematic empirical study to analyze the strengths and weaknesses of individual data preprocessing techniques as well as their combinations.ResultsOur results indicate that data preprocessing techniques may significantly influence the final prediction. They sometimes might have negative impacts on prediction performance of ML methods.ConclusionIn order to reduce prediction errors and improve efficiency, a careful selection is necessary according to the characteristics of machine learning methods, as well as the datasets used for software cost estimation.  相似文献   

16.
According to the utilization law, throughput and utilization are linearly related and their measurements can be used for the indirect estimation of service demands. In practice, however, hardware and software modifications as well as non-modeled loads due to periodic maintenance activities make the estimation process difficult and often impossible without manual intervention to analyze the data. Due to configuration changes, real world datasets show that workload and utilization measurements tend to group themselves into multiple linear clusters. To estimate the service demands of the underlying performance models, the different configurations have to be identified. In this paper, we present an algorithm that, exploiting the timestamps associated with each throughput and utilization observation, identifies the different configurations of the system and estimates the corresponding service demands. Our proposal is based on robust estimation and inference techniques and is therefore suitable to analyze contaminated datasets. Moreover, not only sudden and occasional changes of the system, but also recurring patterns in the system’s behavior, due for instance to scheduled maintenance tasks, are detected. An efficient implementation of the algorithm has been made publicly available and, in this paper, its performance is assessed on synthetic as well as on experimental data.  相似文献   

17.
Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with mixed types of attributes are common in real life data mining applications. In this article, we present two algorithms that extend the Squeezer algorithm to domains with mixed numeric and categorical attributes. The performance of the two algorithms has been studied on real and artificially generated datasets. Comparisons with other clustering algorithms illustrate the superiority of our approaches. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 1077–1089, 2005.  相似文献   

18.
The problem of optimal sampling design for parameter estimation when data are generated by linear models is addressed. The measurements are assumed to be corrupted by an unknown but bounded additive noise. The sampling design assumes that the number of samples is unconstrained and no replication is allowed. Two main results are shown: 1) for particular classes of linear models, the optimal number of measurements is equal to the number of parameters, as in the statistical context; 2) the uncertainty intervals of the parameter estimates are bounded from above by quantities that can be computer a priori, knowing only the model and the error structure.  相似文献   

19.
This paper presents the motivation development and an application of a unique methodology to solve industrial optimization problems, using existing legacy simulation software programs. The methodology is based on approximation models generated with the utility of design of experiments methodologies and response surface methods applied on high-fidelity simulations, coupled together with classical optimization methodologies. Several DOE plans are included, in order to be able to adopt the appropriate level of detail. The approximations are based on stochastic interpolation techniques, or on classical least squares methods. The optimization methods include both local and global techniques. Finally, an application from the plastic molding industry (process simulation) demonstrates the methodology and the software package. Received December 30, 2000  相似文献   

20.
Statistical outlier detection using direct density ratio estimation   总被引:2,自引:2,他引:0  
We propose a new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inliers. Our key idea is to use the ratio of training and test data densities as an outlier score. This approach is expected to have better performance even in high-dimensional problems since methods for directly estimating the density ratio without going through density estimation are available. Among various density ratio estimation methods, we employ the method called unconstrained least-squares importance fitting (uLSIF) since it is equipped with natural cross-validation procedures, allowing us to objectively optimize the value of tuning parameters such as the regularization parameter and the kernel width. Furthermore, uLSIF offers a closed-form solution as well as a closed-form formula for the leave-one-out error, so it is computationally very efficient and is scalable to massive datasets. Simulations with benchmark and real-world datasets illustrate the usefulness of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号