首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this article we derive likelihood-based confidence intervals for the risk ratio using over-reported two-sample binary data obtained using a double-sampling scheme. The risk ratio is defined as the ratio of two proportion parameters. By maximizing the full likelihood function, we obtain closed-form maximum likelihood estimators for all model parameters. In addition, we derive four confidence intervals: a naive Wald interval, a modified Wald interval, a Fieller-type interval, and an Agresti-Coull interval. All four confidence intervals are illustrated using cervical cancer data. Finally, we conduct simulation studies to assess and compare the coverage probabilities and average lengths of the four interval estimators. We conclude that the modified Wald interval, unlike the other three intervals, produces close-to-nominal confidence intervals under various simulation scenarios examined here and, therefore, is preferred in practice.  相似文献   

2.
We describe trial applications of fuzzy sets to data retrieval. The objectives are to test their ability to achieve conceptual matching between retrieved objects and the user's intention and to connect real data with symbolic notations. The algorithm proposed retrieves data that conceptually fit the meanings of the entered keyword. An algorithm is described that uses fuzzy sets to handle word ambiguity (the main cause of vagueness in the meaning of a word). It is based on conceptual fuzzy sets (CFSs), which represent the meaning of words by chaining other related words. Two trial applications of this algorithm to data retrieval are described. First, an application to image retrieval shows variation of data retrieval with conceptual matching and transformation of numeric values into symbols. Next, an application to the agent recommending a TV program shows the method that lets CFSs fit to the sense of a user by Hebbian learning  相似文献   

3.
The need for large sample sizes to train, calibrate, and validate remote-sensing products has driven an emphasis towards rapid, and in many cases qualitative, field methods. Double-sampling is an option for calibrating less precise field measurements with data from a more precise method collected at a subset of sampling locations. While applicable to the creation of training and validation datasets for remote-sensing products, double-sampling has rarely been used in this context. Our objective was to compare vegetation indicators developed from a rapid qualitative (i.e. ocular estimation) field protocol with the quantitative field protocol used by the Bureau of Land Management’s Assessment, Inventory and Monitoring (AIM) programme to determine whether double-sampling could be used to adjust the qualitative estimates to improve the relationship between rapidly collected field data and high-resolution satellite imagery. We used beta regression to establish the relationship between the quantitative and qualitative estimates of vegetation cover from 50 field sites in the Piceance Basin of northwestern Colorado, USA. Using the defined regression models for eight vegetation indicators we adjusted the qualitative estimates and compared the results, along with the original measurements, to 5 m-resolution RapidEye satellite imagery. We found good correlation between quantitative and ocular estimates for dominant site components such as shrub cover and bare ground, but low correlations for minor site components (e.g. annual grass cover) or indicators where observers were required to estimate over multiple life forms (e.g. total canopy cover). Using the beta-regression models to adjust the qualitative estimates with the quantitative data significantly improved correlation with the RapidEye imagery for most indicators. As a means of improving training data for remote-sensing projects, double-sampling should be used where a strong relationship exists between quantitative and qualitative field techniques. Accordingly, ocular techniques should be used only when they can generate reliable estimates of vegetation cover.  相似文献   

4.
In a two-sample location-scale model with censored data, the logrank test is asymptotically efficient when the error distribution is extreme minimum value. On the other hand, the Wilcoxon test is asymptotically efficient when the error distribution is logistic. We propose a pretest for choosing between logrank and Wilcoxon by determining if the error distribution is closer to extreme minimum value or logistic. This adaptive test is compared with the logrank and Wilcoxon tests through simulation.  相似文献   

5.
For the two-sample censored data problem, Pepe and Fleming [Pepe, M.S., Fleming, T.R., 1989. Weighted Kaplan-Meier statistics: A class of distance tests for censored survival data. Biometrics 45, 497-507] introduced the weighted Kaplan-Meier (WKM) statistics. From these statistics we define stochastic processes which can be approximated by zero-mean martingales. Conditional distributions of the processes, given data, can be easily approximated through simulation techniques. Based on comparison of these processes, we construct a supremum test to assess the model adequacy. Monte Carlo simulations are conducted to evaluate and compare the size and power properties of the proposed test to the WKM and the log-rank tests. The procedures are illustrated using real data.  相似文献   

6.
Interpretation of geophysical data is greatly aided by the combined analysis of data from diverse sources. Probability theory provides a general framework for integrating geophysical data sets. We discuss the application of joint and conditional probability density functions (PDF) to the detection of anomalies and the prediction and interpolation of geo-variables. Density estimation techniques are discussed and illustrated on a geophysical data set from West Africa consisting of magnetic, elevation, and radiometric data.  相似文献   

7.
An important issue that needs to be addressed when using data mining tools is the validity of the rules outside of the data set from which they are generated. Rules are typically derived from the patterns in a particular data set. When a new situation occurs, the change in the set of rules obtained from the new data set could be significant. In this paper, we provide a novel model for understanding how the differences between two situations affect the changes of the rules, based on the concept of fine partitioned groups that we call caucuses. Using this model, we provide a simple technique called combination data set, to get a good estimate of the set of rules for a new situation. Our approach works independently of the core mining process and it can be easily implemented with all variations of rule mining techniques. Through experiments with real-life and synthetic data sets, we show the effectiveness of our technique in finding the correct set of rules under different situations.  相似文献   

8.
Handling of incomplete data sets using ICA and SOM in data mining   总被引:1,自引:0,他引:1  
Based on independent component analysis (ICA) and self-organizing maps (SOM), this paper proposes an ISOM-DH model for the incomplete data’s handling in data mining. Under these circumstances the data remain dependent and non-Gaussian, this model can make full use of the information of the given data to estimate the missing data and can visualize the handled high-dimensional data. Compared with mixture of principal component analyzers (MPCA), mean method and standard SOM-based fuzzy map model, ISOM-DH model can be applied to more cases, thus performing its superiority. Meanwhile, the correctness and reasonableness of ISOM-DH model is also validated by the experiment carried out in this paper.  相似文献   

9.
The Ehlers fusion method, which combines a standard intensity-hue-saturation (IHS) transform with fast Fourier transform filtering, is a high spectral characteristics preservation algorithm for multitemporal and multisensor data sets. However, for data sets of more than three bands, the fusion process is complicated, because only every three bands are fused repeatedly for multiple times until all bands are fused. The hyper-spherical colour sharpening (HCS) fusion method can fuse a data set with an arbitrary number of bands. The HCS approach uses a transform between an n-dimensional Cartesian space and an n-dimensional hyper-spherical space to get one single intensity component and n ? 1 angles. Moreover, from a structural point of view, the hyper-spherical colour space is very similar to the IHS colour space. Hence, we propose to combine the Ehlers fusion with an HCS transform to fuse n-band data sets with high spectral information preservation, even hyper-spectral images. A WorldView-2 data set including a panchromatic and eight multispectral bands is used for demonstrating the effectiveness and quality of the new Ehlers –HCS fusion. The WorldView-2 image covers different landscapes such as agriculture, forest, water and urban areas. The fused images are visually and quantitatively analysed for spectral preservation and spatial improvement. Pros and cons of the applied fusion methods are related to the analysed different landscapes. Overall, the Ehlers –HCS method shows the efficacy for n-band fusion.  相似文献   

10.
Clustering categorical data sets using tabu search techniques   总被引:2,自引:0,他引:2  
Clustering methods partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria. The fuzzy k-means-type algorithm is best suited for implementing this clustering operation because of its effectiveness in clustering data sets. However, working only on numeric values limits its use because data sets often contain categorical values. In this paper, we present a tabu search based clustering algorithm, to extend the k-means paradigm to categorical domains, and domains with both numeric and categorical values. Using tabu search based techniques, our algorithm can explore the solution space beyond local optimality in order to aim at finding a global solution of the fuzzy clustering problem. It is found that the clustering results produced by the proposed algorithm are very high in accuracy.  相似文献   

11.
Loop closing in vision based SLAM applications is a difficult task. Comparing new image data with all previously acquired image data is practically impossible because of the high computational costs. Most approaches therefore compare new data with only a subset of the old data, for example by sampling the data over time or over space by using a position estimate. In this paper, we propose a more natural approach, which dynamically determines a subset of images that best describes the complete image data in the space of all previously seen images. The actual problem of finding such a subset is called the “Connected Dominating Set” (CDS) problem, which is well studied in the field of graph theory. Application on large indoor datasets results in approximately the same map using only 13% of the computational resources with respect to comparing with all previous images. Also, it outperforms other sampling approaches. The proposed method is particularly beneficial for realistic mapping scenarios including moving objects and persons, motion blur and changing light conditions.1  相似文献   

12.
Visualization of boundaries in volumetric data sets using LH histograms   总被引:3,自引:0,他引:3  
A crucial step in volume rendering is the design of transfer functions that highlights those aspects of the volume data that are of interest to the user. For many applications, boundaries carry most of the relevant information. Reliable detection of boundaries is often hampered by limitations of the imaging process, such as blurring and noise. We present a method to identify the materials that form the boundaries. These materials are then used in a new domain that facilitates interactive and semiautomatic design of appropriate transfer functions. We also show how the obtained boundary information can be used in region-growing-based segmentation.  相似文献   

13.
对于小规模的非静态数据,最近提出的时间自适应支持向量机(TA-SVM)方法表现出良好的性能,它从兼顾局部优化和全局优化的角度同时求解多个子分类器的特性.但对于大数据集,较高的计算代价限制了它的实用性.针对此不足,结合核心向量机(CVM)理论提出了针对非静态大数据集的新颖分类方法,即基于中心约束最小包含球(CCMEB)的TA-CVM,简称CCTA-CVM.该方法具有渐近线性时间复杂度的优点,同时继承了TA-SVM的良好性能.最后通过实验验证了所提出方法的有效性.  相似文献   

14.
Although graph-based relaxed clustering (GRC) is one of the spectral clustering algorithms with straightforwardness and self-adaptability, it is sensitive to the parameters of the adopted similarity measure and also has high time complexity O(N(3)) which severely weakens its usefulness for large data sets. In order to overcome these shortcomings, after introducing certain constraints for GRC, an enhanced version of GRC [constrained GRC (CGRC)] is proposed to increase the robustness of GRC to the parameters of the adopted similarity measure, and accordingly, a novel algorithm called fast GRC (FGRC) based on CGRC is developed in this paper by using the core-set-based minimal enclosing ball approximation. A distinctive advantage of FGRC is that its asymptotic time complexity is linear with the data set size N. At the same time, FGRC also inherits the straightforwardness and self-adaptability from GRC, making the proposed FGRC a fast and effective clustering algorithm for large data sets. The advantages of FGRC are validated by various benchmarking and real data sets.  相似文献   

15.
Multimedia Tools and Applications - File similarity is a numerical indicator that how many duplicated data exist in target files. With this information, we can reduce storage capacity with data...  相似文献   

16.
Z -buffer technique for fast and efficient shadow generation. Volumetric data contain information about the grid points only. Such data do not provide surface information that could be projected immediately onto the shadow map. To solve this problem, we have implemented two techniques. The first uses a modified adaptive version of the well-known marching cubes algorithm for the special characteristics of medical data sets. The algorithm uses material properties for a precise representation of object boundaries, generating volumetric objects quickly and effectively. There are two representations of the same data set: we use a view-independent approximation to display shadows and the original representation of the volume for object visualization in full precision. The second algorithm uses a ray-tracing approach to create shadow maps. The same routine is used for object rendering, but is restricted to depth-value generation. Semitransparent objects are handled by storing an intensity profile in addition to the depth value.  相似文献   

17.
支持向量机是最有效的分类技术之一,具有很高的分类精度和良好的泛化能力,但其应用于大型数据集时的训练过程还是非常复杂。对此提出了一种基于单类支持向量机的分类方法。采用随机选择算法来约简训练集,以达到提高训练速度的目的;同时,通过恢复超球体交集中样本在原始数据中的邻域来保证支持向量机的分类精度。实验证明,该方法能在较大程度上减小计算复杂度,从而提高大型数据集中的训练速度。  相似文献   

18.
Many real data sets in databases may vary dynamically. With such data sets, one has to run a knowledge acquisition algorithm repeatedly in order to acquire new knowledge. This is a very time-consuming process. To overcome this deficiency, several approaches have been developed to deal with dynamic databases. They mainly address knowledge updating from three aspects: the expansion of data, the increasing number of attributes and the variation of data values. This paper focuses on attribute reduction for data sets with dynamically varying data values. Information entropy is a common measure of uncertainty and has been widely used to construct attribute reduction algorithms. Based on three representative entropies, this paper develops an attribute reduction algorithm for data sets with dynamically varying data values. When a part of data in a given data set is replaced by some new data, compared with the classic reduction algorithms based on the three entropies, the developed algorithm can find a new reduct in a much shorter time. Experiments on six data sets downloaded from UCI show that the algorithm is effective and efficient.  相似文献   

19.
At Los Alamos National Laboratory, geoscientists have assembled and integrated 30 geological, geophysical, and geochemical data sets with four Landsat bands for the Montrose 1° × 2° quadrangle, Colorado. Three graphical displays were developed to determine if visual analysis of the data facilitated interpretation. Two displays project the data spatially: gray-level maps project values of a single data set, and three-color overlays project the values of three data sets simultaneously. The third display, a three-dimensional plot, graphs three data sets and allows examination of relationships in parameter space. Two examples illustrate the potential applications of the display techniques. Uranium in sediments, uranium in waters, and equivalent uranium each provide unique information about uranium distribution in the quadrangle. However, the combined data convey more information than each data set separately. Copper, lead, and zinc displays allow identification of all the basemetal districts and convey information about the geochemical character of the deposits. Visual displays greatly increase efficiency of analysis and interpretability of diverse geologic data sets.  相似文献   

20.
A new efficient unsupervised feature selection method is proposed to handle nominal data without data transformation. The proposed feature selection method introduces a new data distribution factor to select appropriate clusters. The proposed method combines the compactness and separation together with a newly introduced concept of singleton item. This new feature selection method considers all features globally. It is computationally inexpensive and able to deliver very promising results. Eight datasets from the University of California Irvine (UCI) machine learning repository and a high-dimensional cDNA dataset are used in this paper. The obtained results show that the proposed method is very efficient and able to deliver very reliable results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号