期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Likelihood-based confidence intervals for the risk ratio using double sampling with over-reported binary data

Dewi Rahardja Dean M. Young 《Computational statistics & data analysis》2011,55(1):813-823

In this article we derive likelihood-based confidence intervals for the risk ratio using over-reported two-sample binary data obtained using a double-sampling scheme. The risk ratio is defined as the ratio of two proportion parameters. By maximizing the full likelihood function, we obtain closed-form maximum likelihood estimators for all model parameters. In addition, we derive four confidence intervals: a naive Wald interval, a modified Wald interval, a Fieller-type interval, and an Agresti-Coull interval. All four confidence intervals are illustrated using cervical cancer data. Finally, we conduct simulation studies to assess and compare the coverage probabilities and average lengths of the four interval estimators. We conclude that the modified Wald interval, unlike the other three intervals, produces close-to-nominal confidence intervals under various simulation scenarios examined here and, therefore, is preferred in practice. 相似文献

2.

A trial for data retrieval using conceptual fuzzy sets

Takagi T. Kawase K. 《Fuzzy Systems, IEEE Transactions on》2001,9(4):497-505

We describe trial applications of fuzzy sets to data retrieval. The objectives are to test their ability to achieve conceptual matching between retrieved objects and the user's intention and to connect real data with symbolic notations. The algorithm proposed retrieves data that conceptually fit the meanings of the entered keyword. An algorithm is described that uses fuzzy sets to handle word ambiguity (the main cause of vagueness in the meaning of a word). It is based on conceptual fuzzy sets (CFSs), which represent the meaning of words by chaining other related words. Two trial applications of this algorithm to data retrieval are described. First, an application to image retrieval shows variation of data retrieval with conceptual matching and transformation of numeric values into symbols. Next, an application to the agent recommending a TV program shows the method that lets CFSs fit to the sense of a user by Hebbian learning 相似文献

3.

A double-sampling approach to deriving training and validation data for remotely-sensed vegetation products

Jason W. Karl Jason Taylor Matt Bobo 《International journal of remote sensing》2013,34(5):1936-1955

The need for large sample sizes to train, calibrate, and validate remote-sensing products has driven an emphasis towards rapid, and in many cases qualitative, field methods. Double-sampling is an option for calibrating less precise field measurements with data from a more precise method collected at a subset of sampling locations. While applicable to the creation of training and validation datasets for remote-sensing products, double-sampling has rarely been used in this context. Our objective was to compare vegetation indicators developed from a rapid qualitative (i.e. ocular estimation) field protocol with the quantitative field protocol used by the Bureau of Land Management’s Assessment, Inventory and Monitoring (AIM) programme to determine whether double-sampling could be used to adjust the qualitative estimates to improve the relationship between rapidly collected field data and high-resolution satellite imagery. We used beta regression to establish the relationship between the quantitative and qualitative estimates of vegetation cover from 50 field sites in the Piceance Basin of northwestern Colorado, USA. Using the defined regression models for eight vegetation indicators we adjusted the qualitative estimates and compared the results, along with the original measurements, to 5 m-resolution RapidEye satellite imagery. We found good correlation between quantitative and ocular estimates for dominant site components such as shrub cover and bare ground, but low correlations for minor site components (e.g. annual grass cover) or indicators where observers were required to estimate over multiple life forms (e.g. total canopy cover). Using the beta-regression models to adjust the qualitative estimates with the quantitative data significantly improved correlation with the RapidEye imagery for most indicators. As a means of improving training data for remote-sensing projects, double-sampling should be used where a strong relationship exists between quantitative and qualitative field techniques. Accordingly, ocular techniques should be used only when they can generate reliable estimates of vegetation cover. 相似文献

4.

A pretest for using logrank or Wilcoxon in the two-sample problem

Annie Tordilla Darilay Joshua D. Naranjo 《Computational statistics & data analysis》2011,55(7):2400-2409

In a two-sample location-scale model with censored data, the logrank test is asymptotically efficient when the error distribution is extreme minimum value. On the other hand, the Wilcoxon test is asymptotically efficient when the error distribution is logistic. We propose a pretest for choosing between logrank and Wilcoxon by determining if the error distribution is closer to extreme minimum value or logistic. This adaptive test is compared with the logrank and Wilcoxon tests through simulation. 相似文献

5.

Using integrated weighted survival difference for the two-sample censored data problem

Seung-Hwan Lee Eun-Joo Lee 《Computational statistics & data analysis》2008,52(9):4410-4416

For the two-sample censored data problem, Pepe and Fleming [Pepe, M.S., Fleming, T.R., 1989. Weighted Kaplan-Meier statistics: A class of distance tests for censored survival data. Biometrics 45, 497-507] introduced the weighted Kaplan-Meier (WKM) statistics. From these statistics we define stochastic processes which can be approximated by zero-mean martingales. Conditional distributions of the processes, given data, can be easily approximated through simulation techniques. Based on comparison of these processes, we construct a supremum test to assess the model adequacy. Monte Carlo simulations are conducted to evaluate and compare the size and power properties of the proposed test to the WKM and the log-rank tests. The procedures are illustrated using real data. 相似文献

6.

Integrating geophysical data sets using probabilistic methods

N. Pendock V. Nedeljkovic 《International journal of remote sensing》2013,34(7):1627-1635

Interpretation of geophysical data is greatly aided by the combined analysis of data from diverse sources. Probability theory provides a general framework for integrating geophysical data sets. We discuss the application of joint and conditional probability density functions (PDF) to the detection of anomalies and the prediction and interpolation of geo-variables. Density estimation techniques are discussed and illustrated on a geophysical data set from West Africa consisting of magnetic, elevation, and radiometric data. 相似文献

7.

Forecasting association rules using existing data sets

Sung S.Y. Zhao Li Tan C.L. Ng P.A. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(6):1448-1459

An important issue that needs to be addressed when using data mining tools is the validity of the rules outside of the data set from which they are generated. Rules are typically derived from the patterns in a particular data set. When a new situation occurs, the change in the set of rules obtained from the new data set could be significant. In this paper, we provide a novel model for understanding how the differences between two situations affect the changes of the rules, based on the concept of fine partitioned groups that we call caucuses. Using this model, we provide a simple technique called combination data set, to get a good estimate of the set of rules for a new situation. Our approach works independently of the core mining process and it can be easily implemented with all variations of rule mining techniques. Through experiments with real-life and synthetic data sets, we show the effectiveness of our technique in finding the correct set of rules under different situations. 相似文献

8.

Handling of incomplete data sets using ICA and SOM in data mining 总被引：1，自引：0，他引：1

Hongyi Peng Siming Zhu 《Neural computing & applications》2007,16(2):167-172

Based on independent component analysis (ICA) and self-organizing maps (SOM), this paper proposes an ISOM-DH model for the incomplete data’s handling in data mining. Under these circumstances the data remain dependent and non-Gaussian, this model can make full use of the information of the given data to estimate the missing data and can visualize the handled high-dimensional data. Compared with mixture of principal component analyzers (MPCA), mean method and standard SOM-based fuzzy map model, ISOM-DH model can be applied to more cases, thus performing its superiority. Meanwhile, the correctness and reasonableness of ISOM-DH model is also validated by the experiment carried out in this paper. 相似文献

9.

Ehlers pan-sharpening performance enhancement using HCS transform for n-band data sets

Qing Guo Manfred Ehlers Qu Wang Christine Pohl Sabine Hornberg An Li 《International journal of remote sensing》2017,38(17):4974-5002

The Ehlers fusion method, which combines a standard intensity-hue-saturation (IHS) transform with fast Fourier transform filtering, is a high spectral characteristics preservation algorithm for multitemporal and multisensor data sets. However, for data sets of more than three bands, the fusion process is complicated, because only every three bands are fused repeatedly for multiple times until all bands are fused. The hyper-spherical colour sharpening (HCS) fusion method can fuse a data set with an arbitrary number of bands. The HCS approach uses a transform between an n-dimensional Cartesian space and an n-dimensional hyper-spherical space to get one single intensity component and n ? 1 angles. Moreover, from a structural point of view, the hyper-spherical colour space is very similar to the IHS colour space. Hence, we propose to combine the Ehlers fusion with an HCS transform to fuse n-band data sets with high spectral information preservation, even hyper-spectral images. A WorldView-2 data set including a panchromatic and eight multispectral bands is used for demonstrating the effectiveness and quality of the new Ehlers –HCS fusion. The WorldView-2 image covers different landscapes such as agriculture, forest, water and urban areas. The fused images are visually and quantitatively analysed for spectral preservation and spatial improvement. Pros and cons of the applied fusion methods are related to the analysed different landscapes. Overall, the Ehlers –HCS method shows the efficacy for n-band fusion. 相似文献

10.

Clustering categorical data sets using tabu search techniques 总被引：2，自引：0，他引：2

Michael K. Joyce C. 《Pattern recognition》2002,35(12):2783-2790

Clustering methods partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria. The fuzzy k-means-type algorithm is best suited for implementing this clustering operation because of its effectiveness in clustering data sets. However, working only on numeric values limits its use because data sets often contain categorical values. In this paper, we present a tabu search based clustering algorithm, to extend the k-means paradigm to categorical domains, and domains with both numeric and categorical values. Using tabu search based techniques, our algorithm can explore the solution space beyond local optimality in order to aim at finding a global solution of the fuzzy clustering problem. It is found that the clustering results produced by the proposed algorithm are very high in accuracy. 相似文献

11.

Efficient data association for view based SLAM using connected dominating sets

O. Booij Z. Zivkovic B. Kröse 《Robotics and Autonomous Systems》2009,57(12):1225-1234

Loop closing in vision based SLAM applications is a difficult task. Comparing new image data with all previously acquired image data is practically impossible because of the high computational costs. Most approaches therefore compare new data with only a subset of the old data, for example by sampling the data over time or over space by using a position estimate. In this paper, we propose a more natural approach, which dynamically determines a subset of images that best describes the complete image data in the space of all previously seen images. The actual problem of finding such a subset is called the “Connected Dominating Set” (CDS) problem, which is well studied in the field of graph theory. Application on large indoor datasets results in approximately the same map using only 13% of the computational resources with respect to comparing with all previous images. Also, it outperforms other sampling approaches. The proposed method is particularly beneficial for realistic mapping scenarios including moving objects and persons, motion blur and changing light conditions.¹ 相似文献

12.

Visualization of boundaries in volumetric data sets using LH histograms 总被引：3，自引：0，他引：3

Sereda P Vilanova BA Serlie IW Gerritsen FA 《IEEE transactions on visualization and computer graphics》2006,12(2):208-218

A crucial step in volume rendering is the design of transfer functions that highlights those aspects of the volume data that are of interest to the user. For many applications, boundaries carry most of the relevant information. Reliable detection of boundaries is often hampered by limitations of the imaging process, such as blurring and noise. We present a method to identify the materials that form the boundaries. These materials are then used in a new domain that facilitates interactive and semiautomatic design of appropriate transfer functions. We also show how the obtained boundary information can be used in region-growing-based segmentation. 相似文献

13.

基于最小包含球的非静态大数据集的快速分类算法

史荧中王士同王骏邓赵红《控制与决策》2013,28(7):1065-1072

对于小规模的非静态数据,最近提出的时间自适应支持向量机(TA-SVM)方法表现出良好的性能,它从兼顾局部优化和全局优化的角度同时求解多个子分类器的特性.但对于大数据集,较高的计算代价限制了它的实用性.针对此不足,结合核心向量机(CVM)理论提出了针对非静态大数据集的新颖分类方法,即基于中心约束最小包含球(CCMEB)的TA-CVM,简称CCTA-CVM.该方法具有渐近线性时间复杂度的优点,同时继承了TA-SVM的良好性能.最后通过实验验证了所提出方法的有效性. 相似文献

14.

Fast graph-based relaxed clustering for large data sets using minimal enclosing ball

Qian P Chung FL Wang S Deng Z 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2012,42(3):672-687

Although graph-based relaxed clustering (GRC) is one of the spectral clustering algorithms with straightforwardness and self-adaptability, it is sensitive to the parameters of the adopted similarity measure and also has high time complexity O(N(3)) which severely weakens its usefulness for large data sets. In order to overcome these shortcomings, after introducing certain constraints for GRC, an enhanced version of GRC [constrained GRC (CGRC)] is proposed to increase the robustness of GRC to the parameters of the adopted similarity measure, and accordingly, a novel algorithm called fast GRC (FGRC) based on CGRC is developed in this paper by using the core-set-based minimal enclosing ball approximation. A distinctive advantage of FGRC is that its asymptotic time complexity is linear with the data set size N. At the same time, FGRC also inherits the straightforwardness and self-adaptability from GRC, making the proposed FGRC a fast and effective clustering algorithm for large data sets. The advantages of FGRC are validated by various benchmarking and real data sets. 相似文献

15.

Kim Byung-Kwan Oh Su-Jin Jang Sung-Bong Ko Young-Woong 《Multimedia Tools and Applications》2017,76(19):19649-19663

Multimedia Tools and Applications - File similarity is a numerical indicator that how many duplicated data exist in target files. With this information, we can reduce storage capacity with data... 相似文献

16.

Shadow generation for volumetric data sets

Rolf H. van Lengen Jörg Meyer Mathias Matzat Hans Hagen 《The Visual computer》1998,14(2):83-94

Z -buffer technique for fast and efficient shadow generation. Volumetric data contain information about the grid points only. Such data do not provide surface information that could be projected immediately onto the shadow map. To solve this problem, we have implemented two techniques. The first uses a modified adaptive version of the well-known marching cubes algorithm for the special characteristics of medical data sets. The algorithm uses material properties for a precise representation of object boundaries, generating volumetric objects quickly and effectively. There are two representations of the same data set: we use a view-independent approximation to display shadows and the original representation of the volume for object visualization in full precision. The second algorithm uses a ray-tracing approach to create shadow maps. The same routine is used for object rendering, but is restricted to depth-value generation. Semitransparent objects are handled by storing an intensity profile in addition to the depth value. 相似文献

17.

基于OC-SVM的大型数据集分类方法

下载免费PDF全文

张瑜罗可《计算机工程与应用》2011,47(4):131-133

支持向量机是最有效的分类技术之一,具有很高的分类精度和良好的泛化能力,但其应用于大型数据集时的训练过程还是非常复杂。对此提出了一种基于单类支持向量机的分类方法。采用随机选择算法来约简训练集,以达到提高训练速度的目的;同时,通过恢复超球体交集中样本在原始数据中的邻域来保证支持向量机的分类精度。实验证明,该方法能在较大程度上减小计算复杂度,从而提高大型数据集中的训练速度。相似文献

18.

Attribute reduction for dynamic data sets

Feng Wang Jiye Liang Chuangyin Dang 《Applied Soft Computing》2013,13(1):676-689

Many real data sets in databases may vary dynamically. With such data sets, one has to run a knowledge acquisition algorithm repeatedly in order to acquire new knowledge. This is a very time-consuming process. To overcome this deficiency, several approaches have been developed to deal with dynamic databases. They mainly address knowledge updating from three aspects: the expansion of data, the increasing number of attributes and the variation of data values. This paper focuses on attribute reduction for data sets with dynamically varying data values. Information entropy is a common measure of uncertainty and has been widely used to construct attribute reduction algorithms. Based on three representative entropies, this paper develops an attribute reduction algorithm for data sets with dynamically varying data values. When a part of data in a given data set is replaced by some new data, compared with the classic reduction algorithms based on the three entropies, the developed algorithm can find a new reduct in a much shorter time. Experiments on six data sets downloaded from UCI show that the algorithm is effective and efficient. 相似文献

19.

Display techniques for integrated data sets

Susan B Freeman Stephen L Bolivar Thomas A Weaver 《Computers & Geosciences》1983,9(1):59-64

At Los Alamos National Laboratory, geoscientists have assembled and integrated 30 geological, geophysical, and geochemical data sets with four Landsat bands for the Montrose 1° × 2° quadrangle, Colorado. Three graphical displays were developed to determine if visual analysis of the data facilitated interpretation. Two displays project the data spatially: gray-level maps project values of a single data set, and three-color overlays project the values of three data sets simultaneously. The third display, a three-dimensional plot, graphs three data sets and allows examination of relationships in parameter space. Two examples illustrate the potential applications of the display techniques. Uranium in sediments, uranium in waters, and equivalent uranium each provide unique information about uranium distribution in the quadrangle. However, the combined data convey more information than each data set separately. Copper, lead, and zinc displays allow identification of all the basemetal districts and convey information about the geochemical character of the deposits. Visual displays greatly increase efficiency of analysis and interpretability of diverse geologic data sets. 相似文献

20.

A new feature selection scheme using a data distribution factor for unsupervised nominal data.

Tommy W S Chow Piyang Wang Eden W M Ma 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2008,38(2):499-509

A new efficient unsupervised feature selection method is proposed to handle nominal data without data transformation. The proposed feature selection method introduces a new data distribution factor to select appropriate clusters. The proposed method combines the compactness and separation together with a newly introduced concept of singleton item. This new feature selection method considers all features globally. It is computationally inexpensive and able to deliver very promising results. Eight datasets from the University of California Irvine (UCI) machine learning repository and a high-dimensional cDNA dataset are used in this paper. The obtained results show that the proposed method is very efficient and able to deliver very reliable results. 相似文献