期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Principal component analysis for data containing outliers and missing elements

Sven Serneels Tim Verdonck 《Computational statistics & data analysis》2008,52(3):1712-1727

Two approaches are presented to perform principal component analysis (PCA) on data which contain both outlying cases and missing elements. At first an eigendecomposition of a covariance matrix which can deal with such data is proposed, but this approach is not fit for data where the number of variables exceeds the number of cases. Alternatively, an expectation robust (ER) algorithm is proposed so as to adapt the existing methodology for robust PCA to data containing missing elements. According to an extensive simulation study, the ER approach performs well for all data sizes concerned. Using simulations and an example, it is shown that by virtue of the ER algorithm, the properties of the existing methods for robust PCA carry through to data with missing elements. 相似文献

2.

Efficient approaches for materialized views selection in a data warehouse

Ming-Chuan Hung Man-Lin Huang Nien-Lin Hsueh 《Information Sciences》2007,177(6):1333-1348

View materialization is an effective method to increase query efficiency in a data warehouse and improve OLAP query performance. However, one encounters the problem of space insufficiency if all possible views are materialized in advance. Reducing query time by means of selecting a proper set of materialized views with a lower cost is crucial for efficient data warehousing. In addition, the costs of data warehouse creation, query, and maintenance have to be taken into account while views are materialized. In this paper, we propose efficient algorithms to select a proper set of materialized views, constrained by storage and cost considerations, to help speed up the entire data warehousing process. We derive a cost model for data warehouse query and maintenance as well as efficient view selection algorithms that effectively exploit the gain and loss metrics. The main contribution of our paper is to speed up the selection process of materialized views. Concurrently, this will greatly reduce the overall cost of data warehouse query and maintenance. 相似文献

3.

Personnel selection using analytic network process and fuzzy data envelopment analysis approaches

Hung-Tso Lin 《Computers & Industrial Engineering》2010

This paper develops a decision support tool using an integrated analytic network process (ANP) and fuzzy data envelopment analysis (DEA) approach to effectively deal with the personnel selection problem drawn from an electric and machinery company in Taiwan. The current personnel selection procedure is a separate two-stage method. The administration practice shows that the separation between stages 1 and 2 reduces the administration quality and may incur both the top manager’s displeasure and the decision-makers’ depression. An illustrative example by a simulated application demonstrates the implementation of the proposed approach. This example demonstrates how this approach can avoid the main drawback of the current method, and more importantly, can deal with the personnel selection problem more convincingly and persuasively. This study supports the applications of ANP and fuzzy DEA as decision support tools in personnel selection. 相似文献

4.

An implementation of logical analysis of data 总被引：8，自引：0，他引：8

Bores E. Hammer P.L. Ibaraki T. Kogan A. Mayoraz E. Muchnik I. 《Knowledge and Data Engineering, IEEE Transactions on》2000,12(2):292-306

Describes a new, logic-based methodology for analyzing observations. The key features of this “logical analysis of data” (LAD) methodology are the discovery of minimal sets of features that are necessary for explaining all observations and the detection of hidden patterns in the data that are capable of distinguishing observations describing “positive” outcome events from “negative” outcome events. Combinations of such patterns are used for developing general classification procedures. An implementation of this methodology is described in this paper, along with the results of numerical experiments demonstrating the classification performance of LAD in comparison with the reported results of other procedures. In the final section, we describe three pilot studies on applications of LAD to oil exploration, psychometric testing and the analysis of developments in the Chinese transitional economy. These pilot studies demonstrate not only the classification power of LAD but also its flexibility and capability to provide solutions to various case-dependent problems 相似文献

5.

A unified model for detecting efficient and inefficient outliers in data envelopment analysis

Wen-Chih Chen Andrew L. Johnson 《Computers & Operations Research》2010,37(2):417-425

Data envelopment analysis (DEA) uses extreme observations to identify superior performance, making it vulnerable to outliers. This paper develops a unified model to identify both efficient and inefficient outliers in DEA. Finding both types is important since many post analyses, after measuring efficiency, depend on the entire distribution of efficiency estimates. Thus, outliers that are distinguished by poor performance can significantly alter the results. Besides allowing the identification of outliers, the method described is consistent with a relaxed set of DEA axioms. Several examples demonstrate the need for identifying both efficient and inefficient outliers and the effectiveness of the proposed method. Applications of the model reveal that observations with low efficiency estimates are not necessarily outliers. In addition, a strategy to accelerate the computation is proposed that can apply to influential observation detection. 相似文献

6.

Abstract data types for the logical modeling of complex data 总被引：2，自引：0，他引：2

M. Gargano E. Nardelli M. Talamo 《Information Systems》1991,16(6):565-583

In this paper we propose a logical data model for complex data. Our proposal extends the relational model by using abstract data types for domains specification and an extended relational algebra is also introduced. The introduction of the parameterized type Geometry(S), where S is a ground set of elements, allows the representation of complex aggregated data. As an example, we discuss how our model supports the definition of geographical DBMSs. Moreover, to show the generality of our approach, we sketch how the model can be used in the framework of statistical applications. 相似文献

7.

Pattern decomposition method for hyper-multi-spectral data analysis

M. Daigo Corresponding author A. Ono† R. Urabe‡ N. Fujiwara 《International journal of remote sensing》2013,34(6):1153-1166

The ‘pattern decomposition method’ (PDM) is a new analysis method originally developed for Landsat Thematic Mapper (TM) satellite data. Applying the PDM to the radiospectrometer data of ground objects, 121 dimensional data in the wavelength region 350–2500?nm were successfully reduced into three-dimensional data. The nearly continuous spectral reflectance of land cover objects could be decomposed by three standard spectral patterns with an accuracy of 4.17% per freedom. We introduced a concept of supplementary spectral patterns for the study of specific ground objects. As an example, availability of a supplementary spectral pattern that can rectify standard spectral pattern of vivid vegetation for spectra of withered vegetation was studied. The new Revised Vegetation Index based on Pattern Decomposition (RVIPD) for hyper-multi-spectra is proposed as a simple function of the pattern decomposition coefficients including the supplementary vegetation pattern. It was confirmed that RVIPD is linear to the area cover ratio and also to the vegetation quantum efficiency. 相似文献

8.

Genetic algorithms for problems of logical data analysis in discrete optimization and image recognition

R. M. Sotnezov 《Pattern Recognition and Image Analysis》2009,19(3):469-477

Genetic algorithms for the search for minimal covering of a Boolean matrix are developed and studied. This problem arises in image recognition if methods of combinatorial (logical) analysis of information are used to synthesize recognizing procedures. 相似文献

9.

Increasing the efficiency of combinatorial logical data analysis in recognition and classification problems

E. V. Djukova A. S. Inyakin N. V. Peskov A. A. Sakharov 《Pattern Recognition and Image Analysis》2006,16(4):695-699

Problems of increasing the efficiency of combinatorial logical data analysis in recognition problems are examined. A technique for correct conversion of initial information for reduction of its dimensionality is proposed. Results of testing this technique for problems of real medical prognoses are given. Djukova Elena V. Born 1945. Graduated from Moscow State University in 1967. Candidate’s degree in Physics and Mathematics in 1979. Doctoral degree in Physics and Mathematics in 1997. Dorodnicyn Computing Center, Russian Academy of Sciences, leading researcher. Moscow State University, lecturer. Moscow Pedagogical University, lecturer. Scientific interests: discrete mathematics and mathematical method of pattern recognition. Author of 70 papers. Peskov Nikolai V. Born 1978. Graduated from Moscow State University in 2000. Candidate’s degree in 2004. Dorodnicyn Computing Center, Russian Academy of Sciences, junior researcher. Scientific interests: discrete mathematics and mathematical methods of pattern recognition. Author of ten papers. Inyakin Andrey S. Born 1978. Graduated from Moscow State University in 2000. Dorodnicyn Computing Center, Russian Academy of Sciences, junior researcher. Scientific interests: discrete mathematics and mathematical methods of pattern recognition. Author of ten papers. Sakharov Aleksei A. Born 1980. Graduated from Moscow State University in 2003. Moscow Pedagogical University, graduate student. Scientific interests: discrete mathematics and mathematical method of pattern recognition. Author of three papers. 相似文献

10.

Principal component regression for data containing outliers and missing elements

Sven Serneels Tim Verdonck 《Computational statistics & data analysis》2009,53(11):3855-3863

A methodology is presented to construct an expectation robust algorithm for principal component regression. The presented method is the first multivariate regression method which can resist outliers and which can cope with missing elements in the data simultaneously. Simulations and an example illustrate the good statistical properties of the method. 相似文献

11.

Spectral analysis of irregularly-sampled data: Paralleling the regularly-sampled data approaches 总被引：1，自引：0，他引：1

Petre Niclas 《Digital Signal Processing》2006,16(6):712-734

The spectral analysis of regularly-sampled (RS) data is a well-established topic, and many useful methods are available for performing it under different sets of conditions. The same cannot be said about the spectral analysis of irregularly-sampled (IS) data: despite a plethora of published works on this topic, the choice of a spectral analysis method for IS data is essentially limited, on either technical or computational grounds, to the periodogram and its variations. In our opinion this situation is far from satisfactory, given the importance of the spectral analysis of IS data for a considerable number of applications in such diverse fields as engineering, biomedicine, economics, astronomy, seismology, and physics, to name a few.In this paper we introduce a number of IS data approaches that parallel the methods most commonly used for spectral analysis of RS data: the periodogram (PER), the Capon method (CAP), the multiple-signal characterization method (MUSIC), and the estimation of signal parameters via rotational invariance technique (ESPRIT). The proposed IS methods are as simple as their RS counterparts, both conceptually and computationally. In particular, the fast algorithms derived for the implementation of the RS data methods can be used mutatis mutandis to implement the proposed parallel IS methods. Moreover, the expected performance-based ranking of the IS methods is the same as that of the parallel RS methods: all of them perform similarly on data consisting of well-separated sinusoids in noise, MUSIC and ESPRIT outperform the other methods in the case of closely-spaced sinusoids in white noise, and CAP outperforms PER for data whose spectrum has a small-to-medium dynamic range (MUSIC and ESPRIT should not be used in the latter case). 相似文献

12.

Granular modeling and computing approaches for intelligent analysis of non-geometric data

《Applied Soft Computing》2015

Data analysis techniques have been traditionally conceived to cope with data described in terms of numeric vectors. The reason behind this fact is that numeric vectors have a well-defined and clear geometric interpretation, which facilitates the analysis from the mathematical viewpoint. However, the state-of-the-art research on current topics of fundamental importance, such as smart grids, networks of dynamical systems, biochemical and biophysical systems, intelligent trading systems, multimedia content-based retrieval systems, and social networks analysis, deal with structured and non-conventional information characterizing the data, providing richer and hence more complex patterns to be analyzed. As a consequence, representing patterns by complex (relational) structures and defining suitable, usually non-metric, dissimilarity measures is becoming a consolidated practice in related fields. However, as the data sources become more complex, the capability of judging over the data quality (or reliability) and related interpretability issues can be seriously compromised. For this purpose, automated methods able to synthesize relevant information, and at the same time rigorously describe the uncertainty in the available datasets, are very important: information granulation is the key aspect in the analysis of complex data. In this paper, we discuss our general viewpoint on the adoption of information granulation techniques in the general context of soft computing and pattern recognition, conceived as a fundamental approach towards the challenging problem of automatic modeling of complex systems. We focus on the specific setting of processing the so-called non-geometric data, which diverges significantly from what has been done so far in the related literature. We highlight the motivations, the founding concepts, and finally we provide the high-level conceptualization of the proposed data analysis framework. 相似文献

13.

Static and incremental dynamic approaches for multi-objective bitmap join indexes selection in data warehouses

Toumi Lyazid Ugur Ahmet 《The Journal of supercomputing》2021,77(4):3933-3958

The Journal of Supercomputing - Data warehouses are very large databases and play key role in intelligent decision making in enterprises. The bitmap join indexes selection problem is crucial in the... 相似文献

14.

Rogue components: their effect and control using logical analysis of data

Mohamad-Ali Mortada Thomas Carroll III Soumaya Yacout Aouni Lakis 《Journal of Intelligent Manufacturing》2012,23(2):289-302

There is a small subset of any repairable component population that can develop a failure mode outside the scope of the standard repair and overhaul procedures, which makes them “rogue”. When this happens, a Darwinian-like “natural selection” phenomenon ensures that they will be placed in the most disadvantageous position in the asset management program, negatively affecting multiple aspects of the operational and maintenance organizations. Rogue components have long plagued the airline industry and created havoc in their asset management programs. In this paper, we describe how these rogues develop, outline the natural selection process that leads to their hampering the asset management program, and examine some of the negative impacts that ensue. Then we propose a Condition based maintenance approach to control the development of these components. We explore the use of a supervised learning data mining technique called Logical analysis of data (LAD) in CBM for the purpose of detecting rogues within a population of repairable components. We apply the resulting LAD based decision model on an inventory of turbo compressors belonging to an airline fleet. Finally, we evaluate the applicability of LAD to the rogue component detection problem and review its efficiency as a decision model for this type of problem. 相似文献

15.

Feature selection generating directed rough-spanning tree for crime pattern analysis

Das Priyanka Das Asit Kumar Nayak Janmenjoy 《Neural computing & applications》2020,32(12):7623-7639

Neural Computing and Applications - Nowadays, crime is a major threat to the society that affects the normal life of human beings all over the world. It is very important to make the world free... 相似文献

16.

Fault diagnosis in power transformers using multi-class logical analysis of data

Mohamad-Ali Mortada Soumaya Yacout Aouni Lakis 《Journal of Intelligent Manufacturing》2014,25(6):1429-1439

This paper presents the implementation of a novel multi-class diagnostic technique for the detection and identification of faults based on an approach called logical analysis of data (LAD). LAD is a data mining, artificial intelligence approach that is based on pattern recognition. In the context of condition based maintenance (CBM), historical data containing condition indices and the state of the machine are the inputs to LAD. After training and testing phases, LAD generates patterns that characterize the faulty states according to the type of fault, and differentiate between these states and the normal state. These patterns are found by solving a mixed 0–1 integer linear programming problem. They are then used to detect and to identify a future unknown state of equipment. The diagnostic technique has already been tested on several known machine learning datasets. The results proved that the performance of this technique is comparable to other conventional approaches, such as neural network and support vector machine, with the added advantage of the clear interpretability of the generated patterns, which are rules characterizing the faults’ types. To demonstrate its merit in fault diagnosis, the technique is used in the detection and identification of faults in power transformers using dissolved gas analysis data. The paper reaches the conclusion that the multi-class LAD based fault detection and identification is a promising diagnostic approach in CBM. 相似文献

17.

Prediction-based underutilized and destination host selection approaches for energy-efficient dynamic VM consolidation in data centers

Haghshenas Kawsar Mohammadi Siamak 《The Journal of supercomputing》2020,76(12):10240-10257

The Journal of Supercomputing - Improving the energy efficiency while guaranteeing quality of services (QoS) is one of the main challenges of efficient resource management of large-scale data... 相似文献

18.

Variable selection via combined penalization for high-dimensional data analysis

Xiaoming Wang 《Computational statistics & data analysis》2010,54(10):2230-2243

We propose a new penalized least squares approach to handling high-dimensional statistical analysis problems. Our proposed procedure can outperform the SCAD penalty technique (Fan and Li, 2001) when the number of predictors p is much larger than the number of observations n, and/or when the correlation among predictors is high. The proposed procedure has some of the properties of the smoothly clipped absolute deviation (SCAD) penalty method, including sparsity and continuity, and is asymptotically equivalent to an oracle estimator. We show how the approach can be used to analyze high-dimensional data, e.g., microarray data, to construct a classification rule and at the same time automatically select significant genes. A simulation study and real data examples demonstrate the practical aspects of the new method. 相似文献

19.

Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data

Chia-Ming Wang Yin-Fu Huang 《Expert systems with applications》2009,36(3):5900-5908

In this paper, the feature selection problem was formulated as a multi-objective optimization problem, and new criteria were proposed to fulfill the goal. Foremost, data were pre-processed with missing value replacement scheme, re-sampling procedure, data type transformation procedure, and min-max normalization procedure. After that a wide variety of classifiers and feature selection methods were conducted and evaluated. Finally, the paper presented comprehensive experiments to show the relative performance of the classification tasks. The experimental results revealed the success of proposed methods in credit approval data. In addition, the numeric results also provide guides in selection of feature selection methods and classifiers in the knowledge discovery process. 相似文献

20.

A composite gene selection for DNA microarray data analysis

Dong Kyun Park Eun-Young Jung Sang-Hong Lee Joon S. Lim 《Multimedia Tools and Applications》2015,74(20):9031-9041

相似文献