首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Spatial outlier detection is an important research problem that has received much attentions in recent years. Most existing approaches are designed for numerical attributes, but are not applicable to categorical ones (e.g., binary, ordinal, and nominal) that are popular in many applications. The main challenges are the modeling of spatial categorical dependency as well as the computational efficiency. This paper presents the first outlier detection framework for spatial categorical data. Specifically, a new metric, named as Pair Correlation Ratio (PCR), is measured for each pair of category sets based on their co-occurrence frequencies at specific spatial distance ranges. The relevances among spatial objects are then calculated using PCR values with regard to their spatial distances. The outlierness for each object is defined as the inverse of the average relevance between an object and its spatial neighbors. Those objects with the highest outlier scores are returned as spatial categorical outliers. A set of algorithms are further designed for single-attribute and multi-attribute spatial categorical datasets. Extensive experimental evaluations on both simulated and real datasets demonstrated the effectiveness and efficiency of our proposed approaches.  相似文献   

3.
We are concerned with the issue of detecting outliers and change points from time series. In the area of data mining, there have been increased interest in these issues since outlier detection is related to fraud detection, rare event discovery, etc., while change-point detection is related to event/trend change detection, activity monitoring, etc. Although, in most previous work, outlier detection and change point detection have not been related explicitly, this paper presents a unifying framework for dealing with both of them. In this framework, a probabilistic model of time series is incrementally learned using an online discounting learning algorithm, which can track a drifting data source adaptively by forgetting out-of-date statistics gradually. A score for any given data is calculated in terms of its deviation from the learned model, with a higher score indicating a high possibility of being an outlier. By taking an average of the scores over a window of a fixed length and sliding the window, we may obtain a new time series consisting of moving-averaged scores. Change point detection is then reduced to the issue of detecting outliers in that time series. We compare the performance of our framework with those of conventional methods to demonstrate its validity through simulation and experimental applications to incidents detection in network security.  相似文献   

4.
Data envelopment analysis (DEA) uses extreme observations to identify superior performance, making it vulnerable to outliers. This paper develops a unified model to identify both efficient and inefficient outliers in DEA. Finding both types is important since many post analyses, after measuring efficiency, depend on the entire distribution of efficiency estimates. Thus, outliers that are distinguished by poor performance can significantly alter the results. Besides allowing the identification of outliers, the method described is consistent with a relaxed set of DEA axioms. Several examples demonstrate the need for identifying both efficient and inefficient outliers and the effectiveness of the proposed method. Applications of the model reveal that observations with low efficiency estimates are not necessarily outliers. In addition, a strategy to accelerate the computation is proposed that can apply to influential observation detection.  相似文献   

5.
Robust TSK fuzzy modeling for function approximation with outliers   总被引:3,自引:0,他引:3  
The Takagi-Sugeno-Kang (TSK) type of fuzzy models has attracted a great attention of the fuzzy modeling community due to their good performance in various applications. Most approaches for modeling TSK fuzzy rules define their fuzzy subspaces based on the idea of training data being close enough instead of having similar functions. Besides, training data sets algorithms often contain outliers, which seriously affect least-square error minimization clustering and learning algorithms. A robust TSK fuzzy modeling approach is presented. In the approach, a clustering algorithm termed as robust fuzzy regression agglomeration (RFRA) is proposed to define fuzzy subspaces in a fuzzy regression manner with robust capability against outliers. To obtain a more precision model, a robust fine-tuning algorithm is then employed. Various examples are used to verify the effectiveness of the proposed approach. From the simulation results, the proposed robust TSK fuzzy modeling indeed showed superior performance over other approaches  相似文献   

6.
This study proposes a hybrid robust approach for constructing Takagi–Sugeno–Kang (TSK) fuzzy models with outliers. The approach consists of a robust fuzzy C-regression model (RFCRM) clustering algorithm in the coarse-tuning phase and an annealing robust back-propagation (ARBP) learning algorithm in the fine-tuning phase. The RFCRM clustering algorithm is modified from the fuzzy C-regression models (FCRM) clustering algorithm by incorporating a robust mechanism and considering input data distribution and robust similarity measure into the FCRM clustering algorithm. Due to the use of robust mechanisms and the consideration of input data distribution, the fuzzy subspaces and the parameters of functions in the consequent parts are simultaneously identified by the proposed RFCRM clustering algorithm and the obtained model will not be significantly affected by outliers. Furthermore, the robust similarity measure is used in the clustering process to reduce the redundant clusters. Consequently, the RFCRM clustering algorithm can generate a better initialization for the TSK fuzzy models in the coarse-tuning phase. Then, an ARBP algorithm is employed to obtain a more precise model in the fine-tuning phase. From our simulation results, it is clearly evident that the proposed robust TSK fuzzy model approach is superior to existing approaches in learning speed and in approximation accuracy.  相似文献   

7.
针对模糊C均值(FCM)算法聚类数需要预先设定的问题,提出了一种新的模糊聚类有效性指标。首先,计算簇中每个属性的方差,给方差较小的属性赋予较大的权值,给方差较大的属性赋予较小的权值,得到一种基于属性加权的FCM算法;然后,根据FCM改进算法得到的隶属度矩阵计算类内紧致性和类间分离性;最后,利用类内紧致性和类间分离性定义一个新的聚类有效性指标。实验结果表明,该指标可以找到符合数据自然分布的类的数目。基于属性加权的FCM算法可以识别不同属性的重要程度,增加聚类结果的准确率,使用FCM改进算法得到的隶属度矩阵定义的有效性指标,能够发现正确的聚类个数,实现聚类无监督的学习过程。  相似文献   

8.
A cluster validity index for fuzzy clustering   总被引:1,自引:0,他引:1  
A new cluster validity index is proposed for the validation of partitions of object data produced by the fuzzy c-means algorithm. The proposed validity index uses a variation measure and a separation measure between two fuzzy clusters. A good fuzzy partition is expected to have a low degree of variation and a large separation distance. Testing of the proposed index and nine previously formulated indices on well-known data sets shows the superior effectiveness and reliability of the proposed index in comparison to other indices and the robustness of the proposed index in noisy environments.  相似文献   

9.
Modeling and querying fuzzy spatiotemporal databases   总被引:1,自引:0,他引:1  
Modeling spatiotemporal data, in particular fuzzy and complex spatial objects representing geographic entities and relations, is a topic of great importance in geographic information systems, computer vision, environmental data management systems, etc. Because of complex requirements, it is challenging to represent spatiotemporal data and its features in databases and to effectively query them. This article presents a new approach to model and query the spatiotemporal data of fuzzy spatial and complex objects and/or spatial relations. In our case study, we use a meteorological database application in an intelligent database architecture, which combines an object-oriented database with a knowledgebase for modeling and querying spatiotemporal objects.  相似文献   

10.
针对传统支持向量机对于噪声和野点敏感的问题,采用一种模糊技术去除样本中的噪声和野点.应用基于样本之间的紧密度确定每个样本的模糊隶属度,通过训练确定阀值,去除影响得到最优分类超平面的噪声和野点.实验结果表明,与传统的支持向量机相比,该方法提高了支持向量机的抗噪能力,在不影响精度的前提下,线性规划下的一类分类方法要比二次规划节省很多时间.  相似文献   

11.
12.
In this article, a cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set. The maximum value of this index, called the PBM-index, across the hierarchy provides the best partitioning. The index is defined as a product of three factors, maximization of which ensures the formation of a small number of compact clusters with large separation between at least two clusters. We have used both the k-means and the expectation maximization algorithms as underlying crisp clustering techniques. For fuzzy clustering, we have utilized the well-known fuzzy c-means algorithm. Results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters, as compared to three other well-known measures, the Davies-Bouldin index, Dunn's index and the Xie-Beni index, are provided for several artificial and real-life data sets.  相似文献   

13.
In the context of resistant learning, outliers are the observations far away from the fitting function that is deduced from a subset of the given observations and whose form is adaptable during the process. This study presents a resistant learning procedure for coping with outliers via single-hidden layer feed-forward neural network (SLFN). The smallest trimmed sum of squared residuals principle is adopted as the guidance of the proposed procedure, and key mechanisms are: an analysis mechanism that excludes any potential outliers at early stages of the process, a modeling mechanism that deduces enough hidden nodes for fitting the reference observations, an estimating mechanism that tunes the associated weights of SLFN, and a deletion diagnostics mechanism that checks to see if the resulted SLFN is stable. The lake data set is used to demonstrate the resistant-learning performance of the proposed procedure.  相似文献   

14.
Fuzzy regression (FR) been demonstrated as a promising technique for modeling manufacturing processes where availability of data is limited. FR can only yield linear type FR models which have a higher degree of fuzziness, but FR ignores higher order or interaction terms and the influence of outliers, all of which usually exist in the manufacturing process data. Genetic programming (GP), on the other hand, can be used to generate models with higher order and interaction terms but it cannot address the fuzziness of the manufacturing process data. In this paper, genetic programming-based fuzzy regression (GP-FR), which combines the advantages of the two approaches to overcome the deficiencies of the commonly used existing modeling methods, is proposed in order to model manufacturing processes. GP-FR uses GP to generate model structures based on tree representation which can represent interaction and higher order terms of models, and it uses an FR generator based on fuzzy regression to determine outliers in experimental data sets. It determines the contribution and fuzziness of each term in the model by using experimental data excluding the outliers. To evaluate the effectiveness of GP-FR in modeling manufacturing processes, it was used to model a non-linear system and an epoxy dispensing process. The results were compared with those based on two commonly used FR methods, Tanka’s FR and Peters’ FR. The prediction accuracy of the models developed based on GP-FR was shown to be better than that of models based on the other two FR methods.  相似文献   

15.
A transient stability index is described for the online security evaluation of electric power systems by providing a measure for their level of security. A technique for constructing this index is developed, applying the principles of pattern recognition and fuzzy sets theory while the stability determination is based on the initial fault-on accelerations of machine rotors. The analysis of two sample transmission systems is presented to illustrate the application of the developed index  相似文献   

16.
Conventional portfolio optimization models have an assumption that the future condition of stock market can be accurately predicted by historical data. However, no matter how accurate the past data is, this premise will not exist in the financial market due to the high volatility of market environment. This paper discusses the fuzzy portfolio optimization problem where the asset returns are represented by fuzzy data. A mean-absolute deviation risk function model and Zadeh’s extension principle are utilized for the solution method of portfolio optimization problem with fuzzy returns. Since the parameters are fuzzy numbers, the gain of return is a fuzzy number as well. A pair of two-level mathematical programs is formulated to calculate the upper bound and lower bound of the return of the portfolio optimization problem. Based on the duality theorem and by applying the variable transformation technique, the pair of two-level mathematical programs is transformed into a pair of ordinary one-level linear programs so they can be manipulated. It is found that the calculated results conform to an essential idea in finance and economics that the greater the amount of risk that an investor is willing to take on, the greater the potential return. An example, which utilizes the data from Taiwan stock exchange corporation, illustrates the whole idea on fuzzy portfolio optimization problem.  相似文献   

17.
An essential activity to obtain valuable information to identify, for example, intrusions, faults, system failures, etc, is outliers detection. This paper proposes a bio-inspired algorithm able to detect anomaly data in distributed systems. Each data object is associated with a mobile agent that follows the well-known bio-inspired algorithm of flocking. The agents are randomly disseminated onto a virtual space where they move autonomously in order to form one or more flocks. Through a tailored similarity function, the agents associated with similar objects join in the same flock, whereas, the agents associated with dissimilar objects do not join in any flock. The objects associated with isolated agents or associated with agents grouped into flock with a number of entities lower than a given threshold, represent the outliers. Experimental results on synthetic and real data sets confirm the validity of the approach.  相似文献   

18.
How to determine topological relationships is one of the most important operations on fuzzy spatiotemporal data. The proposed strategies impose strict restrictions on structure and data types of fuzzy spatiotemporal data, and fall short in their abilities to handle fuzzy attributes extension and fuzzy time extension. To overcome these limitations, in this paper, we first establish a fuzzy spatiotemporal data model based on XML. Then, we propose strategies of transforming two general fuzzy spatiotemporal data trees into one binary fuzzy spatiotemporal data tree. In succession, an effective algorithm to match the desired twigs is proposed after extending the region coding scheme to compatible with fuzzy spatiotemporal data. Our approach adopts XML twig pattern technique to determine topological relationship continuously so that it can reduce unnecessary execution time of querying the desired nodes. More importantly, we use a pointer array to eliminate unnecessary execution time of twig matching. Finally, the experimental results demonstrate the performance advantages of our approach.  相似文献   

19.
A spatiotemporal communication protocol for wireless sensor networks   总被引:2,自引:0,他引:2  
In this paper, we present a spatiotemporal communication protocol for sensor networks, called SPEED. SPEED is specifically tailored to be a localized algorithm with minimal control overhead. End-to-end soft real-time communication is achieved by maintaining a desired delivery speed across the sensor network through a novel combination of feedback control and nondeterministic geographic forwarding. SPEED is a highly efficient and scalable protocol for sensor networks where the resources of each node are scarce. Theoretical analysis, simulation experiments, and a real implementation on Berkeley motes are provided to validate the claims.  相似文献   

20.
Several statistical decision making tools and methods are available to organize evidence, evaluate risks, and aid in decision making. Process capability indices are the summary statistics to point out the process performance. In this paper, these indices are analyzed to obtain a new decision making tool. Process accuracy index (Ca) measures the degree of process centering and gives alerts when the process mean departures from the target value. It focuses on the location of process mean and the distance between mean and target value. We modify the traditional process accuracy index to obtain a new tool under fuzziness. With the proposed tool, specification limits and process mean can be defined as triangular or trapezoidal fuzzy numbers. The proposed tool is illustrated to solve a supplier selection problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号