期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Selective sampling for approximate clustering of very large data sets

Liang Wang James C. Bezdek Christopher Leckie Ramamohanarao Kotagiri 《国际智能系统杂志》2008,23(3):313-331

A key challenge in pattern recognition is how to scale the computational efficiency of clustering algorithms on large data sets. The extension of non‐Euclidean relational fuzzy c‐means (NERF) clustering to very large (VL = unloadable) relational data is called the extended NERF (eNERF) clustering algorithm, which comprises four phases: (i) finding distinguished features that monitor progressive sampling; (ii) progressively sampling from a N × N relational matrix R_N to obtain a n × n sample matrix R_n; (iii) clustering R_n with literal NERF; and (iv) extending the clusters in R_n to the remainder of the relational data. Previously published examples on several fairly small data sets suggest that eNERF is feasible for truly large data sets. However, it seems that phases (i) and (ii), i.e., finding R_n, are not very practical because the sample size n often turns out to be roughly 50% of n, and this over‐sampling defeats the whole purpose of eNERF. In this paper, we examine the performance of the sampling scheme of eNERF with respect to different parameters. We propose a modified sampling scheme for use with eNERF that combines simple random sampling with (parts of) the sampling procedures used by eNERF and a related algorithm sVAT (scalable visual assessment of clustering tendency). We demonstrate that our modified sampling scheme can eliminate over‐sampling of the original progressive sampling scheme, thus enabling the processing of truly VL data. Numerical experiments on a distance matrix of a set of 3,000,000 vectors drawn from a mixture of 5 bivariate normal distributions demonstrate the feasibility and effectiveness of the proposed sampling method. We also find that actually running eNERF on a data set of this size is very costly in terms of computation time. Thus, our results demonstrate that further modification of eNERF, especially the extension stage, will be needed before it is truly practical for VL data. © 2008 Wiley Periodicals, Inc. 相似文献

2.

Extending fuzzy and probabilistic clustering to very large data sets

Richard J. Hathaway James C. Bezdek 《Computational statistics & data analysis》2006,51(1):215-234

Approximating clusters in very large (VL=unloadable) data sets has been considered from many angles. The proposed approach has three basic steps: (i) progressive sampling of the VL data, terminated when a sample passes a statistical goodness of fit test; (ii) clustering the sample with a literal (or exact) algorithm; and (iii) non-iterative extension of the literal clusters to the remainder of the data set. Extension accelerates clustering on all (loadable) data sets. More importantly, extension provides feasibility—a way to find (approximate) clusters—for data sets that are too large to be loaded into the primary memory of a single computer. A good generalized sampling and extension scheme should be effective for acceleration and feasibility using any extensible clustering algorithm. A general method for progressive sampling in VL sets of feature vectors is developed, and examples are given that show how to extend the literal fuzzy (c-means) and probabilistic (expectation-maximization) clustering algorithms onto VL data. The fuzzy extension is called the generalized extensible fast fuzzy c-means (geFFCM) algorithm and is illustrated using several experiments with mixtures of five-dimensional normal distributions. 相似文献

3.

A new formulation of the coVAT algorithm for visual assessment of clustering tendency in rectangular data

Timothy C. Havens James C. Bezdek 《国际智能系统杂志》2012,27(6):590-612

Since 1998, a graphical representation used in visual clustering called the reordered dissimilarity image or cluster heat map has appeared in more than 4000 biological or biomedical publications. These images are typically used to visually estimate the number of clusters in a data set, which is the most important input to most clustering algorithms, including the popularly chosen fuzzy c‐means and crisp k‐means. This paper presents a new formulation of a matrix reordering algorithm, coVAT, which is the only known method for providing visual clustering information on all four types of cluster structure in rectangular relational data. Finite rectangular relational data are an m× n array R of relational values between m row objects O_r and n column objects O_c. R presents four clustering problems: clusters in O_r, O_c, O_r∪c, and coclusters containing some objects from each of O_r and O_c. coVAT1 is a clustering tendency algorithm that provides visual estimates of the number of clusters to seek in each of these problems by displaying reordered dissimilarity images. We provide several examples where coVAT1 fails to do its job. These examples justify the introduction of coVAT2, a modification of coVAT1 based on a different reordering scheme. We offer several examples to illustrate that coVAT2 may detect coclusters in R when coVAT1 does not. Furthermore, coVAT2 is not limited to just relational data R. The R matrix can also take the form of feature data, such as gene microarray data where each data element is a real number: Positive values indicate upregulation, and negative values indicate downregulation. We show examples of coVAT2 on microarray data that indicate coVAT2 shows cluster tendency in these data. © 2012 Wiley Periodicals, Inc. 相似文献

4.

Relational mountain (density) clustering method and web log analysis

Kuhu Pal Nikhil R. Pal James M. Keller James C. Bezdek 《国际智能系统杂志》2005,20(3):375-392

The mountain clustering method and the subtractive clustering method are useful methods for finding cluster centers based on local density in object data. These methods have been extended to shell clustering. In this article, we propose a relational mountain clustering method (RMCM), which produces a set of (proto) typical objects as well as a crisp partition of the objects generating the relation, using a new concept that we call relational density. We exemplify RMCM by clustering several relational data sets that come from object data. Finally, RMCM is applied to web log analysis, where it produces useful user profiles from web log data. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 375–392, 2005. 相似文献

5.

Diagonal polynomials for small dimensions

J. S. Lew L. B. Morales A. Sánchez-Flores 《Theory of Computing Systems》1996,29(3):305-310

HereR andN denote respectively the real numbers and the nonnegative integers. Also 0 <n εN, ands(x) =x ₁+...+x _n when x = (x ₁,...,x _n) εR ⁿ. Adiagonal function of dimensionn is a mapf onN ⁿ (or any larger set) that takesN ⁿ bijectively ontoN and, for all x, y inN ⁿ, hasf(x) <f(y) whenevers(x) <s(y). We show that diagonalpolynomials f of dimensionn all have total degreen and have the same terms of that degree, so that the lower-degree terms characterize any suchf. We call two polynomialsequivalent if relabeling variables makes them identical. Then, up to equivalence, dimension two admits just one diagonal polynomial, and dimension three admits just two. 相似文献

6.

Two-Pass Rearrangeability in Faulty Benes Networks

Nabanita Das Jayasree Dattagupta 《Journal of Parallel and Distributed Computing》1996,35(2):191

Existing fault-tolerant routing schemes for Benes networks either consider only the control line stuck-at faults, or handle the switch faults by some graceful degradation routing schemes that reconfigure the network into a smaller system with minimal loss. Now, even in the presence of a single switch fault in anN×NBenes networkB(n), (n= log₂N), noN×Npermutation can be realized in a single pass. In this paper, we attempt to characterize the switch fault sets inB(n), in the presence of which the network is always capable of realizing any arbitraryN×NpermutationPin two passes, such that any source–destination path is set up in a single pass, no recirculation is needed, but the whole set ofNsource–destination paths ofPis partitioned in two subsets and are realized in two successive passes. We propose an algorithm that will detect if the switch fault set present in aB(n), belongs to this class; if it is yes, we present another algorithm that computes the fault-tolerant routing to realize any arbitrary permutationPin two passes. This scheme enables us to makeB(n) fault-tolerant in the presence of a restricted class of multiple switch faults, without any recirculation through intermediate nodes, or any reconfiguration of the system. 相似文献

7.

Diagonal polynomials and diagonal orders on multidimensional lattices

L. B. Morales 《Theory of Computing Systems》1997,30(4):367-382

HereR andN denote the real numbers and the nonnegative integers, respectively. Alsos(x)=x ₁+···+x _n whenx=(x ₁, …,x _n) inR ⁿ. A mapf:R ⁿ →R is call adiagonal function of dimensionn iff|N ⁿ is a bijection ontoN and, for allx, y inN ⁿ, f(x)<f(y) whens(x)<s(y). Morales and Lew [6] constructed 2ⁿ⁻² inequivalent diagonal polynomial functions of dimensionn for eachn>1. Here we use new combinatorial ideas to show that numberd _n of such functions is much greater than 2ⁿ⁻² forn>3. These combinatorial ideas also give an inductive procedure to constructd _n+1 diagonal orderings of {1, …,n}. 相似文献

8.

Generalized fuzzy c-means clustering strategies using L_pnorm distances

Hathaway R.J. Bezdek J.C. Yingkang Hu 《Fuzzy Systems, IEEE Transactions on》2000,8(5):576-582

Fuzzy c-means (FCM) is a useful clustering technique. Modifications of FCM using L₁ norm distances increase robustness to outliers. Object and relational data versions of FCM clustering are defined for the more general case where the L_p norm (p⩾1) or semi-norm (00 in order to facilitate the empirical examination of the object data models. Both object and relational approaches are included in a numerical study 相似文献

9.

Color indexing for efficient image retrieval

G. Phanendra Babu Babu M. Mehtre Mohan S. Kankanhalli 《Multimedia Tools and Applications》1995,1(4):327-348

Content based image retrieval is an active area of research. Many approaches have been proposed to retrieve images based on matching of some features derived from the image content. Color is an important feature of image content. The problem with many traditional matching-based retrieval methods is that the search time for retrieving similar images for a given query image increases linearly with the size of the image database. We present an efficient color indexing scheme for similarity-based retrieval which has a search time that increases logarithmically with the database size.In our approach, the color features are extracted automatically using a color clustering algorithm. Then the cluster centroids are used as representatives of the images in 3-dimensional color space and are indexed using a spatial indexing method that usesR-tree. The worst case search time complexity of this approach isOn _q log(N^* n_avg)), whereN is the number of images in the database, andn _q andn _avg are the number of colors in the query image and the average number of colors per image in the database respectively. We present the experimental results for the proposed approach on two databases consisting of 337 Trademark images and 200 Flag images. 相似文献

10.

Cost analysis of the longest-side (triangle bisection) refinement algorithm for triangulations

M. -C. Rivara M. Vemere 《Engineering with Computers》1996,12(3-4):224-234

The triangulation refinement problem, as formulated in the adaptive finite element setting (also useful in the rendering of complex scenes), is discussed. This can be formulated as follows: given a valid, non-degenerate triangulation of a polygonal region, construct a locally refined triangulation, with triangles of prescribed size in a refinement regionR, and such that the smallest (or the largest) angle is bounded. To cope with this problem, longest-side refinement algorithms guarantee the construction of good quality irregular triangulations. This is due in part to their natural refinement propagation strategy farther than the (refinement) area of interestR. In this paper we prove that, asymptotically, the numberN of points inserted inR to obtain triangles of prescribed size, is optimal. Furthermore, in spite of the unavoidable propagation outside the refinement regionR, the time cost of the algorithm is linear inN, independent of the size of the triangulation. Specifically, the number of points inserted outsideR is of orderO(n log₂ n) whereN=O(n²). We prove the latter result for circular and rectangular refinement regions, which allows us to conclude that this is true for general convex refinement regions. We also include empirical evidence, both in two and three dimensions, which is in complete agreement with the theory, even for small values ofN. 相似文献

11.

On contra‐symmetry and MPT conditionality in fuzzy logic

E. Trillas C. Alsina E. Renedo A. Pradera 《国际智能系统杂志》2005,20(3):313-326

This article deals with the N‐contrapositive symmetry of fuzzy implication operators J verifying either Modus Ponens or Modus Tollens inequalities, in a similar and complementary framework to the one in which Fodor (“Contrapositive symmetry of fuzzy implications.” Fuzzy Set Syst 1995;69:141–156) did begin with the subject in fuzzy logic, that is, with the verification of J(a, b) = J(N(b), N(a)) for all a, b in [0,1] and some strong‐negation function N. This property corresponds to the classical p → q = ¬q → ¬p. The aim of this article is to study that property in relation to either Modus Ponens or Modus Tollens meta‐rules of inference when the functions J are taken among those that belong to the usual families of implications in fuzzy logic. That is, the contra‐positive of S implications, R implications, Q implications, and Mamdani–Larsen operators, verifying either Modus Ponens or Modus Tollens inequalities or both, the conditionality's aspect on which lies the complementarity with Fodor. Within this study new types of implication functions are introduced and analyzed. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 313–326, 2005. 相似文献

12.

An improved FCM algorithm with adaptive weights based on SA-PSO

Wu Ziheng Wu Zhongcheng Zhang Jun 《Neural computing & applications》2017,28(10):3113-3118

Fuzzy c-means clustering algorithm (FCM) often used in pattern recognition is an important method that has been successfully used in large amounts of practical applications. The FCM algorithm assumes that the significance of each data point is equal, which is obviously inappropriate from the viewpoint of adaptively adjusting the importance of each data point. In this paper, considering the different importance of each data point, a new clustering algorithm based on FCM is proposed, in which an adaptive weight vector W and an adaptive exponent p are introduced and the optimal values of the fuzziness parameter m and adaptive exponent p are determined by SA-PSO when the objective function reaches its minimum value. In this method, the particle swarm optimization (PSO) is integrated with simulated annealing (SA), which can improve the global search ability of PSO. Experimental results have demonstrated that the proposed algorithm can avoid local optima and significantly improve the clustering performance.

相似文献

13.

Near‐noon albedo values of alfalfa and tall fescue grass derived from multispectral data

J. O. Payero C. M. U. Neale J. L. Wright 《International journal of remote sensing》2013,34(3):569-586

A remote sensing approach was applied to estimate near‐noon values of shortwave albedo (α), the fraction of solar radiation reflected by a surface, for alfalfa and tall fescue grass at Kimberly, Idaho. The approach was based on the (P/T) ratio, which is the ratio of the partial radiation (P) sensed by a multi‐band radiometer and the total incident radiation (T) in a given wavelength range. It was found that instead of being constant, as previously suggested, the upward component of the (P/T) ratio under clear‐sky conditions [(P/T)_u] followed a logistic growth function of solar altitude angle (Λ_z) for both crops (r ² = 0.84). The downward component [(P/T)_d], on the other hand, linearly increased with Λ_z (r ² = 0.83). By applying the (P/T) ratio methodology, using variable ratios, it was found that the diurnal pattern of clear‐sky α for both crops followed a decreasing function of Λ_z (r ² = 0.80). Near‐noon α values for alfalfa estimated using remote sensing were linearly related to plant canopy height (h) (r ² = 0.92), but not to Λ_z. For grass, on the other hand, the near‐noon α values obtained by remote sensing were not correlated with either h or Λ_z. The near‐noon α values for alfalfa obtained with remote sensing deviated considerably from those estimated using an empirical function of day of the year (DOY). For alfalfa, the near‐noon net radiation (R _n) values calculated using α values derived by remote sensing were better correlated to measured R _n values than those obtained using α estimated as a function of DOY. For grass, the α values derived from remote sensing did not significantly improve the accuracy of the calculated near‐noon R _n compared with using α values estimated as a function of Λ_z. 相似文献

14.

Density-Weighted Fuzzy c-Means Clustering

《Fuzzy Systems, IEEE Transactions on》2009,17(1):243-252

In this short paper, a unified framework for performing density-weighted fuzzy $c$-means (FCM) clustering of feature and relational datasets is presented. The proposed approach consists of reducing the original dataset to a smaller one, assigning each selected datum a weight reflecting the number of nearby data, clustering the weighted reduced dataset using a weighted version of the feature or relational data FCM algorithm, and if desired, extending the reduced data results back to the original dataset. Several methods are given for each of the tasks of data subset selection, weight assignment, and extension of the weighted clustering results. The newly proposed weighted version of the non-Euclidean relational FCM algorithm is proved to produce the identical results as its feature data analog for a certain type of relational data. Artificial and real data examples are used to demonstrate and contrast various instances of this general approach. 相似文献

15.

Clustering methods for geometric objects and applications to design problems

F. Dehne H. Noltemeier 《The Visual computer》1986,2(1):31-38

Clustering of geometric objects is a very familiar and important problem in many different areas of applications as well as in the theoretical foundation of some modern fields of computer science. This paper describes how design problems, especially the design of an assembly line, can be transformed into a clustering problem. In order to solve the problem for large sizes of input data we introduce a structure, called Voronoi Tree, which applied to our real world data (assembly line design) did not only reduce the time to get a feasible design of an assembly line dramatically, but additionally increased the value of the design by more than 30% (in comparison with standard design methods). In addition to this we introduce a clustering method which is of interest for those applications which can be transformed to planar clustering problems. In this particular case it is possible to compute an (hierarchically) optimized clustering with resp. to a large class of clustering measures in timeO(nn^1/2log³ n+U _F(n)nn^1/2+P _F(n)) [n: number of points;U _F(n), P_F(n) dependent on the chosen clustering measure]. 相似文献

16.

Comparisons of algorithms to estimate water turbidity in the coastal areas of China

Lufei Zheng Yan Zhou Deyong Sun Shengqiang Wang Wei Wu 《International journal of remote sensing》2016,37(24):6165-6186

Turbidity is an important indicator of water environments and water-quality conditions. Ocean colour remote sensing has proved to be an efficient way of monitoring water turbidity because of its wide synoptic coverage and repeated regular sampling. However, operational tasks are still challenging in high-turbidity waters, especially in estuaries and the coastal regions of China. In these areas, the existing algorithms derived from remote-sensing reflectance (R_rs) are usually invalid because it is difficult to correctly estimate the reflectance R_rs from satellite data such as Moderate Resolution Imaging Spectroradiometer (MODIS) data. A new algorithm that uses Rayleigh-corrected reflectance (R_rc) instead of R_rs has been recently introduced and was used to estimate water turbidity in Zhejiang (ZJ) coastal areas from Geostationary Ocean Color Imager (GOCI) data. The R_rc algorithm has previously shown a capability to estimate water turbidity. However, its performance still requires careful evaluation. In this article, we compared the new R_rc algorithm with two other existing algorithms. Differences among the three algorithms were assessed by comparing the results from using R_rc data and R_rs reflectance data derived from both GOCI and MODIS imagery data. The capability of the new R_rc algorithm to estimate water turbidity in larger areas and extended seasons in the coastal seas of China was also estimated. The results showed that the new R_rc algorithm is suitable for the coastal waters of China, especially for highly turbid waters. 相似文献

17.

nk‐bags

Kankana Chakrabarty Ioan Despi 《国际智能系统杂志》2007,22(2):223-236

We generalize the concept of bags by introducing the notion of n^k‐bags with membership functions ranging in P(N), the power set of positive integers. Consequently, a number of operations on n^k‐bags are defined and some characterizations are done. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 223–236, 2007. 相似文献

18.

Regression analysis between environmental factors and earthquake-damaged trace: a case study in Wenchuan County,China

Yong Luo Zhi-Jun Zheng Tao She 《International journal of remote sensing》2013,34(22):7088-7098

This article discusses the integration of remote sensing (RS), geographic information system (GIS), GPS and ground survey technologies to establish the accuracy of supervised classification. N _e, the proportion of earthquake-damaged trace in a specific region, as a relative value expresses the influence of the characteristics of an area on the distribution of earthquake-damaged trace; it reflects the earthquake-damaged trace's distribution characteristics better. Four factors – distance to central fault, slope, elevation and distance to river – were selected to quantify the association between environmental factors and N _e by regression analysis. The results show the following. (1) The association between N _e and distance to river (R ²?=?0.99) and between N _e and distance to central fault (R ²?=?1.00) is better modelled by an inverse second-order line. N _e is largest within 0.5 km of the river and within 5 km of the central fault. When the distance is more than 2.0 km from the river and more than 30 km from the central fault, N _e dramatically decreases. (2) The association between N _e and elevation (R ²?=?0.94) is better modelled by the exponential decay line. With an increasing elevation, N _e decreases with the weakening of human activities. (3) The association between N _e and slope (R ²?=?0.93) is better modelled by exponential growth. With an increasing slope, N _e first decreases and then increases. (4) A possible mechanism is that the earthquake energy first acts on the area near the central fault. The erosion of the river leads to a steep slope; the unstable geologic structure is easily destroyed by the effect of seismic waves. Consequently, earthquake-damaged trace and N _e are greater in regions near the central fault and river bank than in other regions. 相似文献

19.

Why clustering in function approximation? Theoretical explanation

Vladik Kreinovich Yeung Yam 《国际智能系统杂志》2000,15(10):959-966

Function approximation is a very important practical problem: in many practical applications, we know the exact form of the functional dependence y=f(x₁,…,x_n) between physical quantities, but this exact dependence is complicated, so we need a lot of computer space to store it, and a lot of time to process it, i.e., to predict y from the given x_i. It is therefore necessary to find a simpler approximate expression g(x₁,…,x_n)≈f(x₁,…,x_n) for this same dependence. This problem has been analyzed in numerical mathematics for several centuries, and it is, therefore, one of the most thoroughly analyzed problems of applied mathematics. There are many results related to approximation by polynomials, trigonometric polynomials, splines of different type, etc. Since this problem has been analyzed for so long, no wonder that for many reasonable formulations of the optimality criteria, the corresponding problems of finding the optimal approximations have already been solved. Lately, however, new clustering‐related techniques have been applied to solve this problem (by Yager, Filev, Chu, and others). At first glance, since for most traditional optimality criteria, optimal approximations are already known, the clustering approach can only lead to non‐optimal approximations, i.e., approximations of inferior quality. We show, however, that there exist new reasonable criteria with respect to which clustering‐based function approximation is indeed the optimal method of function approximation. © 2000 John Wiley & Sons, Inc. 相似文献

20.

Retrieving seawater turbidity from Landsat TM data by regressions and an artificial neural network

T. Y. Gan O. A. Kalinga K. Ohgushi H. Araki 《International journal of remote sensing》2013,34(21):4593-4615

The radiance reflected at the sea surface (R_W (λ)) of the Ariake Sea, Japan, was first estimated by subtracting Lowtran 7 estimated Rayleigh and aerosol scattered radiances from Landsat Thematic Mapper measured radiance. Then R_W (λ) was averaged from 4×4 pixel windows centred at 33 sampling sites of the Ariake Sea and the data calibrated against the observed Secchi disk depth (SDD) using linear (LR) and nonlinear (NLR) regressions, and an artificial neural network (ANN) algorithm called the Modified Counter Propagation Network (MCPN). We found that at the validation stage, multi-date R_W (λ) data that are mainly based on the visible channels of Landsat Thematic Mapper (TM) predict more accurate and dependable SDDs than single-date R_W (λ) data. Furthermore, the NLR describes the SDD/R_W (λ) relationship more closely than the LR. As an ANN, MCPN possesses non-linearity, inter-connectivity, and an ability to learn and generalize information from complex or poorly understood systems, which enables it to even better represent the SDD/R_W (λ) relationship than the NLR. Our study confirms the feasibility of retrieving SDD (or turbidity) from Landsat TM data, and it seems that the calibrated MCPN and possibly NLR are portable temporally within the Ariake Sea. Lastly, the coefficient of efficiency E_f is a more stringent and probably a more accurate statistical measure than the popular coefficient of determination R ². 相似文献