期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Automatic clustering using an improved artificial bee colony optimization for customer segmentation

R. J. Kuo Ferani E. Zulvia 《Knowledge and Information Systems》2018,57(2):331-357

In cluster analysis, determining number of clusters is an important issue because information about the most appropriate number of clusters do not exist in the real-world problems. Automatic clustering is a clustering approach which is able to automatically find the most suitable number of clusters as well as divide the instances into the corresponding clusters. This study proposes a novel automatic clustering algorithm using a hybrid of improved artificial bee colony optimization algorithm and K-means algorithm (iABC). The proposed iABC algorithm improves the onlooker bee exploration scheme by directing their movements to a better location. Instead of using a random neighborhood location, the improved onlooker bee considers the data centroid to find a better initial centroid for the K-means algorithm. To increase efficiency of the improvement, the updating process is only applied on the worst cluster centroid. The proposed iABC algorithm is verified using some benchmark datasets. The computational result indicates that the proposed iABC algorithm outperforms the original ABC algorithm for automatic clustering problem. Furthermore, the proposed iABC algorithm is utilized to solve the customer segmentation problem. The result reveals that the iABC algorithm has better and more stable result than original ABC algorithm. 相似文献

2.

A closed asynchronous dynamic model of cellular learning automata and its application to peer-to-peer networks

Ali Mohammad Saghiri Mohammad Reza Meybodi 《Genetic Programming and Evolvable Machines》2017,18(3):313-349

Cellular Learning Automata (CLAs) are hybrid models obtained from combination of Cellular Automata (CAs) and Learning Automata (LAs). These models can be either open or closed. In closed CLAs, the states of neighboring cells of each cell called local environment affect on the action selection process of the LA of that cell whereas in open CLAs, each cell, in addition to its local environment has an exclusive environment which is observed by the cell only and the global environment which can be observed by all the cells in CLA. In dynamic models of CLAs, one of their aspects such as structure, local rule or neighborhood radius may change during the evolution of the CLA. CLAs can also be classified as synchronous CLAs or asynchronous CLAs. In a synchronous CLA, all LAs in different cells are activated synchronously whereas in an asynchronous CLA, the LAs in different cells are activated asynchronously. In this paper, a new closed asynchronous dynamic model of CLA whose structure and the number of LAs in each cell may vary with time has been introduced. To show the potential of the proposed model, a landmark clustering algorithm for solving topology mismatch problem in unstructured peer-to-peer networks has been proposed. To evaluate the proposed algorithm, computer simulations have been conducted and then the results are compared with the results obtained for two existing algorithms for solving topology mismatch problem. It has been shown that the proposed algorithm is superior to the existing algorithms with respect to communication delay and average round-trip time between peers within clusters. 相似文献

3.

Efficient <Emphasis Type="Italic">l</Emphasis><Subscript><Emphasis Type="Italic">q</Emphasis></Subscript> norm based sparse subspace clustering via smooth IRLS and ADMM

Shenfen Kuang HongYang Chao Jun Yang 《Multimedia Tools and Applications》2017,76(22):23163-23185

Recently, sparse subspace clustering, as a subspace learning technique, has been successfully applied to several computer vision applications, e.g. face clustering and motion segmentation. The main idea of sparse subspace clustering is to learn an effective sparse representation that are used to construct an affinity matrix for spectral clustering. While most of existing sparse subspace clustering algorithms and its extensions seek the forms of convex relaxation, the use of non-convex and non-smooth l _q(0 < q < 1) norm has demonstrated better recovery performance. In this paper we propose an l _q norm based Sparse Subspace Clustering method (lqSSC), which is motivated by the recent work that l _q norm can enhance the sparsity and make better approximation to l ₀ than l ₁. However, the optimization of l _q norm with multiple constraints is much difficult. To solve this non-convex problem, we make use of the Alternating Direction Method of Multipliers (ADMM) for solving the l _q norm optimization, updating the variables in an alternating minimization way. ADMM splits the unconstrained optimization into multiple terms, such that the l _q norm term can be solved via Smooth Iterative Reweighted Least Square (SIRLS), which converges with guarantee. Different from traditional IRLS algorithms, the proposed algorithm is based on gradient descent with adaptive weight, making it well suit for general sparse subspace clustering problem. Experiments on computer vision tasks (synthetic data, face clustering and motion segmentation) demonstrate that the proposed approach achieves considerable improvement of clustering accuracy than the convex based subspace clustering methods. 相似文献

4.

A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet

Farhan Hassan Khan Usman Qamar Saba Bashir 《Knowledge and Information Systems》2017,50(3):851-881

Emergence of MapReduce (MR) framework for scaling data mining and machine learning algorithms provides for Volume, while handling of Variety and Velocity needs to be skilfully crafted in algorithms. So far, scalable clustering algorithms have focused solely on Volume, taking advantage of the MR framework. In this paper we present a MapReduce algorithm—data aware scalable clustering (DASC), which is capable of handling the 3 Vs of big data by virtue of being (i) single scan and distributed to handle Volume, (ii) incremental to cope with Velocity and (iii) versatile in handling numeric and categorical data to accommodate Variety. DASC algorithm incrementally processes infinitely growing data set stored on distributed file system and delivers quality clustering scheme while ensuring recency of patterns. The up-to-date synopsis is preserved by the algorithm for the data seen so far. Each new data increment is processed and merged with the synopsis. Since the synopsis itself may grow very large in size, the algorithm stores it as a file. This makes DASC algorithm truly scalable. Exclusive clusters are obtained on demand by applying connected component analysis (CCA) algorithm over the synopsis. CCA presents subtle roadblock to effective parallelism during clustering. This problem is overcome by accomplishing the task in two stages. In the first stage, hyperclusters are identified based on prevailing data characteristics. The second stage utilizes this knowledge to determine the degree of parallelism, thereby making DASC data aware. Hyperclusters are distributed over the available compute nodes for discovering embedded clusters in parallel. Staged approach for clustering yields dual advantage of improved parallelism and desired complexity in \(\mathcal {MRC}^0\) class. DASC algorithm is empirically compared with incremental Kmeans and Scalable Kmeans++ algorithms. Experimentation on real-world and synthetic data with approximately 1.2 billion data points demonstrates effectiveness of DASC algorithm. Empirical observations of DASC execution are in consonance with the theoretical analysis with respect to stability in resources utilization and execution time. 相似文献

5.

Software cost estimation based on modified <Emphasis Type="Italic">K</Emphasis>-Modes clustering Algorithm

Partha Sarathi Bishnu Vandana Bhattacherjee 《Natural computing》2016,15(3):415-422

Unsupervised technique like clustering may be used for software cost estimation in situations where parametric models are difficult to develop. This paper presents a software cost estimation model based on a modified K-Modes clustering algorithm. The aims of this paper are: first, the modified K-Modes clustering which is an enhancement over the simple K-Modes algorithm using a proper dissimilarity measure for mixed data types, is presented and second, the proposed K-Modes algorithm is applied for software cost estimation. We have compared our modified K-Modes algorithm with existing algorithms on different software cost estimation datasets, and results showed the effectiveness of our proposed algorithm. 相似文献

6.

Mining coterie patterns from Instagram photo trajectories for recommending popular travel routes

Yaxin Yu Yuhai Zhao Ge Yu Guoren Wang 《Frontiers of Computer Science》2017,11(6):1007-1022

Instagram is a popular photo-sharing social application. It is widely used by tourists to record their journey information such as location, time and interest. Consequently, a huge volume of geo-tagged photos with spatio-temporal information are generated along tourist’s travel trajectories. Such Instagram photo trajectories consist of travel paths, travel density distributions, and traveller behaviors, preferences, and mobility patterns. Mining Instagram photo trajectories is thus very useful for many mobile and location-based social applications, including tour guide and recommender systems. However, we have not found any work that extracts interesting group-like travel trajectories from Instagram photos asynchronously taken by different tourists. Motivated by this, we propose a novel concept: coterie, which reveals representative travel trajectory patterns hidden in Instagram photos taken by users at shared locations and paths. Our work includes the discovery of (1) coteries, (2) closed coteries, and (3) the recommendation of popular travel routes based on closed coteries. For this, we first build a statistically reliable trajectory database from Instagram geo-tagged photos. These trajectories are then clustered by the DBSCAN method to find tourist density. Next, we transform each raw spatio-temporal trajectory into a sequence of clusters. All discriminative closed coteries are further identified by a Cluster-Growth algorithm. Finally, distance-aware and conformityaware recommendation strategies are applied on closed coteries to recommend popular tour routes. Visualized demos and extensive experimental results demonstrate the effectiveness and efficiency of our methods. 相似文献

7.

Outlier-eliminated k-means clustering algorithm based on differential privacy preservation

Qingying Yu Yonglong Luo Chuanming Chen Xintao Ding 《Applied Intelligence》2016,45(4):1179-1191

Individual privacy may be compromised during the process of mining for valuable information, and the potential for data mining is hindered by the need to preserve privacy. It is well known that k-means clustering algorithms based on differential privacy require preserving privacy while maintaining the availability of clustering. However, it is difficult to balance both aspects in traditional algorithms. In this paper, an outlier-eliminated differential privacy (OEDP) k-means algorithm is proposed that both preserves privacy and improves clustering efficiency. The proposed approach selects the initial centre points in accordance with the distribution density of data points, and adds Laplacian noise to the original data for privacy preservation. Both a theoretical analysis and comparative experiments were conducted. The theoretical analysis shows that the proposed algorithm satisfies ε-differential privacy. Furthermore, the experimental results show that, compared to other methods, the proposed algorithm effectively preserves data privacy and improves the clustering results in terms of accuracy, stability, and availability. 相似文献

8.

Novel steganographic method based on generalized <Emphasis Type="Italic">K</Emphasis>-distance <Emphasis Type="Italic">N</Emphasis>-dimensional pixel matching

Bingwen?Feng Email author Wei?Lu Wei?Sun Email author 《Multimedia Tools and Applications》2015,74(21):9623-9646

In this paper, a steganographic scheme adopting the concept of the generalized K _d-distance N-dimensional pixel matching is proposed. The generalized pixel matching embeds a B-ary digit (B is a function of K and N) into a cover vector of length N, where the order-d Minkowski distance-measured embedding distortion is no larger than K. In contrast to other pixel matching-based schemes, a N-dimensional reference table is used. By choosing d, K, and N adaptively, an embedding strategy which is suitable for arbitrary relative capacity can be developed. Additionally, an optimization algorithm, namely successive iteration algorithm (SIA), is proposed to optimize the codeword assignment in the reference table. Benefited from the high dimensional embedding and the optimization algorithm, nearly maximal embedding efficiency is achieved. Compared with other content-free steganographic schemes, the proposed scheme provides better image quality and statistical security. Moreover, the proposed scheme performs comparable to state-of-the-art content-based approaches after combining with image models. 相似文献

9.

A hybrid MapReduce-based <Emphasis Type="Italic">k</Emphasis>-means clustering using genetic algorithm for distributed datasets

Ankita Sinha Prasanta K. Jana 《The Journal of supercomputing》2018,74(4):1562-1579

Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means\( ++ \) initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA. 相似文献

10.

A new cognitive filtering approach based on Freeman K3 Neural Networks

João Luís Garcia Rosa Denis R. M. Piazentin 《Applied Intelligence》2016,45(2):363-382

Huge volume of data over several domains demands the development of new more efficient tools for search, analysis, and interpretation. Clustering approaches represent an important step in exploring the internal structure and relationships in datasets. In this study, the cognitively motivated neural network Freeman K₃-set was applied as a filter to preprocess the data, achieving a better clustering performance. We combine K₃ with a variety of clustering algorithms commonly used, and tested its performance using standard UCI datasets and also datasets from social networks. A comprehensive evaluation using a number of cluster validation measures shows significant improvement in the overall performance of the K₃-based clustering method for social data sets, for two types of clustering validation measures. Additionally, K₃ filtering results in transparent representation of data, which leads to improved efficiency of data processing algorithms used. 相似文献

11.

Ultrasonic elastography optimization algorithm based on coded excitation and spatial compounding

Zhihong Zhang Limiao Li Huafu Liu 《Automatic Control and Computer Sciences》2017,51(2):133-140

Traditional elasticity imaging systems use short pulses with low sound power, causing the signal to be attenuated severely in deep zones. On the basis of the coded excitation and spatial composition theorems, an ultrasonic elastography optimization algorithm is proposed in this paper. It takes advantage of coded excitation and spatial compounding such as high peak power and average sound power, suppresses speckle noise, and improves the imaging quality effectively. Specifically, a coded excitation system encodes the long pulses when transmitting, and then decodes the long pulses into short pulses upon receiving. This increases the average sound power of the beam without sacrificing the spatial resolution. A imaging system based on coded excitation can therefore achieve a good signal-to-noise ratio (SNR _e) and contrast-to-noise ratio (CNR _e) in deep zones below the detection surface. The proposed algorithm combines coded excitation with a filter-group based spatial compounding algorithm at the receiving terminal. Finally, experimental results show that the proposed algorithm yields a higher SNR _e and CNR _e than using chirp coded excitation or spatial compounding alone. 相似文献

12.

<Emphasis Type="Italic">α</Emphasis>-Systems of differential inclusions and their unification

V. N. Ushakov S. A. Brykalov G. V. Parshikov 《Automation and Remote Control》2016,77(8):1480-1499

This paper introduces α-systems of differential inclusions on a bounded time interval [t₀, ?] and defines α-weakly invariant sets in [t₀, ?] × ?_n, where ?_n is a phase space of the differential inclusions. We study the problems connected with bringing the motions (trajectories) of the differential inclusions from an α-system to a given compact set M ? ?_n at the moment ? (the approach problems). The issues of extracting the solvability set W ? [t₀, ?] × ?ⁿ in the problem of bringing the motions of an α-system to M and the issues of calculating the maximal α-weakly invariant set W^c ? [t₀, ?] × ?ⁿ are also discussed. The notion of the quasi-Hamiltonian of an α-system (α-Hamiltonian) is proposed, which seems important for the problems of bringing the motions of the α-system to M. 相似文献

13.

Minimizing the maximal weighted lateness of delivering orders between two railroad stations

D. I. Arkhipov A. A. Lazarev

《Automation and Remote Control》

We consider the planning problem for freight transportation between two railroad stations. We are required to fulfill orders (transport cars by trains) that arrive at arbitrary time moments and have different value (weight). The speed of trains moving between stations may be different. We consider problem settings with both fixed and undefined departure times for the trains. For the problem with fixed train departure times we propose an algorithm for minimizing the weighted lateness of orders with time complexity O(qn ² log n) operations, where q is the number of trains and n is the number of orders. For the problem with undefined train departure and arrival times we construct a Pareto optimal set of schedules optimal with respect to criteria wL _max and C _max in O(n ² max{n log n, q log v}) operations, where v is the number of time windows during which the trains can depart. The proposed algorithm allows to minimize both weighted lateness wL _max and total time of fulfilling freight delivery orders C _max. 相似文献

14.

Minimization of the maximal lateness for a single machine

A. A. Lazarev D. I. Arkhipov 《Automation and Remote Control》2016,77(4):656-671

Consideration was given to the classical NP-hard problem 1|r_j|L_max of the scheduling theory. An algorithm to determine the optimal schedule of processing n jobs where the job parameters satisfy a system of linear constraints was presented. The polynomially solvable area of the problem 1|r_j|L_max was expanded. An algorithm was described to construct a Pareto-optimal set of schedules by the criteria L_max and C_max for complexity of O(n³logn) operations. 相似文献

15.

The weight-constrained maximum-density subtree problem and related problems in trees

Sun-Yuan Hsieh Ting-Yu Chou 《The Journal of supercomputing》2010,54(3):366-380

Given a tree T=(V,E) of n nodes such that each node v is associated with a value-weight pair (val _v,w _v), where value val _v is a real number and weight w _v is a non-negative integer, the density of T is defined as \(\frac{\sum_{v\in V}{\mathit{val}}_{v}}{\sum_{v\in V}w_{v}}\). A subtree of T is a connected subgraph (V′,E′) of T, where V′?V and E′?E. Given two integers w _min? and w _max?, the weight-constrained maximum-density subtree problem on T is to find a maximum-density subtree T′=(V′,E′) satisfying w _min?≤∑_v∈V′ w _v≤w _max?. In this paper, we first present an O(w _max? n)-time algorithm to find a weight-constrained maximum-density path in a tree T, and then present an O(w _max? ² n)-time algorithm to find a weight-constrained maximum-density subtree in T. Finally, given a node subset S?V, we also present an O(w _max? ² n)-time algorithm to find a weight-constrained maximum-density subtree in T which covers all the nodes in S. 相似文献

16.

Asymptotic bounds for the rate of colored superimposed codes

V. S. Lebedev 《Problems of Information Transmission》2008,44(2):112-118

We introduce the notion of a q-ary (r ₀, r ₁, ..., r _q?1) superimposed code. We obtain upper and lower asymptotic bounds for the rate of these (colored) codes. 相似文献

17.

Algorithm to determine ε-distance parameter in density based clustering

《Expert systems with applications》2014,41(6):2939-2946

The well known clustering algorithm DBSCAN is founded on the density notion of clustering. However, the use of global density parameter ε-distance makes DBSCAN not suitable in varying density datasets. Also, guessing the value for the same is not straightforward. In this paper, we generalise this algorithm in two ways. First, adaptively determine the key input parameter ε-distance, which makes DBSCAN independent of domain knowledge satisfying the unsupervised notion of clustering. Second, the approach of deriving ε-distance based on checking the data distribution of each dimension makes the approach suitable for subspace clustering, which detects clusters enclosed in various subspaces of high dimensional data. Experimental results illustrate that our approach can efficiently find out the clusters of varying sizes, shapes as well as varying densities. 相似文献

18.

Bimagic Vertex Labelings

M. F. Semeniuta S. N. Nedilko V. N. Nedilko 《Cybernetics and Systems Analysis》2018,54(5):771-778

The notion of the equivalence of vertex labelings on a given graph is introduced. The equivalence of three bimagic labelings for regular graphs is proved. A particular solution is obtained for the problem of the existence of a 1-vertex bimagic vertex labeling of multipartite graphs, namely, for graphs isomorphic with K_{n, n, m}. It is proved that the sequence of bi-regular graphs K_n(ij)?=?((K_n???1???M)?+?K₁)???(u_nu_i)???(u_nu_j) admits 1-vertex bimagic vertex labeling, where u_i, u_j is any pair of non-adjacent vertices in the graph K_n???1???M, u_n is a vertex of K₁, M is perfect matching of the complete graph K_n???1. It is established that if an r-regular graph G of order n is distance magic, then graph G + G has a 1-vertex bimagic vertex labeling with magic constants (n?+?1)(n?+?r)/2?+?n² and (n?+?1)(n?+?r)/2?+?nr. Two new types of graphs that do not admit 1-vertex bimagic vertex labelings are defined. 相似文献

19.

Protecting query privacy with differentially private <Emphasis Type="Italic">k</Emphasis>-anonymity in location-based services

Jinbao?Wang Zhipeng?Cai Yingshu?Li Donghua?Yang Email author Ji?Li Hong?Gao 《Personal and Ubiquitous Computing》2018,22(3):453-469

Nowadays, location-based services (LBS) are facilitating people in daily life through answering LBS queries. However, privacy issues including location privacy and query privacy arise at the same time. Existing works for protecting query privacy either work on trusted servers or fail to provide sufficient privacy guarantee. This paper combines the concepts of differential privacy and k-anonymity to propose the notion of differentially private k-anonymity (DPkA) for query privacy in LBS. We recognize the sufficient and necessary condition for the availability of 0-DPkA and present how to achieve it. For cases where 0-DPkA is not achievable, we propose an algorithm to achieve ??-DPkA with minimized ??. Extensive simulations are conducted to validate the proposed mechanisms based on real-life datasets and synthetic data distributions. 相似文献

20.

Quantum annealing for combinatorial clustering

Vaibhaw Kumar Gideon Bass Casey Tomlin Joseph DulnyIII 《Quantum Information Processing》2018,17(2):39

Clustering is a powerful machine learning technique that groups “similar” data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver “qbsolv.” The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques. 相似文献