共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we present an n^ O(k
1-1/d
) -time algorithm for solving the k -center problem in \reals
d
, under L
∈
fty - and L
2
-metrics. The algorithm extends to other metrics, and to the discrete k -center problem. We also describe a simple (1+ɛ) -approximation algorithm for the k -center problem, with running time O(nlog k) + (k/ɛ)^ O(k
1-1/d
) . Finally, we present an n^ O(k
1-1/d
) -time algorithm for solving the L -capacitated k -center problem, provided that L=Ω(n/k
1-1/d
) or L=O(1) .
Received July 25, 2000; revised April 6, 2001. 相似文献
2.
3.
4.
许多应用程序会产生大量的流数据,如网络流、web点击流、视频流、事件流和语义概念流。数据流挖掘已成为热点问题,其目标是从连续不断的流数据中提取隐藏的知识/模式。聚类作为数据流挖掘领域的一个重要问题,在近期被广泛研究。不同于传统的静态数据聚类问题,数据流聚类面临有限内存、一遍扫描、实时响应和概念漂移等许多约束。本文对数据流挖掘中的各种聚类算法进行了总结。首先介绍了数据流挖掘的约束;随后给出了数据流聚类的一般模型,并描述了其与传统数据聚类之间的关联;最后提出数据流聚类领域中进一步的研究热点和研究方向。 相似文献
5.
Clustering data streams has drawn lots of attention in the last few years due to their ever-growing presence. Data streams put additional challenges on clustering such as limited time and memory and one pass clustering. Furthermore, discovering clusters with arbitrary shapes is very important in data stream applications. Data streams are infinite and evolving over time, and we do not have any knowledge about the number of clusters. In a data stream environment due to various factors, some noise appears occasionally. Density-based method is a remarkable class in clustering data streams, which has the ability to discover arbitrary shape clusters and to detect noise. Furthermore, it does not need the nmnber of clusters in advance. Due to data stream characteristics, the traditional density-based clustering is not applicable. Recently, a lot of density-based clustering algorithms are extended for data streams. The main idea in these algorithms is using density- based methods in the clustering process and at the same time overcoming the constraints, which are put out by data streanFs nature. The purpose of this paper is to shed light on some algorithms in the literature on density-based clustering over data streams. We not only summarize the main density-based clustering algorithms on data streams, discuss their uniqueness and limitations, but also explain how they address the challenges in clustering data streams. Moreover, we investigate the evaluation metrics used in validating cluster quality and measuring algorithms' performance. It is hoped that this survey will serve as a steppingstone for researchers studying data streams clustering, particularly density-based algorithms. 相似文献
6.
数据清洗是建立数据仓库及进行数据挖掘的一个重要步骤。数据清洗的核心是检测近似重复记录,而聚类是将相似度高的数据对象聚集到一个类中的分析方法。本文描述的数 据清洗过程就基于聚类分析,它将基于密度的改进聚类算法ICAD应用到数据清洗过程中,该算法通过不断调节密度发现近似重复记录,快速完成大容量数据清洗任务。 相似文献
7.
魏丽 《数字社区&智能家居》2007,(11):637-639
聚类分析技术是数据挖据中的一种重要技术。本文介绍了数据挖掘对聚类的典型要求和聚类方法的分类,研究分析了聚类的主要算法.并从多个方面对这些算法的性能进行比较。 相似文献
8.
9.
10.
11.
魏丽 《数字社区&智能家居》2007,(21)
聚类分析技术是数据挖据中的一种重要技术.本文介绍了数据挖掘对聚类的典型要求和聚类方法的分类,研究分析了聚类的主要算法,并从多个方面对这些算法的性能进行比较. 相似文献
12.
数据流挖掘可有效解决大容量流式数据的知识发现问题,并已得到广泛研究.数据流的一个典型的例子是传感器采集的流式数据.然而,随着传感器网络的应用普及,这些流式数据在很多情况下是分布式采集和管理的,这就必然导致分布式地挖掘数据流的需求.分布式数据流挖掘的最大障碍是由分布式而导致的挖掘质量或者效率问题.为适应分布式数据流的聚类挖掘,探讨了分布式数据流的挖掘模型,并且基于该模型设计了对应的概要数据结构和关键的挖掘算法,给出了算法的理论评估或者实验验证.实验说明,提出的模型和算法可以有效地减少数据通信代价,并且能保证较高的全局模式的聚类质量. 相似文献
13.
The maximum satisfiability problem (MAX-SAT) is stated as follows: Given a boolean formula in CNF, find a truth assignment that satisfies the maximum possible number of its clauses. MAX-SAT is MAX-SNP-complete and received much attention recently. One of the challenges posed by Alber, Gramm and Niedermeier in a recent survey paper asks: Can MAX-SAT be solved in less than 2n “steps”? Here, n is the number of distinct variables in the formula and a step may take polynomial time of the input. We answered this challenge positively by showing that a popular algorithm based on branch-and-bound is bounded by O(2n) in time, where n is the maximum number of occurrences of any variable in the input.When the input formula is in 2-CNF, that is, each clause has at most two literals, MAX-SAT becomes MAX-2-SAT and the decision version of MAX-2-SAT is still NP-complete. The best bound of the known algorithms for MAX-2-SAT is O(m2m/5), where m is the number of clauses. We propose an efficient decision algorithm for MAX-2-SAT whose time complexity is bound by O(n2n). This result is substantially better than the previously known results. Experimental results also show that our algorithm outperforms any algorithm we know on MAX-2-SAT. 相似文献
14.
近年来,随着现实生活中数据量的不断增大,不一致数据的出现也越发频繁,这使得人工修正不一致数据变得更加耗时.而且,人工修正数据方法本身也存在着不可避免的人为操作错误,因此,这种修正方法不再可行.如何不提前修复不一致数据,直接在不一致数据上进行分类,是该文的核心研究内容.对决策树生成算法的目标函数进行改进,使其能够直接对不一致数据进行分类,并得到较好的分类结果.对约束条件中的特征对分类结果的影响进行了多方面衡量,从而调整该特征的影响因子,使得决策树的节点分割更加精确,分类效果更优. 相似文献
15.
随着数据信息的积累,如何从这些海量信息中有效地提取所需要的知识成为当前数据挖掘的重要内容。聚类作为数据挖掘的重要工具,通过将数据划分成多个类,使得类内数据尽可能相似,而类间数据的相似度尽可能小,已成为研究热点。本文总结在数据挖掘中的聚类算法,针对聚类中所存在的问题进行归纳,并对未来的研究进行了展望。 相似文献
16.
Graph homomorphism, also called H-coloring, is a natural generalization of graph coloring: There is a
homomorphism from a graph G to a complete graph on k vertices if and only if G is k-colorable. During recent years the topic
of exact (exponential-time) algorithms for NP-hard problems in general, and for graph coloring in particular, has led to
extensive research. Consequently, it is natural to ask how the techniques developed for exact graph coloring algorithms can
be extended to graph homomorphisms. By the celebrated result of Hell and Nesetril, for each fixed simple graph H, deciding
whether a given simple graph G has a homomorphism to H is polynomial-time solvable if
H is a bipartite graph, and NP-complete otherwise. The case where H is the cycle of length 5, is the first NP-hard case different
from graph coloring. We show that for an odd integer
, whether an input graph G with n vertices is homomorphic to the cycle of length k, can be decided in time
. We extend the results obtained for cycles, which are graphs of treewidth two, to graphs of bounded treewidth as follows:
if H is of treewidth at most t, then whether input graph G with n vertices is homomorphic to H can be decided in time
. 相似文献
17.
针对传统聚类算法在基因表达数据处理中的不足之处,讨论了与计算智能技术相关的两种算法:模糊C均值算法(FCM)和遗传K均值算法(GKA),对FCM算法中类别数c和模糊指数m的选取进行了比较深入的研究,最后用实验数据对各算法性能进行了分析和比较。 相似文献
18.
减少非技术损失(NTL)是实施智能电网所带来的潜在利益的重要组成部分。提出了一种基于智能电表数据的聚类算法来检测窃电和其他原因所导致的非技术性损失。通过对智能电表采集的数据进行聚类,提取正常用电行为的数据原型。然后对待检测数据样本和正常数据的聚类中心之间的距离进行计算,如果距离明显,则将其分类为NTL数据样本。最后对四种不同的异常用电指标进行空间分析,使分类结果更易于可视化。实验表明,基于GA聚类算法的NTL检测方法具有优于同类检测方法的性能。 相似文献
19.
把数据挖掘中K-中心点聚类算法应用于基于客户价值矩阵的客户价值细分中,建立一种零售业客户细分方法,为零售超市客户保持和营销提供决策依据,并用样本进行实验,得出结论. 相似文献
20.
数据挖掘技术中关联规则可以很好地发现数据项之间存在的相互关系,同时有大量的挖掘算法可供选择。聚类分析就是通过分析数据库中的记录数据,根据一定的分类规则合理地划分记录集合,确定每个记录所在类别。它所采用的分类规则是由聚类分析工具决定的。本文研究 聚类和关联规则的挖掘算法。 相似文献