首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Given a digraph (or an undirected graph) G=(V,E) with a set V of vertices v with nonnegative real costs w(v), and a set E of edges and a positive integer k, we deal with the problem of finding a minimum cost subset SV such that, for each vertex vVS, there are k vertex-disjoint paths from S to v. In this paper, we show that the problem can be solved by a greedy algorithm in time in a digraph (or in time in an undirected graph), where n=|V| and m=|E|. Based on this, given a digraph and two integers k and ℓ, we also give a polynomial time algorithm for finding a minimum cost subset SV such that for each vertex vVS, there are k vertex-disjoint paths from S to v as well as ℓ vertex-disjoint paths from v to S.  相似文献   

2.
We present in this paper a peptide matching approach to the multiple comparison of a set of protein sequences. This approach consists in looking for all the words that are common to q of these sequences, where q is a parameter.

The comparison between words is done by using as reference an object called a model. In the case of proteins, a model is a product of subsets of the alphabet Σ of the amino acids. These subsets belong to a cover of Σ, that is, their union covers all of Σ. A word is said to be an instance of a model if it belongs to the model.

A further flexibility is introduced in the comparison by allowing for up to e errors in the comparison between a word and a model. These errors may concern gaps or substitutions not allowed by the cover. A word is said to be this time an occurrence of a model if the Levenshtein distance between it and an instance of the model is inferior or equal to e. This corresponds to what we call a Set-Levenshtein distance between the occurrences and the model itself. Two words are said to be similar if there is at least one model of which both are occurrences. In the special case where e = 0, the occurrences of a model are simply its instances. If a model M has occurrences in at least q of the sequences of the set, M is said to occur in the set.

The algorithm presented here is an efficient and exact way of looking for all the models, of a fixed length k or of the greatest possible length kmax, that occur in a set of sequences. It is linear in the total length n of the sequences and proportional to (e + 2)(2e+ 1)ke+1pe+1 gk where k n is a small value in all practical situations, p is the number of sets in the cover and g is related to the latter's degree of nontransitivity.

Models are closely related to what is called a consensus in the biocomputing area, and covers are a good way of representing complex relationships between the amino acids.  相似文献   


3.
This paper outlines an algorithm for optimum linear ordering (OLO) of a weighted parallel graph with O(n log k) worst-case time complexity, and O(n + k log(n/k) log k) expected-case time complexity, where n is the total number of nodes and k is the number of chains in the parallel graph. Next, the two-layer OLO problem is considered, where the goal is to place the nodes linearly in two routing layers minimizing the total wire length. The two-layer problem is shown to subsume the maxcut problem and a befitting heuristic algorithm is proposed. Experimental results on randomly generated samples show that the heuristic algorithm runs very fast and outputs optimum solutions in more than 90% instances.  相似文献   

4.
A Viterbi Algorithm is formally modified to select a set of k state sequences with top a posteriori probabilities, where k is a prespecified positive integer. A hypercube parallel algorithm is then developed along with a performance evaluation.  相似文献   

5.
k-Anonymization with Minimal Loss of Information   总被引:3,自引:0,他引:3  
The technique of k-anonymization allows the releasing of databases that contain personal information while ensuring some degree of individual privacy. Anonymization is usually performed by generalizing database entries. We formally study the concept of generalization, and propose three information-theoretic measures for capturing the amount of information that is lost during the anonymization process. The proposed measures are more general and more accurate than those that were proposed by Meyerson and Williams and Aggarwal et al. We study the problem of achieving k-anonymity with minimal loss of information. We prove that it is NP-hard and study polynomial approximations for the optimal solution. Our first algorithm gives an approximation guarantee of O(ln k) for two of our measures as well as for the previously studied measures. This improves the best-known O(k)-approximation in. While the previous approximation algorithms relied on the graph representation framework, our algorithm relies on a novel hypergraph representation that enables the improvement in the approximation ratio from O(k) to O(ln k). As the running time of the algorithm is O(n2k}), we also show how to adapt the algorithm in in order to obtain an O(k)-approximation algorithm that is polynomial in both n and k.  相似文献   

6.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2005,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

7.
Recently a lot of works have been investigating to find the tenuous groups, i.e., groups with few social interactions and weak relationships among members, for reviewer selection and psycho-educational group formation. However, the metrics (e.g., k-triangle, k-line, and k-tenuity) used to measure the tenuity, require a suitable k value to be specified which is difficult for users without background knowledge. Thus, in this paper we formulate the most tenuous group (MTG) query in terms of the group distance and average group distance of a group measuring the tenuity to eliminate the influence of parameter k on the tenuity of the group. To address the MTG problem, we first propose an exact algorithm, namely MTG-VDIS, which takes priority to selecting those vertices whose vertex distance is large, to generate the result group, and also utilizes effective filtering and pruning strategies. Since MTG-VDIS is not fast enough, we design an efficient exact algorithm, called MTG-VDGE, which exploits the degree metric to sort the vertexes and proposes a new combination order, namely degree and reverse based branch and bound (DRBB). MTG-VDGE gives priority to those vertices with small degree. For a large p, we further develop an approximation algorithm, namely MTG-VDLT, which discards candidate attendees with high degree to reduce the number of vertices to be considered. The experimental results on real datasets manifest that the proposed algorithms outperform existing approaches on both efficiency and group tenuity.  相似文献   

8.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2020,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

9.
k-Anonymity is a method for providing privacy protection by ensuring that data cannot be traced to an individual. In a k-anonymous dataset, any identifying information occurs in at least k tuples. To achieve optimal and practical k-anonymity, recently, many different kinds of algorithms with various assumptions and restrictions have been proposed with different metrics to measure quality. This paper evaluates a family of clustering-based algorithms that are more flexible and even attempts to improve precision by ignoring the restrictions of user-defined Domain Generalization Hierarchies. The evaluation of the new approaches with respect to cost metrics shows that metrics may behave differently with different algorithms and may not correlate with some applications’ accuracy on output data.  相似文献   

10.
Support vector ordinal regression (SVOR) is a recently proposed ordinal regression (OR) algorithm. Despite its theoretical and empirical success, the method has one major bottleneck, which is the high computational complexity. In this brief, we propose a both practical and theoretical guaranteed algorithm, block-quantized support vector ordinal regression (BQSVOR), where we approximate the kernel matrix K with [(K)tilde] that is composed of k 2 constant blocks. We provide detailed theoretical justification on the approximation accuracy of BQSVOR. Moreover, we prove theoretically that the OR problem with the block-quantized kernel matrix [(K)tilde] could be solved by first separating the data samples in the training set into k clusters with kernel k-means and then performing SVOR on the k cluster representatives. Hence, the algorithm leads to an optimization problem that scales only with the number of clusters, instead of the data set size. Finally, experiments on several real-world data sets support the previous analysis and demonstrate that BQSVOR improves the speed of SVOR significantly with guaranteed accuracy.  相似文献   

11.
k-trees have established themselves as useful data structures in pattern recognition. A fundamental operation regarding k-trees is the construction of a k-tree. We present a method to store an object as a set of rays and an algorithm to convert such a set into a k-tree. The algorithm is conceptually simple, works for any k, and builds a k-tree from the rays very fast. It produces a minimal k-tree and does not lead to intermediate storage swell.  相似文献   

12.
针对低阶Markov模型预测精度较差,以及多阶Markov模型预测稀疏率高的问题,提出一种基于Markov模型与轨迹相似度(MMTS)的移动对象位置预测算法。该方法借鉴了Markov模型思想对移动对象的历史轨迹进行建模,并将轨迹相似度作为位置预测的重要因素,以Markov预测模型的预测结果集作为预测候选集,结合相似度因素得出最终预测结果。实验结果表明,与k阶Markov模型相比,该方法的预测性能不会随着训练样本大小及阶数k的变化受到很大的影响,并且在大幅降低k阶Markov模型预测稀疏率的同时将预测精度平均提高了8%以上。所提方法不仅解决了k阶Markov模型的预测稀疏率高及预测精度不足的问题;同时提高了预测的稳定性。  相似文献   

13.
Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n2/p) time for pn/log n.  相似文献   

14.
k近邻(kNN)算法是缺失数据填补的常用算法,但由于需要逐个计算所有记录对之间的相似度,因此其填补耗时较高。为提高算法效率,提出结合局部敏感哈希(LSH)的kNN数据填补算法LSH-kNN。首先,对不存在缺失的完整记录进行局部敏感哈希,为之后查找近似最近邻提供索引;其次,针对枚举型、数值型以及混合型缺失数据分别提出对应的局部敏感哈希方法,对每一条待填补的不完整记录进行局部敏感哈希,按得到的哈希值找到与其疑似相似的候选记录;最后在候选记录中通过逐个计算相似度来找到其中相似程度最高的k条记录,并按照kNN算法对不完整记录进行填补。通过在4个真实数据集上的实验表明,结合局部敏感哈希的kNN填补算法LSH-kNN相对经典的kNN算法能够显著提高填补效率,并且保持准确性基本不变。  相似文献   

15.
针对化工过程数据中存在缺失数据的问题,在保持局部数据结构特征的基础上提出了基于局部加权重构的化工过程数据恢复算法。通过定位缺失的数据点并以符号NaN(Not a Number)标记,将缺失的数据集分为完备数据集和不完备数据集。不完备的数据集按照完整性的大小依次找到它们在完备数据集中相应的k个近邻,根据误差平方和最小的原则,求出k个近邻相应的权值,用k个近邻及相应的权值重构出缺失的数据点。将该算法应用在不同缺失率下的两种化工过程数据中并与望最大化主成分分析(EM-PCA)法和平均值(MA)两种传统的数据恢复算法相比较,该算法的恢复数据误差最小,并且计算速度相比EM-PCA算法平均提高了2倍。实验结果表明,局部加权重构的化工过程数据恢复算法可以有效地对数据进行恢复,提高了数据的利用率,适用于非线性化工过程缺失数据的恢复。  相似文献   

16.
针对大数据环境下K-means聚类算法聚类精度不足和收敛速度慢的问题,提出一种基于优化抽样聚类的K-means算法(OSCK)。首先,该算法从海量数据中概率抽样多个样本;其次,基于最佳聚类中心的欧氏距离相似性原理,建模评估样本聚类结果并去除抽样聚类结果的次优解;最后,加权整合评估得到的聚类结果得到最终k个聚类中心,并将这k个聚类中心作为大数据集聚类中心。理论分析和实验结果表明,OSCK面向海量数据分析相对于对比算法具有更好的聚类精度,并且具有很强的稳健性和可扩展性。  相似文献   

17.
The k-path partition problem is to partition a graph into the minimum number of paths, so that none of them has length more than k, for a given positive integer k. The problem is a generalization of the Hamiltonian path problem and the problem of partitioning a graph into the minimum number of paths. The k-path partition problem remains NP-complete on the class of chordal bipartite graphs if k is part of the input, and we show that it is NP-complete on the class of comparability graphs even for k=3. On the positive side, we present a polynomial-time solution for the problem, with any k, on bipartite permutation graphs, which form a subclass of chordal bipartite graphs.  相似文献   

18.
Given a set of n points on the plane, a symmetric furthest-neighbor (SFN) pair of points p, q is one such that both p and q are furthest from each other among the points in . A pair of points is antipodal if it admits parallel lines of support. In this paper it is shown that a SFN pair of is both a set of extreme points of and an antipodal pair of . It is also shown that an asymmetric furthest-neighbor (ASFN) pair is not necessarily antipodal. Furthermore, if is such that no two distances are equal, it is shown that as many as, and no more than, n/2 pairs of points are SFN pairs. A polygon is unimodal if for each vertex pk, k = 1,…,n the distance function defined by the euclidean distance between pk and the remaining vertices (traversed in order) contains only one local maximum. The fastest existing algorithms for computing all the ASFN or SFN pairs of either a set of points, a simple polygon, or a convex polygon, require 0(n log n) running time. It is shown that the above results lead to an 0(n) algorithm for computing all the SFN pairs of vertices of a unimodal polygon.  相似文献   

19.
It is shown in (Huang and Lin, 1991) that the kth-order robust nonlinear servomechanism problem can be solved by a class of linear controllers called kth-order robust servo-regulator. It is further shown in (Huang, 1995) that the kth-order robust servo-regulator, under one additional condition, also solves the (exact) robust nonlinear servomechanism problem. This paper further shows that the minimal dimension of this class of linear controllers is equal to the degree of the minimal polynomial of the k-fold exosystem multiplied by p, where p is the dimension of the regulated output. This result, coupled with a characterization of the minimal polynomial of the k-fold exosystem leads to a straightforward and efficient procedure to synthesize a minimal dimension linear robust servo-regulator.  相似文献   

20.
ANTS: Agents on Networks, Trees, and Subgraphs   总被引:1,自引:0,他引:1  
Efficient exploration of large networks is a central issue in data mining and network maintenance applications. In most existing work there is a distinction between the active ‘searcher’ which both executes the algorithm and holds the memory and the passive ‘searched graph’ over which the searcher has no control at all. Large dynamic networks like the Internet, where the nodes are powerful computers and the links have narrow bandwidth and are heavily-loaded, call for a different paradigm, in which a noncentralized group of one or more lightweight autonomous agents traverse the network in a completely distributed and parallelizable way. Potential advantages of such a paradigm would be fault tolerance against network and agent failures, and reduced load on the busy nodes due to the small amount of memory and computing resources required by the agent in each node. Algorithms for network covering based on this paradigm could be used in today’s Internet as a support for data mining and network control algorithms. Recently, a vertex ant walk ( ) method has been suggested [I.A. Wagner, M. Lindenbaum, A.M. Bruckstein, Ann. Math. Artificial Intelligence 24 (1998) 211–223] for searching an undirected, connected graph by an a(ge)nt that walks along the edges of the graph, occasionally leaving ‘pheromone’ traces at nodes, and using those traces to guide its exploration. It was shown there that the ant can cover a static graph within time nd, where n is the number of vertices and d the diameter of the graph. In this work we further investigate the performance of the method on dynamic graphs, where edges may appear or disappear during the search process. In particular we prove that (a) if a certain spanning subgraph S is stable during the period of covering, then the method is guaranteed to cover the graph within time nds, where ds is the diameter of S, and (b) if a failure occurs on each edge with probability p, then the expected cover time is bounded from above by nd((logΔ/log(1/p))+((1+p)/(1−p))), where Δ is the maximum vertex degree in the graph. We also show that (c) if G is a static tree then it is covered within time 2n.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号