首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns.  相似文献   

2.
Given a simple undirected graph G = (V, E) and an integer k < |V|, the Sparsest k-Subgraph problem asks for a set of k vertices which induces the minimum number of edges. As a generalization of the classical independent set problem, Sparsest k-Subgraph is ????-hard and even not approximable unless ?????? in general graphs. Thus, we investigate Sparsest k-Subgraph in graph classes where independent set is polynomial-time solvable, such as subclasses of perfect graphs. Our two main results are the ????-hardness of Sparsest k-Subgraph on chordal graphs, and a greedy 2-approximation algorithm. Finally, we also show how to derive a P T A S for Sparsest k-Subgraph on proper interval graphs.  相似文献   

3.
In this paper, we identify and solve a multi-join optimization problem for Arbitrary Feature-based social image Similarity JOINs(AFS-JOIN). Given two collections(i.e., R and S) of social images that carry both visual, spatial and textual(i.e., tag) information, the multiple joins based on arbitrary features retrieves the pairs of images that are visually, textually similar or spatially close from different users. To address this problem, in this paper, we have proposed three methods to facilitate the multi-join processing: 1) two baseline approaches(i.e., a naïve join approach and a maximal threshold(MT)-based), and 2) a Batch Similarity Join(BSJ) method. For the BSJ method, given m users’ join requests, they are first conversed and grouped into m″ clusters which correspond to m″ join boxes, where m > m″. To speedup the BSJ processing, a feature distance space is first partitioned into some cubes based on four segmentation schemes; the image pairs falling in the cubes are indexed by the cube tree index; thus BSJ processing is transformed into the searching of the image pairs falling in some affected cubes for m″ AFS-JOINs with the aid of the index. An extensive experimental evaluation using real and synthetic datasets shows that our proposed BSJ technique outperforms the state-of-the-art solutions.  相似文献   

4.
The k-truss of a graph is the largest edge-induced subgraph such that every edge is contained in at least k triangles within the subgraph, where a triangle is a cycle consisting of three vertices. As a new notion of cohesive subgraphs, truss has recently attracted a lot of research attentions in the database and data mining fields. At the same time, uncertainty is an intrinsic property of massive graph data, and truss decomposition (i.e., finding all k-trusses of a graph) has become a key primitive on uncertain graphs. In this paper, we study the truss decomposition problem on uncertain graphs, that is, finding all highly probable k-trusses of an uncertain graph. We first give an formal statement of the truss decomposition problem on uncertain graphs. Then, we prove that the truss decomposition of an uncertain graph attains two elegant properties, namely uniqueness and hierarchy. We show that the truss decomposition of an uncertain graph can be found in \(O(m^{1.5}Q)\) time by proposing an in-memory algorithm called \(\mathtt {TD_{mem}}\), where m is the number of edges of the uncertain graph, and Q is at most the maximum number of common neighbors of the endpoints of an edge. When an uncertain graph is too large to fit into main memory, we propose an external-memory algorithm \(\mathtt {TD_{I/O}}\) to find the truss decomposition of the uncertain graph. Extensive experiments have been carried out to evaluate the practical performance of the proposed algorithms. The experimental results verify that both \(\mathtt {TD_{mem}}\) and \(\mathtt {TD_{I/O}}\) are efficient when an uncertain graph is small enough to fit into main memory, and that \(\mathtt {TD_{I/O}}\) is much faster than \(\mathtt {TD_{mem}}\) when the graph is too large to fit into main memory.  相似文献   

5.
How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A k-core is the maximal subgraph in which all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to k-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum k such that the k-core exists) obeys a 3-to-1 power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to 12 \(\times \) faster than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17 \(\times \) faster than its competitors with comparable accuracy.  相似文献   

6.
Despite many algorithms for embedding graphs on unbounded grids, only a few results on embedding graphs on restricted grids have been published. In this paper, we study the problem of embedding paths and cycles on solid grid graphs. We show that a cycle of length k is unit-length embeddable on a solid grid graph G if k is an even integer between four and the length of the longest cycle of G. In addition, our result shows that a path of length k is unit-length embeddable on G, between its two given vertices s and t, if \(k\le L\) and \(k\equiv L (\mathrm{mod}\ 2)\), in which L is the length of the longest path between s and t. Our presented two algorithms show that such embeddings can be found in linear time for cycles and quadratic time for paths, with respect to the size of graph G. In the case of rectangular grid graphs, the running time of the algorithms can be improved to O(k) and O\((k^2)\), respectively. In addition, we extend our results to \(m\times n\times o\) 3D grids. A application of our result is in the interconnection network mapping in parallel processing.  相似文献   

7.
An outer-connected dominating set in a graph G = (V, E) is a set of vertices D ? V satisfying the condition that, for each vertex v ? D, vertex v is adjacent to some vertex in D and the subgraph induced by V?D is connected. The outer-connected dominating set problem is to find an outer-connected dominating set with the minimum number of vertices which is denoted by \(\tilde {\gamma }_{c}(G)\). In this paper, we determine \(\tilde {\gamma }_{c}(S(n,k))\), \(\tilde {\gamma }_{c}(S^{+}(n,k))\), \(\tilde {\gamma }_{c}(S^{++}(n,k))\), and \(\tilde {\gamma }_{c}(S_{n})\), where S(n, k), S +(n, k), S ++(n, k), and S n are Sierpi\(\acute {\mathrm {n}}\)ski-like graphs.  相似文献   

8.
The ambiguity of a nondeterministic finite automaton (NFA) N for input size n is the maximal number of accepting computations of N for inputs of size n. For every natural number k we construct a family \((L_{r}^{k}\;|\;r\in \mathbb{N})\) of languages which can be recognized by NFA’s with size k?poly(r) and ambiguity O(n k ), but \(L_{r}^{k}\) has only NFA’s with size exponential in r, if ambiguity o(n k ) is required. In particular, a hierarchy for polynomial ambiguity is obtained, solving a long standing open problem (Ravikumar and Ibarra, SIAM J. Comput. 19:1263–1282, 1989, Leung, SIAM J. Comput. 27:1073–1082, 1998).  相似文献   

9.
In this paper we consider both the maximization variant Max Rep and the minimization variant Min Rep of the famous Label Cover problem. So far the best approximation ratios known for these two problems were \(O(\sqrt{n})\) and indeed some authors suggested the possibility that this ratio is the best approximation factor for these two problems. We show, in fact, that there are a O(n 1/3)-approximation algorithm for Max Rep and a O(n 1/3log?2/3 n)-approximation algorithm for Min Rep. In addition, we also exhibit a randomized reduction from Densest k-Subgraph to Max Rep, showing that any approximation factor for Max Rep implies the same factor (up to a constant) for Densest k-Subgraph.  相似文献   

10.
The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time.  相似文献   

11.
The Planar Feedback Vertex Set problem asks whether an n-vertex planar graph contains at most k vertices meeting all its cycles. The Face Cover problem asks whether all vertices of a plane graph G lie on the boundary of at most k faces of G. Standard techniques from parameterized algorithm design indicate that both problems can be solved by sub-exponential parameterized algorithms (where k is the parameter). In this paper we improve the algorithmic analysis of both problems by proving a series of combinatorial results relating the branchwidth of planar graphs with their face cover. Combining this fact with duality properties of branchwidth, allows us to derive analogous results on feedback vertex set. As a consequence, it follows that Planar Feedback Vertex Set and Face Cover can be solved in \(O(2^{15.11\cdot\sqrt{k}}+n^{2})\) and \(O(2^{10.1\cdot\sqrt {k}}+n^{2})\) steps, respectively.  相似文献   

12.
In its simplest form, the longest common substring problem is to find a longest substring common to two or multiple strings. Using (generalized) suffix trees, this problem can be solved in linear time and space. A first generalization is the k -common substring problem: Given m strings of total length n, for all k with 2≤km simultaneously find a longest substring common to at least k of the strings. It is known that the k-common substring problem can also be solved in O(n) time (Hui in Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pp. 230–243, Springer, Berlin, 1992). A further generalization is the k -common repeated substring problem: Given m strings T (1),T (2),…,T (m) of total length n and m positive integers x 1,…,x m , for all k with 1≤km simultaneously find a longest string ω for which there are at least k strings \(T^{(i_{1})},T^{(i_{2})},\ldots,T^{(i_{k})}\) (1≤i 1<i 2<???<i k m) such that ω occurs at least \(x_{i_{j}}\) times in \(T^{(i_{j})}\) for each j with 1≤jk. (For x 1=???=x m =1, we have the k-common substring problem.) In this paper, we present the first O(n) time algorithm for the k-common repeated substring problem. Our solution is based on a new linear time algorithm for the k-common substring problem.  相似文献   

13.
Let \(H_{1}, H_{2},\ldots ,H_{n}\) be separable complex Hilbert spaces with \(\dim H_{i}\ge 2\) and \(n\ge 2\). Assume that \(\rho \) is a state in \(H=H_1\otimes H_2\otimes \cdots \otimes H_n\). \(\rho \) is called strong-k-separable \((2\le k\le n)\) if \(\rho \) is separable for any k-partite division of H. In this paper, an entanglement witnesses criterion of strong-k-separability is obtained, which says that \(\rho \) is not strong-k-separable if and only if there exist a k-division space \(H_{m_{1}}\otimes \cdots \otimes H_{m_{k}}\) of H, a finite-rank linear elementary operator positive on product states \(\Lambda :\mathcal {B}(H_{m_{2}}\otimes \cdots \otimes H_{m_{k}})\rightarrow \mathcal {B}(H_{m_{1}})\) and a state \(\rho _{0}\in \mathcal {S}(H_{m_{1}}\otimes H_{m_{1}})\), such that \(\mathrm {Tr}(W\rho )<0\), where \(W=(\mathrm{Id}\otimes \Lambda ^{\dagger })\rho _{0}\) is an entanglement witness. In addition, several different methods of constructing entanglement witnesses for multipartite states are also given.  相似文献   

14.
The effect of three-spin interaction k on thermal entanglement between alternate qubits is studied using pairwise concurrence C and energy-level diagram. It is found that k breaks the symmetry about the effect of magnetic field h on C. It shifts a dip structure and gradually effaces a boot structure when \(\left| k \right| <\left| J \right| \) (J is spin exchange coupling). A peak with C maintains 1 appears and expands, elbowing the dip backwards when \(\left| k \right| >\left| J \right| \). A sudden change in the concurrence occurs around \(\left| k \right| =\left| J \right| \), \(h=-k\). Similar conclusions about nearest-neighbor qubits are directly given.  相似文献   

15.
Non-negative matrix factorization (NMF) is widely used in feature extraction and dimension reduction fields. Essentially, it is an optimization problem to determine two non-negative low rank matrices \(W_{m \times k}\) and \(H_{k \times n}\) for a given matrix \(A_{m \times n}\), satisfying \(A_{m \times n} \approx W_{m \times k}H_{k \times n}\). In this paper, a novel approach to improve the image decomposing and reconstruction effects by introducing the Singular Value Decomposing (SVD)-based initialization scheme of factor matrices W and H, and another measure called choosing rule to determine the optimum value of factor rank k, are proposed. The input image is first decomposed using SVD to get its singular values and corresponding eigenvectors. Then, the number of main components as the rank value k is extracted. Then, the singular values and corresponding eigenvectors are used to initialize W and H based on selected rank k. Finally, convergent results are obtained using multiplicative and additive update rules. However, iterative NMF algorithms’ convergence is very slow on most platforms limiting its practicality. To this end, a parallel implementation frame of described improved NMF algorithm using CUDA, a tool for algorithms parallelization on massively parallel processors, i.e., many-core graphics processors, is presented. Experimental results show that our approach can get better decomposing effect than traditional NMF implementations and dramatic accelerate rate comparing to serial schemes as well as existing distributed-system implementations.  相似文献   

16.
Mutually independent Hamiltonian cycles in dual-cubes   总被引:1,自引:0,他引:1  
The hypercube family Q n is one of the most well-known interconnection networks in parallel computers. With Q n , dual-cube networks, denoted by DC n , was introduced and shown to be a (n+1)-regular, vertex symmetric graph with some fault-tolerant Hamiltonian properties. In addition, DC n ’s are shown to be superior to Q n ’s in many aspects. In this article, we will prove that the n-dimensional dual-cube DC n contains n+1 mutually independent Hamiltonian cycles for n≥2. More specifically, let v i V(DC n ) for 0≤i≤|V(DC n )|?1 and let \(\langle v_{0},v_{1},\ldots ,v_{|V(\mathit{DC}_{n})|-1},v_{0}\rangle\) be a Hamiltonian cycle of DC n . We prove that DC n contains n+1 Hamiltonian cycles of the form \(\langle v_{0},v_{1}^{k},\ldots,v_{|V(\mathit{DC}_{n})|-1}^{k},v_{0}\rangle\) for 0≤kn, in which v i k v i k whenever kk′. The result is optimal since each vertex of DC n has only n+1 neighbors.  相似文献   

17.
In secure delegatable computation, computationally weak devices (or clients) wish to outsource their computation and data to an untrusted server in the cloud. While most earlier work considers the general question of how to securely outsource any computation to the cloud server, we focus on concrete and important functionalities and give the first protocol for the pattern matching problem in the cloud. Loosely speaking, this problem considers a text T that is outsourced to the cloud \({\textsc {S}}\) by a sender \({\textsc {SEN}}\). In a query phase, receivers \({\textsc {REC}}_1, \ldots , {\textsc {REC}}_l\) run an efficient protocol with the server \({\textsc {S}}\) and the sender \({\textsc {SEN}}\) in order to learn the positions at which a pattern of length m matches the text (and nothing beyond that). This is called the outsourced pattern matching problem which is highly motivated in the context of delegatable computing since it offers storage alternatives for massive databases that contain confidential data (e.g., health-related data about patient history). Our constructions are simulation-based secure in the presence of semi-honest and malicious adversaries (in the random oracle model) and limit the communication in the query phase to O(m) bits plus the number of occurrences—which is optimal. In contrast to generic solutions for delegatable computation, our schemes do not rely on fully homomorphic encryption but instead use novel ideas for solving pattern matching, based on a reduction to the subset sum problem. Interestingly, we do not rely on the hardness of the problem, but rather we exploit instances that are solvable in polynomial time. A follow-up result demonstrates that the random oracle is essential in order to meet our communication bound.  相似文献   

18.
Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers’ sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.  相似文献   

19.
We present two parameterized algorithms for the Minimum Fill-in problem, also known as Chordal Completion: given an arbitrary graph G and integer k, can we add at most k edges to G to obtain a chordal graph? Our first algorithm has running time \(\mathcal {O}(k^{2}nm+3.0793^{k})\), and requires polynomial space. This improves the base of the exponential part of the best known parameterized algorithm time for this problem so far. We are able to improve this running time even further, at the cost of more space. Our second algorithm has running time \(\mathcal {O}(k^{2}nm+2.35965^{k})\) and requires \(\mathcal {O}^{\ast}(1.7549^{k})\) space. To achieve these results, we present a new lemma describing the edges that can safely be added to achieve a chordal completion with the minimum number of edges, regardless of k.  相似文献   

20.
In the framework of parameterized complexity, exploring how one parameter affects the complexity of a different parameterized (or unparameterized problem) is of general interest. A well-developed example is the investigation of how the parameter treewidth influences the complexity of (other) graph problems. The reason why such investigations are of general interest is that real-world input distributions for computational problems often inherit structure from the natural computational processes that produce the problem instances (not necessarily in obvious, or well-understood ways). The max leaf number ml(G) of a connected graph G is the maximum number of leaves in a spanning tree for G. Exploring questions analogous to the well-studied case of treewidth, we can ask: how hard is it to solve 3-Coloring, Hamilton Path, Minimum Dominating Set, Minimum Bandwidth or many other problems, for graphs of bounded max leaf number? What optimization problems are W[1]-hard under this parameterization? We do two things:
  1. (1)
    We describe much improved FPT algorithms for a large number of graph problems, for input graphs G for which ml(G)≤k, based on the polynomial-time extremal structure theory canonically associated to this parameter. We consider improved algorithms both from the point of view of kernelization bounds, and in terms of improved fixed-parameter tractable (FPT) runtimes O *(f(k)).
     
  2. (2)
    The way that we obtain these concrete algorithmic results is general and systematic. We describe the approach, and raise programmatic questions.
     
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号