首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
A social stream refers to the data stream that records a series of social entities and the dynamic interactions between two entities. It can be employed to model the changes of entity states in numerous applications. The social streams, the combination of graph and streaming data, pose great challenge to efficient analytical query processing, and are key to better understanding users’ behavior. Considering of privacy and other related issues, a social stream generator is of great significance. A framework of synthetic social stream generator (SSG) is proposed in this paper. The generated social streams using SSG can be tuned to capture several kinds of fundamental social stream properties, including patterns about users’ behavior and graph patterns. Extensive empirical studies with several real-life social stream data sets show that SSG can produce data that better fit to real data. It is also confirmed that SSG can generate social stream data continuously with stable throughput and memory consumption. Furthermore, we propose a parallel implementation of SSG with the help of asynchronized parallel processing model and delayed update strategy. Our experiments verify that the throughput of the parallel implementation can increase linearly by increasing nodes.  相似文献   

2.
A community within a graph can be broadly defined as a set of vertices that exhibit high cohesiveness (relatively high number of edges within the set) and low conductance (relatively low number of edges leaving the set). Community detection is a fundamental graph processing analytic that can be applied to several application domains, including social networks. In this context, communities are often overlapping, as a person can be involved in more than one community (e.g., friends, and family); and evolving, since the structure of the network changes. We address the problem of streaming overlapping community detection, where the goal is to maintain communities in the presence of streaming updates. This way, the communities can be updated more efficiently. To this end, we introduce SONIC—a find-and-merge type of community detection algorithm that can efficiently handle streaming updates. SONIC first detects when graph updates yield significant community changes. Upon the detection, it updates the communities via an incremental merge procedure. The SONIC algorithm incorporates two additional techniques to speed-up the incremental merge; min-hashing and inverted indexes. Results show that SONIC can provide high quality overlapping communities, while handling streaming updates several orders of magnitude faster than the alternatives performing from-scratch computation.  相似文献   

3.
Agile development aims at rapidly developing software while embracing the continuous evolution of user requirements along the whole development process. User stories are the primary means of requirements collection and elicitation in the agile development. A project can involve a large amount of user stories, which should be clustered into different groups based on their functionality’s similarity for systematic requirements analysis, effective mapping to developed features, and efficient maintenance. Nevertheless, the current user story clustering is mainly conducted in a manual manner, which is time-consuming and subjective to human bias. In this paper, we propose a novel approach for clustering the user stories automatically on the basis of natural language processing. Specifically, the sentence patterns of each component in a user story are first analysed and determined such that the critical structure in the representative tasks can be automatically extracted based on the user story meta-model. The similarity of user stories is calculated, which can be used to generate the connected graph as the basis of automatic user story clustering. We evaluate the approach based on thirteen datasets, compared against ten baseline techniques. Experimental results show that our clustering approach has higher accuracy, recall rate and F1-score than these baselines. It is demonstrated that the proposed approach can significantly improve the efficacy of user story clustering and thus enhance the overall performance of agile development. The study also highlights promising research directions for more accurate requirements elicitation.  相似文献   

4.
The rapidly growing amount of newswire stories stored in electronic devices raises new challenges for information retrieval technology. Traditional query-driven retrieval is not suitable for generic queries. It is desirable to have an intelligent system to automatically locate topically related events or topics in a continuous stream of newswire stories. This is the goal of automatic event detection. We propose a new approach to performing event detection from multilingual newswire stories. Unlike traditional methods which employ simple keyword matching, our method makes use of concept terms and named entities such as person, location, and organization names. Concept terms of a story are derived from statistical context analysis between sentences in the news story and stories in the concept database. We have conducted a set of experiments to study the effectiveness of our approach. The results show that the performance of detection using concept terms together with story keywords is better than traditional methods which only use keyword representation. © 2001 John Wiley & Sons, Inc.  相似文献   

5.
Recent years have witnessed increased interests in exploiting automatic annotating techniques for managing and retrieving media contents. Previous studies on automatic annotating usually rely on the metadata which are often unavailable for use. Instead, multimedia contents usually arouse frequent preference-sensitive interactions in the online social networks of public social media platforms, which can be organized in the form of interaction graph for intensive study. Inspired by this observation, we propose a novel media annotating method based on the analytics of streaming social interactions of media content instead of the metadata. The basic assumption of our approach is that different types of social media content may attract latent social group with different preferences, thus generate different preference-sensitive interactions, which could be reflected as localized dense subgraph with clear preferences. To this end, we first iteratively select nodes from streaming records to build the preference-sensitive subgraphs, then uniformly extract several static and topologic features to describe these subgraphs, and finally integrate these features into a learning-to-rank framework for automatic annotating. Extensive experiments on several real-world date sets clearly show that the proposed approach outperforms the baseline methods with a significant margin.  相似文献   

6.
采用设置本地端缓冲服务器的方法提高流传榆质量.在开放型网络英语教学系统中应用流媒体提供QoS的管理功能,解决音视频流缓冲问题,并提供相应机制支持网络环境下的流媒体QoS。实验结果表明,流体系结构较好实现网络教学环境下的流媒体播放,保证音视频流的QoS。采用此流体系结构能较好地实现对流的管理和控制。从而保证多媒体课件的传输质量。  相似文献   

7.
采用设置本地端缓冲服务器的方法提高流传输质量,在开放型网络英语教学系统中应用流媒体提供QoS的管理功能,解决音视频流缓冲问题,并提供相应机制支持网络环境下的流媒体QoS。实验结果表明,流体系结构较好实现网络教学环境下的流媒体播放,保证音视频流的QoS。采用此流体系结构能较好地实现对流的管理和控制,从而保证多媒体课件的传输质量。  相似文献   

8.
Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding “dense subgraph” patterns and combining them with techniques such as node orderings and compact data structures. On those representations, we support out-neighbor and out/in-neighbor queries, as well as mining queries based on the dense subgraphs. First, we propose a compression scheme for Web graphs that reduces edges by representing dense subgraphs with “virtual nodes”; over this scheme, we apply node orderings and other compression techniques. With this approach, we match the best current compression ratios that support out-neighbor queries (i.e., nodes pointed from a given node), using 1.0–1.8 bits per edge (bpe) on large Web graphs, and retrieving each neighbor of a node in 0.6–1.0 microseconds ( \(\upmu \) s). When supporting both out- and in-neighbor queries, instead, our technique generally offers the best time when using little space. If the reduced graph, instead, is represented with a compact data structure that supports bidirectional navigation, we obtain the most compact Web graph representations (0.9–1.5 bpe) that support out/in-neighbor navigation; yet, the time per neighbor extracted raises to around 5–20  \(\upmu \) s. We also propose a compact data structure that represents dense subgraphs without using virtual nodes. It allows us to recover out/in-neighbors and answer other more complex queries on the dense subgraphs identified. This structure is not competitive on Web graphs, but on social networks, it achieves 4–13 bpe and 8–12  \(\upmu \) s per out/in-neighbor retrieved, which improves upon all existing representations.  相似文献   

9.
Zhang  Fan  Zou  Lei  Zeng  Li  Gou  Xiangyang 《World Wide Web》2020,23(2):873-903

A streaming graph is a graph formed by a sequence of incoming edges with time stamps. Unlike the static graphs, the streaming graph is highly dynamic and time-related. Streaming graphs in the real world, which are of the high volume and velocity, can be challenging to the classic graph data structures: data of internet traffic, social network communication, and financial transections, etc. The traditional graph storage models like the adjacency matrix and the adjacency list are no longer sufficient for the large amount data and high frequency updates. And most the streaming graph structures are only supports the specific graph algorithms. Here a new data structure is presented to meet the challenge: a double orthogonal list in hash table (Dolha) as a high speed and high memory efficiency graph structure. Dolha has constant time cost for single edge processing, and near-linear space cost. Moreover, time cost for neighborhood queries in Dolha is linear, which enables it to support most algorithms of graphs without extra cost. A persistent structure based on Dolha is also presented, to handle the sliding window update and time related queries.

  相似文献   

10.
Periodic subgraph mining in dynamic networks   总被引:1,自引:1,他引:1  
In systems of interacting entities such as social networks, interactions that occur regularly typically correspond to significant, yet often infrequent and hard to detect, interaction patterns. To identify such regular behavior in streams of dynamic interaction data, we propose a new mining problem of finding a minimal set of periodically recurring subgraphs to capture all periodic behavior in a dynamic network. We analyze the computational complexity of the problem and show that it is polynomial, unlike many related subgraph or itemset mining problems. We propose an efficient and scalable algorithm to mine all periodic subgraphs in a dynamic network. The algorithm makes a single pass over the data and is also capable of accommodating imperfect periodicity. We demonstrate the applicability of our approach on several real-world networks and extract interesting and insightful periodic interaction patterns. We also show that periodic subgraphs can be an effective way to uncover and characterize the natural periodicities in a system.  相似文献   

11.
Advances in technology coupled with the availability of low‐cost sensors have resulted in the continuous generation of large time series from several sources. In order to visually explore and compare these time series at different scales, analysts need to execute online analytical processing (OLAP) queries that include constraints and group‐by's at multiple temporal hierarchies. Effective visual analysis requires these queries to be interactive. However, while existing OLAP cube‐based structures can support interactive query rates, the exponential memory requirement to materialize the data cube is often unsuitable for large data sets. Moreover, none of the recent space‐efficient cube data structures allow for updates. Thus, the cube must be re‐computed whenever there is new data, making them impractical in a streaming scenario. We propose Time Lattice, a memory‐efficient data structure that makes use of the implicit temporal hierarchy to enable interactive OLAP queries over large time series. Time Lattice is a subset of a fully materialized cube and is designed to handle fast updates and streaming data. We perform an experimental evaluation which shows that the space efficiency of the data structure does not hamper its performance when compared to the state of the art. In collaboration with signal processing and acoustics research scientists, we use the Time Lattice data structure to design the Noise Profiler, a web‐based visualization framework that supports the analysis of noise from cities. We demonstrate the utility of Noise Profiler through a set of case studies.  相似文献   

12.
Clustering entities into dense parts is an important issue in social network analysis. Real social networks usually evolve over time and it remains a problem to efficiently cluster dynamic social networks. In this paper, a dynamic social network is modeled as an initial graph with an infinite change stream, called change stream model, which naturally eliminates the parameter setting problem of snapshot graph model. Based on the change stream model, the incremental version of a well known k-clique clustering problem is studied and incremental k-clique clustering algorithms are proposed based on local DFS (depth first search) forest updating technique. It is theoretically proved that the proposed algorithms outperform corresponding static ones and incremental spectral clustering algorithm in terms of time complexity. The practical performances of our algorithms are extensively evaluated and compared with the baseline algorithms on ENRON and DBLP datasets. Experimental results show that incremental k-clique clustering algorithms are much more efficient than corresponding static ones, and have no accumulating errors that incremental spectral clustering algorithm has and can capture the evolving details of the clusters that snapshot graph model based algorithms miss.  相似文献   

13.
频繁子图挖掘是数据挖掘领域的一个重要问题,并且有着广泛的应用。在Hadoop平台上实现了一种基于MapReduce的高效频繁子图挖掘算法Cloud-GFSG(cloud-global frequent subgraph)。该算法基于Apriori思想,在扩展边生成新的子图时,使用已经挖掘出的k-1阶的频繁子图生成k阶的频繁子图。同时,检查是否存在待扩展生成的子图,设定生成的频繁子图表示规则,保证了频繁子图信息的唯一性。较同类算法相比,该算法在挖掘频繁子图时更具通用性,并且在扩展边时避免产生大量的复制图,从而使得算法的正确性得以保证,且运行效率显著提高。  相似文献   

14.
An efficient overlay is a crucial component of wireless cooperative live video streaming networks—an emerging wireless streaming solution with ever-increasing storage and computation capabilities, and provides scalability, autonomy, carrier-billing network bandwidth conservation, service coverage extension, etc. Based on whether routes are pre-calculated and maintained, or determined per-hop in reactive to each data piece, the streaming overlay can be classified as either unstructured, structured, or hybrid. We discuss issues, properties and example approaches of each category in detail, and present quantitative and qualitative comparisons on their strengths and weaknesses in terms of system robustness, overlay maintenance complexity, delivery ratio, end-to-end delay, etc. Finally we discuss some open issues and emerging areas regarding overlay construction.  相似文献   

15.
16.
彭慧丽  张啸剑  金凯忠 《计算机科学》2017,44(Z6):395-398, 423
基于用户朋友关系的社交网络项目推荐技术可能泄露用户-项目隐私偏好。传统的匿名化方法由于过分依赖特定知识背景假设 而存在内在的脆弱性。提出一种基于差分隐私的社交网络项目推荐方法DPSR,该方法利用聚类技术对用户进行划分,利用拉普拉斯机制对用户-项目边的权重进行扰动。为了克服边权重中异常点对推荐结果的影响,提出了一种基于k-中心点的边权重聚类方法,该方法利用指数机制挑选出类中边权重集合的中位数。实验结果表明,DPSR优于同类方法。  相似文献   

17.
Named entity recognition (NER) methods have been regarded as an efficient strategy to extract relevant entities for answering a given query. The aim of this work is to exploit the conventional NER methods for analyzing a large set of microtexts of which lengths are short. Particularly, the microtexts are streaming on online social media, e.g., Twitter. To do so, this paper proposes three properties of contextual association among the microtexts to discover contextual clusters of the microtexts, which can be expected to improve the performance of NER tasks. As a case study, we have applied the proposed NER system to Twitter. Experimental results demonstrate the feasibility of the proposed method (around 90.3% of precision) for extracting relevant information in online social network applications.  相似文献   

18.
19.
流编程模型是一种近年来被广泛研究的并行编程模型,它在基于软件管理的流式存储器,如流寄存器文件的流体系结构上得到了良好的应用.但同时也有研究指出流编程模型同样适合于基于硬件管理的一致性cache的体系结构.流编程模型目前最重要的应用背景GPGPU在发展中也逐渐引入通用的数据cache,因此发掘流程序的cache局部性就成为在这类体系结构上提高流程序性能的关键.由于流程序特殊的执行模型,其重用向局部性转化的过程与传统的串行程序不一致,无法直接使用传统的局部性分析方法直接对流程序进行分析.在深入分析了重用向局部性转化过程的基础上,提出了"迭代序"的概念用于描述流和串行程序重用向局部性转化时的不同,同时结合流程序的执行特点面向并行扩展了传统的局部性分析理论,给出了基于迭代序的局部性分析方法.此外,结合局部性分析模型还提出了两种流程序的cache局部性优化方法.在GPGPUSim模拟平台上进行的验证结果表明对流程序局部性的定量分析是有效的,并且提出的优化方法也可以有效改善流程序的cache局部性,提高流程序的性能.  相似文献   

20.
Since today’s real-world graphs, such as social network graphs, are evolving all the time, it is of great importance to perform graph computations and analysis in these dynamic graphs. Due to the fact that many applications such as social network link analysis with the existence of inactive users need to handle failed links or nodes, decremental computation and maintenance for graphs is considered a challenging problem. Shortest path computation is one of the most fundamental operations for managing and analyzing large graphs. A number of indexing methods have been proposed to answer distance queries in static graphs. Unfortunately, there is little work on answering such queries for dynamic graphs. In this paper, we focus on the problem of computing the shortest path distance in dynamic graphs, particularly on decremental updates (i.e., edge deletions). We propose maintenance algorithms based on distance labeling, which can handle decremental updates efficiently. By exploiting properties of distance labeling in original graphs, we are able to efficiently maintain distance labeling for new graphs. We experimentally evaluate our algorithms using eleven real-world large graphs and confirm the effectiveness and efficiency of our approach. More specifically, our method can speed up index re-computation by up to an order of magnitude compared with the state-of-the-art method, Pruned Landmark Labeling (PLL).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号