首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a novel collaborative approach for document classification, combining the knowledge of multiple users for improved organization of data such as individual document repositories or emails. To this end, we distribute locally built classification models in a network of participating users, and combine the shared classifiers into more powerful meta models. In order to increase the propagation efficiency, we apply a method for selecting the most discriminative model components and transmitting them to other participants. In our experiments on four large standard collections for text classification we study the resulting tradeoffs between network cost and classification accuracy. The experimental results show that the proposed model propagation has negligible communication costs and substantially outperforms current approaches with respect to efficiency and classification quality.  相似文献   

2.
Model-driven engineering (MDE) is increasingly accepted in industry as an effective approach for managing the full life cycle of software development. In MDE, software models are manipulated, evolved and translated by model transformations (MT), up to code generation. Automatic deductive verification techniques have been proposed to guarantee that transformations satisfy correctness requirements (encoded as transformation contracts). However, to be transferable to industry, these techniques need to be scalable and provide the user with easily accessible feedback. In MT-specific languages like ATL, we are able to infer static trace information (i.e., mappings among types of generated elements and rules that potentially generate these types). In this paper, we show that this information can be used to decompose the MT contract and, for each sub-contract, slice the MT to the only rules that may be responsible for fulfilling it. Based on this contribution, we design a fault localization approach for MT, and a technique to significantly enhance scalability when verifying large MTs against a large number of contracts. We implement both these algorithms as extensions of the VeriATL verification system, and we show by experimentation that they increase its industry readiness.  相似文献   

3.
Performance heterogeneous multicore processors (HMP for brevity) consisting of multiple cores with the same instruction set but different performance characteristics (e.g., clock speed, issue width), are of great concern since they are able to deliver higher performance per watt and area for programs with diverse architectural requirements than comparable homogeneous ones. However, such power and area efficiencies of performance heterogeneous multicore systems can only be achieved when workloads are matched with cores according to both the properties of the workload and the features of the cores.  相似文献   

4.
Epidemic-style (gossip-based) techniques have recently emerged as a class of scalable and reliable protocols for peer-to-peer multicast dissemination in large process groups. However, popular implementations of epidemic-style dissemination suffer from two major drawbacks: 1) Network overhead: when deployed on a WAN-wide or VPN-wide scale, they generate a large number of packets that transit across the boundaries of multiple network domains (e.g., LANs, subnets, ASs), causing an overload on core network elements such as bridges, routers, and associated links. 2) Lack of adaptivity: they impose the same load on process group members and the network even under reduced failure rates (viz., packet losses, process failures). In this paper, we describe two protocols to address these problems: 1) a hierarchical gossiping protocol and 2) an adaptive dissemination framework (for multicasts) that allows use of any gossiping primitive within it. These protocols work within a virtual peer-to-peer hierarchy called the leaf box hierarchy. Processes can be allocated in a topologically aware manner to the leaf boxes of this structure, so that protocols 1 and 2 produce low traffic across domain boundaries in the network and induce minimal overhead when there are no failures.  相似文献   

5.
RDF Site Summaries constitute an application of RDF on the Web that has considerably grown in popularity. However, the way RSS systems operate today limits their scalability. Current RSS feed arregators follow a pull-based architecture model, which is not going to scale with the increasing number of RSS feeds becoming available on the Web. In this paper, we introduce G-ToPSS, a scalable publish/subscribe system for selective information dissemination. G-ToPSS only sends newly updated information to the interested user and follows a push-based architecture model. G-ToPSS is particularly well suited for applications that deal with large-volume content distribution from diverse sources. G-ToPSS allows use of an ontology as a way to provide additional information about the data disseminated. We have implemented and experimentally evaluated G-ToPSS and we provide results demonstrating its scalability compared to alternative approaches. In addition, we describe an application of G-ToPSS and RSS to a Web-based content management system that provides an expressive, efficient, and convenient update notification dissemination system.  相似文献   

6.
In order to achieve high efficiency of classification in intrusion detection, a compressed model is proposed in this paper which combines horizontal compression with vertical compression. OneR is utilized as horizontal compression for attribute reduction, and affinity propagation is employed as vertical compression to select small representative exemplars from large training data. As to be able to computationally compress the larger volume of training data with scalability, MapReduce based parallelization approach is then implemented and evaluated for each step of the model compression process abovementioned, on which common but efficient classification methods can be directly used. Experimental application study on two publicly available datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the classification using the compressed model proposed can effectively speed up the detection procedure at up to 184 times, most importantly at the cost of a minimal accuracy difference with less than 1% on average.  相似文献   

7.
In this paper, we emphasize the importance of efficient debugging in formal verification and present capabilities that we have developed in order to aid debugging in Intel’s Formal Verification Environment. We have given the name “Counter-Example Wizard” to the bundle of capabilities that we have developed to address the needs of the verification engineer in the context of counter-example diagnosis and rectification. The novel features of the Counter-Example Wizard are the multi-value counter-example annotation, constraint-based debugging, and multiple counter-example generation mechanisms. Our experience with the verification of real-life Intel designs shows that these capabilities complement one another and can help the verification engineer diagnose and fix a reported failure. We use real-life verification cases to illustrate how our system solution can significantly reduce the time spent in the loop of model checking, specification, and design modification. Published online: 21 February 2003  相似文献   

8.
9.
10.
Graphs are natural candidates for modeling application domains, such as social networks, pattern recognition, citation networks, or protein–protein interactions. One of the most challenging tasks in managing graphs is subgraph matching over data graphs, which attempts to find one-to-one correspondences, called solutions, among the query and data nodes. To compute solutions, most contemporary techniques use backtracking and recursion. An open research question is whether graphs can be matched based on parts and local solutions can be combined to reach a global matching. In this paper, we present an approach based on graph decomposition called SGMatch to match graphs. We represent graphs in smaller units called graphlets and develop a matching technique to leverage this representation. Pruning strategies use a new notion of edge covering called minimum hub cover and metadata, such as statistics and inverted indices, to reduce the number of matching candidates. Our evaluation of SGMatch versus contemporary algorithms, i.e., VF2, GraphQL, QuickSI, GADDI, or SPath, shows that SGMatch substantially improves the performance of current state-of-the-art techniques for larger query graphs with different structures, i.e., cliques, paths or subgraphs.  相似文献   

11.
Debugging a program can be viewed as performing queries and updates on a database that contains program source information as well as the state of the executing program. This approach integrates the facilities of a traditional debugger into a programming environment by providing access to runtime information through normal database query operations. We are building a programming environment in which all program information is stored in a relational database system. This system will include capabilities to provide the programmer a simple yet powerful mechanism for describing debugging requests.  相似文献   

12.
Meeyoung  W.  Jennifer  Aman  Sue 《Computer Networks》2009,53(16):2825-2839
There is a growing need for large-scale distribution of realtime multicast data such as Internet TV channels and scientific and financial data. Internet Service Providers (ISPs) face an urgent challenge in supporting these services; they need to design multicast routing paths that are reliable, cost-effective, and scalable. To meet the realtime constraint, the routing paths need to be robust against a single IP router or link failure, as well as multiple such failures due to sharing fiber spans (SRLGs). Several algorithms have been proposed to solve this problem in the past. However, they are not suitable for today’s large networks, because either they do not find a feasible solution at all or if they do, they take a significant amount of time to arrive at high-quality solutions.In this paper, we present a new Integer Linear Programming (ILP) model for designing a cost-effective and robust multicast infrastructure. Our ILP model is extremely efficient and can be extended to produce quality-guaranteed network paths. We develop two heuristic algorithms for solving the ILP. Our algorithms can guarantee to find high-quality, feasible solutions even for very large networks. We evaluate the proposed algorithms using topologies of four operational backbones and demonstrate their scalability. We also compare the capital expenditure of the resulting multicast designs with existing approaches. The evaluation not only confirms the efficacy of our algorithms, but also shows that they outperform existing schemes significantly.  相似文献   

13.
14.
针对现有元数据管理方法存在可扩展性差或访问效率低的问题,提出一种高效可扩展的元数据管理方法.基于可扩展哈希方法实现元数据系统动态可扩展,使用并行定位方法实现元数据服务器高效定位,提出动态K叉树的元数据组织方法以提高元数据服务器扩展时选择迁移元数据的速度.实验结果表明,该方法具有近似线性扩展性能,其选择迁移元数据的时间几乎为零,有效解决了云计算环境中元数据管理系统的高效动态可扩展问题.  相似文献   

15.
Efficient and scalable search on scale-free P2P networks   总被引:1,自引:0,他引:1  
Unstructured peer-to-peer (P2P) systems (e.g. Gnutella) are characterized by uneven distributions of node connectivity and file sharing. The existence of “hub” nodes that have a large number of connections and “generous” nodes that share many files significantly influences performance of information search over P2P file-sharing networks. In this paper, we present a novel Scalable Peer-to-Peer Search (SP2PS) method with low maintenance overhead for resource discovery in scale-free P2P networks. Different from existing search methods which employ one heuristic to direct searches, SP2PS achieves better performance by considering both of the number of shared files and the connectivity of each neighbouring node. SP2PS enables peer nodes to forward queries to the neighbours that are more likely to have the requested files and also can help in finding the requested files in the future hops. The proposed method has been simulated in different power-law networks with different forwarding degrees and distances. From our analytic and simulation results, SP2PS achieves better performance when compared to other related methods.
David WebsterEmail:
  相似文献   

16.
Local selection is a simple selection scheme in evolutionary computation. Individual fitnesses are accumulated over time and compared to a fixed threshold, rather than to each other, to decide who gets to reproduce. Local selection, coupled with fitness functions stemming from the consumption of finite shared environmental resources, maintains diversity in a way similar to fitness sharing. However, it is more efficient than fitness sharing and lends itself to parallel implementations for distributed tasks. While local selection is not prone to premature convergence, it applies minimal selection pressure to the population. Local selection is, therefore, particularly suited to Pareto optimization or problem classes where diverse solutions must be covered. This paper introduces ELSA, an evolutionary algorithm employing local selection and outlines three experiments in which ELSA is applied to multiobjective problems: a multimodal graph search problem, and two Pareto optimization problems. In all these experiments, ELSA significantly outperforms other well-known evolutionary algorithms. The paper also discusses scalability, parameter dependence, and the potential distributed applications of the algorithm.  相似文献   

17.
Two-tier streaming settings are a typical dynamic environment where continuous skylines represent an important semantic indicator for multiple attributes. To monitor skylines over the dynamic data in such settings, one needs to continuously update the skyline query results in order to reflect the new data values. This paper tackles the problem of continuous skyline monitoring on a central query server over dynamic data from multiple data sites. Simply sending the updates of tuple values to the server is cost-prohibitive. Therefore, we propose an approach that allows the central server to collaborate with the data sites to monitor the possible skyline changes. By doing so, the processing load is distributed over all the data sites instead of only on the central server. Furthermore, this collaborative approach minimizes the bandwidth consumption between the server and the data sites, which is often critical in a widely distributed environment such as a wide-area sensor network. We give theoretical upper bounds for the computation costs and communication costs of the proposed collaborative approach. We also conduct extensive experiments on both synthetic and real data sets. The experimental results demonstrate that our collaborative approach is efficient, scalable and well-balanced in terms of communication costs and computation costs.  相似文献   

18.
Media authentication of wireless transmission is becoming an increasingly important issue. Authenticated media content is constantly required to be transcoded at intermediates to accommodate heterogeneous applications. In this paper, a general and efficient authentication approach is proposed for scalable lossy media streams. Firstly, a joint coding and stream authentication (JCSA) media transmission system is described in a heterogeneous wireless network. For the JCSA system, a novel structure-maintained packetization is designed to realize flexible transcoding. Secondly, to obtain the optimal end-to-end quality and minimize the authentication overhead, a quality-optimized stream authentication (QOSA) framework is proposed for authenticating media content. Finally, an implementation of the proposed QOSA optimization framework on the consultative committee for space data systems image data compression (CCSDS IDC) coder is presented by combining graph-based and error-correction coding based (ECC-based) approaches. Experimental results demonstrate that our scheme can achieve the desired goal that it provides high robustness against packet-loss at the cost of a very low overhead.  相似文献   

19.
In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/quantization without any penalizing information loss. This model may be used for many purposes. For instance, we rely on it to carry out the first steps leading to advertisement recognition in document images. Furthermore, the color segmentation output is used to localize text areas and enhance optical character recognition (OCR) performances. We held tests on a variety of magazine images to point up our contribution to the well-known OCR product Abby FinerReader. We also get promising results with our ad detection system on a large set of complex layout testing images.  相似文献   

20.
With fast growing number of images on photo-sharing websites such as Flickr and Picasa, it is in urgent need to develop scalable multi-label propagation algorithms for image indexing, management and retrieval. It has been well acknowledged that analysis in semantic region level may greatly improve image annotation performance compared to that in the holistic image level. However, region level approach increases the data scale to several orders of magnitude and proposes new challenges to most existing algorithms. In this work, we present a novel framework to effectively compute pairwise image similarity by accumulating the information of semantic image regions. Firstly, each image is encoded as Bag-of-Regions based on multiple image segmentations. Secondly, all image regions are separated into buckets with efficient locality-sensitive hashing (LSH) method, which guarantees high collision probabilities for similar regions. The k-nearest neighbors of each image and the corresponding similarities can be efficiently approximated with these indexed patches. Lastly, the sparse and region-aware image similarity matrix is fed into the multi-label extension of the entropic graph regularized semi-supervised learning algorithm [1]. In combination they naturally yield the capability of handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets validate the effectiveness and efficiency of our proposed framework for region-aware and scalable multi-label propagation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号