首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 455 毫秒
1.
This article presents XML-based tools for parser generation and data binding generation. The underlying concept is that of transformation between formal languages, which is a form of meta-programming. We discuss the benefits of such a declarative approach with well-defined semantics: productivity, maintainability, verifiability, performance and safety.  相似文献   

2.
In this paper, a new approach called ‘instance variant nearest neighbor’ approximates a regression surface of a function using the concept of k nearest neighbor. Instead of fixed k neighbors for the entire dataset, our assumption is that there are optimal k neighbors for each data instance that best approximates the original function by fitting the local regions. This approach can be beneficial to noisy datasets where local regions form data characteristics that are different from the major data clusters. We formulate the problem of finding such k neighbors for each data instance as a combinatorial optimization problem, which is solved by a particle swarm optimization. The particle swarm optimization is extended with a rounding scheme that rounds up or down continuous-valued candidate solutions to integers, a number of k neighbors. We apply our new approach to five real-world regression datasets and compare its prediction performance with other function approximation algorithms, including the standard k nearest neighbor, multi-layer perceptron, and support vector regression. We observed that the instance variant nearest neighbor outperforms these algorithms in several datasets. In addition, our new approach provides consistent outputs with five datasets where other algorithms perform poorly.  相似文献   

3.
In this paper, we focus on the Conference Scheduling System, a case study at the Tool Contest of Graph-Based Tools (GraBaTs) 2008. We took part in the contest with our graph transformation tool AGG and the Eclipse-based EMF model transformation tool EMF Tiger. We present the features of both tools and evaluate their abilities to model the Conference Scheduling System and to deal with additional contest assignments like model instance generation, property verification, and interoperability.  相似文献   

4.
In big data sources, real-world entities are typically represented with a variety of schemata and formats (e.g., relational records, JSON objects, etc.). Different profiles (i.e., representations) of an entity often contain redundant and/or inconsistent information. Thus identifying which profiles refer to the same entity is a fundamental task (called Entity Resolution) to unleash the value of big data. The naïve all-pairs comparison solution is impractical on large data, hence blocking methods are employed to partition a profile collection into (possibly overlapping) blocks and limit the comparisons to profiles that appear in the same block together. Meta-blocking is the task of restructuring a block collection, removing superfluous comparisons. Existing meta-blocking approaches rely exclusively on schema-agnostic features, under the assumption that handling the schema variety of big data does not pay-off for such a task. In this paper, we demonstrate how “loose” schema information (i.e., statistics collected directly from the data) can be exploited to enhance the quality of the blocks in a holistic loosely schema-aware (meta-)blocking approach that can be used to speed up your favorite Entity Resolution algorithm. We call it Blast (Blocking with Loosely-Aware Schema Techniques). We show how Blast can automatically extract the loose schema information by adopting an LSH-based step for efficiently handling volume and schema heterogeneity of the data. Furthermore, we introduce a novel meta-blocking algorithm that can be employed to efficiently execute Blast on MapReduce-like systems (such as Apache Spark). Finally, we experimentally demonstrate, on real-world datasets, how Blast outperforms the state-of-the-art (meta-)blocking approaches.  相似文献   

5.
Grammar deployment is the process of turning a given grammar specification into a working parser. The Grammar Deployment Kit (for short, GDK) provides tool support in this process based on grammar engineering methods. We are mainly interested in the deployment of grammars for software renovation tools, that is, tools for software re- and reverse engineering. The current version of GDK is optimized for Cobol. We assume that grammar deployment starts from an initial grammar specification which is maybe still ambiguous or even incomplete. In practice, grammar deployment binds unaffordable human resources because of the unavailability of suitable grammar specifications, the diversity of parsing technology as well as the limitations of the technology, integration problems regarding the development of software renovation functionality, and the lack of tools and adherence to firm methods for grammar engineering. GDK helps to largely automate grammar deployment because tool support for grammar adaptation and parser generation is provided. We support different parsing technologies, among them btyacc, that is, yacc with backtracking. GDK is free software.  相似文献   

6.
We investigate the design of declarative, domain-specific languages for constructing interactive visualizations. By separating specification from execution, declarative languages can simplify development, enable unobtrusive optimization, and support retargeting across platforms. We describe the design of the Protovis specification language and its implementation within an object-oriented, statically-typed programming language (Java). We demonstrate how to support rich visualizations without requiring a toolkit-specific data model and extend Protovis to enable declarative specification of animated transitions. To support cross-platform deployment, we introduce rendering and event-handling infrastructures decoupled from the runtime platform, letting designers retarget visualization specifications (e.g., from desktop to mobile phone) with reduced effort. We also explore optimizations such as runtime compilation of visualization specifications, parallelized execution, and hardware-accelerated rendering. We present benchmark studies measuring the performance gains provided by these optimizations and compare performance to existing Java-based visualization tools, demonstrating scalability improvements exceeding an order of magnitude.  相似文献   

7.
Huge volume of data over several domains demands the development of new more efficient tools for search, analysis, and interpretation. Clustering approaches represent an important step in exploring the internal structure and relationships in datasets. In this study, the cognitively motivated neural network Freeman K3-set was applied as a filter to preprocess the data, achieving a better clustering performance. We combine K3 with a variety of clustering algorithms commonly used, and tested its performance using standard UCI datasets and also datasets from social networks. A comprehensive evaluation using a number of cluster validation measures shows significant improvement in the overall performance of the K3-based clustering method for social data sets, for two types of clustering validation measures. Additionally, K3 filtering results in transparent representation of data, which leads to improved efficiency of data processing algorithms used.  相似文献   

8.
Sensor-based human activity recognition (HAR), with the ability to recognise human activities from wearable or embedded sensors, has been playing an important role in many applications including personal health monitoring, smart home, and manufacturing. The real-world, long-term deployment of these HAR systems drives a critical research question: how to evolve the HAR model automatically over time to accommodate changes in an environment or activity patterns. This paper presents an online continual learning (OCL) scenario for HAR, where sensor data arrives in a streaming manner which contains unlabelled samples from already learnt activities or new activities. We propose a technique, OCL-HAR, making a real-time prediction on the streaming sensor data while at the same time discovering and learning new activities. We have empirically evaluated OCL-HAR on four third-party, publicly available HAR datasets. Our results have shown that this OCL scenario is challenging to state-of-the-art continual learning techniques that have significantly underperformed. Our technique OCL-HAR has consistently outperformed them in all experiment setups, leading up to 0.17 and 0.23 improvements in micro and macro F1 scores.  相似文献   

9.
《Computer Networks》1999,31(11-16):1155-1169
An important application of XML is the interchange of electronic data (EDI) between multiple data sources on the Web. As XML data proliferates on the Web, applications will need to integrate and aggregate data from multiple source and clean and transform data to facilitate exchange. Data extraction, conversion, transformation, and integration are all well-understood database problems, and their solutions rely on a query language. We present a query language for XML, called XML-QL, which we argue is suitable for performing the above tasks. XML-QL is a declarative, `relational complete' query language and is simple enough that it can be optimized. XML-QL can extract data from existing XML documents and construct new XML documents.  相似文献   

10.
This paper describes a new out-of-core multi-resolution data structure for real-time visualization, interactive editing and externally efficient processing of large point clouds. We describe an editing system that makes use of the novel data structure to provide interactive editing and preprocessing tools for large scanner data sets. Using the new data structure, we provide a complete tool chain for 3D scanner data processing, from data preprocessing and filtering to manual touch-up and real-time visualization. In particular, we describe an out-of-core outlier removal and bilateral geometry filtering algorithm, a toolset for interactive selection, painting, transformation, and filtering of huge out-of-core point-cloud data sets and a real-time rendering algorithm, which all use the same data structure as storage backend. The interactive tools work in real-time for small model modifications. For large scale editing operations, we employ a two-resolution approach where editing is planned in real-time and executed in an externally efficient offline computation afterwards. We evaluate our implementation on example data sets of sizes up to 63 GB, demonstrating that the proposed technique can be used effectively in real-world applications.  相似文献   

11.
Gaussian Process (GP) model is an elegant tool for the probabilistic prediction. However, the high computational cost of GP prohibits its practical application on large datasets. To address this issue, this paper develops a new sparse GP model, referred to as GPHalf. The key idea is to sparsify the GP model via the newly introduced ? 1/2 regularization method. To achieve this, we represent the GP as a generalized linear regression model, then use the modified ? 1/2 half thresholding algorithm to optimize the corresponding objective function, thus yielding a sparse GP model. We proof that the proposed model converges to a sparse solution. Numerical experiments on both artificial and real-world datasets validate the effectiveness of the proposed model.  相似文献   

12.
This paper presents the design, implementation, and applications of a software testing tool, TAO, which allows users to specify and generate test cases and oracles in a declarative way. Extended from its previous grammar-based test generation tool, TAO provides a declarative notation for defining denotational semantics on each productive grammar rule, such that when a test case is generated, its expected semantics will be evaluated automatically as well, serving as its test oracle. TAO further provides a simple tagging mechanism to embed oracles into test cases for bridging the automation between test case generation and software testing. Two practical case studies are used to illustrate how automated oracle generation can be effectively integrated with grammar-based test generation in different testing scenarios: locating fault-inducing input patterns on Java applications; and Selenium-based automated web testing.  相似文献   

13.
Modern storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world proprietary datasets are too large to be copied onto a test storage system, and most data cannot be shared due to privacy issues, a benchmark needs to generate data synthetically. To ensure that the result is accurate, it is necessary to generate data content based on the characterization of real-world data properties that influence the storage system performance during the execution of a benchmark. The existing approach, called SDGen, cannot guarantee that the benchmark result is accurate in storage systems that have built-in word-based compressors. The reason is that SDGen characterizes the properties that influence compression performance only at the byte level, and no properties are characterized at the word level. To address this problem, we present TextGen, a realistic text data content generation method for modern storage system benchmarks. TextGen builds the word corpus by segmenting real-world text datasets, and creates a word-frequency distribution by counting each word in the corpus. To improve data generation performance, the word-frequency distribution is fitted to a lognormal distribution by maximum likelihood estimation. The Monte Carlo approach is used to generate synthetic data. The running time of TextGen generation depends only on the expected data size, which means that the time complexity of TextGen is O(n). To evaluate TextGen, four real-world datasets were used to perform an experiment. The experimental results show that, compared with SDGen, the compression performance and compression ratio of the datasets generated by TextGen deviate less from real-world datasets when end-tagged dense code, a representative of word-based compressors, is evaluated.  相似文献   

14.
In this paper, we present an innovative framework for efficiently monitoring Wireless Sensor Networks (WSNs). Our framework, coined KSpot, utilizes a novel top-k query processing algorithm we developed, in conjunction with the concept of in-network views, in order to minimize the cost of query execution. For ease of exposition, consider a set of sensors acquiring data from their environment at a given time instance. The generated information can conceptually be thought as a horizontally fragmented base relation R. Furthermore, the results to a user-defined query Q, registered at some sink point, can conceptually be thought as a view V. Maintaining consistency between V and R is very expensive in terms of communication and energy. Thus, KSpot focuses on a subset V??(?V) that unveils only the k highest-ranked answers at the sink, for some user defined parameter k. To illustrate the efficiency of our framework, we have implemented a real system in nesC, which combines the traditional advantages of declarative acquisition frameworks, like TinyDB, with the ideas presented in this work. Extensive real-world testing and experimentation with traces from UC-Berkeley, the University of Washington and Intel Research Berkeley, show that KSpot provides an up to 66% of energy savings compared to TinyDB, minimizes both the size and number of packets transmitted over the network (up to 77%), and prolongs the longevity of a WSN deployment to new scales.  相似文献   

15.
Homotopy continuation provides a numerical tool for computing the equivalence of a smooth variety in an intersection product. Intersection theory provides a theoretical tool for relating the equivalence of a smooth variety in an intersection product to the degrees of the Chern classes of the variety. A combination of these tools leads to a numerical method for computing the degrees of Chern classes of smooth projective varieties in Pn. We illustrate the approach through several worked examples.  相似文献   

16.
The success of kernel-based learning methods depends on the choice of kernel. Recently, kernel learning methods have been proposed that use data to select the most appropriate kernel, usually by combining a set of base kernels. We introduce a new algorithm for kernel learning that combines a continuous set of base kernels, without the common step of discretizing the space of base kernels. We demonstrate that our new method achieves state-of-the-art performance across a variety of real-world datasets. Furthermore, we explicitly demonstrate the importance of combining the right dictionary of kernels, which is problematic for methods that combine a finite set of base kernels chosen a priori. Our method is not the first approach to work with continuously parameterized kernels. We adopt a two-stage kernel learning approach. We also show that our method requires substantially less computation than previous such approaches, and so is more amenable to multi-dimensional parameterizations of base kernels, which we demonstrate.  相似文献   

17.
Overlay networks create new networking services using nodes that communicate using pre-existing networks. They are often optimized for specific applications and targeted at niche vertical domains, but lack interoperability with which their functionalities can be shared. Mosaic is a declarative platform for constructing new overlay networks from multiple existing overlays, each possessing a subset of the desired new network’s characteristics.This paper focuses on the design and implementation of Mosaic: composition and deployment of control and/or data plane functions of different overlay networks, dynamic compositions of overlay networks to meet changing application needs and network conditions, and seamless support for legacy applications. Mosaic overlays are specified using Mozlog, a new declarative language for expressing overlay properties independently from their particular implementation or underlying network.Mosaic is validated experimentally using compositions specified in Mozlog in order to create new overlay networks with compositions of their functions: the i3 indirection overlay that supports mobility, the resilient overlay network (RON) overlay for robust routing, and the Chord distributed hash table for scalable lookups. Mosaic uses runtime composition to simultaneously deliver application-aware mobility, NAT traversal and reliability. We further demonstrate Mosaic’s dynamic composition capabilities by Chord switching its underlay from IP to RON at runtime.Mosaic’s benefits are obtained at a low performance cost, as demonstrated by measurements on both a local cluster environment and the PlanetLab global testbed.  相似文献   

18.
We present a unified declarative platform for specifying, implementing, and analyzing secure networked information systems. Our work builds upon techniques from logic-based trust management systems and declarative networking. We make the following contributions. First, we propose the Secure Network Datalog (SeNDlog) language that unifies Binder, a logic-based language for access control in distributed systems, and Network Datalog, a distributed recursive query language for declarative networks. SeNDlog enables network routing, information systems, and their security policies to be specified and implemented within a common declarative framework. Second, we extend existing distributed recursive query processing techniques to execute SeNDlogprograms that incorporate secure communication via authentication and encryption among untrusted nodes. Third, we demonstrate the use of user-defined cryptographic functions for customizing the authentication and encryption mechanisms used for securing protocols. Finally, using a local cluster and the PlanetLab testbed, we perform a detailed performance study of a variety of secure networked systems implemented using our platform.  相似文献   

19.

Deep learning has been extensively researched in the field of document analysis and has shown excellent performance across a wide range of document-related tasks. As a result, a great deal of emphasis is now being placed on its practical deployment and integration into modern industrial document processing pipelines. It is well known, however, that deep learning models are data-hungry and often require huge volumes of annotated data in order to achieve competitive performances. And since data annotation is a costly and labor-intensive process, it remains one of the major hurdles to their practical deployment. This study investigates the possibility of using active learning to reduce the costs of data annotation in the context of document image classification, which is one of the core components of modern document processing pipelines. The results of this study demonstrate that by utilizing active learning (AL), deep document classification models can achieve competitive performances to the models trained on fully annotated datasets and, in some cases, even surpass them by annotating only 15–40% of the total training dataset. Furthermore, this study demonstrates that modern AL strategies significantly outperform random querying, and in many cases achieve comparable performance to the models trained on fully annotated datasets even in the presence of practical deployment issues such as data imbalance, and annotation noise, and thus, offer tremendous benefits in real-world deployment of deep document classification models. The code to reproduce our experiments is publicly available at https://github.com/saifullah3396/doc_al.

  相似文献   

20.
We devise a technique designed to remove the texturing artefacts that are typical of 3D models representing real-world objects, acquired by photogrammetric techniques. Our technique leverages the recent advancements in inpainting of natural colour images, adapting them to the specific context. A neural network, modified and trained for our purposes, replaces the texture areas containing the defects, substituting them with new plausible patches of texels, reconstructed from the surrounding surface texture. We train and apply the network model on locally reparametrized texture patches, so to provide an input that simplifies the learning process, because it avoids any texture seams, unused texture areas, background, depth jumps and so on. We automatically extract appropriate training data from real-world datasets. We show two applications of the resulting method: one, as a fully automatic tool, addressing all problems that can be detected by analysing the UV-map of the input model; and another, as an interactive semi-automatic tool, presented to the user as a 3D ‘fixing’ brush that has the effect of removing artefacts from any zone the users paints on. We demonstrate our method on a variety of real-world inputs and provide a reference usable implementation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号