首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The goal of this survey is to present the state of the art instance matching benchmarks for Linked Data. We introduce the principles of benchmark design for instance matching systems, discuss the dimensions and characteristics of an instance matching benchmark, provide a comprehensive overview of existing benchmarks, as well as benchmark generators, discuss their advantages and disadvantages, as well as the research directions that should be exploited for the creation of novel benchmarks, to answer the needs of the Linked Data paradigm.  相似文献   

2.
We propose a micro-benchmark for XML data management to aid engineers in designing improved XML processing engines. This benchmark is inherently different from application-level benchmarks, which are designed to help users choose between alternative products. We primarily attempt to capture the rich variety of data structures and distributions possible in XML, and to isolate their effects, without imitating any particular application. The benchmark specifies a single data set against which carefully specified queries can be used to evaluate system performance for XML data with various characteristics.  相似文献   

3.
微机的图形性能已经成为微机性能的一个重要方面,但是目前尚未有标准的微机图形基准程序。本文介绍了我们设计的适合于微机系统的基于WINDOWS环境下的一组图形基准程序,以便测试微机的图形性能,并通过运行这组基准程序组出了几种微机的图形性能价格格比分析。  相似文献   

4.
Weicker  R.P. 《Computer》1990,23(12):65-75
The three most often used benchmarks are characterized in detail and users are warned about a number of pitfalls. Two of them, Whetstone and Drystone, are synthetic benchmarks: they were written solely for benchmarking purposes and perform no useful computation. Linpack was distilled out of a real, purposeful program that is now used as a benchmark. Some other benchmarks, namely Livermore Fortran Kernels, Stanford Small Programs Benchmark Set, EDN benchmarks, Sieve of Eratosthenes, Rhealstone and SPEC benchmarks, are briefly considered. Non-CPU influences in benchmark performance are discussed  相似文献   

5.
工业界、学术界,以及最终用户都急切需要一个大数据的评测基准, 用以评估现有的大数据系统,改进现有技术以及开发新的技术。回顾了近几年来大数据评测基准研发方面的主要工作。 对它们的特点和缺点进行了比较分析。在此基础上, 对研发新的大数据评测基准提出了一系列考虑因素:1)为了对整个大数据平台的不同子工具进行评测, 以及把大数据平台作为一个整体进行评测, 需要研发面向组件的评测基准和面向大数据平台整体的评测基准, 后者是前者的有机组合;2)工作负载除了SQL查询之外, 必须包含大数据分析任务所需要的各种复杂分析功能, 涵盖各类应用需求;3)在评测指标方面,除了性能指标(响应时间和吞吐量)之外, 还需要考虑其他指标的评测, 包括系统的可扩展性、容错性、节能性和安全性等。  相似文献   

6.
Benchmarks are vital tools in the performance measurement and evaluation of computer hardware and software systems. Standard benchmarks such as the TREC, TPC, SPEC, SAP, Oracle, Microsoft, IBM, Wisconsin, AS3AP, OO1, OO7, XOO7 benchmarks have been used to assess the system performance. These benchmarks are domain-specific in that they model typical applications and tie to a problem domain. Test results from these benchmarks are estimates of possible system performance for certain pre-determined problem types. When the user domain differs from the standard problem domain or when the application workload is divergent from the standard workload, they do not provide an accurate way to measure the system performance of the user problem domain. System performance of the actual problem domain in terms of data and transactions may vary significantly from the standard benchmarks. In this research, we address the issue of domain boundness and workload boundness which results in the ir-representative and ir-reproducible performance readings. We tackle the issue by proposing a domain-independent and workload-independent benchmark method which is developed from the perspective of the user requirements. We present a user-driven workload model to develop a benchmark in a process of workload requirements representation, transformation, and generation. We aim to create a more generalized and precise evaluation method which derives test suites from the actual user domain and application. The benchmark method comprises three main components. They are a high-level workload specification scheme, a translator of the scheme, and a set of generators to generate the test database and the test suite. The specification scheme is used to formalize the workload requirements. The translator is used to transform the specification. The generator is used to produce the test database and the test workload. In web search, the generic constructs are main common carriers we adopt to capture and compose the workload requirements. We determine the requirements via the analysis of literature study. In this study, we have conducted ten baseline experiments to validate the feasibility and validity of the benchmark method. An experimental prototype is built to execute these experiments. Experimental results demonstrate that the method is capable of modeling the standard benchmarks as well as more general benchmark requirements.  相似文献   

7.
We consider the university course timetabling problem, which is one of the most studied problems in educational timetabling. In particular, we focus our attention on the formulation known as the curriculum-based course timetabling problem (CB-CTT), which has been tackled by many researchers and for which there are many available benchmarks.The contribution of this paper is twofold. First, we propose an effective and robust single-stage simulated annealing method for solving the problem. Second, we design and apply an extensive and statistically-principled methodology for the parameter tuning procedure. The outcome of this analysis is a methodology for modeling the relationship between search method parameters and instance features that allows us to set the parameters for unseen instances on the basis of a simple inspection of the instance itself. Using this methodology, our algorithm, despite its apparent simplicity, has been able to achieve high quality results on a set of popular benchmarks.A final contribution of the paper is a novel set of real-world instances, which could be used as a benchmark for future comparison.  相似文献   

8.
This paper presents a set of benchmarks and metrics for performance reporting in explicit state parallel model checking algorithms. The benchmarks are selected for controllability, and the metrics are chosen to measure speedup and communication overhead. The benchmarks and metrics are used to compare two parallel model checking algorithms: partition and random walk. Implementations of the partition algorithm using synchronous and asynchronous communication are used. Metrics are reported for each benchmark and algorithm for up to 128 workstations using a network of dynamically loaded workstations. Empirical results show that load balancing becomes an issue for more than 32 workstations in the partition algorithm and that random walk is a reasonable, low overhead, approach for finding errors in large models. The synchronous implementation is consistently faster than the asynchronous. The benchmarks, metrics and results given here are intended to be a starting point for a larger discussion of performance reporting in parallel explicit state model checking.  相似文献   

9.
Benchmarks are heavily used in different areas of computer science to evaluate algorithms and tools. In program analysis and testing, open‐source and commercial programs are routinely used as benchmarks to evaluate different aspects of algorithms and tools. Unfortunately, many of these programs are written by programmers who introduce different biases, not to mention that it is very difficult to find programs that can serve as benchmarks with high reproducibility of results. We propose a novel approach for generating random benchmarks for evaluating program analysis and testing tools and compilers. Our approach uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated programs. We implemented our tool for Java and applied it to generate a set of large benchmark programs of up to 5M lines of code each with which we evaluated different program analysis and testing tools and compilers. The generated benchmarks let us independently rediscover several issues in the evaluated tools. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
Inability to identify weaknesses or to quantify advancements in software system robustness frequently hinders the development of robust software systems. Efforts have been made to develop benchmarks of software robustness to address this problem, but they all suffer from significant shortcomings. The paper presents the various features that are desirable in a benchmark of system robustness, and evaluates some existing benchmarks according to these features. A new hierarchically structured approach to building robustness benchmarks, which overcomes many deficiencies of past efforts, is also presented. This approach has been applied to building a hierarchically structured benchmark that tests part of the Unix file and virtual memory systems. The resultant benchmark has successfully been used to identify new response class structures that were not detected in a similar situation by other less organized techniques  相似文献   

11.
We have witnessed exciting development of RAM technology in the past decade. The memory size grows rapidly and the price continues to decrease, so that it is feasible to deploy large amounts of RAM in a computer system. Several companies and research institutions have devoted a lot of resources to develop in-memory databases (IMDB) that implement queries after loading data into (virtual) memory in advance. The bloom of various in-memory databases pursues us to test and evaluate their performance objectively and fairly. Although the existing database benchmarks like Wisconsin benchmark and TPC-X series have achieved great success, they cannot suit for in-memory databases due to the lack of consideration of unique characteristics of an IMDB. In this study, we propose MemTest, a novel benchmark that concerns some major characteristics of an in-memory database. This benchmark constructs particular metrics, which cover processing time, compression ratio, minimal memory space and column strength of an in-memory database. We design a data model based on inter-bank transaction applications, and a data generator to support uniform and skew data distributions. The MemTest workload includes a set of queries and transactions against the metrics and data model. Finally, we illustrate the efficacy of MemTest through the implementations on two different in-memory databases.  相似文献   

12.
13.
The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BenchIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BenchIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect various characteristics of the evaluated intelligence processors. BenchIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BenchIP will be open-sourced soon.  相似文献   

14.
By using the principle of fixed-time benchmarking, it is possible to compare a wide range of computers, from a small personal computer to the most powerful parallel supercomputer, on a single scale. Fixed-time benchmarks promise greater longevity than those based on a particular problem size and are more appropriate for “grand challenge” capability comparison. We present the design of a benchmark, SLALOM, that adjusts automatically to the computing power available and corrects several deficiencies in various existing benchmarks: it is highly scalable, solves a real problem, includes input and output times, and can be run on parallel computers of all kinds, using any convenient language. The benchmark provides an estimate of the size of problem solvable on scientific computers. It also can be used to demonstrate a new source of superlinear speedup in parallel computers. Results that span six orders of magnitude for contemporary computers of various architectures are presented.  相似文献   

15.
Most benchmarks are smaller than actual application programs. One reason is to improve benchmark universality by demanding resources every computer is likely to have. However, users dynamically increase the size of application programs to match the power available, whereas most benchmarks are static and of a size appropriate for computers available when the benchmark was created; this is particularly true for parallel computers. Thus, the benchmark overstates computer performance, since smaller problems spend more time in cache. Scalable benchmarks, such as HINT, examine the full spectrum of performance through various memory regimes, and express a superset of the information given by any particular fixed-size benchmark. Using 5,000 experimental measurements, we have found that performance on the NAS Parallel Benchmarks, SPEC, LINPACK, and other benchmarks is predicted accurately by subsets of HINT performance curve. Correlations are typically better than 0.995. Predicted ranking is often perfect.  相似文献   

16.
Benchmarking has proven to be crucial for the investigation of the behavior and performances of a system. However, the choice of relevant benchmarks still remains a challenge. To help the process of comparing and choosing among benchmarks, we propose a solution for automatic benchmark profiling. It computes unified benchmark profiles reflecting benchmarks' duration, function repartition, stability, CPU efficiency, parallelization, and memory usage. Our approach identifies the needed system information for profile computation and collects it from execution traces captured without benchmark code modifications. It structures profile computation as a reproducible workflow for automatic trace analysis, which efficiently manages important trace volumes. In this paper, we report on the design and the implementation of our approach, which involves the collection and analysis of about 500 GB of trace data coming from 2 different platforms (an x86 desktop machine and the Juno SoC board). The computed benchmark profiles provide valuable insights about the benchmarks' behavior and help compare different benchmarks on the same platform as well as the behavior of the same benchmark on different platforms.  相似文献   

17.
This paper presents a survey and an analysis of the XQuery benchmark publicly available in 2006—XMach-1, XMark, X007, the Michigan benchmark, and XBench—from different perspectives. We address three simple questions about these benchmarks: How are they used? What do they measure? What can one learn from using them? One focus of our analysis is to determine whether the benchmarks can be used for micro-benchmarking. Our conclusions are based on an usage analysis, on an in-depth analysis of the benchmark queries, and on experiments run on four XQuery engines: Galax, SaxonB, Qizx/Open, and MonetDB/XQuery.  相似文献   

18.
随着云计算的快速发展,云文件系统在云计算基础设施中扮演着越来越重要的角色。尽管目前业界已有不少面向云文件系统的性能评测工具,但大多数评测工具仅关注于传统的系统性能指标,比如IOPS和吞吐量,难以评估云文件系统在多租户环境下的性能隔离性。由于云环境I/O负载的动态性和异构性,所以准确评估云文件系统的隔离性变得更加具有挑战性。提出了一种新型的云文件系统隔离性度量模型,并在一个基准测试工具Porcupine中进行了实现。Porcupine通过模拟真实负载特征的I/O请求,实现对负载与性能的准确仿真并提高文件系统的测试效率。通过对Ceph文件系统的实验,验证了所提出的隔离性度量模型的有效性及准确性。  相似文献   

19.
基准测试程序是评估处理器微体系结构设计的重要手段,然而当前的基准测试程序无法有效全面地评估面向高通量应用的处理器微体系结构的设计.基于此,针对高通量应用的特征,提出了用于评估面向高通量应用的处理器微体系结构设计的基准测试程序——HTC-MicroBench.首先,提出一种基于应用特征的高通量应用分类方法,并基于此分类方法对高通量应用中的Workload进行分类.其次,针对高通量应用的特征,提出了一种基于线程的作业处理节点并行化模型,基于此模型完成了HTC-MicroBench的设计和实现.最后,从作业并发性、作业之间的耦合性和Cache使用效率等指标对HTC-MicroBench进行实验评估;并基于HTC-MicroBench对TILE-Gx和Xeon两种处理器的并行加速能力做了评估,高并发、低耦合和由Workload特征所体现出的不同Cache命中率的评估结果说明了HTC-MicroBench能够准确刻画高通量应用的特征,并对面向高通量应用的处理器微体系结构的设计进行有效的测评.  相似文献   

20.
Benchmarking is comparing the output of different systems for a given set of input data in order to improve the system’s performance. Faced with the lack of realistic and operational benchmarks that can be used for testing optimization methods and control systems in flexible systems, this paper proposes a benchmark system based on a real production cell. A three-step method is presented: data preparation, experimentation, and reporting. This benchmark allows the evaluation of static optimization performances using traditional operation research tools and the evaluation of control system's robustness faced with unexpected events.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号