期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

隋兵才《计算机工程与科学》2021,43(2):274-279

乱序超标量处理器所能获得的指令级并行能力越来越有限,为了获得更高的指令并行性,必须增加更多的乱序执行和控制资源.随着处理器架构的变化,值预测技术能够在现有主流处理器微架构的基础上以更少的硬件开销,获得更高的数据并行性,进一步提升处理器的乱序执行性能.提出了一种基于真实历史反馈的上下文值预测器(RH-VTAGE),通过设置失效列表和预测精度表来控制反馈RH-VTAGE的预测精度,减少预测失效时的流水线恢复开销.同时,在值预测器的最后阶段增加了真实历史反馈的控制计数器,并设计了自适应置信度控制逻辑,针对不同类型的指令按概率对置信度进行动态调整.实际测试结果表明,相对于其他预测器,RH-VTAGE的整数程序预测性能没有明显提升,但是对于浮点程序性能最大提升31.2％. 相似文献

2.

Software monitoring with controllable overhead 总被引：1，自引：0，他引：1

Xiaowan Huang Justin Seyster Sean Callanan Ketan Dixit Radu Grosu Scott A. Smolka Scott D. Stoller Erez Zadok 《International Journal on Software Tools for Technology Transfer (STTT)》2012,14(3):327-347

We introduce the technique of software monitoring with controllable overhead (SMCO), which is based on a novel combination of supervisory control theory of discrete event systems and PID-control theory of discrete time systems. SMCO controls monitoring overhead by temporarily disabling monitoring of selected events for as short a time as possible under the constraint of a user-supplied target overhead o _t. This strategy is optimal in the sense that it allows SMCO to monitor as many events as possible, within the confines of o _t. SMCO is a general monitoring technique that can be applied to any system interface or API. We have applied SMCO to a variety of monitoring problems, including two highlighted in this paper: integer range analysis, which determines upper and lower bounds on integer variable values; and non-accessed period detection, which detects stale or underutilized memory allocations. We benchmarked SMCO extensively, using both CPU- and I/O-intensive workloads, which often exhibited highly bursty behavior. We demonstrate that SMCO successfully controls overhead across a wide range of target overhead levels; its accuracy monotonically increases with the target overhead; and it can be configured to distribute monitoring overhead fairly across multiple instrumentation points. 相似文献

3.

基于改进贝叶斯概率模型的推荐算法 总被引：1，自引：0，他引：1

刘付勇高贤强张著《计算机科学》2017,44(5):285-289

针对现有基于矩阵分解的协同过滤推荐系统预测精度与推荐精度较低的问题,提出一种改进的矩阵分解方法与协同过滤推荐系统。首先,将评分矩阵分解为两个非负矩阵,并对评分做归一化处理,使其具有概率语义;然后,采用变分推理法计算贝叶斯概率模型实部后验的分布;最后,搜索相同偏好的用户分组并预测用户的偏好。此外,基于用户向量的稀疏性设计一种低计算复杂度、低存储成本的推荐结果决策算法。基于3组公开数据集的实验结果表明,本算法的预测性能以及推荐系统的效果均优于其他预测算法与推荐算法。相似文献

4.

Modeling and evaluating design alternatives for an on-lineinstrumentation system: a case study

Waheed A. Rover D.T. Hollingsworth J.K. 《IEEE transactions on pattern analysis and machine intelligence》1998,24(6):451-470

This paper demonstrates the use of a model-based evaluation approach for instrumentation systems (ISs). The overall objective of this study is to provide early feedback to tool developers regarding IS overhead and performance; such feedback helps developers make appropriate design decisions about alternative system configurations and task scheduling policies. We consider three types of system architectures: network of workstations (NOW), symmetric multiprocessors (SMP), and massively parallel processing (MPP) systems. We develop a Resource OCCupancy (ROCC) model for an on-line IS for an existing tool and parameterize it for an IBM SP-2 platform. This model is simulated to answer several “what if” questions regarding two policies to schedule instrumentation data forwarding: collect-and-forward (CF) and batch-and-forward (BF). In addition, this study investigates two alternatives for forwarding the instrumentation data: direct and binary tree forwarding for an MPP system. Simulation results indicate that the BF policy can significantly reduce the overhead and that the tree forwarding configuration exhibits desirable scalability characteristics for MPP systems. Initial measurement-based testing results indicate more than 60 percent reduction in the direct IS overhead when the BF policy was added to Paradyn parallel performance measurement tool 相似文献

5.

Light-weight kernel instrumentation framework using dynamic binary translation

Dongwoo Lee Inhyuk Kim Jeehong Kim Hyung Kook Jun Won Tae Kim Sangwon Lee Young Ik Eom 《The Journal of supercomputing》2013,66(3):1613-1628

Mobile platforms such as Android and iOS, which are based on typical operating systems, have been widely adopted in various computing devices from smart phones even to smart TVs. Along with this, the necessity of kernel instrumentation framework has also grown up for efficient development and debugging of a kernel itself and its components. Although the existing approaches are providing some information about the kernel state including physical register value and primitive memory map, it is hard for the developers to understand and exploit the information. Moreover, the excessive analysis overhead in the existing approach makes them impractical to be used in real systems. Meanwhile, there have been a few studies on analyzing the user-level applications using dynamic binary translation and they are now widely used. In this paper, by extending this idea of dynamic binary translation for user-level applications to the kernel, we propose a new dynamic kernel instrumentation framework. Our framework focuses on the modules such as device drivers, rather than the kernel itself, since the modules comprise a large portion of OS development. Because of the frequent execution of kernel modules, the dynamic kernel instrumentation framework should guarantee the quality of the translated target code. However, costly optimizations to achieve high execution performance are rather harmful to the overall performance. Therefore, in order to improve performance of both translations, we suggest light-weight translator based on pseudo-machine instruction representation and tabular-base translation instead of typical intermediate representation. We implement our framework on Linux system, and our experimental evaluations show that it could quite effectively instrument the target with nominal overhead. 相似文献

6.

On the Accuracy of Packet Delay Estimation in Distributed Service Networks

Nafei Zhu Jingsha He Yue Zhou Wei Wang 《Journal of Network and Systems Management》2013,21(4):623-649

Packet delay (either one-way time or round-trip time) is a very important metric for measuring the performance of networks in a highly dynamic environment such as the Internet. Many network applications are also sensitive to packet delay or delay variation for ensuring an acceptable level of quality in providing network-based services such as VoIP, multimedia streaming, etc. A very important property of packet delay is that it is very dynamic and therefore should be measured frequently with measurement results being updated on a timely basis. Measurement of packet delay has thus generated a great deal of interest in the past years and a lot of research has been performed in the development of measurement architecture as well as specific measurement techniques. However, how to reduce network overhead resulting from measurement while achieving a reasonable level of accuracy still remains a challenge. In this paper, we propose to use delay estimation as an alternative to delay measurement for reducing measurement overhead and, in particular, examine the level of accuracy that delay estimation can achieve. With delay estimation, measurement nodes can be dynamically selected and activated and other nodes can share measurement results by performing delay estimation, thus reducing measurement overhead while supporting the dynamic requirement for delay measurement. Consequently, while measurement overhead can be reduced by activating only a subset of network nodes to perform actual measurement, desired accuracy can be achieved by exploring the correlation between delays as well as by sharing measurement results to do delay estimation based on such a correlation. We illustrate how packet delays of network nodes can correlate to each other based on topological properties and show how delays can be estimated based on such a correlation to meet accuracy requirements, which would make delay measurement in the Internet highly dynamic and adaptable to the accuracy requirements and measurement results highly reliable. We also show how delay estimation can be applied by presenting three application scenarios as well as an example to demonstrate the usefulness and effectiveness of delay estimation in the measurement of packet delays. 相似文献

7.

基于动态插桩的程序分析工具的性能改进

代声馨洪玫郭鑫宇张鹏祁琳莹《计算机应用研究》2013,30(7):2087-2090

基于动态插桩的工具被广泛应用于程序分析中, 但该类工具都面临着严重的性能问题。这类工具的性能开销主要由两部分组成, 即插桩引擎的开销和用户定义的分析程序的开销。为降低用户定义的分析工具的开销, 首先分析了基于动态插桩的工具的性能开销的组成, 并通过实验分析了造成性能开销的几点原因及其对工具的性能影响; 根据分析结论提出了使用离线分析方式来优化工具性能, 最后通过并行数据收集来进一步提升工具性能。使用该方法能减少分析程序5%～15%的CPU占用时间。相似文献

8.

动态翻译系统中的间接转移关联软件预测算法

贾宁杨春佟冬王克义《计算机研究与发展》2014,51(3):661-671

动态翻译系统每执行一次间接转移指令均需进行一次地址转换,该过程是翻译系统性能开销的主要来源之一.无特殊硬件支持的翻译系统常采用软件预测法来降低地址转换开销,而软件预测法的预测准确率较低,制约其对翻译系统整体性能的提升.低开销关联软件预测算法(low-overhead correlated software prediction, LOCSP)可利用代码副本区分待预测指令的不同转移场景,将到达该指令的多条动态执行路径分离为多个互不重合的代码缓存副本,并为各个副本提供独立的预测链.从而在不增加动态指令数的前提下实现关联预测,显著提升软件预测的预测准确率.同时,LOCSP算法基于动态剖析的结果,仅对部分难预测的热点间接转移指令进行关联软件预测,进一步降低预测开销.实验表明,相比软件预测法,LOCSP算法可将平均预测准确率从58.9%提升至82.2%,将翻译系统的整体性能开销平均降低19.3%,最高降低41.9%,而平均静态代码数量仅增加2.4%. 相似文献

9.

动态二进制翻译中热路径优化的软件实现

下载免费PDF全文

史辉辉管海兵梁阿磊《计算机工程》2007,33(23):78-80,8

在动态二进制翻译中,热路径的识别和生成是提高二进制翻译器效率的重要环节。为了提高热路径预测的命中率,必须在程序的运行中搜集较为详细的信息,这必然增加系统的开销。因此,在准确率和开销之间做出权衡十分必要。该文在研究现有热路径算法的基础上,提出了一种改进的基于路径的热路径识别和优化算法,并对结果进行了分析。相似文献

10.

基于插桩分析的Java虚拟机自适应预取优化框架

邹琼伍鸣胡伟武章隆兵《软件学报》2008,19(7):1581-1589

对堆上数据的频繁访问是Java程序的主要开销,为此,研究者们通过虚拟机收集堆上数据访问的信息,而后采用预取或垃圾收集来改进内存性能.常用的收集方法有采样法和插桩法,但二者无法同时满足细粒度和低开销的要求.针对这两个要求,提出基于插桩分析的虚拟机自适应预取框架,该框架通过插桩收集信息,并根据程序运行时的反馈自适应地调整插桩并进行预取优化.实验结果表明,自适应预取优化在Pentium 4上对SPEC JVM98和Dacapo有不同程度的提高,最高的达到了18.1%,而开销控制在4.0%以内. 相似文献

11.

基于超级块支配图插装的软件测试工具设计与实现* 总被引：2，自引：0，他引：2

徐晓峰陈艳李伊飏林晓鹏郭东辉《计算机应用研究》2010,27(3):923-927

通过超级块支配图来分析软件测试探针的合理插装位置,可有效地减少插装探针数量,降低代码插装对程序的影响。基于超级块支配图的代码插装原理,设计一种针对C语言的软件自动测试工具(SAT),介绍了该工具中词法语法分析器、静态分析器、代码插装器等主要功能模块的具体实现方案,同时对SAT的插装性能进行了分析。相似文献

12.

The Paradyn parallel performance measurement tool 总被引：1，自引：0，他引：1

Miller B.P. Callaghan M.D. Cargille J.M. Hollingsworth J.K. Irvin R.B. Karavanic K.L. Kunchithapadam K. Newhall T. 《Computer》1995,28(11):37-46

Paradyn is a tool for measuring the performance of large-scale parallel programs. Our goal in designing a new performance tool was to provide detailed, flexible performance information without incurring the space (and time) overhead typically associated with trace-based tools. Paradyn achieves this goal by dynamically instrumenting the application and automatically controlling this instrumentation in search of performance problems. Dynamic instrumentation lets us defer insertion until the moment it is needed (and remove it when it is no longer needed); Paradyn's Performance Consultant decides when and where to insert instrumentation 相似文献

13.

An entropy-based algorithm for data elimination in time-driven software instrumentation

Ahmet Özmen^{Author Vitae} 《Journal of Systems and Software》2009,82(5):907-913

While monitoring, instrumented long running parallel applications generate huge amount of instrumentation data. Processing and storing this data incurs overhead, and perturbs the execution. A technique that eliminates unnecessary instrumentation data and lowers the intrusion without loosing any performance information is valuable for tool developers. This paper presents a new algorithm for software instrumentation to measure the amount of information content of instrumentation data to be collected. The algorithm is based on entropy concept introduced in information theory, and it makes selective data collection for a time-driven software monitoring system possible. 相似文献

14.

基于位置信息的自适应Ad Hoc路由协议

张棋飞刘威杨宗凯袁林锋《计算机科学》2007,34(5):20-24

传统的基于位置信息的路由算法往往采用分布式预测机制,可靠性不高,特别当源节点不知道目的节点位置时只能采用洪泛策略,增大了通信开销。本文提出一种基于位置信息的自适应路由机制LAAR（Location-based A—daptive Ad hoc Routing）。LAAR采用分层的体系结构获得全网一致的拓扑视图,消除分布式预测带来的不确定性,通过路由前的位置查询避免洪泛。LAAR综合利用多种位置更新机制,在限制网络开销的同时保障位置信息的准确性。LAAR的自适应调节机制将节点的运动状态与路由发现过程结合,实现对目的节点的动态跟踪,提高路由性能。仿真结果表明,随着节点移动速度的增加,LAAR能够获得比LAR更小的控制开销;同时在节点密度较大情况下,LAAR能获得更高的分组投递率。相似文献

15.

Predicting the performance of synchronous discrete event simulation

Jinshen Xu Chung M.J. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(12):1130-1137

We develop a model to predict the performance of synchronous discrete event simulation. Our model considers the two most important factors for the performance of synchronous simulation: load balancing and communication. The effect of load balancing in a synchronous simulation is computed using probability distribution models. We derive a formula that computes the cost of synchronous simulation by combining a communication model called LogGP and computation granularity. Even though the formula is simple, it is effective in capturing the most important factors for the synchronous simulation. The formula helps us to predict the maximum speed up achievable by synchronous simulation. In order to examine the prediction model, we have simulated several large ISCAS logic circuits and a simple PCS network simulation on an SGI Origin 2000 and Terascale Computing System (TCS) at the Pittsburgh Supercomputing Center. The results of the experiment show that our performance model accurately predicts the performance of synchronous simulation. The performance model developed is used to analyze the effect of several factors that may improve the performance of synchronous simulation. The factors include problem size, load balancing, granularity, communication overhead, and partitioning. 相似文献

16.

PAST: accurate instrumentation on fully optimized program

下载免费PDF全文

Jie Yin Chao Ma Shi‐Min Hu 《Software》2016,46(3):341-360

Instrumentation is a powerful technique for monitoring, profiling, debugging, logging and tracing the software. In order to determine the instrumentation location, the user needs to know where the current executed location is in the source code. Previous instrumentation approaches rely on debugging information to find the location in the source code. For fully optimized programs, debugging information is not complete, which limits the application of those approaches. In this paper, we present pattern‐based abstract syntax tree (PAST) instrumentation, an ideal instrumentation methodology that accurately instruments the fully optimized program. The instrumentation location is specified in an intuitive way that matches the source code at the abstract syntax tree level. The program can be instrumented either at the compile time using the ordinary compiling or when it is running using the just‐in‐time compiling. Experimental results show that PAST can accurately instrument the target program. There is negligible run time overhead when the running program is instrumented without any operation. We have implemented PAST on both x86‐32 and x86‐64 to show that PAST is easily portable across different architecture. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

17.

一种基于最稳路径的高效MANET协议

下载免费PDF全文

邓曙光王建新陈松乔陈建二《计算机工程与科学》2002,24(5):43-45

网络拓扑的动态性是MANET的重要特点，也是影响通信性能的主要因素。获取网络拓扑的传统方法是定时更新网络信息，但会带来大量的网络开销。鉴此，本文提出了一个高效的路由协议。在移动预测的机制下，该协议通过选取最稳传输路径来实现。模拟结果表明这种方法具有较好的网络性能。相似文献

18.

Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach

Hyunchul Ahn Kyoung-jae Kim 《Applied Soft Computing》2009,9(2):599-607

One of the most important research issues in finance is building effective corporate bankruptcy prediction models because they are essential for the risk management of financial institutions. Researchers have applied various data-driven approaches to enhance prediction performance including statistical and artificial intelligence techniques, and many of them have been proved to be useful. Case-based reasoning (CBR) is one of the most popular data-driven approaches because it is easy to apply, has no possibility of overfitting, and provides good explanation for the output. However, it has a critical limitation—its prediction performance is generally low. In this study, we propose a novel approach to enhance the prediction performance of CBR for the prediction of corporate bankruptcies. Our suggestion is the simultaneous optimization of feature weighting and the instance selection for CBR by using genetic algorithms (GAs). Our model can improve the prediction performance by referencing more relevant cases and eliminating noises. We apply our model to a real-world case. Experimental results show that the prediction accuracy of conventional CBR may be improved significantly by using our model. Our study suggests ways for financial institutions to build a bankruptcy prediction model which produces accurate results as well as good explanations for these results. 相似文献

19.

基于信息流的整数漏洞插装和验证

孙浩李会朋曾庆凯《软件学报》2013,24(12):2767-2781

为降低整数漏洞插装验证的运行开销,提出基于信息流的整数漏洞插装方法.从限定分析对象范围的角度出发,将分析对象约减为污染信息流路径上的所有危险整数操作,以降低静态插装密度.在GCC平台上,实现了原型系统DRIVER(detect and run-time check integer-based vulnerabilities with information flow).实验结果表明,该方法具有精度高、开销低、定位精确等优点. 相似文献

20.

Symbolic performance modeling of parallel systems

van Gemund A.J.C. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(2):154-165

Performance prediction is an important engineering tool that provides valuable feedback on design choices in program synthesis and machine architecture development. We present an analytic performance modeling approach aimed to minimize prediction cost, while providing a prediction accuracy that is sufficient to enable major code and data mapping decisions. Our approach is based on a performance simulation language called PAMELA. Apart from simulation, PAMELA features a symbolic analysis technique that enables PAMELA models to be compiled into symbolic performance models that trade prediction accuracy for the lowest possible solution cost. We demonstrate our approach through a large number of theoretical and practical modeling case studies, including six parallel programs and two distributed-memory machines. The average prediction error of our approach is less than 10 percent, while the average worst-case error is limited to 50 percent. It is shown that this accuracy is sufficient to correctly select the best coding or partitioning strategy. For programs expressed in a high-level, structured programming model, such as data-parallel programs, symbolic performance modeling can be entirely automated. We report on experiments with a PAMELA model generator built within a dataparallel compiler for distributed-memory machines. Our results show that with negligible program annotation, symbolic performance models are automatically compiled in seconds, while their solution cost is in the order of milliseconds. 相似文献