首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
As highly parallel multicore machines become commonplace, programs must exhibit more concurrency to exploit the available hardware. Many multithreaded programming models already encourage programmers to create hundreds or thousands of short‐lived threads that interact in complex ways. Programmers need to be able to analyze, tune, and troubleshoot these large‐scale multithreaded programs. To address this problem, we present ThreadScope: a tool for tracing, visualizing, and analyzing massively multithreaded programs. ThreadScope extracts the machine‐independent program structure from execution trace data from a variety of tracing tools and displays it as a graph of dependent execution blocks and memory objects, enabling identification of synchronization and structural problems, even if they did not occur in the traced run. It also uses graph‐based analysis to identify potential problems. We demonstrate the use of ThreadScope to view program structure, memory access patterns, and synchronization problems in three programming environments and seven applications. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

2.
《Micro, IEEE》2004,24(6):74-82
Memory latency dominates the performance of many applications on modern processors, despite advances in caches and prefetching techniques. Numerous prefetching techniques, both in hardware and software, try to alleviate the memory bottleneck. One such technique, known as helper threading improves single-thread performance on a simultaneous multithreaded architecture (SMT), which shares processor resources, including caches, among logical threads. It uses otherwise idle hardware thread contexts to execute speculative threads on behalf of the main thread. Helper threading accelerates a program by exploiting a processor's multithreading capability to run assist threads. Based on the helper threading usage model, virtual multithreading (VMT), a form of switch-on-event user-level multithreading, can improve performance for real-world workloads with a wall-clock speedup of 5.0 to 38.5 percent  相似文献   

3.
Multithreaded programs are especially difficult to test and debug. The aim of the paper is to present a new concept of multithreaded program analysis and debugging based on contextual visualisation of the program components that influence thread execution. For this purpose, a dedicated software package called MTV (multithreading viewer) has been designed and implemented. It performs above the run-time library level, and hence only a programmer's view of multiple threads of control execution may be analyzed. The paper presents tested program code instrumentation, communication and synchronization between the instrumented program and MTV. Next, a general concept of contextual visualisation of multithreaded programs has been elaborated. A scheme of the MTV cooperation with the monitored program is discussed. The user interface has been described. A representation of the multithreaded program state has been shown, and the capability of MTV for certain classes of error recognition has been specified and illustrated by a few examples. These examples have been not intended to be exhaustive, but they rather indicate the opportunities to exploit MTV for analysis of complex applications. Short evaluation of the proposed contextual visualisation techniques with application to multithreaded program analysis concludes the paper. © 1997 by John Wiley & Sons, Ltd.  相似文献   

4.
Spinning is a synchronization mechanism commonly used in applications and operating systems. Excessive spinning, however, often indicates performance or correctness (e.g., livelock) problems. Detecting if applications and operating systems are spinning is essential for achieving high performance, especially in consolidated servers running virtual machines. Prior research has used source or binary instrumentation to detect spinning. However, these approaches place a significant burden on programmers and may even be infeasible in certain situations. In this paper, we propose efficient hardware to detect spinning in unmodified applications and operating systems. Based on this hardware, we develop 1) scheduling and power policies that adaptively manage resources for spinning threads, 2) system support that helps detect when a multithreaded program is livelocked, and 3) hardware performance counters that accurately reflect system performance. Using full-system simulation with SPEC OMP, SPLASH-2, and Wisconsin commercial workloads, we demonstrate that our mechanisms effectively improve the management of multithreaded systems.  相似文献   

5.
Discovering and exploiting program phases   总被引:2,自引:0,他引:2  
Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the largest of scales (that is, over the program's complete execution). During one part of the execution, a program can be completely memory bound; in another, it can repeatedly stall on branch mispredicts. Average statistics gathered about a program might not accurately picture where the real problems lie. This realization has ramifications for many architecture and compiler techniques, from how to best schedule threads on a multithreaded machine, to feedback-directed optimizations, power management, and the simulation and test of architectures. Taking advantage of time-varying behavior requires a set of automated analytic tools and hardware techniques that can discover similarities and changes in program behavior on the largest of time scales. The challenge in building such tools is that during a program's lifetime it can execute billions or trillions of instructions. How can high-level behavior be extracted from this sea of instructions? Some programs change behavior drastically, switching between periods of high and low performance, yet system design and optimization typically focus on average system behavior. It is argued that instead of assuming average behavior, it is now time to model and optimize phase-based program behavior.  相似文献   

6.
Efficient performance tuning of parallel programs is often hard. Optimization is often done when the program is written as a last effort to increase the performance. With sequential programs each (executed) code segment will affect the completion time. In the case of a parallel program executed on a multiprocessor this is not always true, due to dependencies between the different threads. Thus, certain code segments of the execution may not affect the completion time of the program. Optimization of such code segments will not increase the performance. In this paper we present an approach to optimize performance by finding the extended critical path of the multithreaded program. The extended critical path analysis is a generalization of the critical path analysis in the sense that it also deals with more threads than processors. We have implemented the extended critical path analysis in a performance optimization tool. The tool allows the user to determine the extended critical path of a multithreaded application written for the Solaris operating system for any number of processors based on execution on a single processor workstation.  相似文献   

7.
随着多核技术的不断发展,多线程技术更加广泛地应用于计算机软件中.但由于执行的不确定性,多线程程序的排错和调试存在着很大的困难.确定性多线程系统可以使多线程程序以确定的方式执行,即多次执行同一个多线程程序的顺序和结果是相同的,这可以大大简化多线程程序的排错和调试.但是,确定性多线程系统会导致多线程程序性能的下降.本文提出一种基于长并行距离优先的确定性多线程调度算法,优先执行并行距离长的线程,减少线程总体等待时间,从而提高多线程程序的效率.实验结果表明,本文方法可以使多线程程序的性能提升10%,并且具有很好的可扩展性.  相似文献   

8.
The recent advent of multithreaded architectures holds many promises: the exploitation of intrathread locality and the latency tolerance of multithreaded synchronization can result in a more efficient processor utilization and higher scalability. The challenge for a code generation scheme is to make effective use of the underlying hardware by generating large threads with a large degree of internal locality without limiting the program level parallelism or increasing latency. Top-down code generation, where threads are created directly from the compiler's intermediate form, is effective at creating a relatively large thread. However, having only a limited view of the code at any one time limits the quality of threads generated. These top-down generated threads can therefore be optimized by global, bottom-up optimization techniques. In this paper, we introduce the Pebbles multithreaded model of computation and analyze a code generation scheme whereby top-down code generation is combined with bottom-up optimizations. We evaluate the effectiveness of this scheme in terms of overall performance and specific thread characteristics such as size, length, instruction level parallelism, number of inputs, and synchronization costs.  相似文献   

9.
周广川 《现代计算机》2011,(3):28-30,47
对多线程应用程序进行调试是一项具有挑战性的任务。多线程应用程序采用的互斥、同步技术使得调试时查看程序运行状态变得困难,线程的时序和多个线程间的交叉执行增加了程序调试的复杂性。采用适用于调试多线程应用程序的通用技术,并结合Visual Studio调试器提供的工具可以有效调试多线程应用程序。  相似文献   

10.
11.
Distributed concurrent computing based on lightweight processes can potentially address performance and functionality limits in heterogeneous systems. The TPVM framework, based on the notion of ‘exportable services’, is an extension to the PVM message-passing system, but uses threads as units of computing, scheduling, and parallelism. TPVM facilitates and supports three different distributed concurrent programming paradigms: (a) the traditional, task based, explicit message-passing model; (b) a data-driven instantiation model that enables straightforward specification of computation based on data dependencies; and (c) a partial shared-address space model via remote memory access, with naming and typing of distributed data areas. The latter models offer significantly different computing paradigms for network-based computing, while maintaining a close resemblance to, and building upon, the conventional PVM infrastructure in the interest of compatibility and ease of transition. The TPVM system comprises three basic modules: a library interface that provides access to thread-based distributed concurrent computing facilities, a portable thread interface module which abstracts the required thread-related services, and a thread server module which performs scheduling and system data management. System implementation as well as applications experiences have been very encouraging, indicating the viability of the proposed models, the feasibility of portable and efficient threads systems for distributed computing, and the performance improvements that result from multithreaded concurrent computing. © 1998 John Wiley & Sons, Ltd.  相似文献   

12.
Hardware interrupts are widely used in the world’s critical software systems to support preemptive threads, device drivers, operating system kernels, and hypervisors. Handling interrupts properly is an essential component of low-level system programming. Unfortunately, interrupts are also extremely hard to reason about: they dramatically alter the program control flow and complicate the invariants in low-level concurrent code (e.g., implementation of synchronization primitives). Existing formal verification techniques—including Hoare logic, typed assembly language, concurrent separation logic, and the assume-guarantee method—have consistently ignored the issues of interrupts; this severely limits the applicability and power of today’s program verification systems. In this paper we present a novel Hoare-logic-like framework for certifying low-level system programs involving both hardware interrupts and preemptive threads. We show that enabling and disabling interrupts can be formalized precisely using simple ownership-transfer semantics, and the same technique also extends to the concurrent setting. By carefully reasoning about the interaction among interrupt handlers, context switching, and synchronization libraries, we are able to—for the first time—successfully certify a preemptive thread implementation and a large number of common synchronization primitives. Our work provides a foundation for reasoning about interrupt-based kernel programs and makes an important advance toward building fully certified operating system kernels and hypervisors.  相似文献   

13.
Modeling Multithreaded Applications Using Petri Nets   总被引:2,自引:0,他引:2  
Since most modern computing systems contain multiple processing elements, applications are relying on multithreaded programming techniques that allow a program to execute multiple tasks concurrently to take advantage of the processing capabilities. Multithreaded programs are more difficult to design and test because of the nondeterministic execution orders and synchronization among the threads. Different approaches can be used to test Multithreaded Applications. In our approach we use Petri nets to represent the key elements of interactions among threads to identify potential problems such as race conditions, lost signals, and deadlocks. A tool called C2Petri has been developed which converts C-Pthreads programs to the equivalent Petri net model. This tool helps verification of Pthread-based programs. At present the tool has limited capabilities and we hope to expand the capabilities of our tool in the near future.  相似文献   

14.
In modern programming languages, concurrency control can be traced back to one of two different schools: actor-based message passing concurrency and thread-based shared-state concurrency. This paper describes a linguistic symbiosis between two programming languages with such different concurrency models. More specifically, we describe a novel symbiosis between actors represented as event loops on the one hand and threads on the other. This symbiosis ensures that the invariants of the actor-based concurrency model are not violated by engaging in symbiosis with multithreaded programs. The proposed mapping is validated by means of a concrete symbiosis between AmbientTalk, a flexible, domain-specific language for writing distributed programs and Java, a conventional object-oriented language. This symbiosis allows the domain-specific language to reuse existing software components written in a multithreaded language without sacrificing the beneficial event-driven properties of the actor concurrency model.  相似文献   

15.
An important aspect of understanding the behavior of applications with respect to their performance, overhead, and scalability characteristics is knowledge of their execution control flow. High level knowledge of which functions or constructs were executed after which other constructs allows reasoning about temporal application characteristics such as cache reuse. This paper describes an approach to capture and visualize the execution control flow of OpenMP applications in a compact way. Our approach does not require a full trace of program execution events but is instead based on a straightforward extension to the summary data already collected by an existing profiling tool. In multithreaded applications each thread may define its own independent flow of control, complicating both the recording as well as the visualization of the execution dynamics. Our approach allows for the full flexibility with respect to independent threads. However, the most common usage models of OpenMP have threads operate in a largely uniform way, synchronizing frequently at sequence points and diverging only to operate on different data items in worksharing constructs. Our approach accounts for this by offering a simplified representation of the execution control flow for threads with similar behavior.  相似文献   

16.
JavaSplit is a portable runtime environment for distributed execution of standard multithreaded Java programs. It gains augmented computational power and increased memory capacity by distributing the threads and objects of an application among the available machines. Unlike previous works, which either forfeit Java portability by using a nonstandard Java Virtual Machine (JVM) or compromise transparency by requiring user intervention in making the application suitable for distributed execution, JavaSplit automatically executes standard multithreaded Java on any heterogenous collection of Java-enabled machines. Each machine carries out its part of the computation using nothing but its local standard (unmodified) JVM. Neither the programmer nor the person submitting the program for execution needs to be aware of JavaSplit or its distributed nature. We evaluate the efficiency of JavaSplit on several combinations of operating systems, JVM implementations, and communication hardware.  相似文献   

17.
A number of highly-threaded, many-core architectures hide memory-access latency by low-overhead context switching among a large number of threads. The speedup of a program on these machines depends on how well the latency is hidden. If the number of threads were infinite, theoretically, these machines could provide the performance predicted by the PRAM analysis of these programs. However, the number of threads per processor is not infinite, and is constrained by both hardware and algorithmic limits. In this paper, we introduce the Threaded Many-core Memory (TMM) model which is meant to capture the important characteristics of these highly-threaded, many-core machines. Since we model some important machine parameters of these machines, we expect analysis under this model to provide a more fine-grained and accurate performance prediction than the PRAM analysis. We analyze 4 algorithms for the classic all pairs shortest paths problem under this model. We find that even when two algorithms have the same PRAM performance, our model predicts different performance for some settings of machine parameters. For example, for dense graphs, the dynamic programming algorithm and Johnson’s algorithm have the same performance in the PRAM model. However, our model predicts different performance for large enough memory-access latency and validates the intuition that the dynamic programming algorithm performs better on these machines. We validate several predictions made by our model using empirical measurements on an instantiation of a highly-threaded, many-core machine, namely the NVIDIA GTX 480.  相似文献   

18.
Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems. Performance prediction of MPI programs on current or future parallel systems can help to find system bottleneck or optimize programs. To effectively analyze and predict performance of a large and complex MPI program, an efficient and accurate communication model is highly needed. A series of communication models have been proposed, such as the LogP model family, which assume th...  相似文献   

19.
20.
基于.NET线程间通讯技术的应用   总被引:1,自引:0,他引:1  
随着多核技术的发展,多线程程序设计越来越引起人们的重视,在多线程程序中线程通信是必不可少的。在基于windows多线程程序中,线程之间通信解决中,存在前台GUI界面线程假死、其中一个线程可能会修改另一个线程间的内部数据等影响性能和安全的问题。本文就是针对这些问题进行讨论并给出了相应的解决方案。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号