期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quantitative characterization and analysis of the I/O behavior of acommercial distributed-shared-memory machine

Bordawekar R.R. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(5):509-526

This paper presents a unified evaluation of the I/O behavior of a commercial clustered DSM machine, the HP Exemplar. Our study has the following objectives: 1) To evaluate the impact of different interacting system components, namely, architecture, operating system, and programming model, on the overall I/O behavior and identify possible performance bottlenecks, and 2) To provide hints to the users for achieving high out-of-box I/O throughput. We find that for the DSM machines that are built as a cluster of SMP nodes, integrated clustering of computing and I/O resources, both hardware and software, is not advantageous for two reasons. First, within an SMP node, the I/O bandwidth is often restricted by the performance of the peripheral components and cannot match the memory bandwidth. Second, since the I/O resources are shared as a global resource, the file-access costs become nonuniform and the I/O behavior of the entire system, in terms of both scalability and balance, degrades. We observe that the buffered I/O performance is determined not only by the I/O subsystem, but also by the programming model, global-shared memory subsystem, and data-communication mechanism. Moreover, programming-model support can be used effectively to overcome the performance constraints created by the architecture and operating system. For example, on the HP Exemplar, users can achieve high I/O throughput by using features of the programming model that balance the sharing and locality of the user buffers and file systems. Finally, we believe that at present, the I/O subsystems are being designed in isolation, and there is a need for mending the traditional memory-oriented design approach to address this problem 相似文献

2.

Compilation and Communication Strategies for Out-of-Core Programs on Distributed Memory Machines

Rajesh Bordawekar Alok Choudhary J. Ramanujam 《Journal of Parallel and Distributed Computing》1996,38(2):277

It is widely acknowledged that improving parallel I/O performance is critical for widespread adoption of high performance computing. In this paper, we show that communication in out-of-core distributed memory problems may require both interprocessor communication and file I/O. Thus, in order to improve I/O performance, it is necessary to minimize the I/O costs associated with a communication step. We present three methods for performing communication in out-of-core distributed memory problems. The first method, called thegeneralized collective communicationmethod, follows a loosely synchronous model; computation and communication phases are clearly separated, and communication requires permutation of data in files. The second method, called thereceiver-driven in-core communication, communicates only the in-core data. The third method, called theowner-driven in-core communication, goes even one step further and tries to identify the potential future use of data (by the recipients) while it is in the senders memory. We provide performance results for two out-of-core applications: the two-dimensional FFT code, and the two-dimensional elliptic Jacobi solver. 相似文献

3.

An algorithm for partitioning trees augmented with sibling edges

Rajesh Bordawekar Oded Shmueli 《Information Processing Letters》2008,108(3):136-142

We investigate a special case of the graph partitioning problem: the partitioning of a sibling graph which is an ordered tree augmented with edges connecting consecutive nodes that share a common parent. We describe the algorithm, XS, and present a proof of its correctness. 相似文献

4.

CellJoin: a parallel stream join operator for the cell processor

Buğra Gedik Rajesh R. Bordawekar Philip S. Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(2):501-519

Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capacity are ideal matches for executing costly DSMS operators. The recently developed Cell processor is a good example of a heterogeneous multi-core architecture and provides a powerful platform for executing data stream operators with high-performance. On the down side, exploiting the full potential of a multi-core processor like Cell is often challenging, mainly due to the heterogeneous nature of the processing elements, the software managed local memory at the co-processor side, and the unconventional programming model in general. In this paper, we study the problem of scalable execution of windowed stream join operators on multi-core processors, and specifically on the Cell processor. By examining various aspects of join execution flow, we determine the right set of techniques to apply in order to minimize the sequential segments and maximize parallelism. Concretely, we show that basic windows coupled with low-overhead pointer-shifting techniques can be used to achieve efficient join window partitioning, column-oriented join window organization can be used to minimize scattered data transfers, delay-optimized double buffering can be used for effective pipelining, rate-aware batching can be used to balance join throughput and tuple delay, and finally single-instruction multiple-data (SIMD) optimized operator code can be used to exploit data parallelism. Our experimental results show that, following the design guidelines and implementation techniques outlined in this paper, windowed stream joins can achieve high scalability (linear in the number of co-processors) by making efficient use of the extensive hardware parallelism provided by the Cell processor (reaching data processing rates of ≈13 GB/s) and significantly surpass the performance obtained form conventional high-end processors (supporting a combined input stream rate of 2,000 tuples/s using 15 min windows and without dropping any tuples, resulting in ≈8.3 times higher output rate compared to an SSE implementation on dual 3.2 GHz Intel Xeon). 相似文献

5.

Positioning for a sustainable future—Role of chemical engineers in transforming pharmaceutical process development

Shailendra Bordawekar Moiz Diwan Nandkishor K. Nere 《American Institute of Chemical Engineers》2021,67(9):e17364

相似文献

6.

Influence of support composition on the structure and reactivity of strontium base catalysts

S.V. Bordawekar E.J. Doskocil R.J. Davis 《Catalysis Letters》1997,44(3-4):193-199

Strontium was supported on a variety of carriers, including silica, alumina, titania and carbon, by impregnation and decomposition of an acetate precursor at 773 K. These supported samples were characterized by surface area measurements, stepwise temperature-programmed desorption of carbon dioxide and activity for the catalytic decomposition of 2-propanol. In some cases, infrared and X-ray absorption spectroscopy were used to identify surface species. Results from these techniques suggested that strontium supported on silica forms a weakly basic surface silicate phase that had low activity for 2-propanol dehydrogenation. On alumina and titania, strontium acetate decomposed to form supported basic carbonates that were moderately active for 2-propanol dehydrogenation. When rates are normalized by the base site density determined from CO₂ desorption, strontium supported on carbon was the most active sample for dehydrogenation of 2-propanol. These results suggest that the nature of supported alkaline earth catalysts is strongly dependent on the composition of the carrier. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献