期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Boosting the performance of Myrinet networks

Flich J. Lopez P. Malumbres M.P. Duato J. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(7):693-709

Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. These networks allow the customer to connect processors using irregular topologies, providing the wiring flexibility, scalability and incremental expansion capability required in this environment. Some of these networks use source routing and wormhole switching. In particular, we are interested in Myrinet networks because they are a well-known commercial product and their behavior can be controlled by the software running on the network interfaces (the Myrinet Control Program, MCP). Usually, the Myrinet network uses up*/down* routing for computing the paths for every source-destination pair. In this paper, we propose an in-transit buffer (ITB) mechanism to improve the network performance. We apply the ITB mechanism to NOWs with up*/down* source routing, like the Myrinet, analyzing its behavior on networks with both regular and irregular topologies. The proposed scheme can be implemented on Myrinet networks by simply modifying the MCP, without changing the network hardware. We evaluate by simulation several networks with different traffic patterns using timing parameters taken from the Myrinet network. The results show that the current routing schemes used in Myrinet networks can be strongly improved by applying the ITB mechanism. In general, our proposed scheme is able to double the network throughput on medium and large NOWs. Finally, we present a first implementation of the ITB mechanism on a Myrinet network 相似文献

2.

Myrinet communication

Dubnicki C. Bilas A. Yuqun Chen Damianakis S.N. Kai Li 《Micro, IEEE》1998,18(1):50-52

In last year's IEEE Micro special issue on the Hot Interconnects IV Symposium, we discussed our experiences with client-server computing on the Paragon-based Shrimp multicomputer. Since then we implemented the same virtual memory-mapped communication (VMMC) mechanism on the Myrinet-based Shrimp multicomputer (a set of Pentium PCs connected by a Myrinet network). In both cases we achieved protected, user-level, end-to-end performance close to the hardware limits. However, VMMC imposes a copy for high-level connection-oriented communication libraries. Therefore, we extended the model, and designed and built a new implementation. This update reports our latest work with VMMC on the Myrinet-based Shrimp multicomputer, which we call VMMC-2 相似文献

3.

Boosting neural networks 总被引：15，自引：0，他引：15

Schwenk H Bengio Y 《Neural computation》2000,12(8):1869-1887

Boosting is a general method for improving the performance of learning algorithms. A recently proposed boosting algorithm, AdaBoost, has been applied with great success to several benchmark machine learning problems using mainly decision trees as base classifiers. In this article we investigate whether AdaBoost also works as well with neural networks, and we discuss the advantages and drawbacks of different versions of the AdaBoost algorithm. In particular, we compare training methods based on sampling the training set and weighting the cost function. The results suggest that random resampling of the training data is not the main explanation of the success of the improvements brought by AdaBoost. This is in contrast to bagging, which directly aims at reducing variance and for which random resampling is essential to obtain the reduction in generalization error. Our system achieves about 1.4% error on a data set of on-line handwritten digits from more than 200 writers. A boosted multilayer network achieved 1.5% error on the UCI letters and 8.1% error on the UCI satellite data set, which is significantly better than boosted decision trees. 相似文献

4.

Boosting performance of multidimensional tores

V. S. Podlazov 《Automation and Remote Control》2017,78(1):167-179

This paper suggests a speed boosting technique for system area networks in massive parallel multiprocessor computers by decreasing the diameter and increasing the throughput of a pair of opposite simplex rings (a duplex ring), a couple and quadruple of such rings. The result is achieved through replacing a duplex ring with a pair of minimal switched multidimensional rings with different steps in each ring. The decreased diameter and increased throughput of rings appreciably reduce packet delivery delays in system area networks based on the pairs of such rings. 相似文献

5.

Boosting the performance of shared memory multiprocessors

Stenstrom P. Brorsson M. Dahlgren F. Grahn H. Dubois M. 《Computer》1997,30(7):63-70

Proposed hardware optimizations to CC-NUMA machines-shared memory multiprocessors that use cache consistency protocols-can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs for each 相似文献

6.

Boosting MUC extraction in unsatisfiable constraint networks

Éric Grégoire Jean-Marie Lagniez Bertrand Mazure 《Applied Intelligence》2014,41(4):1012-1023

One very fertile domain of applied Artificial Intelligence is constraint solving technologies. Especially, constraint networks that concern problems that can be represented using discrete variables, together with constraints on allowed instantiation values for these variables. Every solution to a constraint network must satisfy every constraint. When no solution exists, the user might want to know the actual reasons leading to the absence of global solution. In this respect, extracting mucs (Minimal Unsatisfiable Cores) from an unsatisfiable constraint network is a useful process when causes of unsatisfiability must be understood so that the network can be re-engineered and relaxed to become satisfiable. Despite bad worst-case computational complexity results, various muc-finding approaches that appear tractable for many real-life instances have been proposed. Many of them are based on the successive identification of so-called transition constraints. In this respect, we show how local search can be used to possibly extract additional transition constraints at each main iteration step. In the general constraint networks setting, the approach is shown to outperform a technique based on a form of model rotation imported from the sat-related technology and that also exhibits additional transition constraints. Our extensive computational experimentations show that this enhancement also boosts the performance of state-of-the-art DC(WCORE)-like MUC extractors. 相似文献

7.

Boosting the accuracy of differentially private in weighted social networks

Wang Dan Long Shigong 《Multimedia Tools and Applications》2019,78(24):34801-34817

Multimedia Tools and Applications - Social network not only helps people to build its internet applicable service, but also collects a large amount of user information (i.e., sensitive data), which... 相似文献

8.

基于Myrinet的用户空间精简协议 总被引：5，自引：0，他引：5

董春雷郑纬民《软件学报》1999,10(3):299-303

通信子系统是影响工作站机群系统整体性能的主要因素.文章在分析和比较了3种常用的网络性能之后,指出上层协议的处理是影响工作站机群系统性能的主要瓶颈.在由640Mbps的Myrinet连接的8台Sun SPARC工作站组成的机群系统上实现了一个用户层的高性能的精简通信协议——RCP(reduced communication protocol).通过精简协议的冗余功能、减少数据拷贝次数和直接操作硬件缓冲区等方法,达到低延迟、高效率.RCP的回路延迟时间比TCP/IP小得多（200μs vs 1 540μs）, 相似文献

9.

Boosting quantum annealer performance via sample persistence

Hamed Karimi Gili Rosenberg 《Quantum Information Processing》2017,16(7):166

We propose a novel method for reducing the number of variables in quadratic unconstrained binary optimization problems, using a quantum annealer (or any sampler) to fix the value of a large portion of the variables to values that have a high probability of being optimal. The resulting problems are usually much easier for the quantum annealer to solve, due to their being smaller and consisting of disconnected components. This approach significantly increases the success rate and number of observations of the best known energy value in samples obtained from the quantum annealer, when compared with calling the quantum annealer without using it, even when using fewer annealing cycles. Use of the method results in a considerable improvement in success metrics even for problems with high-precision couplers and biases, which are more challenging for the quantum annealer to solve. The results are further enhanced by applying the method iteratively and combining it with classical pre-processing. We present results for both Chimera graph-structured problems and embedded problems from a real-world application. 相似文献

10.

Boosting system performance with optimistic distributed protocols

《Computer》2001,34(12):80-86

Optimistic distributed protocols can dramatically improve system performance if the underlying system assumptions are sound and carry a high degree of probability. Optimistic protocols aggressively execute actions based on best-case system assumptions. Using optimistic protocols unquestionably involves tradeoffs, but if a protocol is well designed and the optimistic assumptions hold frequently enough, the gain in performance outweighs the overhead of repairing actions that execute incorrectly. Optimistic distributed protocols can dramatically improve system performance if the underlying system assumptions are sound and carry a high degree of probability 相似文献

11.

Boosting performance of transactional memory through O-GEHL predictors

Ehsan Atoofian 《Microprocessors and Microsystems》2014

Time-based Software Transactional Memory (STM) exploits a global clock to validate transactional data and guarantee consistency of transactions. While this method is simple to implement it results in contentions over the clock if transactions commit simultaneously. The alternative method is thread local clock (TLC) which exploits local variables to maintain consistency of transactions. However, TLC may increase false aborts and degrade performance of STMs. In this paper, we analyze global clock and TLC in the context of STM systems, highlighting both the implementation trade-offs and the performance implications of the two techniques. We demonstrate that neither global clock nor TLC is optimum across applications. To counter this challenge, we introduce two optimization techniques: The first optimization technique is Adaptive Clock (AC) which dynamically selects one of the two validation techniques based on probability of conflicts. AC is a speculative approach and relies on software O-GEHL predictors to speculate future conflicts. The second optimization technique is AC+ which reduces timing overhead of O-GEHL predictors by implementing the predictors in hardware. In addition, we exploit information theory to eliminate unnecessary computational resources and reduce storage requirements of the O-GEHL predictors. Our evaluation with TL2 and Stamp benchmark suite reveals that AC is effective and improves execution time of transactional applications up to 65%. 相似文献

12.

基于Myrinet上MPICH-2的实现

杨开济徐凤燕马允胜《计算机工程与设计》2007,28(13):3115-3118

在当前并行计算环境中的通信网络中,MPICH-1并行系统可以使用Internet和Myrinet千兆位包交换网络,而MPICH-2并行系统只能使用Internet,由于通信时间的限制而影响了整个系统性能.对MPICH-1和MPICH-2中的作业提交模式进行了研究,给出了一种在MPICH-2中使用Myrinet网络来提交作业的应用,从而达到减少了通信时间的目的. 相似文献

13.

Virtual network transport protocols for Myrinet

Chun B.N. Mainwaring A.M. Culler D.E. 《Micro, IEEE》1998,18(1):53-63

Bringing direct and protected network multiprogramming into mainstream cluster computing requires innovations in three key areas: application programming interfaces, network virtualization systems, and lightweight communication protocols for high-speed interconnects. The AM-II API extends traditional active messages with support for client-server computing and facilitates the construction of parallel clients and distributed servers. Our virtual network segment driver enables a large number of arbitrary sequential and parallel applications to access network interface resources directly in a concurrent but fully protected manner. The NIC-to-NIC communication protocols provide reliable and at-most-once message delivery between communication endpoints. The NIC-to-NIC protocols perform well as the number of endpoints and the number of hosts in the cluster are scaled. The flexibility afforded by the underlying protocols enables a diverse set of timely research efforts. Other Berkeley researchers are actively using this system to investigate implicit techniques for the coscheduling of communicating processes, an essential part of high-performance communications in multiprogrammed clusters of uni- and multiprocessor servers. Other researchers are extending the active message protocols described here for clusters of symmetric multiprocessors, using so-called multiprotocol techniques and multiple network interfaces per machine 相似文献

14.

Boosting paraphrase detection through textual similarity metrics with abductive networks

《Applied Soft Computing》2015

A number of metrics have been proposed in the literature to measure text re-use between pairs of sentences or short passages. These individual metrics fail to reliably detect paraphrasing or semantic equivalence between sentences, due to the subjectivity and complexity of the task, even for human beings. This paper analyzes a set of five simple but weak lexical metrics for measuring textual similarity and presents a novel paraphrase detector with improved accuracy based on abductive machine learning. The objective here is 2-fold. First, the performance of each individual metric is boosted through the abductive learning paradigm. Second, we investigate the use of decision-level and feature-level information fusion via abductive networks to obtain a more reliable composite metric for additional performance enhancement. Several experiments were conducted using two benchmark corpora and the optimal abductive models were compared with other approaches. Results demonstrate that applying abductive learning has significantly improved the results of individual metrics and further improvement was achieved through fusion. Moreover, building simple models of polynomial functional elements that identify and integrate the smallest subset of relevant metrics yielded better results than those obtained from the support vector machine classifiers utilizing the same datasets and considered metrics. The results were also comparable to the best result reported in the literature even with larger number of more powerful features and/or using more computationally intensive techniques. 相似文献

15.

Myrinet: a gigabit-per-second local area network 总被引：2，自引：0，他引：2

Boden N.J. Cohen D. Felderman R.E. Kulawik A.E. Seitz C.L. Seizovic J.N. Wen-King Su 《Micro, IEEE》1995,15(1):29-36

The Myrinet local area network employs the same technology used for packet communication and switching within massively parallel processors. In realizing this distributed MPP network, we developed specialized communication channels, cut-through switches, host interfaces, and software. To our knowledge, Myrinet demonstrates the highest performance per unit cost of any current LAN 相似文献

16.

Boosting trainees' expectations of success through knowledge of performance norms

Karen T. Hilling Andrew J. Tattersall 《Behaviour & Information Technology》1997,16(6):363-364

A field study of the role of performance norms in computer training shows that norms have an influential impact on expectations and anticipated satisfaction. 相似文献

17.

基于Myrinet/GM的多通道通信 总被引：1，自引：0，他引：1

张继超舒继武郑纬民常迪《软件学报》2003,14(2):278-284

通信子系统对并行系统的计算效率有重要影响,大规模应用对并行平台的通信性能和可用性提出了挑战性的要求.多通道通信技术通过并行采用多路网络链路互连来提高并行系统通信性能和可用性.首先分析了多进程复用网络对通信性能的影响,然后以Myrinet/GM网络平台为基础,提出了基于网络接口层的通信链路动态选择与分配策略,设计和实现了支持多路Myrinet网络并行通信的协议层MNC.MNC支持通信进程平等,充分地利用多路Myrinet网络链路资源.在使用2路Myrinet互连的PC机群平台上,MNC进程间通信带宽相对于单链路提高了约34%,有效地提高了应用层通信性能. 相似文献

18.

On the performance of multicomputer interconnection networks

《Journal of Systems Architecture》2004,50(9):563-574

Several researchers have analysed the performance of k-ary n-cubes taking into account channel bandwidth constraints imposed by implementation technology, namely the constant wiring density and pin-out constraints for VLSI and multiple-chip technology respectively. For instance, Dally [IEEE Trans. Comput. 39(6) (1990) 775], Abraham [Issues in the architecture of direct interconnection networks schemes for multiprocessors, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1992], and Agrawal [IEEE Trans. Parallel Distributed Syst. 2(4) (1991) 398] have shown that low-dimensional k-ary n-cubes (known as tori) outperform their high-dimensional counterparts (known as hypercubes) under the constant wiring density constraint. However, Abraham and Agrawal have arrived at an opposite conclusion when they considered the constant pin-out constraint. Most of these analyses have assumed deterministic routing, where a message always uses the same network path between a given pair of nodes. More recent multicomputers have incorporated adaptive routing to improve performance. This paper re-examines the relative performance merits of the torus and hypercube in the context of adaptive routing. Our analysis reveals that the torus manages to exploit its wider channels under light traffic. As traffic increases, however, the hypercube can provide better performance than the torus. Our conclusion under the constant wiring density constraint is different from that of the works mentioned above because adaptive routing enables the hypercube to exploit its richer connectivity to reduce message blocking. 相似文献

19.

Measurement and Prediction of Communication Delays in Myrinet Networks

《Journal of Parallel and Distributed Computing》2001,61(11):1692-1704

This paper describes a series of experiments carried out to determine if it is possible to accurately predict the delays of inter-node communication in a PC cluster system interconnected with a Myrinet switch network. Prediction accuracy is affected not only by the software and hardware overhead involved in network communication, but also interference from concurrent message streams. Based on extensive measurements using a 14-node Myrinet cluster system, it is determined that (1) the simple linear model typically used to model communication delay in networks is insufficient and (2) communication delay behavior with n message streams sharing a common link is more complicated than a simple divide-by-n solution. A piecewise-linear model, based on parameters obtained through experiments, is proposed as a more accurate communication delay prediction method when there is no sharing of communication links. However, if two or more message streams share a common link, then the communication delay is more accurately predicted as being one of a set of discrete values. 相似文献

20.

Boosting performance in attack intention recognition by integrating multiple techniques

Hao Bai Kunsheng Wang Changzhen Hu Gang Zhang Xiaochuan Jing 《Frontiers of Computer Science in China》2011,5(1):109-118

Recognizing attack intention is crucial for security analysis. In recent years, a number of methods for attack intention recognition have been proposed. However, most of these techniques mainly focus on the alerts of an intrusion detection system and use algorithms of low efficiency that mine frequent attack patterns without reconstructing attack paths. In this paper, a novel and effective method is proposed, which integrates several techniques to identify attack intentions. Using this method, a Bayesian-based attack scenario is constructed, where frequent attack patterns are identified using an efficient data-mining algorithm based on frequent patterns. Subsequently, attack paths are rebuilt by recorrelating frequent attack patterns mined in the scenario. The experimental results demonstrate the capability of our method in rebuilding attack paths, recognizing attack intentions as well as in saving system resources. Specifically, to the best of our knowledge, the proposed method is the first to correlate complementary intrusion evidence with frequent pattern mining techniques based on the FP-Growth algorithm to rebuild attack paths and to recognize attack intentions. 相似文献