期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Implementation of a dynamic adjustment strategy for parallel file transfer in co-allocation data grids

Chao-Tung Yang Shih-Yu Wang William Cheng-Chung Chu 《The Journal of supercomputing》2010,54(2):180-205

Co-allocation architecture was developed to enable parallel transferring of files from multiple replicas stored in the different servers. Several co-allocation strategies have been coupled and used to exploit the different transfer rates among various client-server links and to address dynamic rate fluctuations by dividing files into multiple blocks of equal sizes. The paper presents a dynamic file transfer scheme, called dynamic adjustment strategy (DAS), for co-allocation architecture in concurrently transferring a file from multiple replicas stored in multiple servers within a data grid. The scheme overcomes the obstacle of transfer performance due to idle waiting time of faster servers in co-allocation based file transfers and, therefore, provides reduced file transfer time. A tool with user friendly interface that can be used to manage replicas and downloading in a data grid environment is also described. Experimental results show that our DAS can obtain high-performance file transfer speed and reduce the time cost of reassembling data blocks. 相似文献

2.

Analogue algorithm for parallel factorization of an exponential number of large integers: II—optical implementation

Tamma Vincenzo 《Quantum Information Processing》2016,15(12):5243-5257

Quantum Information Processing - We report a detailed analysis of the optical realization of the analogue algorithm described in the first paper of this series (Tamma in Quantum Inf Process... 相似文献

3.

BLAZE™: An implementation of the Smith-Waterman sequence comparison algorithm on a massively parallel computer

《Computers & chemistry》1993,17(2):203-207

We have implemented the Smith and Waterman dynamic programming algorithm on the massively parallel MP1104 computer from MasPar and compared its ability to detect remote protein sequence homologies with that of other commonly used database search algorithms. Dynamic programming algorithms are normally too computer intensive to permit full databases search, however on the MP1104 a search of the Swiss-Prot database takes about 15 s. This nearly interactive speed of database searching permits one to optimize the parameters for each query. Most of the common database search methods (FASTA, FASTDB and BLAST) gain their speed by using approximations such as word matching or eliminating gaps from the alignments which prevents them from detecting remote homologies. By using queries from protein super families containing a large number of family members of diverse similarities, we have measured the ability of each of these algorithms to detect the remotest members of each super family. Using these super families, we have found that the algorithms, in order of decreasing sensitivity are BLAZE, FASTDB, FASTA and BLAST. Hence the massively parallel computers allow one to have maximal sensitivity and search speed simultaneously. 相似文献

4.

OTIS-MOT: an efficient interconnection network for parallel processing

Prasanta K. Jana Dheeresh K. Mallick 《The Journal of supercomputing》2012,59(2):920-940

Mesh of trees (MOT) is well known for its small diameter, high bisection width, simple decomposability and area universality. On the other hand, OTIS (Optical Transpose Interconnection System) provides an efficient optoelectronic model for massively parallel processing system. In this paper, we present OTIS-MOT as a competent candidate for a two-tier architecture that can take the advantages of both the OTIS and the MOT. We show that an n⁴_-n^{4}_{-} processor OTIS-MOT has diameter 8log n ^∗+1 (The base of the logarithm is assumed to be 2 throughout this paper.) and fault diameter 8log n+2 under single node failure. We establish other topological properties such as bisection width, multiple paths and the modularity. We show that many communication as well as application algorithms can run on this network in comparable time or even faster than other similar tree-based two-tier architectures. The communication algorithms including row/column-group broadcast and one-to-all broadcast are shown to require O(log n) time, multicast in O(n ²log n) time and the bit-reverse permutation in O(n) time. Many parallel algorithms for various problems such as finding polynomial zeros, sales forecasting, matrix-vector multiplication and the DFT computation are proposed to map in O(log n) time. Sorting and prefix computation are also shown to run in O(log n) time. 相似文献

5.

In-advance path reservation for file transfers in e-science applications

Yan Li Sanjay Ranka Sartaj Sahni 《The Journal of supercomputing》2012,59(3):1167-1187

We develop several multipath reservation algorithms for in-advance scheduling of single and multiple file transfers in connection-oriented optical networks. These algorithms consider the jobs one at a time or in a batch. The latter can be potentially useful to minimize the resource conflict between multiple consecutive requests. Extensive simulations using both real world networks and random topologies show that the greedy strategy, which process requests one at a time, can perform comparable to batch scheduling and is significantly better in terms of computational time requirement. Further, this strategy can be extended to reduce the path switching overheads. 相似文献

6.

Learning implicit user interest hierarchy for context in personalization 总被引：1，自引：1，他引：1

Hyoung-Rae Kim Philip K. Chan 《Applied Intelligence》2008,28(2):153-166

To provide a more robust context for personalization, we desire to extract a continuum of general to specific interests of a user, called a user interest hierarchy (UIH). The higher-level interests are more general, while the lower-level interests are more specific. A UIH can represent a user’s interests at different abstraction levels and can be learned from the contents (words/phrases) in a set of web pages bookmarked by a user. We propose a divisive hierarchical clustering (DHC) algorithm to group terms (topics) into a hierarchy where more general interests are represented by a larger set of terms. Our approach does not need user involvement and learns the UIH “implicitly”. To enrich features used in the UIH, we used phrases in addition to words. Our experiment indicates that DHC with the Augmented Expected Mutual Information (AEMI) correlation function and MaxChildren threshold-finding method built more meaningful UIHs than the other combinations on average; using words and phrases as features improved the quality of UIHs. 相似文献

7.

On-line scheduling on an unbounded parallel batch machine to minimize makespan of two families of jobs

Ruyan Fu Ji Tian Jinjiang Yuan 《Journal of Scheduling》2009,12(1):91-97

We study the on-line scheduling on an unbounded parallel batch machine to minimize makespan of two families of jobs. In this model, jobs arrive over time and jobs from different families cannot be scheduled in a common batch. We provide a best possible on-line algorithm for the problem with competitive ratio . Research supported by NSFC (10671183), NFSC-RGC (70731160633) and SRFDP (20070459002). 相似文献

8.

A multicriteria approach to two-level hierarchy scheduling in grids

Krzysztof Kurowski Jarek Nabrzyski Ariel Oleksiak Jan Węglarz 《Journal of Scheduling》2008,11(5):371-379

In this paper we address a multicriteria scheduling problem for computational Grid systems. We focus on the two-level hierarchical Grid scheduling problem, in which at the first level (the Grid level) a Grid broker makes scheduling decisions and allocates jobs to Grid nodes. Jobs are then sent to the Grid nodes, where local schedulers generate local schedules for each node accordingly. A general approach is presented taking into account preferences of all the stakeholders of Grid scheduling (end-users, Grid administrators, and local resource providers) and assuming a lack of knowledge about job time characteristics. A single-stakeholder, single-criterion version of the approach has been compared experimentally with the existing approaches. 相似文献

9.

An efficient implementation of Bailey and Borwein’s algorithm for parallel random number generation on graphics processing units

Gleb Beliakov Michael Johnstone Doug Creighton Tim Wilkin 《Computing》2013,95(4):309-326

Pseudorandom number generators are required for many computational tasks, such as stochastic modelling and simulation. This paper investigates the serial and parallel implementation of a Linear Congruential Generator for Graphics Processing Units (GPU) based on the binary representation of the normal number $\alpha _{2,3}$ . We adapted two methods of modular reduction which allowed us to perform most operations in 64-bit integer arithmetic, improving on the original implementation based on 106-bit double-double operations, which resulted in four-fold increase in efficiency. We found that our implementation is faster than existing methods in literature, and our generation rate is close to the limiting rate imposed by the efficiency of writing to a GPU’s global memory. 相似文献

10.

An inspectable qualitative model builder for distribution networks — An example in power distribution systems

《Artificial Intelligence in Engineering》1990,5(3):153-160

This paper presents a Model Builder for generating inspectable qualitative models of distribution networks. An implemented example in the domain of power distribution systems is described. The research is an extended result of the project to develop a model-based generic power distribution training system. The Model Builder has two prominent features: (1) a systematic approach to building inspectable qualitative models of distribution networks; and (2) a high-level user interface to enable non-AI personnel to create these models without programming. The implementation was done in LISP, effectively combining the object-oriented programming paradigm and a general-purpose graphics editor together in a unified environment. This research contributes to an improved understanding of methodologies for building inspectable qualitative models for a wide variety of distribution networks. 相似文献

11.

Scalable parallel word search in multicore/multiprocessor systems

Frank Drews Jens Lichtenberg Lonnie Welch 《The Journal of supercomputing》2010,51(1):58-75

This paper presents a parallel algorithm for fast word search to determine the set of biological words of an input DNA sequence. The algorithm is designed to scale well on state-of-the-art multiprocessor/multicore systems for large inputs and large maximum word sizes. The pattern exhibited by many sequential solutions to this problem is a repetitive execution over a large input DNA sequence, and the generation of large amounts of output data to store and retrieve the words determined by the algorithm. As we show, this pattern does not lend itself to straightforward standard parallelization techniques. The proposed algorithm aims to achieve three major goals to overcome the drawbacks of embarrassingly parallel solution techniques: (i) to impose a high degree of cache locality on a problem that, by nature, tends to exhibit nonlocal access patterns, (ii) to be lock free or largely reduce the need for data access locking, and (iii) to enable an even distribution of the overall processing load among multiple threads. We present an implementation and performance evaluation of the proposed algorithm on DNA sequences of various sizes for different organisms on a dual processor quad-core system with a total of 8 cores. We compare the performance of the parallel word search implementation with a sequential implementation and with an embarrassingly parallel implementation. The results show that the proposed algorithm far outperforms the embarrassingly parallel strategy and achieves a speed-up’s of up to 6.9 on our 8-core test system. 相似文献

12.

An evaluation method of outsourcing services for developing an elastic cloud platform

Wanchun Dou Lianyong Qi Xuyun Zhang Jinjun Chen 《The Journal of supercomputing》2013,63(1):1-23

To gain and retain competitive advantages in a competitive business arena, a business cloud-computing platform should continuously strive to offer new services and remain competitive. Unfortunately, it becomes more and more recognized by the industry that a cloud-computing platform could not cover all aspects of IT layers engaged in infrastructure, platform and application. In practice, end users’ requests are nearly unlimited; while the services held by a cloud-computing platform is relatively limited, no matter in service category or in service capacity. In view of this challenge, an elastic cloud platform is investigated by recruited outside services that are absent from the cloud platform. Concretely, through dynamically hiring a qualified service on Internet to replace the absent service inside a cloud platform, an elastic cloud platform could nearly provide unlimited capabilities in an outsourcing service way, e.g., computing power, storage, application functions, etc. At last, the validity of the method is evaluated by a case study. 相似文献

13.

Energy efficient scheduling of parallel tasks on multiprocessor computers 总被引：1，自引：1，他引：1

Keqin Li 《The Journal of supercomputing》2012,60(2):223-247

In this paper, scheduling parallel tasks on multiprocessor computers with dynamically variable voltage and speed are addressed as combinatorial optimization problems. Two problems are defined, namely, minimizing schedule length with energy consumption constraint and minimizing energy consumption with schedule length constraint. The first problem has applications in general multiprocessor and multicore processor computing systems where energy consumption is an important concern and in mobile computers where energy conservation is a main concern. The second problem has applications in real-time multiprocessing systems and environments where timing constraint is a major requirement. Our scheduling problems are defined such that the energy-delay product is optimized by fixing one factor and minimizing the other. It is noticed that power-aware scheduling of parallel tasks has rarely been discussed before. Our investigation in this paper makes some initial attempt to energy-efficient scheduling of parallel tasks on multiprocessor computers with dynamic voltage and speed. Our scheduling problems contain three nontrivial subproblems, namely, system partitioning, task scheduling, and power supplying. Each subproblem should be solved efficiently, so that heuristic algorithms with overall good performance can be developed. The above decomposition of our optimization problems into three subproblems makes design and analysis of heuristic algorithms tractable. A unique feature of our work is to compare the performance of our algorithms with optimal solutions analytically and validate our results experimentally, not to compare the performance of heuristic algorithms among themselves only experimentally. The harmonic system partitioning and processor allocation scheme is used, which divides a multiprocessor computer into clusters of equal sizes and schedules tasks of similar sizes together to increase processor utilization. A three-level energy/time/power allocation scheme is adopted for a given schedule, such that the schedule length is minimized by consuming given amount of energy or the energy consumed is minimized without missing a given deadline. The performance of our heuristic algorithms is analyzed, and accurate performance bounds are derived. Simulation data which validate our analytical results are also presented. It is found that our analytical results provide very accurate estimation of the expected normalized schedule length and the expected normalized energy consumption and that our heuristic algorithms are able to produce solutions very close to optimum. 相似文献

14.

Comparison of parallel multi-objective approaches to protein structure prediction

J. C. Calvo J. Ortega M. Anguita 《The Journal of supercomputing》2011,58(2):253-260

Protein structure prediction (PSP) is an open problem with many useful applications in disciplines such as medicine, biology and biochemistry. As this problem presents a vast search space and the analysis of each protein structure requires a significant amount of computing time, it is necessary to take advantage of high-performance parallel computing platforms as well as to define efficient search procedures in the space of possible protein conformations. In this paper we compare two parallel procedures for PSP which are based on different multi-objective optimization approaches, i.e. PAES (Knowles and Corne in Proc. Congr. Evol. Comput. 1:98–105, 1999) and NSGA2 (Deb et al. in IEEE Trans. Evol. Comput. 6:182–197, 2002). Although both procedures include techniques to take advantage of known protein structures and strategies to simplify the search space through the so-called rotamer library and adaptive mutation operators, they present different profiles with respect to their implicit parallelism. 相似文献

15.

Hybrid approach of parallel implementation on CPU–GPU for high-speed ECDSA verification

Lee Sokjoon Seo Hwajeong Kwon Hyeokchan Yoon Hyunsoo 《The Journal of supercomputing》2019,75(8):4329-4349

The Journal of Supercomputing - Since the advent of deep belief network deep learning technology in 2006, artificial intelligence technology has been utilized in various convergence areas, such as... 相似文献

16.

Lost in translation? An actor-network approach to HRIS implementation

Kristine Dery Richard Hall Nick Wailes Sharna Wiblen 《The Journal of Strategic Information Systems》2013,22(3):225-237

Available evidence suggests that the adoption of IT-enabled Human Resource Information Systems (HRIS) has not produced the widely predicted transformation of Human Resources (HR) to a strategic business partner. We examine the relationship between HRIS and the HR function by applying actor-network theory (ANT) to an HRIS implementation project. The focus on how actor networks are formed and reformed during implementation may be particularly well suited to explaining why the original aims of the HRIS can be displaced or lost in translation. We suggest that the approach afforded by ANT enables us to better understand the ongoing and contingent process of HRIS implementations. 相似文献

17.

Efficient implementation of tight response-times for tasks with offsets 总被引：1，自引：1，他引：1

Jukka Mäki-Turja Mikael Nolin 《Real-Time Systems》2008,40(1):77-116

Earlier approximate response time analysis (RTA) methods for tasks with offsets (transactional task model) exhibit two major deficiencies: (i) They overestimate the calculated response times resulting in an overly pessimistic result. (ii) They suffer from time complexity problems resulting in an RTA method that may not be applicable in practice. This paper shows how these two problems can be alleviated and combined in one single fast-and-tight RTA method that combines the best of worlds, high precision response times and a fast approximate RTA method. Simulation studies, on randomly generated task sets, show that the response time improvement is significant, typically about 15% tighter response times in 50% of the cases, resulting in about 12% higher admission probability for low priority tasks subjected to admission control. Simulation studies also show that speedups of more than two orders of magnitude, for realistically sized tasks sets, compared to earlier RTA analysis techniques, can be obtained. Other improvements such as Palencia Gutiérrez, González Harbour (Proceedings of the 20th IEEE real-time systems symposium (RTSS), pp. 328–339, 1999), Redell (Technical Report TRITA-MMK 2003:4, Dept. of Machine Design, KTH, 2003) are orthogonal and complementary which means that our method can easily be incorporated also in those methods. Hence, we conclude that the fast-and-tight RTA method presented is the preferred analysis technique when tight response-time estimates are needed, and that we do not need to sacrifice precision for analysis speed; both are obtained with one single method.

Mikael NolinEmail:

相似文献

18.

On a high-order compact scheme and its utilization in parallel solution of a time-dependent system on a distributed memory processor

Okon H. Akpan 《The Journal of supercomputing》2012,60(3):410-419

The focus of this study is the design of a parallel solution method that utilizes a fourth-order compact scheme. The applicability of the method is demonstrated on a time-dependent parabolic system with Neumann boundaries. The core of the parallel computing facilities used in the study is a 2-head-node, 224-compute-node Apple Xserve G5 multiprocessor. The system is first discretized in both time and space such that it remains in its stability regimes, before being solved with the method. The solution requires time marching in which every time step, h_t, calls for a single parallel solve of the intermediary subsystems generated. The solution uses p processors ranging in numbers from 3 to 63. The speedups, s _p, approach their limiting value of p only when p is small. The solution produces good computational results at large p, but poor results as p becomes progressively small. Also, the parallel solution produces accurate results yielding good speedups and efficiencies only when p is within some reasonable range of values. The intermediary systems generated by this method are linear and fine-grained, therefore, they are best suited for solution on massively-parallel processors. The solution method proposed in this study is, therefore, expected to yield more impressive results if applied in a massively-parallel computing environment. 相似文献

19.

A framework for development and evaluation of a dynamic subchannel allocation scheme in an OFDMA system

Banani Roy Michael Einhaus Chanchal Kumar Roy 《The Journal of supercomputing》2009,47(2):198-227

This paper presents a framework for allocating radio resources to the Access Points (APs) introducing an Access Point Controller (APC). Radio resources can be either time slots or subchannels. The APC assigns subchannels to the APs using a dynamic subchannel allocation scheme. The developed framework evaluates the dynamic subchannel allocation scheme for a downlink multicellular Orthogonal Frequency Division Multiple Access (OFDMA) system. In the considered system, each AP and the associated Mobile Terminals (MTs) are not operating on a frequency channel with fixed bandwidth, rather the channel bandwidth for each AP is dynamically adapted according to the traffic load. The subchannels assignment procedure is based on quality estimations due to the interference measurements and the current traffic load. The traffic load estimation is realized with the measurement of the utilization of the assigned radio resources. The reuse partitioning for the radio resources is done by estimating mutual Signal to Interference Ratio (SIR) of the APs. The developed dynamic subchannel allocation ensures Quality of Service (QoS), better traffic adaptability, and higher spectrum efficiency with less computational complexity.

Chanchal Kumar Roy (Corresponding author)Email:

相似文献

20.

Parallel implementation of an efficient preconditioned linear solver for grid-based applications in chemical physics. III: Improved parallel scalability for sparse matrix–vector products

Wenwu ChenAuthor Vitae Bill Poirier 《Journal of Parallel and Distributed Computing》2010

The linear solve problems arising in chemical physics and many other fields involve large sparse matrices with a certain block structure, for which special block Jacobi preconditioners are found to be very efficient. In two previous papers [W. Chen, B. Poirier, Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. I. Block Jacobi diagonalization, J. Comput. Phys. 219 (1) (2006) 185–197; W. Chen, B. Poirier, Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. II. QMR linear solver, J. Comput. Phys. 219 (1) (2006) 198–209], a parallel implementation was presented. Excellent parallel scalability was observed for preconditioner construction, but not for the matrix–vector product itself. In this paper, we introduce a new algorithm with (1) greatly improved parallel scalability and (2) generalization for arbitrary number of nodes and data sizes. 相似文献