共查询到20条相似文献,搜索用时 0 毫秒
1.
Chao-Tung Yang Shih-Yu Wang William Cheng-Chung Chu 《The Journal of supercomputing》2010,54(2):180-205
Co-allocation architecture was developed to enable parallel transferring of files from multiple replicas stored in the different
servers. Several co-allocation strategies have been coupled and used to exploit the different transfer rates among various
client-server links and to address dynamic rate fluctuations by dividing files into multiple blocks of equal sizes. The paper
presents a dynamic file transfer scheme, called dynamic adjustment strategy (DAS), for co-allocation architecture in concurrently
transferring a file from multiple replicas stored in multiple servers within a data grid. The scheme overcomes the obstacle
of transfer performance due to idle waiting time of faster servers in co-allocation based file transfers and, therefore, provides
reduced file transfer time. A tool with user friendly interface that can be used to manage replicas and downloading in a data
grid environment is also described. Experimental results show that our DAS can obtain high-performance file transfer speed
and reduce the time cost of reassembling data blocks. 相似文献
2.
Quantum Information Processing - We report a detailed analysis of the optical realization of the analogue algorithm described in the first paper of this series (Tamma in Quantum Inf Process... 相似文献
3.
《Computers & chemistry》1993,17(2):203-207
We have implemented the Smith and Waterman dynamic programming algorithm on the massively parallel MP1104 computer from MasPar and compared its ability to detect remote protein sequence homologies with that of other commonly used database search algorithms. Dynamic programming algorithms are normally too computer intensive to permit full databases search, however on the MP1104 a search of the Swiss-Prot database takes about 15 s. This nearly interactive speed of database searching permits one to optimize the parameters for each query. Most of the common database search methods (FASTA, FASTDB and BLAST) gain their speed by using approximations such as word matching or eliminating gaps from the alignments which prevents them from detecting remote homologies. By using queries from protein super families containing a large number of family members of diverse similarities, we have measured the ability of each of these algorithms to detect the remotest members of each super family. Using these super families, we have found that the algorithms, in order of decreasing sensitivity are BLAZE, FASTDB, FASTA and BLAST. Hence the massively parallel computers allow one to have maximal sensitivity and search speed simultaneously. 相似文献
4.
Mesh of trees (MOT) is well known for its small diameter, high bisection width, simple decomposability and area universality. On the other hand, OTIS (Optical Transpose Interconnection System) provides an efficient optoelectronic model for massively parallel processing system. In this paper, we present OTIS-MOT as a competent candidate for a two-tier architecture that can take the advantages of both the OTIS and the MOT. We show that an n4-n^{4}_{-} processor OTIS-MOT has diameter 8log n ∗+1 (The base of the logarithm is assumed to be 2 throughout this paper.) and fault diameter 8log n+2 under single node failure. We establish other topological properties such as bisection width, multiple paths and the modularity. We show that many communication as well as application algorithms can run on this network in comparable time or even faster than other similar tree-based two-tier architectures. The communication algorithms including row/column-group broadcast and one-to-all broadcast are shown to require O(log n) time, multicast in O(n 2log n) time and the bit-reverse permutation in O(n) time. Many parallel algorithms for various problems such as finding polynomial zeros, sales forecasting, matrix-vector multiplication and the DFT computation are proposed to map in O(log n) time. Sorting and prefix computation are also shown to run in O(log n) time. 相似文献
5.
We develop several multipath reservation algorithms for in-advance scheduling of single and multiple file transfers in connection-oriented
optical networks. These algorithms consider the jobs one at a time or in a batch. The latter can be potentially useful to
minimize the resource conflict between multiple consecutive requests. Extensive simulations using both real world networks
and random topologies show that the greedy strategy, which process requests one at a time, can perform comparable to batch
scheduling and is significantly better in terms of computational time requirement. Further, this strategy can be extended
to reduce the path switching overheads. 相似文献
6.
To provide a more robust context for personalization, we desire to extract a continuum of general to specific interests of
a user, called a user interest hierarchy (UIH). The higher-level interests are more general, while the lower-level interests
are more specific. A UIH can represent a user’s interests at different abstraction levels and can be learned from the contents
(words/phrases) in a set of web pages bookmarked by a user. We propose a divisive hierarchical clustering (DHC) algorithm
to group terms (topics) into a hierarchy where more general interests are represented by a larger set of terms. Our approach
does not need user involvement and learns the UIH “implicitly”. To enrich features used in the UIH, we used phrases in addition
to words. Our experiment indicates that DHC with the Augmented Expected Mutual Information (AEMI) correlation function and
MaxChildren threshold-finding method built more meaningful UIHs than the other combinations on average; using words and phrases
as features improved the quality of UIHs. 相似文献
7.
We study the on-line scheduling on an unbounded parallel batch machine to minimize makespan of two families of jobs. In this
model, jobs arrive over time and jobs from different families cannot be scheduled in a common batch. We provide a best possible
on-line algorithm for the problem with competitive ratio
.
Research supported by NSFC (10671183), NFSC-RGC (70731160633) and SRFDP (20070459002). 相似文献
8.
Krzysztof Kurowski Jarek Nabrzyski Ariel Oleksiak Jan Węglarz 《Journal of Scheduling》2008,11(5):371-379
In this paper we address a multicriteria scheduling problem for computational Grid systems. We focus on the two-level hierarchical
Grid scheduling problem, in which at the first level (the Grid level) a Grid broker makes scheduling decisions and allocates
jobs to Grid nodes. Jobs are then sent to the Grid nodes, where local schedulers generate local schedules for each node accordingly.
A general approach is presented taking into account preferences of all the stakeholders of Grid scheduling (end-users, Grid
administrators, and local resource providers) and assuming a lack of knowledge about job time characteristics. A single-stakeholder,
single-criterion version of the approach has been compared experimentally with the existing approaches. 相似文献
9.
Pseudorandom number generators are required for many computational tasks, such as stochastic modelling and simulation. This paper investigates the serial and parallel implementation of a Linear Congruential Generator for Graphics Processing Units (GPU) based on the binary representation of the normal number $\alpha _{2,3}$ . We adapted two methods of modular reduction which allowed us to perform most operations in 64-bit integer arithmetic, improving on the original implementation based on 106-bit double-double operations, which resulted in four-fold increase in efficiency. We found that our implementation is faster than existing methods in literature, and our generation rate is close to the limiting rate imposed by the efficiency of writing to a GPU’s global memory. 相似文献
10.
《Artificial Intelligence in Engineering》1990,5(3):153-160
This paper presents a Model Builder for generating inspectable qualitative models of distribution networks. An implemented example in the domain of power distribution systems is described. The research is an extended result of the project to develop a model-based generic power distribution training system. The Model Builder has two prominent features: (1) a systematic approach to building inspectable qualitative models of distribution networks; and (2) a high-level user interface to enable non-AI personnel to create these models without programming. The implementation was done in LISP, effectively combining the object-oriented programming paradigm and a general-purpose graphics editor together in a unified environment. This research contributes to an improved understanding of methodologies for building inspectable qualitative models for a wide variety of distribution networks. 相似文献
11.
This paper presents a parallel algorithm for fast word search to determine the set of biological words of an input DNA sequence.
The algorithm is designed to scale well on state-of-the-art multiprocessor/multicore systems for large inputs and large maximum
word sizes. The pattern exhibited by many sequential solutions to this problem is a repetitive execution over a large input
DNA sequence, and the generation of large amounts of output data to store and retrieve the words determined by the algorithm.
As we show, this pattern does not lend itself to straightforward standard parallelization techniques. The proposed algorithm
aims to achieve three major goals to overcome the drawbacks of embarrassingly parallel solution techniques: (i) to impose
a high degree of cache locality on a problem that, by nature, tends to exhibit nonlocal access patterns, (ii) to be lock free
or largely reduce the need for data access locking, and (iii) to enable an even distribution of the overall processing load
among multiple threads. We present an implementation and performance evaluation of the proposed algorithm on DNA sequences
of various sizes for different organisms on a dual processor quad-core system with a total of 8 cores. We compare the performance
of the parallel word search implementation with a sequential implementation and with an embarrassingly parallel implementation.
The results show that the proposed algorithm far outperforms the embarrassingly parallel strategy and achieves a speed-up’s
of up to 6.9 on our 8-core test system. 相似文献
12.
To gain and retain competitive advantages in a competitive business arena, a business cloud-computing platform should continuously strive to offer new services and remain competitive. Unfortunately, it becomes more and more recognized by the industry that a cloud-computing platform could not cover all aspects of IT layers engaged in infrastructure, platform and application. In practice, end users’ requests are nearly unlimited; while the services held by a cloud-computing platform is relatively limited, no matter in service category or in service capacity. In view of this challenge, an elastic cloud platform is investigated by recruited outside services that are absent from the cloud platform. Concretely, through dynamically hiring a qualified service on Internet to replace the absent service inside a cloud platform, an elastic cloud platform could nearly provide unlimited capabilities in an outsourcing service way, e.g., computing power, storage, application functions, etc. At last, the validity of the method is evaluated by a case study. 相似文献
13.
Keqin Li 《The Journal of supercomputing》2012,60(2):223-247
In this paper, scheduling parallel tasks on multiprocessor computers with dynamically variable voltage and speed are addressed
as combinatorial optimization problems. Two problems are defined, namely, minimizing schedule length with energy consumption
constraint and minimizing energy consumption with schedule length constraint. The first problem has applications in general
multiprocessor and multicore processor computing systems where energy consumption is an important concern and in mobile computers
where energy conservation is a main concern. The second problem has applications in real-time multiprocessing systems and
environments where timing constraint is a major requirement. Our scheduling problems are defined such that the energy-delay
product is optimized by fixing one factor and minimizing the other. It is noticed that power-aware scheduling of parallel
tasks has rarely been discussed before. Our investigation in this paper makes some initial attempt to energy-efficient scheduling
of parallel tasks on multiprocessor computers with dynamic voltage and speed. Our scheduling problems contain three nontrivial
subproblems, namely, system partitioning, task scheduling, and power supplying. Each subproblem should be solved efficiently,
so that heuristic algorithms with overall good performance can be developed. The above decomposition of our optimization problems
into three subproblems makes design and analysis of heuristic algorithms tractable. A unique feature of our work is to compare
the performance of our algorithms with optimal solutions analytically and validate our results experimentally, not to compare
the performance of heuristic algorithms among themselves only experimentally. The harmonic system partitioning and processor
allocation scheme is used, which divides a multiprocessor computer into clusters of equal sizes and schedules tasks of similar
sizes together to increase processor utilization. A three-level energy/time/power allocation scheme is adopted for a given
schedule, such that the schedule length is minimized by consuming given amount of energy or the energy consumed is minimized
without missing a given deadline. The performance of our heuristic algorithms is analyzed, and accurate performance bounds
are derived. Simulation data which validate our analytical results are also presented. It is found that our analytical results
provide very accurate estimation of the expected normalized schedule length and the expected normalized energy consumption
and that our heuristic algorithms are able to produce solutions very close to optimum. 相似文献
14.
Protein structure prediction (PSP) is an open problem with many useful applications in disciplines such as medicine, biology
and biochemistry. As this problem presents a vast search space and the analysis of each protein structure requires a significant
amount of computing time, it is necessary to take advantage of high-performance parallel computing platforms as well as to
define efficient search procedures in the space of possible protein conformations. In this paper we compare two parallel procedures
for PSP which are based on different multi-objective optimization approaches, i.e. PAES (Knowles and Corne in Proc. Congr.
Evol. Comput. 1:98–105, 1999) and NSGA2 (Deb et al. in IEEE Trans. Evol. Comput. 6:182–197, 2002). Although both procedures include techniques to take advantage of known protein structures and strategies to simplify the
search space through the so-called rotamer library and adaptive mutation operators, they present different profiles with respect
to their implicit parallelism. 相似文献
15.
Lee Sokjoon Seo Hwajeong Kwon Hyeokchan Yoon Hyunsoo 《The Journal of supercomputing》2019,75(8):4329-4349
The Journal of Supercomputing - Since the advent of deep belief network deep learning technology in 2006, artificial intelligence technology has been utilized in various convergence areas, such as... 相似文献
16.
Kristine Dery Richard Hall Nick Wailes Sharna Wiblen 《The Journal of Strategic Information Systems》2013,22(3):225-237
Available evidence suggests that the adoption of IT-enabled Human Resource Information Systems (HRIS) has not produced the widely predicted transformation of Human Resources (HR) to a strategic business partner. We examine the relationship between HRIS and the HR function by applying actor-network theory (ANT) to an HRIS implementation project. The focus on how actor networks are formed and reformed during implementation may be particularly well suited to explaining why the original aims of the HRIS can be displaced or lost in translation. We suggest that the approach afforded by ANT enables us to better understand the ongoing and contingent process of HRIS implementations. 相似文献
17.
Earlier approximate response time analysis (RTA) methods for tasks with offsets (transactional task model) exhibit two major
deficiencies: (i) They overestimate the calculated response times resulting in an overly pessimistic result. (ii) They suffer
from time complexity problems resulting in an RTA method that may not be applicable in practice. This paper shows how these
two problems can be alleviated and combined in one single fast-and-tight RTA method that combines the best of worlds, high
precision response times and a fast approximate RTA method.
Simulation studies, on randomly generated task sets, show that the response time improvement is significant, typically about 15%
tighter response times in 50% of the cases, resulting in about 12% higher admission probability for low priority tasks subjected
to admission control. Simulation studies also show that speedups of more than two orders of magnitude, for realistically sized
tasks sets, compared to earlier RTA analysis techniques, can be obtained.
Other improvements such as Palencia Gutiérrez, González Harbour (Proceedings of the 20th IEEE real-time systems symposium
(RTSS), pp. 328–339, 1999), Redell (Technical Report TRITA-MMK 2003:4, Dept. of Machine Design, KTH, 2003) are orthogonal and complementary which means that our method can easily be incorporated also in those methods. Hence, we
conclude that the fast-and-tight RTA method presented is the preferred analysis technique when tight response-time estimates
are needed, and that we do not need to sacrifice precision for analysis speed; both are obtained with one single method.
相似文献
Mikael NolinEmail: |
18.
Okon H. Akpan 《The Journal of supercomputing》2012,60(3):410-419
The focus of this study is the design of a parallel solution method that utilizes a fourth-order compact scheme. The applicability of the method is demonstrated on a time-dependent parabolic system with Neumann boundaries. The core of the parallel computing facilities used in the study is a 2-head-node, 224-compute-node Apple Xserve G5 multiprocessor. The system is first discretized in both time and space such that it remains in its stability regimes, before being solved with the method. The solution requires time marching in which every time step, h t , calls for a single parallel solve of the intermediary subsystems generated. The solution uses p processors ranging in numbers from 3 to 63. The speedups, s p , approach their limiting value of p only when p is small. The solution produces good computational results at large p, but poor results as p becomes progressively small. Also, the parallel solution produces accurate results yielding good speedups and efficiencies only when p is within some reasonable range of values. The intermediary systems generated by this method are linear and fine-grained, therefore, they are best suited for solution on massively-parallel processors. The solution method proposed in this study is, therefore, expected to yield more impressive results if applied in a massively-parallel computing environment. 相似文献
19.
This paper presents a framework for allocating radio resources to the Access Points (APs) introducing an Access Point Controller
(APC). Radio resources can be either time slots or subchannels. The APC assigns subchannels to the APs using a dynamic subchannel
allocation scheme. The developed framework evaluates the dynamic subchannel allocation scheme for a downlink multicellular
Orthogonal Frequency Division Multiple Access (OFDMA) system. In the considered system, each AP and the associated Mobile
Terminals (MTs) are not operating on a frequency channel with fixed bandwidth, rather the channel bandwidth for each AP is
dynamically adapted according to the traffic load. The subchannels assignment procedure is based on quality estimations due
to the interference measurements and the current traffic load. The traffic load estimation is realized with the measurement
of the utilization of the assigned radio resources. The reuse partitioning for the radio resources is done by estimating mutual
Signal to Interference Ratio (SIR) of the APs. The developed dynamic subchannel allocation ensures Quality of Service (QoS),
better traffic adaptability, and higher spectrum efficiency with less computational complexity.
相似文献
Chanchal Kumar Roy (Corresponding author)Email: |
20.
The linear solve problems arising in chemical physics and many other fields involve large sparse matrices with a certain block structure, for which special block Jacobi preconditioners are found to be very efficient. In two previous papers [W. Chen, B. Poirier, Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. I. Block Jacobi diagonalization, J. Comput. Phys. 219 (1) (2006) 185–197; W. Chen, B. Poirier, Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. II. QMR linear solver, J. Comput. Phys. 219 (1) (2006) 198–209], a parallel implementation was presented. Excellent parallel scalability was observed for preconditioner construction, but not for the matrix–vector product itself. In this paper, we introduce a new algorithm with (1) greatly improved parallel scalability and (2) generalization for arbitrary number of nodes and data sizes. 相似文献