Abstract: | In a shared-nothing parallel database system, a join operation is split into a set of tasks that are allocated to the nodes in the system to be executed concurrently and independently. While parallel processing could greatly reduce the completion time of a join operation, the system performance may degrade because of load imbalance across the nodes caused by data skewness in the relations. Load-balanced join processing uses various techniques to evenly distribute the load among nodes in a system and hence improves the overall system performance. In this paper, the basic issues in designing load-balanced parallel join algorithms are identified. From the solutions to those issues, a large set of load-balanced join algorithms can be constructed. Performance of four representative algorithms-two dynamic load-balancing algorithms proposed in this paper and two static load-balancing algorithms adapted from similar algorithms in the literature-is studied and compared with that of a parallel join algorithm that does not balance the join load. The results of our study clearly show the benefits of load-balancing. This study also demonstrates that the dynamic load-balancing techniques proposed in this paper not only are feasible but also provide good system performance. |