共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
Michel Jambu 《Computers & Geosciences》1981,7(3):297-310
A rapid hierarchical classification program enables the clustering of 5000 elements in only a few minutes of central processor time using an IBM 370/168 computer. The program algorithm, based on the reductibility axiom in graph theory, is related to the criterion of correspondence analysis. Its application to a set of hydrogeological data is described briefly. 相似文献
5.
本文通过对IBM Power与DEC Alpha RISC微处理器的设计策略及对性能的影响做了系统的分析与比较。 相似文献
6.
7.
A general class of methods for (partial) rotation of a set of (loading) matrices to maximal agreement has been available in the literature since the 1980s. It contains a generalization of canonical correlation analysis as a special case. However, various other generalizations of canonical correlation analysis have been proposed. A new general class of methods for each such alternative generalization of canonical correlation is proposed. Together, these general classes of methods form a superclass of methods that strike a compromise between explaining the variance within sets of variables and explaining the agreement between sets of variables, as illustrated in some examples. Furthermore, one general algorithm for finding the solutions for all methods in all general classes is offered. As a consequence, for all methods in the superclass of methods, algorithms are available at once. For the existing methods, the general algorithm usually reduces to the standard algorithms employed in these methods, and thus the algorithms for all these methods are shown to be related to each other. 相似文献
8.
Privacy-preserving distributed mining of association rules on horizontally partitioned data 总被引:9,自引:0,他引:9
Kantarcioglu M. Clifton C. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(9):1026-1037
Data mining can extract important knowledge from large data collections ut sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. We address secure mining of association rules over horizontally partitioned data. The methods incorporate cryptographic techniques to minimize the information shared, while adding little overhead to the mining task. 相似文献
9.
10.
A data acquisition, display and plotting program for the IBM PC 总被引:3,自引:0,他引:3
A program, AQ, has been developed to perform analog-to-digital (A/D) conversions on IBM PC products using the Data Translation DT2801-A or DT2801 boards. This program provides support for all of the triggered and continuous A/D modes of these boards. Additional subroutines for management of data files and display of acquired data have also been developed. These programs have been written so that a minimum number of keystrokes are required for their operation. Parameter files are used to simplify reconfiguration of this program for various data acquisition tasks. 相似文献
11.
By executing different fingerprint-image matching algorithms on large data sets, it reveals that the match and non-match similarity scores have no specific underlying distribution function. Thus, it requires a nonparametric analysis for fingerprint-image matching algorithms on large data sets without any assumption about such irregularly discrete distribution functions. A precise receiver operating characteristic (ROC) curve based on the true accept rate (TAR) of the match similarity scores and the false accept rate (FAR) of the non-match similarity scores can be constructed. The area under such an ROC curve computed using the trapezoidal rule is equivalent to the Mann-Whitney statistic directly formed from the match and non-match similarity scores. Thereafter, the Z statistic formulated using the areas under ROC curves along with their variances and the correlation coefficient is applied to test the significance of the difference between two ROC curves. Four examples from the extensive testing of commercial fingerprint systems at the National Institute of Standards and Technology are provided. The nonparametric approach presented in this article can also be employed in the analysis of other large biometric data sets. 相似文献
12.
The problem of optimization of communications during the execution of a program on a parallel computer with distributed memory
is investigated. Statements are formulated that make it possible to determine the possibility of organization of data broadcast
and translation. The conditions proposed are represented in the form suitable for practical application and can be used for
automated parallelization of programs.
This work was done within the framework of the State Program of Fundamental Studies of the Republic of Belarus (under the
code name “Mathematical structures 21”) with the partial support of the Foundation for Fundamental Studies of the Republic
of Belarus (grant F03-062).
__________
Translated from Kibernetika i Sistemnyi Analiz, No. 2, pp. 166–182, March–April 2006. 相似文献
13.
Self-organising maps (SOM) have become a commonly-used cluster analysis technique in data mining. However, SOM are not able to process incomplete data. To build more capability of data mining for SOM, this study proposes an SOM-based fuzzy map model for data mining with incomplete data sets. Using this model, incomplete data are translated into fuzzy data, and are used to generate fuzzy observations. These fuzzy observations, along with observations without missing values, are then used to train the SOM to generate fuzzy maps. Compared with the standard SOM approach, fuzzy maps generated by the proposed method can provide more information for knowledge discovery. 相似文献
14.
Per Ling 《The Journal of supercomputing》1993,7(3):323-355
Fortran 77 implementations of the Level 3 Basic Linear Algebra Subprograms (BLAS) in double precision, structured and tuned to achieve high performance on the IBM 3090 VF, are presented. The implementations are designed to exploit the memory hierarchy and the vector processor efficiently. Efficient cache reuse is provided by a method for matrix blocking adapted to the memory hierarchy. Vector registers and compound vector instructions are used efficiently through carefully designed Fortran code constructs. Performance results generally show speed comparable to the highly tuned IBM ESSL library. In some cases our implementations are actually faster than ESSL. The generality of the program design and the use of Fortran 77 make the implementations portable and well suited to serve as design platforms for other machines with similar architectures. 相似文献
15.
Dong JX Krzyzak A Suen CY 《IEEE transactions on pattern analysis and machine intelligence》2005,27(4):603-618
Training a support vector machine on a data set of huge size with thousands of classes is a challenging problem. This paper proposes an efficient algorithm to solve this problem. The key idea is to introduce a parallel optimization step to quickly remove most of the nonsupport vectors, where block diagonal matrices are used to approximate the original kernel matrix so that the original problem can be split into hundreds of subproblems which can be solved more efficiently. In addition, some effective strategies such as kernel caching and efficient computation of kernel matrix are integrated to speed up the training process. Our analysis of the proposed algorithm shows that its time complexity grows linearly with the number of classes and size of the data set. In the experiments, many appealing properties of the proposed algorithm have been investigated and the results show that the proposed algorithm has a much better scaling capability than Libsvm, SVM/sup light/, and SVMTorch. Moreover, the good generalization performances on several large databases have also been achieved. 相似文献
16.
A software package, OMNILAB, has been written for the IBM PC, XT and AT computers to be a general purpose system for data collection, analysis, and display. The program supports collection of dta from a variety of absorbance detectors, from pH meters and from other instruments that output time-varying analog voltages in the ranges of millivolts or volts. The program includes capabilities for averaging of data, baseline subtraction, integration of curves, and versatile formatting for both video and hardcopy display of data. 相似文献
17.
Data envelopment analysis (DEA) is computationally intensive. This work answers conclusively questions about computational performance and scale limits of the standard LP-based procedures currently used. Examples of DEA problems with up to 15K entities are documented and it is not hard to imagine problem size increasing as new more sophisticated applications are found for DEA. This work reports on a comprehensive computational study involving DEA problems with up to 100K DMUs. We explore the impact of different LP algorithms including interior point methods as well as accelerators such as advanced basis starts and DEA specific enhancements such as “restricted basis entry” (RBE). Our results demonstrate that solution times behave close to quadratically and that massive problems can be solved efficiently. We propose ideas for extending DEA into a data mining tool. 相似文献
18.
L. Poladian L.S. Jermiin 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(4):359-368
Evolutionary relationships among species are usually (1) illustrated by means of a phylogenetic tree and (2) inferred by optimising
some measure of fitness, such as the total evolutionary distance between species or the likelihood of the tree (given a model
of the evolutionary process and a data set). The combinatorial complexity of inferring the topology of the best tree makes
phylogenetic inference an ideal candidate for evolutionary algorithms. However, difficulties arise when different data sets
provide conflicting information about the inferred `best' tree(s). We apply the techniques of multi-objective optimisation
to phylogenetic inference for the first time. We use the simplest model of evolution and a four species problem to illustrate
the method. 相似文献
19.
20.
In practice, there are many binary classification problems, such as credit risk assessment, medical testing for determining
if a patient has a certain disease or not, etc. However, different problems have different characteristics that may lead to
different difficulties of the problem. One important characteristic is the degree of imbalance of two classes in data sets.
For data sets with different degrees of imbalance, are the commonly used binary classification methods still feasible? In
this study, various binary classification models, including traditional statistical methods and newly emerged methods from
artificial intelligence, such as linear regression, discriminant analysis, decision tree, neural network, support vector machines,
etc., are reviewed, and their performance in terms of the measure of classification accuracy and area under Receiver Operating
Characteristic (ROC) curve are tested and compared on fourteen data sets with different imbalance degrees. The results help
to select the appropriate methods for problems with different degrees of imbalance. 相似文献