期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Approximative distance computation by random hashing

Selim Mimaroglu Murat Yagci Dan A. Simovici 《The Journal of supercomputing》2012,61(3):572-589

We propose an approximate computation technique for inter-object distances of binary data sets. Our approach is based on locality sensitive hashing. We randomly select a number of projections of the data set and group objects into buckets based on the hash values of these projections. For each pair of objects, occurrences in the same bucket are counted and the exact Hamming distance is approximated based on the number of co-occurrences in all buckets. We parallelize the computation using mainly two schemes. The first assigns each random subspace to a processor for calculating the local co-occurrence matrix, where all the local co-occurrence matrices are combined into the final co-occurrence matrix. The second method provides the same distance approximation in longer runtimes by limiting the total message size in a parallel computing environment, which is especially useful for very large data sets generating immense message traffic. Our methods produce very accurate results, scale up well with the number of objects, and tolerate processor failures. Experimental evaluations on supercomputers and workstations with several processors demonstrate the usefulness of our methods. 相似文献

2.

Scalable pattern mining with Bayesian networks as background knowledge 总被引：2，自引：1，他引：1

Szymon Jaroszewicz Tobias Scheffer Dan A. Simovici 《Data mining and knowledge discovery》2009,18(1):56-100

We study a discovery framework in which background knowledge on variables and their relations within a discourse area is available in the form of a graphical model. Starting from an initial, hand-crafted or possibly empty graphical model, the network evolves in an interactive process of discovery. We focus on the central step of this process: given a graphical model and a database, we address the problem of finding the most interesting attribute sets. We formalize the concept of interestingness of attribute sets as the divergence between their behavior as observed in the data, and the behavior that can be explained given the current model. We derive an exact algorithm that finds all attribute sets whose interestingness exceeds a given threshold. We then consider the case of a very large network that renders exact inference unfeasible, and a very large database or data stream. We devise an algorithm that efficiently finds the most interesting attribute sets with prescribed approximation bound and confidence probability, even for very large networks and infinite streams. We study the scalability of the methods in controlled experiments; a case-study sheds light on the practical usefulness of the approach. 相似文献

3.

Learning with Permutably Homogeneous Multiple-Valued Multiple-Threshold Perceptrons

Ngom Alioune Reischer Corina Simovici Dan A. Stojmenović Ivan 《Neural Processing Letters》2000,12(1):71-90

The (n,k,s)-perceptrons partition the input space V R ⁿ into s+1 regions using s parallel hyperplanes. Their learning abilities are examined in this research paper. The previously studied homogeneous (n,k,k–1)-perceptron learning algorithm is generalized to the permutably homogeneous (n,k,s)-perceptron learning algorithm with guaranteed convergence property. We also introduce a high capacity learning method that learns any permutably homogeneously separable k-valued function given as input. 相似文献

4.

Impurity measures in databases

Dan A. Simovici Dana Cristofor Laurentiu Cristofor 《Acta Informatica》2002,38(5):307-324

We introduce purity dependencies as generalizations of functional dependencies in relational databases starting from the notion of impurity measure. The impurity measure of a subset of a set relative to a partition of that set and the relative impurity of two partitions allow us to define the relative impurity of two attribute sets of a table of a relational database and to introduce purity dependencies. We discuss properties of these dependencies that generalize similar properties of functional dependencies and we highlight their relevance for approximate classifications. Finally, an algorithm that mines datasets for these dependencies is presented. Received: 4 July 2000 / 16 November 2001 相似文献

5.

A new evaluation measure using compression dissimilarity on text summarization

Tong Wang Ping Chen Dan Simovici 《Applied Intelligence》2016,45(1):127-134

Evaluation of automatic text summarization is a challenging task due to the difficulty of calculating similarity of two texts. In this paper, we define a new dissimilarity measure – compression dissimilarity to compute the dissimilarity between documents. Then we propose a new automatic evaluating method based on compression dissimilarity. The proposed method is a completely “black box” and does not need preprocessing steps. Experiments show that compression dissimilarity could clearly distinct automatic summaries from human summaries. Compression dissimilarity evaluating measure could evaluate an automatic summary by comparing with high-quality human summaries, or comparing with its original document. The evaluating results are highly correlated with human assessments, and the correlation between compression dissimilarity of summaries and compression dissimilarity of documents can serve as a meaningful measure to evaluate the consistency of an automatic text summarization system. 相似文献

6.

The Goodman-Kruskal coefficient and its applications in genetic diagnosis of cancer

Jaroszewicz S Simovici DA Kuo WP Ohno-Machado L 《IEEE transactions on bio-medical engineering》2004,51(7):1095-1102

Increasing interest in new pattern recognition methods has been motivated by bioinformatics research. The analysis of gene expression data originated from microarrays constitutes an important application area for classification algorithms and illustrates the need for identifying important predictors. We show that the Goodman-Kruskal coefficient can be used for constructing minimal classifiers for tabular data, and we give an algorithm that can construct such classifiers. 相似文献