共查询到20条相似文献,搜索用时 10 毫秒
1.
Drakopoulos Georgios Kafeza Eleanna Mylonas Phivos Iliadis Lazaros 《Neural computing & applications》2021,33(23):16363-16375
Neural Computing and Applications - Graph signal processing has recently emerged as a field with applications across a broad spectrum of fields including brain connectivity networks, logistics and... 相似文献
2.
于艳东 《网络安全技术与应用》2014,(6):93-93,95
程序代码相似度度量是用来检测剽窃及重复率、验证学生作业原创性的关键科技技术,这一技术还可以对所评阅的作业进行自动修改,通过对算法在程序代码相似度度量中的应用进行研究,可以辅助教师有效的衡量出学生程序设计对间的相似程度,从而检测出学生作业中相似的程序代码,促进教学评价的科学性和真实性,实现尊重原创、提倡创新的社会效益和教育目的。 相似文献
3.
4.
Machine Learning - Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer... 相似文献
5.
The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. In this paper, we develop two new in-memory spatial join algorithms, the Grid-join and EGO*-join, and study their performance. Through evaluation, we explore the domain of applicability of each approach and provide recommendations for the choice of a join algorithm depending upon the dimensionality of the data as well as the expected selectivity of the join. We show that the two new proposed join techniques substantially outperform the state-of-the-art join algorithm, the EGO-join. 相似文献
6.
We define the problem of bounded similarity querying in time-series databases, which generalizes earlier notions of similarity querying. Given a (sub)sequence S, a query sequence Q, lower and upper bounds on shifting and scaling parameters, and a tolerance , S is considered boundedly similar to Q if S can be shifted and scaled within the specified bounds to produce a modified sequence S′ whose distance from Q is within . We use similarity transformation to formalize the notion of bounded similarity. We then describe a framework that supports the resulting set of queries; it is based on a fingerprint method that normalizes the data and saves the normalization parameters. For off-line data, we provide an indexing method with a single index structure and search technique for handling all the special cases of bounded similarity querying. Experimental investigations find the performance of our method to be competitive with earlier, less general approaches. 相似文献
7.
Similarity coefficients (also known as coefficients of association) are important measurement techniques used to quantify the extent to which objects resemble one another. Due to privacy concerns, the data owner might not want to participate in any similarity measurement if the original dataset will be revealed or could be derived from the final output. There are many different measurements used for numerical, structural and binary data. In this paper, we particularly consider the computation of similarity coefficients for binary data. A large number of studies related to similarity coefficients have been performed. Our objective in this paper is not to design a specific similarity coefficient. Rather, we are demonstrating how to compute similarity coefficients in a secure and privacy preserved environment. In our protocol, a client and a server jointly participate in the computation. At the end of the protocol, the client will obtain all summation variables needed for the computation while the server learns nothing. We incorporate cryptographic methods in our protocol to protect the original dataset and all other intermediate results. Note that our protocol also supports dissimilarity coefficients. 相似文献
8.
Locally adaptive metrics for clustering high dimensional data 总被引:2,自引:0,他引:2
Carlotta Domeniconi Dimitrios Gunopulos Sheng Ma Bojun Yan Muna Al-Razgan Dimitris Papadopoulos 《Data mining and knowledge discovery》2007,14(1):63-97
Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance
may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of
dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality
reduction techniques, and does not assume any data distribution model. Our method associates to each cluster a weight vector,
whose values capture the relevance of features within the corresponding cluster. We experimentally demonstrate the gain in
perfomance our method achieves with respect to competitive methods, using both synthetic and real datasets. In particular,
our results show the feasibility of the proposed technique to perform simultaneous clustering of genes and conditions in gene
expression data, and clustering of very high-dimensional data such as text data. 相似文献
9.
Scene analysis is a relevant way of gathering information about the structure of an audio stream. For content extraction purposes, it also provides prior knowledge that can be taken into account in order to provide more robust results for standard classification approaches. In order to perform such scene analysis, we believe that the notion of temporality is important. Consequently, we study in this paper a new way of modeling the evolution over time of the frequency and amplitude parameters of spectral components. We evaluate its benefits by considering its ability to automatically gather the components of the same sound source. The evaluation of the proposed metric shows that it achieves good performance and takes better account of micro-modulations. 相似文献
10.
相似性连接,即利用相似函数度量数据之间的相似程度,满足条件后进行连接操作。MapReduce框架下已存在很多相似性连接算法,但仍然存在一些不足,如大量的索引加大时间、空间的开销;现有算法不能有效地完成增量式数据集的相似性连接等。针对海量增量式数据集进行了研究,采用抽样技术得到有效中枢,形成更为合理的分区,建立分区索引和分配原则,完成新增数据的相似性连接操作。实验证明,该算法能够有效地解决海量增量式数据集的相似性连接问题,验证了分区索引的建立,可以提高新增数据的相似性连接操作的效率。 相似文献
11.
12.
Indexing high-dimensional data for main-memory similarity search 总被引:1,自引:0,他引:1
As RAM gets cheaper and larger, in-memory processing of data becomes increasingly affordable. In this paper, we propose a novel index structure, the CSR+-tree, to support efficient high-dimensional similarity search in main memory. We introduce quantized bounding spheres (QBSs) that approximate bounding spheres (BSs) or data points. We analyze the respective pros and cons of both QBSs and the previously proposed quantized bounding rectangles (QBRs), and take the best of both worlds by carefully incorporating both of them into the CSR+-tree. We further propose a novel distance computation scheme that eliminates the need for decompressing QBSs or QBRs, which results in significant cost savings. We present an extensive experimental evaluation and analysis of the CSR+-tree, and compare its performance against that of other representative indexes in the literature. Our results show that the CSR+-tree consistently outperforms other index structures. 相似文献
13.
Efficient similarity search for market basket data 总被引:2,自引:0,他引:2
Alexandros Nanopoulos Yannis Manolopoulos 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):138-152
Several organizations have developed very large market basket databases for the maintenance of customer transactions. New
applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket
databases. In this paper, we propose a novel scheme for similarity search queries in basket data. We develop a new representation
method, which, in contrast to existing approaches, is proven to provide correct results. New algorithms are proposed for the
processing of similarity queries. Extensive experimental results, for a variety of factors, illustrate the superiority of
the proposed scheme over the state-of-the-art method.
Edited by R. Ng. Received: August 6, 2001 / Accepted: May 21, 2002 Published online: September 25, 2002 相似文献
14.
15.
The authors present case study applications of statistical methods for the analysis of software metrics data which recognize the discrete nature of such data. A procedure is also described which allows a component of complexity independent of size to be extracted from the usual Halstead's metrics and McCabe's cyclomatic number. The methods described are different from the usual regression and non-parametric methods previously applied to software metrics. With the software quality practitioner in mind, the paper explores how these new methods are helpful in understanding the relationships between software metrics. 相似文献
16.
The individuality of production devices should be taken into account when statistical models are designed for parallelized devices. In the present work, a new clustering method, referred to as NC-spectral clustering, is proposed for discriminating the individuality of production devices. The key idea is to classify samples according to the differences of the correlation among measured variables, since the individuality of production devices is expressed by the correlation. In the proposed NC-spectral clustering, the nearest correlation (NC) method and spectral clustering are integrated. The NC method generates the weighted graph that expresses the correlation-based similarities between samples, and the constructed graph is partitioned by spectral clustering. A new statistical process monitoring method and a new soft-sensor design method are proposed on the basis of NC-spectral clustering. The usefulness of the proposed methods is demonstrated through a numerical example and a case study of parallelized batch processes. 相似文献
17.
Dr. H. Späth 《Computing》1973,11(2):175-177
A set ofn ordered real numbers is partitioned by complete enumeration intok clusters such that the sum of the sum of squared deviations from the mean-value within each cluster is minimized. 相似文献
18.
Yogendra Narain Singh Phalguni Gupta 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(3):449-460
This paper proposes new techniques to delineate P and T waves efficiently from heartbeats. The delineation results have been found to be optimum and stable in comparison to other published results. These delineators are used along with QRS complex to extract various features of classes time interval, amplitude and angle from clinically dominant fiducials on each heartbeat of the electrocardiogram (ECG). A new identification system has been proposed in this study, which uses these features and makes the decision on the identity of an individual with respect to a given database. The system has been tested against a set of 250 ECG recordings prepared from 50 individuals of Physionet. The matching decisions are made on the basis of correlation between heartbeat features among individuals. The proposed system has achieved an equal error rate of less than 1.01 with an accuracy of 99%. 相似文献
19.
Bin Cui Beng Chin Coi Jianwen Su Tan K.-L. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(3):339-353
In main memory systems, the L2 cache typically employs cache line sizes of 32-128 bytes. These values are relatively small compared to high-dimensional data, e.g., >32D. The consequence is that existing techniques (on low-dimensional data) that minimize cache misses are no longer effective. We present a novel index structure, called /spl Delta/-tree, to speed up the high-dimensional query in main memory environment. The /spl Delta/-tree is a multilevel structure where each level represents the data space at different dimensionalities: the number of dimensions increases toward the leaf level. The remaining dimensions are obtained using principal component analysis. Each level of the tree serves to prune the search space more efficiently as the lower dimensions can reduce the distance computation and better exploit the small cache line size. Additionally, the top-down clustering scheme can capture the feature of the data set and, hence, reduces the search space. We also propose an extension, called /spl Delta//sup +/-tree, that globally clusters the data space and then partitions clusters into small regions. The /spl Delta//sup +/-tree can further reduce the computational cost and cache misses. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of data sets. Our results show that the /spl Delta//sup +/-tree is superior in most cases. 相似文献
20.
Jiajia Xu Weiming Zhang Ruiqi Jiang Xiaocheng Hu Nenghai Yu 《Multimedia Tools and Applications》2017,76(14):15491-15511
Until now, most reversible data hiding techniques have been evaluated by peak signal-to-noise ratio(PSNR), which based on mean squared error(MSE). Unfortunately, MSE turns out to be an extremely poor measure when the purpose is to predict perceived signal fidelity or quality. The structural similarity (SSIM) index has gained widespread popularity as an alternative motivating principle for the design of image quality measures. How to utilize the characterize of SSIM to design RDH algorithm is very critical. In this paper, we propose an optimal RDH algorithm under structural similarity constraint. Firstly, we deduce the metric of the structural similarity constraint, and further we prove it does’t hold non-crossing-edges property. Secondly, we construct the rate-distortion function of optimal structural similarity constraint, which is equivalent to minimize the average distortion for a given embedding rate, and then we can obtain the optimal transition probability matrix under the structural similarity constraint. Comparing with previous RDH, our method have gained the improvement of SSIM about 1.89 % on average. Experiments show that our proposed method outperforms the state-of-arts performance in SSIM. 相似文献