共查询到20条相似文献,搜索用时 11 毫秒
1.
Drakopoulos Georgios Kafeza Eleanna Mylonas Phivos Iliadis Lazaros 《Neural computing & applications》2021,33(23):16363-16375
Neural Computing and Applications - Graph signal processing has recently emerged as a field with applications across a broad spectrum of fields including brain connectivity networks, logistics and... 相似文献
2.
于艳东 《网络安全技术与应用》2014,(6):93-93,95
程序代码相似度度量是用来检测剽窃及重复率、验证学生作业原创性的关键科技技术,这一技术还可以对所评阅的作业进行自动修改,通过对算法在程序代码相似度度量中的应用进行研究,可以辅助教师有效的衡量出学生程序设计对间的相似程度,从而检测出学生作业中相似的程序代码,促进教学评价的科学性和真实性,实现尊重原创、提倡创新的社会效益和教育目的。 相似文献
3.
4.
Machine Learning - Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer... 相似文献
5.
6.
We define the problem of bounded similarity querying in time-series databases, which generalizes earlier notions of similarity querying. Given a (sub)sequence S, a query sequence Q, lower and upper bounds on shifting and scaling parameters, and a tolerance , S is considered boundedly similar to Q if S can be shifted and scaled within the specified bounds to produce a modified sequence S′ whose distance from Q is within . We use similarity transformation to formalize the notion of bounded similarity. We then describe a framework that supports the resulting set of queries; it is based on a fingerprint method that normalizes the data and saves the normalization parameters. For off-line data, we provide an indexing method with a single index structure and search technique for handling all the special cases of bounded similarity querying. Experimental investigations find the performance of our method to be competitive with earlier, less general approaches. 相似文献
7.
Similarity coefficients (also known as coefficients of association) are important measurement techniques used to quantify the extent to which objects resemble one another. Due to privacy concerns, the data owner might not want to participate in any similarity measurement if the original dataset will be revealed or could be derived from the final output. There are many different measurements used for numerical, structural and binary data. In this paper, we particularly consider the computation of similarity coefficients for binary data. A large number of studies related to similarity coefficients have been performed. Our objective in this paper is not to design a specific similarity coefficient. Rather, we are demonstrating how to compute similarity coefficients in a secure and privacy preserved environment. In our protocol, a client and a server jointly participate in the computation. At the end of the protocol, the client will obtain all summation variables needed for the computation while the server learns nothing. We incorporate cryptographic methods in our protocol to protect the original dataset and all other intermediate results. Note that our protocol also supports dissimilarity coefficients. 相似文献
8.
The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. In this paper, we develop two new in-memory spatial join algorithms, the Grid-join and EGO*-join, and study their performance. Through evaluation, we explore the domain of applicability of each approach and provide recommendations for the choice of a join algorithm depending upon the dimensionality of the data as well as the expected selectivity of the join. We show that the two new proposed join techniques substantially outperform the state-of-the-art join algorithm, the EGO-join. 相似文献
9.
Lessons learned from completed projects are valuable resources for planning of new projects. A quantitative similarity measurement between construction projects can improve knowledge reuse practices. The information and documents of a similar past project can be retrieved to resolve the challenges in a new project. This paper introduces a novel method for measuring the similarity of construction projects based on semantic comparison of their work breakdown structure (WBS). WBS of a project should theoretically encompass a hierarchical decomposition of the total scope of project’s works, thus it could be used as an appropriate representative of the projects. The proposed method measures the semantic similarity between WBS of projects by means of natural language processing techniques. This method was implemented based on three metrics: node, structural, and total similarity. Each of these metrics calculate a quantitative similarity score between 0 and 1. The method was assessed using fifteen test samples with promising results in compliance with similarity properties. In addition, precision and recall of the method were evaluated in retrieving similar past projects. The results illustrate that the structural similarity slightly outperforms the other metrics. 相似文献
10.
This paper presents a new technique for clustering either object or relational data. First, the data are represented as a matrix D of dissimilarity values. D is reordered to D * using a visual assessment of cluster tendency algorithm. If the data contain clusters, they are suggested by visually apparent dark squares arrayed along the main diagonal of an image I( D *) of D *. The suggested clusters in the object set underlying the reordered relational data are found by defining an objective function that recognizes this blocky structure in the reordered data. The objective function is optimized when the boundaries in I( D *) are matched by those in an aligned partition of the objects. The objective function combines measures of contrast and edginess and is optimized by particle swarm optimization. We prove that the set of aligned partitions is exponentially smaller than the set of partitions that needs to be searched if clusters are sought in D . Six numerical examples are given to illustrate various facets of the algorithm. © 2009 Wiley Periodicals, Inc. 相似文献
11.
Locally adaptive metrics for clustering high dimensional data 总被引:2,自引:0,他引:2
Carlotta Domeniconi Dimitrios Gunopulos Sheng Ma Bojun Yan Muna Al-Razgan Dimitris Papadopoulos 《Data mining and knowledge discovery》2007,14(1):63-97
Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance
may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of
dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality
reduction techniques, and does not assume any data distribution model. Our method associates to each cluster a weight vector,
whose values capture the relevance of features within the corresponding cluster. We experimentally demonstrate the gain in
perfomance our method achieves with respect to competitive methods, using both synthetic and real datasets. In particular,
our results show the feasibility of the proposed technique to perform simultaneous clustering of genes and conditions in gene
expression data, and clustering of very high-dimensional data such as text data. 相似文献
12.
Scene analysis is a relevant way of gathering information about the structure of an audio stream. For content extraction purposes, it also provides prior knowledge that can be taken into account in order to provide more robust results for standard classification approaches. In order to perform such scene analysis, we believe that the notion of temporality is important. Consequently, we study in this paper a new way of modeling the evolution over time of the frequency and amplitude parameters of spectral components. We evaluate its benefits by considering its ability to automatically gather the components of the same sound source. The evaluation of the proposed metric shows that it achieves good performance and takes better account of micro-modulations. 相似文献
13.
相似性连接,即利用相似函数度量数据之间的相似程度,满足条件后进行连接操作。MapReduce框架下已存在很多相似性连接算法,但仍然存在一些不足,如大量的索引加大时间、空间的开销;现有算法不能有效地完成增量式数据集的相似性连接等。针对海量增量式数据集进行了研究,采用抽样技术得到有效中枢,形成更为合理的分区,建立分区索引和分配原则,完成新增数据的相似性连接操作。实验证明,该算法能够有效地解决海量增量式数据集的相似性连接问题,验证了分区索引的建立,可以提高新增数据的相似性连接操作的效率。 相似文献
14.
Efficient similarity search for market basket data 总被引:2,自引:0,他引:2
Alexandros Nanopoulos Yannis Manolopoulos 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):138-152
Several organizations have developed very large market basket databases for the maintenance of customer transactions. New
applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket
databases. In this paper, we propose a novel scheme for similarity search queries in basket data. We develop a new representation
method, which, in contrast to existing approaches, is proven to provide correct results. New algorithms are proposed for the
processing of similarity queries. Extensive experimental results, for a variety of factors, illustrate the superiority of
the proposed scheme over the state-of-the-art method.
Edited by R. Ng. Received: August 6, 2001 / Accepted: May 21, 2002 Published online: September 25, 2002 相似文献
15.
Indexing high-dimensional data for main-memory similarity search 总被引:1,自引:0,他引:1
As RAM gets cheaper and larger, in-memory processing of data becomes increasingly affordable. In this paper, we propose a novel index structure, the CSR+-tree, to support efficient high-dimensional similarity search in main memory. We introduce quantized bounding spheres (QBSs) that approximate bounding spheres (BSs) or data points. We analyze the respective pros and cons of both QBSs and the previously proposed quantized bounding rectangles (QBRs), and take the best of both worlds by carefully incorporating both of them into the CSR+-tree. We further propose a novel distance computation scheme that eliminates the need for decompressing QBSs or QBRs, which results in significant cost savings. We present an extensive experimental evaluation and analysis of the CSR+-tree, and compare its performance against that of other representative indexes in the literature. Our results show that the CSR+-tree consistently outperforms other index structures. 相似文献
16.
A number of metrics have been proposed in the literature to measure text re-use between pairs of sentences or short passages. These individual metrics fail to reliably detect paraphrasing or semantic equivalence between sentences, due to the subjectivity and complexity of the task, even for human beings. This paper analyzes a set of five simple but weak lexical metrics for measuring textual similarity and presents a novel paraphrase detector with improved accuracy based on abductive machine learning. The objective here is 2-fold. First, the performance of each individual metric is boosted through the abductive learning paradigm. Second, we investigate the use of decision-level and feature-level information fusion via abductive networks to obtain a more reliable composite metric for additional performance enhancement. Several experiments were conducted using two benchmark corpora and the optimal abductive models were compared with other approaches. Results demonstrate that applying abductive learning has significantly improved the results of individual metrics and further improvement was achieved through fusion. Moreover, building simple models of polynomial functional elements that identify and integrate the smallest subset of relevant metrics yielded better results than those obtained from the support vector machine classifiers utilizing the same datasets and considered metrics. The results were also comparable to the best result reported in the literature even with larger number of more powerful features and/or using more computationally intensive techniques. 相似文献
17.
18.
One of the key challenges to realize automated processing of the information on the Web, which is the central goal of the Semantic Web, is related to the entity matching problem. There are a number of tools that reliably recognize named entities, such as persons, companies, geographic locations, in Web documents. The names of these extracted entities are, however, non-unique; the same name on different Web pages might or might not refer to the same entity. The entity matching problem concerns of identifying the entities, which are referring to the same real-world entity. This problem is very similar to the entity resolution problem studied in relational databases, however, there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the entities.Similarity functions try to capture the degree of belief about the equivalence of two entities, thus they play a crucial role in entity matching. The accuracy of the similarity functions highly depends on the applied assessment techniques, but also on some specific features of the entities. We propose systematic design strategies for combined similarity functions in this context. Our method relies on the combination of multiple evidences, with the help of estimated quality of the individual similarity values and with particular attention to missing information that is common in Web context. We study the effectiveness of our method in two specific instances of the general entity matching problem, namely the person name disambiguation and the Twitter message classification problem. In both cases, using our techniques in a very simple algorithmic framework we obtained better results than the state-of-the-art methods. 相似文献
19.
20.
The authors present case study applications of statistical methods for the analysis of software metrics data which recognize the discrete nature of such data. A procedure is also described which allows a component of complexity independent of size to be extracted from the usual Halstead's metrics and McCabe's cyclomatic number. The methods described are different from the usual regression and non-parametric methods previously applied to software metrics. With the software quality practitioner in mind, the paper explores how these new methods are helpful in understanding the relationships between software metrics. 相似文献