共查询到20条相似文献,搜索用时 31 毫秒
1.
The concept of support is central to data mining. While the definition of support in transaction databases is intuitive and
simple, that is not the case in graph datasets and databases. Most mining algorithms require the support of a pattern to be
no greater than that of its subpatterns, a property called anti-monotonicity, or admissibility. This paper examines the requirements for admissibility of a support measure. Support measures for mining
graphs are usually based on the notion of an instance graph---a graph representing all the instances of the pattern in a database
and their intersection properties. Necessary and sufficient conditions for support measure admissibility, based on operations
on instance graphs, are developed and proved. The sufficient conditions are used to prove admissibility of one support measure—the
size of the independent set in the instance graph. Conversely, the necessary conditions are used to quickly show that some
other support measures, such as weighted count of instances, are not admissible.
*Partially supported by the KITE consortium under contract to the Israeli Ministry of Trade and Industry, and by the Paul Ivanier
Center for Robotics and Production Management. 相似文献
2.
We introduce a new filtering algorithm, called IDL(d)-filtering, for a global constraint dedicated to the graph isomorphism problem—the goal of which is to decide if two given
graphs have an identical structure. The basic idea of IDL(d)-filtering is to label every vertex with respect to its relationships with other vertices around it in the graph, and to
use these labels to filter domains by removing values that have different labels. IDL(d)-filtering is parameterized by a positive integer value d which gives a limit on the distance between a vertex to be labelled and the set of vertices considered to build its label.
We experimentally compare different instantiations of IDL(d)-filtering with state-of-the-art dedicated algorithms and show that IDL(d)-filtering is more efficient on regular sparse graphs and competitive on other kinds of graphs. 相似文献
3.
Mining patterns from graph traversals 总被引:12,自引:0,他引:12
In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, the problem of finding traversal patterns from such collections is examined. The determination of patterns is based on the graph structure of the model. For this purpose, three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not are presented. Additionally, we consider the fact that accesses within patterns may be interleaved with random accesses due to navigational purposes. The definition of the pattern type generalizes existing ones in order to take into account this fact. The performance of all algorithms and their sensitivity to several parameters is examined experimentally. 相似文献
4.
Ad hoc networks are self-configurable networks with dynamic topologies. All involved nodes in the network share the responsibility for routing, access, and communications. The mobile ad hoc network can be considered as a short-lived collection of mobile nodes communicating with each other. Such networks are more vulnerable to security threats than traditional wireless networks because of the absence of the fixed infrastructure. For providing secure communications in such networks, lots of mechanisms have been proposed since the early 1990s, which also have to deal with the limitations of the mobile ad hoc networks, including high power saving and low bandwidth. Besides, public key infrastructure (PKI) is a well-known method for providing confidential communications in mobile ad hoc networks. In 2004, Varadharajan et al. proposed a secure communication scheme for cluster-based ad hoc networks based on PKI. Since the computation overheads of the PKI cryptosystem are heavy for each involved communicating node in the cluster, we propose an ID-based version for providing secure communications in ad hoc networks. Without adopting PKI cryptosystems, computation overheads of involved nodes in our scheme can be reduced by 25% at least. 相似文献
5.
知识图谱(KG)能够缓解协同过滤算法存在的数据稀疏和冷启动问题,在推荐领域被广泛地研究和应用。现有的很多基于KG的推荐模型混淆了用户物品二部图中的协同过滤信息和KG中实体间的关联信息,导致学习到的用户向量和物品向量无法准确表达其特征,甚至引入与用户、物品无关的信息从而干扰推荐。针对上述问题提出一种融合协同信息的知识图注意力网络(KGANCF)。首先,为了避免KG实体信息的干扰,网络的协同过滤层从用户物品二部图中挖掘出用户和物品的协同过滤信息;然后,在知识图注意力嵌入层中应用图注意力机制,从KG中继续提取与用户和物品密切相关的属性信息;最后,在预测层将用户物品的协同过滤信息和KG中的属性信息融合,得到用户和物品最终向量表示,进而预测用户对物品的评分。在MovieLens-20M和Last.FM数据集上进行了实验,与协同知识感知注意力网络(CKAN)相比,KGANCF在MovieLens-20M数据集上的F1分数提升了1.1个百分点,曲线下面积(AUC)提升了0.6个百分点;而在KG相对稀疏的Last.FM数据集上,模型的F1分数提升了3.3个百分点,AUC提升了8.5个百分点。实验结果表明,KGANCF能够有效提高推荐结果的准确度,在KG稀疏的数据集上显著优于协同知识嵌入(CKE)、知识图谱卷积网络(KGCN)、知识图注意网络(KGAT)和CKAN模型。 相似文献
6.
在推荐系统中,利用图卷积网络等方法提取图的高阶信息缓解了冷启动问题。为了在此基础上融合神经网络协同过滤的深层特征提取能力,提出一种基于图卷积的双通道协同过滤推荐算法(GCNCF-2C)。首先,将推荐问题分为上游任务和下游任务;其次,在上游任务中,预训练编码器利用包含残差的一维卷积层和多个图卷积层在两个独立通道中对节点特征和图高阶特征进行分离提取,形成节点的特征表示;最后,解码器通过节点特征进行评级预测,进行端到端的训练。在数据集MovieLens-100K和MovieLens-1M上的实验表明,该算法相比于基线模型在两个数据集上的RMSE指标平均提高1.72%和1.76%,MAE指标平均提高2.7%和1.98%,同时在基于用户和项目的冷启动实验中RMSE指标平均提高5.9%,具有更好的综合性能。 相似文献
7.
利用知识图谱进行推荐的一个巨大挑战在于如何获取项目的结构化知识并对其进行语义特征提取.针对这一问题,提出了一种基于知识图嵌入的协同过滤推荐算法(KGECF).首先从Freebase知识图谱中提取与项目相关的知识信息,并与历史交互项目进行链接构建子知识库;然后通过基于TransR的Xavier-TransR方法得到子知识库中实体、关系表征;设计一种端到端的联合学习模型,将结构化信息与历史偏好信息嵌入到统一的向量空间中;最后利用协同过滤方法进一步计算这些向量并生成精确的推荐列表.在MovieLens-1 M和Amazon-book两个公开数据集上的实验表明,该算法在推荐准确率、召回率、F1值和NDCG四个指标上均优于基线方法,能够集成大规模的结构化和非结构化数据,同时获得高精度的推荐结果. 相似文献
8.
利用向量空间模型表示的文本邮件数据具有高维性, 不利于邮件过滤模型的建立, 需要对数据进行降维处理。最大间隔Semi-NMF(max-margin semi-nonnegative matrix factorization, MNMF)能够同时实现维数约减和邮件分类, 而图正则化NMF能保持数据空间的几何结构。基于以上两种NMF改进模型, 提出了图正则化MNMF(graph regularized MNMF, GMNMF)算法, 并设计了一个迭代的求解算法。将GMNMF算法及其他相关算法用于中文垃圾邮件过滤实验, 结果表明GMNMF算法构建的过滤模型要优于其他较好的算法构建的过滤模型。 相似文献
9.
10.
针对GDSF替换算法中对访问频率缺少预测的不足,提出了一种基于协同过滤的GDSF缓存替换算法(GDSF-CF)。该算法考虑了Web对象之间相似性与用户访问时间间隔,运用协同过滤算法生成Web对象的预测访问频率,并采用齐普夫定律参数对GDSF算法的目标函数进行了改进。当需要进行缓存替换时,利用目标函数价值计算缓存空间中的每个Web对象缓存价值,将最小缓存价值的Web对象进行替换。仿真实验结果表明,该算法的命中率HR和字节命中率BHR都有较大提升。 相似文献
11.
通过分析在用户评分数据极端稀疏的情况下,现有的基于项目评分预测的协同过滤推荐算法中项目之间的相似性度量不准确以及新项目的冷开始问题,提出了一种优化的基于项目评分预测的协同过滤推荐算法。该算法在计算项目之间的相似性时,既考虑了项目的评分相似性,又考虑了项目的特征属性相似性。实验表明,优化后的算法使计算出的项目之间的相似性更准确,并有效地解决了新项目的推荐问题,使得数据稀疏性对推荐结果的负面影响变小,显著提高了系统的推荐质量。 相似文献
12.
Data co-clustering refers to the problem of simultaneous clustering of two data types. Typically, the data is stored in a
contingency or co-occurrence matrix C where rows and columns of the matrix represent the data types to be co-clustered. An entry C
ij
of the matrix signifies the relation between the data type represented by row i and column j. Co-clustering is the problem of deriving sub-matrices from the larger data matrix by simultaneously clustering rows and
columns of the data matrix. In this paper, we present a novel graph theoretic approach to data co-clustering. The two data
types are modeled as the two sets of vertices of a weighted bipartite graph. We then propose Isoperimetric Co-clustering Algorithm
(ICA)—a new method for partitioning the bipartite graph. ICA requires a simple solution to a sparse system of linear equations
instead of the eigenvalue or SVD problem in the popular spectral co-clustering approach. Our theoretical analysis and extensive
experiments performed on publicly available datasets demonstrate the advantages of ICA over other approaches in terms of the
quality, efficiency and stability in partitioning the bipartite graph. 相似文献
13.
面向元数据流,提出有效评测用户订阅的方法.设计了索引结构对订阅进行分组索引,消除了一个订阅因为包含多个谓词而造成的多次索引、计数和比较;设计了新的基于分组的过滤算法,该算法通过缓存谓词匹配结果使得谓词匹配结果得以在订阅过滤过程中传播,取得了很高的过滤性能.实验结果表明,该系统可以有效地处理达上百万订阅的负载量,实验中引进提取词干和消除停用词,极大提高系统的查全率和精度. 相似文献
14.
We present a new block adaptive algorithm as a variant of the Toeplitz-preconditioned block conjugate gradient (TBCG) algorithm. The proposed algorithm is formulated by combining TBCG algorithm with a data-reusing scheme that is realized by processing blocks of data in an overlapping manner, as in the optimum block adaptive shifting (OBAS) algorithm. Simulation results show that the proposed algorithm is superior to the block conjugate gradient shifting (BCGS), TBCG and Toeplitz-OBAS (TOBAS) algorithms in both convergence rate and tracking property of input signal conditioning. 相似文献
15.
准确而积极地向用户提供他们可能感兴趣的信息或服务是推荐系统的主要任务。协同过滤是采用得最广泛的推荐算法之一,而数据稀疏的问题往往严重影响推荐质量。为了解决这个问题,提出了基于二分图划分联合聚类的协同过滤推荐算法。首先将用户与项目构建成二分图进行联合聚类,从而映射到低维潜在特征空间;其次根据聚类结果改进2种相似性计算策略:簇偏好相似性和评分相似性,并将二者相结合。基于结合的相似性,分别采用基于用户和项目的方法来获得对未知目标评分的预测。最后,将这些预测结果进行融合。实验结果表明,所提算法比最新的联合聚类协同过滤推荐算法具有更好的性能。 相似文献
16.
As we are in the big data age, graph data such as user networks in Facebook and Flickr becomes large. How to reduce the visual complexity of a graph layout is a challenging problem. Clustering graphs is regarded as one of effective ways to address this problem. Most of current graph visualization systems, however, directly use existing clustering algorithms that are not originally developed for the visualization purpose. For graph visualization, a clustering algorithm should meet specific requirements such as the sufficient size of clusters, and automatic determination of the number of clusters. After identifying the requirements of clustering graphs for visualization, in this paper we present a new clustering algorithm that is particularly designed for visualization so as to reduce the visual complexity of a layout, together with a strategy for improving the scalability of our algorithm. Experiments have demonstrated that our proposed algorithm is capable of detecting clusters in a way that is required in graph visualization. 相似文献
17.
We present an algorithm to solve the graph isomorphism problem for the purpose of object recognition. Objects, such as those which exist in a robot workspace, may be represented by labelled graphs (graphs with attributes on their nodes and/or edges). Thereafter, object recognition is achieved by matching pairs of these graphs. Assuming that all objects are sufficiently different so that their corresponding representative graphs are distinct, then given a new graph, the algorthm efficiently finds the isomorphic stored graph (if it exists). The algorithm consists of three phases: preprocessing, link construction, and ambiguity resolution. Results from experiments on a wide variety and sizes of graphs are reported. Results are also reported for experiments on recognising graphs that represent protein molecules. The algorithm works for all types of graphs except for a class of highly ambiguous graphs which includes strongly regular graphs. However, members of this class are detected in polynomial time, which leaves the option of switching to a higher complexity algorithm if desired. 相似文献
18.
针对目前协同过滤推荐算法存在的数据稀疏性问题和可扩展性问题,本文进行了相关研究。针对稀疏性问题,在传统的皮尔逊相关相似度中引入交占比系数计算用户间直接相似度,该方法缓解了用户间共同评分项的占比问题;提出一种基于图游走的间接相似度计算方法,该方法根据用户间的直接相似度建立用户网络图,在用户网络图上通过游走计算用户间的间接相似度,并进行推荐。在Spark平台上实现本文方法的并行化,缓解了数据规模增加带来的可扩展性问题。实验结果表明:本文提出的算法在不同数据集上均取得了良好效果,有效地提高了推荐准确度,并且在分布式环境下具有良好的可扩展性。 相似文献
19.
《Information Processing Letters》2014,114(10):573-578
This letter derives a data filtering based least squares iterative identification algorithm for output error autoregressive systems. The basic idea is to use the data filtering technique to transform the original identification model to an equation error model and to estimate the parameters of this model. The proposed algorithm is more efficient and can produce more accurate parameter estimation than the existing least squares iterative algorithm. 相似文献
20.
Sinichiro Kawano 《Information Processing Letters》2003,85(6):333-337
In this paper, we consider a greedy algorithm for thickness of graphs. The greedy algorithm we consider here takes a maximum planar subgraph away from the current graph in each iteration and repeats this process until the current graph has no edge. The greedy algorithm outputs the number of iterations which is an upper bound of thickness for an input graph G=(V,E). We show that the performance ratio of the greedy algorithm is . 相似文献