首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We address the task of multi-target regression, where we generate global models that simultaneously predict multiple continuous variables. We use ensembles of generalized decision trees, called predictive clustering trees (PCTs), in particular bagging and random forests (RF) of PCTs and extremely randomized PCTs (extra PCTs). We add another dimension of randomization to these ensemble methods by learning individual base models that consider random subsets of target variables, while leaving the input space randomizations (in RF PCTs and extra PCTs) intact. Moreover, we propose a new ensemble prediction aggregation function, where the final ensemble prediction for a given target is influenced only by those base models that considered it during learning. An extensive experimental evaluation on a range of benchmark datasets has been conducted, where the extended ensemble methods were compared to the original ensemble methods, individual multi-target regression trees, and ensembles of single-target regression trees in terms of predictive performance, running times and model sizes. The results show that the proposed ensemble extension can yield better predictive performance, reduce learning time or both, without a considerable change in model size. The newly proposed aggregation function gives best results when used with extremely randomized PCTs. We also include a comparison with three competing methods, namely random linear target combinations and two variants of random projections.  相似文献   

2.
传统的机器学习算法难以有效处理具有自相关性的网络数据,而已有的网络学习算法多为分类算法,回归算法较少。为解决网络数据中的回归预测问题,考虑数据实例间的自相关性,提出一种迭代加权线性回归算法(IWR)。该算法采用迭代分类算法的集体学习框架,每步迭代中将待预测实例逐个输入局部回归模型以更新目标属性值,直至达到既定目标。在空间网络和社会网络的数据集合上进行实验,结果表明,与传统回归算法及NCLUS算法相比,IWR算法可以有效减小预测误差。  相似文献   

3.
Sensor networks, communication and financial networks, web and social networks are becoming increasingly important in our day-to-day life. They contain entities which may interact with one another. These interactions are often characterized by a form of autocorrelation, where the value of an attribute at a given entity depends on the values at the entities it is interacting with. In this situation, the collective inference paradigm offers a unique opportunity to improve the performance of predictive models on network data, as interacting instances are labeled simultaneously by dealing with autocorrelation. Several recent works have shown that collective inference is a powerful paradigm, but it is mainly developed with a fully-labeled training network. In contrast, while it may be cheap to acquire the network topology, it may be costly to acquire node labels for training. In this paper, we examine how to explicitly consider autocorrelation when performing regression inference within network data. In particular, we study the transduction of collective regression when a sparsely labeled network is a common situation. We present an algorithm, called CORENA (COllective REgression in Network dAta), to assign a numeric label to each instance in the network. In particular, we iteratively augment the representation of each instance with instances sharing correlated representations across the network. In this way, the proposed learning model is able to capture autocorrelations of labels over a group of related instances and feed-back the more reliable labels predicted by the transduction in the labeled network. Empirical studies demonstrate that the proposed approach can boost regression performances in several spatial and social tasks.  相似文献   

4.
Kocev  Dragi  Ceci  Michelangelo  Stepišnik  Tomaž 《Machine Learning》2020,109(11):2213-2241

We address the task of learning ensembles of predictive models for structured output prediction (SOP). We focus on three SOP tasks: multi-target regression (MTR), multi-label classification (MLC) and hierarchical multi-label classification (HMC). In contrast to standard classification and regression, where the output is a single (discrete or continuous) variable, in SOP the output is a data structure—a tuple of continuous variables MTR, a tuple of binary variables MLC or a tuple of binary variables with hierarchical dependencies (HMC). SOP is gaining increasing interest in the research community due to its applicability in a variety of practically relevant domains. In this context, we consider the Extra-Tree ensemble learning method—the overall top performer in the DREAM4 and DREAM5 challenges for gene network reconstruction. We extend this method for SOP tasks and call the extension Extra-PCTs ensembles. As base predictive models we propose using predictive clustering trees (PCTs)–a generalization of decision trees for predicting structured outputs. We conduct a comprehensive experimental evaluation of the proposed method on a collection of 41 benchmark datasets: 21 for MTR, 10 for MLC and 10 for HMC. We first investigate the influence of the size of the ensemble and the size of the feature subset considered at each node. We then compare the performance of Extra-PCTs to other ensemble methods (random forests and bagging), as well as to single PCTs. The experimental evaluation reveals that the Extra-PCTs achieve optimal performance in terms of predictive power and computational cost, with 50 base predictive models across the three tasks. The recommended values for feature subset sizes vary across the tasks, and also depend on whether the dataset contains only binary and/or sparse attributes. The Extra-PCTs give better predictive performance than a single tree (the differences are typically statistically significant). Moreover, the Extra-PCTs are the best performing ensemble method (except for the MLC task, where performances are similar to those of random forests), and Extra-PCTs can be used to learn good feature rankings for all of the tasks considered here.

  相似文献   

5.
Instead of traditional (multi-class) learning approaches that assume label independency, multi-label learning approaches must deal with the existing label dependencies and relations. Many approaches try to model these dependencies in the process of learning and integrate them in the final predictive model, without making a clear difference between the learning process and the process of modeling the label dependencies. Also, the label relations incorporated in the learned model are not directly visible and can not be (re)used in conjunction with other learning approaches. In this paper, we investigate the use of label hierarchies in multi-label classification, constructed in a data-driven manner. We first consider flat label sets and construct label hierarchies from the label sets that appear in the annotations of the training data by using a hierarchical clustering approach. The obtained hierarchies are then used in conjunction with hierarchical multi-label classification (HMC) approaches (two local model approaches for HMC, based on SVMs and PCTs, and two global model approaches, based on PCTs for HMC and ensembles thereof). The experimental results reveal that the use of the data-derived label hierarchy can significantly improve the performance of single predictive models in multi-label classification as compared to the use of a flat label set, while this is not preserved for the ensemble models.  相似文献   

6.
Methods that address the task of multi-target regression on data streams are relatively weakly represented in the current literature. We present several different approaches to learning trees and ensembles of trees for multi-target regression based on the Hoeffding bound. First, we introduce a local method, which learns multiple single-target trees to produce multiple predictions, which are then aggregated into a multi-target prediction. We follow with a tree-based method (iSOUP-Tree) which learns trees that predict all of the targets at once. We then introduce iSOUP-OptionTree, which extends iSOUP-Tree through the use of option nodes. We continue with ensemble methods, and describe the use of iSOUP-Tree as a base learner in the online bagging and online random forest ensemble approaches. We describe an evaluation scenario, and present and discuss the results of the described methods, most notably in terms of predictive performance and the use of computational resources. Finally, we present two case studies where we evaluate the introduced methods in terms of their efficiency and viability of application to real world domains.  相似文献   

7.
8.
高维数据的聚类特性通常难以直接观测. 将其构建为复杂网络, 节点间的拓扑结构可以反映样本之间的关系. 对网络中的节点进行社区发现, 可实现对数据更直观的聚类. 提出一种基于网络社区发现的低随机性标签传播聚类算法. 首先, 用半径和最近邻方法将数据集构建为稀疏的全连通网络. 之后, 根据节点相似度进行节点标签预处理, 使得相似的节点具有相同的标签. 用节点的影响力值改进标签传播过程, 降低标签选择的随机性. 最后, 基于内聚度进行社区的优化合并, 提高社区的质量. 在真实数据集和人工数据集上的实验结果表明, 该算法对各种类型的数据都具有较好的适应性.  相似文献   

9.
半监督模式下社团结构划分方法   总被引:1,自引:0,他引:1       下载免费PDF全文
为了对有标签和无标签节点混合的网络进行分类,给出了一种基于半监督学习的信息传递分类算法,算法首先确定网络中无标签节点的分类参数,然后通过对网络中所有无标签节点进行有限次的迭代计算,可以对所有节点进行分类。实验数据分析证明了该算法在进行半监督分类时具有比较好的效果。  相似文献   

10.
许多基于网络结构信息的链接预测算法利用节点的聚集程度评估节点间的相似性,进而执行链接预测;然而,该类算法只注重网络中节点的聚集系数,没有考虑预测节点与共同邻居节点之间的链接聚集系数对节点间相似性的影响。针对上述问题,提出了一种融合节点聚集系数和非对称链接聚集系数的链接预测算法。首先,计算共同邻居节点的聚集系数,并利用共同邻居节点对应的两个非对称链接聚集系数计算该预测节点的平均链接聚集系数;然后,基于Dempster-Shafer证据理论将两种聚集系数进行融合生成一个综合性度量指标,并将该指标应用于中间概率模型(IMP),得到一个新的节点相似性指标(IMP_DS)。在9个网络数据上的实验结果表明,该算法的受试者工作特征(ROC)的曲线下方面积(AUC)与精度值(Precision)优于共同邻居(CN)、Adamic-Adar(AA)、资源分配(RA)指标和基于共同邻居的中间概率模型(IMP_CN)。  相似文献   

11.
随着信息社会的发展,网络安全的重要性日益凸显,准确获取网络实体的地理位置有助于更好地实施网络管理。现有经典的基于拓扑启发式聚类的网络实体定位方法,采用基于网络结构的集群划分对网络实体进行聚类,由于没有考虑网络拓扑的具体特性,导致最后的结果误差较大。为解决这一问题,提出一种基于模体的目标区域网络拓扑划分方法。该方法根据目标网络拓扑呈现局部节点高聚类性的特点,创新性地引入"模体"的概念,在目标网络拓扑中挖掘模体结构并进行分析;然后借鉴复杂网络研究领域内局部社团发现方法中初始种子扩展的思路,以模体结构为初始种子进行相应扩展,将拓扑中与模体紧密相连的节点划分为多个集合;最后分别根据地标和公开的IP地理位置数据库对划分的节点集合进行定位,将集合的位置作为集合内节点的地理位置,从而实现网络实体的批量定位。基于香港和台湾两个地区网络拓扑的实验结果表明,该方法与经典的HC-Based方法、NNC方法相比,在网络实体定位准确率上分别能提高25%和16%左右,并且可批量定位的网络实体更多。  相似文献   

12.
Modeling spatial context (e.g., autocorrelation) is a key challenge in classification problems that arise in geospatial domains. Markov random fields (MRF) is a popular model for incorporating spatial context into image segmentation and land-use classification problems. The spatial autoregression (SAR) model, which is an extension of the classical regression model for incorporating spatial dependence, is popular for prediction and classification of spatial data in regional economics, natural resources, and ecological studies. There is little literature comparing these alternative approaches to facilitate the exchange of ideas. We argue that the SAR model makes more restrictive assumptions about the distribution of feature values and class boundaries than MRF. The relationship between SAR and MRF is analogous to the relationship between regression and Bayesian classifiers. This paper provides comparisons between the two models using a probabilistic and an experimental framework.  相似文献   

13.
现有的基于网络表示学习的链路预测算法主要通过捕获网络节点的邻域拓扑信息构造特征向量来进行链路预测,该类算法通常只注重从网络节点的单一邻域拓扑结构中学习信息,而对多个网络节点在链路结构上的相似性方面研究不足。针对此问题,提出一种基于密集连接卷积神经网络(DenseNet)的链路预测模型(DenseNet-LP)。首先,利用基于网络表示学习算法node2vec生成节点表示向量,并利用该表示向量将网络节点的结构信息映射为三维特征数据;然后,利用密集连接卷积神经网络来捕捉链路结构的特征,并建立二分类模型实现链路预测。在四个公开的数据集上的实验结果表明,相较于网络表示学习算法,所提模型链路预测结果的ROC曲线下方面积(AUC)值最大提高了18个百分点。  相似文献   

14.
针对真实环境下多目标表情分类识别算法准确率低的问题,提出一种基于改进的快速区域卷积神经网络(Faster RCNN)面部表情检测算法.该算法利用二阶检测网络实现表情识别中的多目标识别与定位,使用密集连接模块替代原始的特征提取模块,该模块能够融合多层次特征信息,增加网络深度并避免网络梯度消失.采用柔性非极大抑制(soft...  相似文献   

15.
In designing wireless sensor networks of image transmitting, it is important to reduce energy dissipation and prolong network lifetime. This paper presents the research on existing clustering algorithm applied in heterogeneous sensor networks and then puts forward an energy-efficient prediction clustering algorithm, which is adaptive to sensor networks with energy and objects heterogeneous. This algorithm enables the nodes to select the cluster head according to factors such as energy and communication cost, thus the nodes with higher residual energy have higher probability to become a cluster head than those with lower residual energy, so that the network energy can be dissipated uniformly. In order to reduce energy consumption when broadcasting in clustering phase and prolong network lifetime, an energy consumption prediction model is established for regular data acquisition nodes. Simulation results and the application in image clustering show that compared with current clustering algorithms, this algorithm can achieve longer sensor network lifetime, higher energy efficiency, and superior network monitoring quality.  相似文献   

16.
Compared with conventional graph data analysis methods, the graph embedding algorithm provides a new graph data analysis strategy. It aims to encode graph nodes into vectors to mine or analyze graph data more effectively using neural network related technologies. Some classic tasks have been improved significantly by graph embedding methods, such as node classification, link prediction, and traffic flow prediction. Although substantial breakthroughs have been made by former researchers in graph embedding, the nodes embedding problem over temporal graph has been seldom studied. In this study, we propose an adaptive temporal graph embedding (ATGED), attempting to encode temporal graph nodes into vectors by combining previous research and the information propagation characteristics. First, an adaptive cluster method is proposed by solving the situation that nodes active frequency varies types of graph. Then, a new node walk strategy is designed in order to store the time sequence between nodes, and also the walking list will be stored in a bidirectional multi-tree in the walking process to get complete walking lists fast. Last, based on the basic walking characteristics and graph topology, an important node sampling strategy is proposed to train the satisfied neural network as soon as possible. Sufficient experiments demonstrate that the proposed method surpasses existing embedding methods in terms of node clustering, reachability prediction, and node classification in temporal graphs.  相似文献   

17.
李瑾  潘宏  刘中兵 《计算机应用》2012,32(7):1840-1843
对移动Ad Hoc网络(MANET)中的分簇机制进行了研究,提出一种基于连通支配集的组合权值簇生成算法(WCACDS),包括分簇算法和簇结构维护策略。通过节点的移动性、最小平均发射功率、能量消耗速度三方面的组合权值来量化节点的综合性能,利用改进后的求解连通支配集算法对节点分簇,以使性能较强的节点担任簇头,并且减少分簇数量。仿真实验结果表明,所提算法有助于提高网络负载均衡能力,增强网络的健壮性及稳定性。  相似文献   

18.
现存大部分有向网络的链路预测方法仅关注链接方向信息和互惠链接信息而忽略节点重要性及度相关聚类的贡献,导致预测精度下降。针对以上不足,提出基于节点中心性和度相关聚类的有向网络链路预测指标。首先,利用节点中心性统计任意节点邻居数量去衡量节点的影响力;其次,将节点度相关聚类系数方法扩展到有向网络去评估节点聚类能力,并与网络同配系数相融合获得节点对高聚类能力;最后,融合以上2类信息提出一个带参的有向网络链路预测指标。在6个真实世界有向网络上与最近代表性预测指标比较,所提指标AUPR和AUC分别提高了33%和1.6%。  相似文献   

19.
针对无线传感器网络中节点负载过重与能耗不均衡而出现网络能量空洞的问题,基于演化博弈理论建立一种簇头竞选的博弈模型,同时提出一种基于演化博弈的无线传感器网络最优成簇算法。运用节点的剩余能量、数据接收能耗和数据转发能耗设计簇头演化博弈的收益函数,并将最优发射功率控制机制应用于簇成员的选择,从而形成稳定连通的网络分簇结构。仿真实验表明该算法平衡了节点负载,从而均衡网络能量,有效改善网络中过早出现能量空洞的问题,进而延长了网络生存时间。  相似文献   

20.
动态信息网络是当前复杂网络领域一个极具挑战的新问题,其动态的演化过程具有时序、复杂、多变的特点.结构是网络最基本的特征,也是进行网络建模和分析的基础,研究网络结构的演化过程对全面认识复杂系统的行为倾向具有重要意义.使用“角色”来量化动态网络的结构,得到动态网络的角色模型,应用并改进多类标分类问题的“问题转换”思想,将动态网络的角色预测问题视为多目标回归问题,以历史网络数据作为训练数据构建模型,预测未来时刻网络可能的角色分布情况,提出基于多目标回归思想的动态网络角色预测方法MTR-RP.该方法不仅克服了基于转移矩阵方法忽略时间因素的不足,并且考虑了多个预测目标之间可能存在的依赖关系,实验结果表明,本文提出的MTR-RP方法具有更准确且更稳定的预测效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号