首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Functional Trees   总被引:1,自引:0,他引:1  
In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. In the regression setting, model trees algorithms explore multiple representation languages but using linear models at leaf nodes. In this work we study the effects of using combinations of attributes at decision nodes, leaf nodes, or both nodes and leaves in regression and classification tree learning. In order to study the use of functional nodes at different places and for different types of modeling, we introduce a simple unifying framework for multivariate tree learning. This framework combines a univariate decision tree with a linear function by means of constructive induction. Decision trees derived from the framework are able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. We experimentally evaluate a univariate tree, a multivariate tree using linear combinations at inner and leaf nodes, and two simplified versions restricting linear combinations to inner nodes and leaves. The experimental evaluation shows that all functional trees variants exhibit similar performance, with advantages in different datasets. In this study there is a marginal advantage of the full model. These results lead us to study the role of functional leaves and nodes. We use the bias-variance decomposition of the error, cluster analysis, and learning curves as tools for analysis. We observe that in the datasets under study and for classification and regression, the use of multivariate decision nodes has more impact in the bias component of the error, while the use of multivariate decision leaves has more impact in the variance component.  相似文献   

2.
宋超  徐新  桂容  谢欣芳  徐丰 《计算机应用》2017,37(1):244-250
为了充分利用极化合成孔径雷达(SAR)图像不同极化特征对不同地物目标类型的刻画能力,提出一种基于多层支持向量机(SVM)的极化SAR特征分析与分类方法。该方法首先通过特征分析确定适合不同地物类型的最佳特征子集;然后采用分层分类树的方式,根据每一种地物类型的特征子集逐层进行SVM分类;最终得到整体分类结果。RadarSAT-2极化SAR图像分类实验结果表明所提方法水域、耕地、林地、城区4类地物分类精度为85%左右,总体分类精度达到86%。该算法充分利用了不同地物目标类型的特性,提高了分类精度,也降低了算法时间复杂度。  相似文献   

3.
孔芝  袁航  王立夫  郭戈 《自动化学报》2022,48(4):1048-1059
复杂系统间的相互作用能够用复杂网络描述.复杂网络中某些节点遭受攻击或破坏会造成网络故障,导致整个网络能控性变化.不同节点失效会对网络能控性有不同的影响.本文提出一种网络节点的分类方式,将网络中的节点根据边的方向和匹配关系分成九种类型,并给出了辨识节点类型的算法.另外,本文给出了基于此分类方式下复杂网络中某类节点失效时,...  相似文献   

4.
概念漂移数据流挖掘算法综述   总被引:1,自引:0,他引:1  
丁剑  韩萌  李娟 《计算机科学》2016,43(12):24-29, 62
数据流是一种新型的数据模型,具有动态、无限、高维、有序、高速和变化等特性。在真实的数据流环境中,一些数据分布是随着时间改变的,即具有概念漂移特征,称为可变数据流或概念漂移数据流。因此处理数据流模型的方法需要处理时空约束和自适应调整概念变化。对概念漂移问题和概念漂移数据流分类、聚类和模式挖掘等内容进行综述。首先介绍概念漂移的类型和常用概念改变检测方法。为了解决概念漂移问题,数据流挖掘中常使用滑动窗口模型对新近事务进行处理。数据流分类常用的模型包括单分类模型和集成分类模型,常用的方法包括决策树、分类关联规则等。数据流聚类方式通常包括基于k- means的和非基于k- means的。模式挖掘可以为分类、聚类和关联规则等提供有用信息。概念漂移数据流中的模式包括频繁模式、序列模式、episode、模式树、模式图和高效用模式等。最后详细介绍其中的频繁模式挖掘算法和高效用模式挖掘算法。  相似文献   

5.
Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate the similarity between different types of objects.We propose a meta-path-based outlier detection method(MPOutliers)in heterogeneous information network to deal with problems in one go under a unified framework.MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships.It discovers the semantic information among nodes in heterogeneous networks,instead of only considering the network structure.It also computes the closeness degree between nodes with the same type,which extends the whole heterogeneous network.Moreover,each node is assigned with a reliable weighting to measure its authority degree.Substantial experiments on two real datasets(AMiner and Movies dataset)show that our proposed method is very effective and efficient for outlier detection.  相似文献   

6.
为提高无线传感网的生存时间,提出基于最短路径树的优化生存时间路由算法(LORA_SPT).该算法引入节点分类概念,构造基于链路能耗因子、自身节点剩余能量因子、邻居节点剩余能量因子和类型权重因子等多个因子的权值函数.针对不同类型的节点采用不同的权重因子,最后利用dijkstra算法完成最短路径树,所有节点沿着最短路径树将...  相似文献   

7.
对分类算法的描述通常缺少定量的分析与研究,本文以K-近邻、支持向量机和决策树为研究对象,定量分析算法参数、数据噪音、节点数量对分类精度和运行时间的影响。首先研究这几种算法及参数作用,选定最优参数,分析不同噪音对分类精度的影响,然后分析节点数量对分类精度影响及运行时间变化。通过Scikit-learn模块对讨论内容进行仿真实验,实验结果清楚地展示了分类算法在不同参数条件下分类特点,为实际数据分类研究提供指导。  相似文献   

8.
In this paper, we propose a multiple-metric learning algorithm to learn jointly a set of optimal homogenous/heterogeneous metrics in order to fuse the data collected from multiple sensors for joint classification. The learned metrics have the potential to perform better than the conventional Euclidean metric for classification. Moreover, in the case of heterogenous sensors, the learned multiple metrics can be quite different, which are adapted to each type of sensor. By learning the multiple metrics jointly within a single unified optimization framework, we can learn better metrics to fuse the multi-sensor data for a joint classification. Furthermore, we also exploit multi-metric learning in a kernel induced feature space to capture the non-linearity in the original feature space via kernel mapping.  相似文献   

9.
The wireless sensor network (WSN) technology have been evolving very quickly in recent years. Sensors are constantly increasing in sensing, processing, storage, and communication capabilities. In many WSNs that are used in environmental, commercial and military applications, the sensors are lined linearly due to the linear nature of the structure or area that is being monitored making a special class of these networks; We defined these in a previous paper as Linear Sensor Networks (LSNs), and provided a classification of the different types of LSNs. A pure multihop approach to route the data all the way along the linear network (e.g. oil, gas and water pipeline monitoring, border monitoring, road-side monitoring, etc.), which can extend for hundreds or even thousands of kilometers can be very costly from an energy dissipation point of view. In order to significantly reduce the energy consumption used in data transmission and extend the network lifetime, we present a framework for monitoring linear infrastructures using LSNs where data collection and transmission is done using Unmanned Aerial Vehicles (UAVs). The system defines four types of nodes, which include: sensor nodes (SNs), relay nodes (RNs), UAVs, and sinks. The SNs use a classic WSN multihop routing approach to transmit their data to the nearest RN, which acts as a cluster head for its surrounding SNs. Then, a UAV moves back and forth along the linear network and transport the data that is collected by the RNs to the sinks located at both ends of the LSN. We name this network architecture a UAV-based LSNs (ULSNs). This approach leads to considerable savings in node energy consumption, due to a significant reduction of the transmission ranges of the SN and RN nodes and the use of a one-hop transmission to communicate the data from the RNs to the UAV. Furthermore, the strategy provides for reduced interference between the RNs that can be caused by hidden terminal and collision problems, that would be expected if a pure multihop approach is used at the RN level. In addition, three different UAV movement approaches are presented, simulated, and analyzed in order to measure system performance under various network conditions.  相似文献   

10.
We address the problem of multi-label classification in heterogeneous graphs, where nodes belong to different types and different types have different sets of classification labels. We present a novel approach that aims to classify nodes based on their neighborhoods. We model the mutual influence of nodes as a random walk in which the random surfer aims at distributing class labels to nodes while walking through the graph. When viewing class labels as “colors”, the random surfer is essentially spraying different node types with different color palettes; hence the name Graffiti of our method. In contrast to previous work on topic-based random surfer models, our approach captures and exploits the mutual influence of nodes of the same type based on their connections to nodes of other types. We show important properties of our algorithm such as convergence and scalability. We also confirm the practical viability of Graffiti by an experimental study on subsets of the popular social networks Flickr and LibraryThing. We demonstrate the superiority of our approach by comparing it to three other state-of-the-art techniques for graph-based classification.  相似文献   

11.
基于演化博弈论的WSNs 信任决策模型与动力学分析   总被引:1,自引:0,他引:1  
针对无线传感器网络(WSNs)节点间信任关系建立时的信任决策和动态演化问题,引入与节点信任度绑定的激励机制,建立WSNs节点信任博弈模型以反映信任建立过程中表现出的有限理性和每次博弈过程的收益.基于演化博弈论研究节点信任策略选择的演化过程,给出WSNs节点信任演化的复制动态方程,提出并证明在不同参数条件下达到演化稳定策略的定理,为WSNs信任机制设计提供了理论基础.实验表明了定理结论和激励机制的效果.  相似文献   

12.
无线传感器网络是一个暴露在开放环境中的分布式网络,各节点之间相互独立,缺乏中心节点和监控节点,极易受到恶意节点的攻击.为了对无线传感器网络中的大量传感器节点进行恶意节点检测,提出了一种基于多元分类的恶意节点检测方法.提出的检测方法是在已知少量传感器节点类型的前提下,抽取与已知恶意节点类型相关的传感器节点属性,建立关于全...  相似文献   

13.
We study the static load balancing problem in a distributed computer system that consists of a set of heterogeneous computer systems interconnected by a star network with two-way traffic. We formulate the static load balancing problem as a nonlinear optimization problem which minimizes the mean response time. We prove that in the optimal solution the satellite nodes in the star network can be divided into four different types: the idle source nodes, the active source nodes, the neutral nodes, and the sink nodes. The necessary and sufficient conditions for optimal solution are studied. An efficient algorithm of complexity O(n) is proposed for the static load balancing of an n-satellite system. The effects of link communication time on optimal load balancing in a star network are also studied by parametric analysis. By employing the proposed algorithm, a significant system performance improvement over that without load balancing is illustrated in a numerical example. The numerical example also shows that the effects of the link communication time in a star network are large.  相似文献   

14.
This paper proposes a model for a special case of the machine interference problem (MIP), where each of N identical machines randomly requests several different service types. Each request for a service type is fulfilled by an operator who can provide only one service type. The model allows the calculation of the expected interference (waiting) time in the queue for each service type, according to the multinomial distribution. The uniqueness of the model is that under its assumptions internal service order and queue discipline are not needed for the interference calculations. The model requires as inputs only the machine runtime and the average time of each service type that is needed to produce one unit. These inputs can be obtained by a common work measurement. The model enables practitioners to determine the optimal numbers of operators that are needed for each service type in order to minimize the cost per unit or maximize the profit, or to set other performance measures. To demonstrate the applicability of the model, a theoretical analysis and a case study are presented.  相似文献   

15.
This paper enlightens some of the key issues involved in developing real schedule generation architecture in E-manufacturing environment. The high cost, long cycle time of development of shop floor control systems and the lack of robust system integration capabilities are some of the major deterrents in the development of the underlying architecture. We conceptualize a robust framework, capable of providing flexibility to the system, communication among various entities and making intelligent decisions. Owing to the fast communication, distributed control and autonomous character, agent-oriented architecture has been preferred here to address the scheduling problem in E-manufacturing. An integer programming based model with dual objectives of minimizing the makespan and increasing the system throughput has been formulated for determining the optimal part type sequence from the part type pool. It is very difficult to appraise all possible combinations of the operation-machine allocations in order to accomplish the above objectives. A combinatorial auction-based heuristic has been proposed to minimize large search spaces and to obtain optimal or near-optimal solutions of operation-machine allocations of given part types with tool slots and available machine time as constraint. We have further shown the effects of exceeding the planning horizon due to urgency of part types or over time given to complete the part type processing on shop floor and observed the significant increase in system throughput.  相似文献   

16.
In almost all applications of queueing network models it is assumed that for each customer the service times at different network nodes are independent. But service times in, for instance, computer and communication networks are typically essentially determined by properties like message or packet lengths that do not change substantially on the route through the network. Therefore, the service times of any customer in a queueing network are likely to be correlated, which can significantly influence quality of service (QoS) properties and performance measures such that results obtained with the independence assumption may be misleading. We consider delays in a series of queues with correlated service times at each network node where for each customer the service time at the first node is a random variable and the successive service times are correlated with the one at the first node. A recursive scheme for delays is provided. This scheme is used in order to efficiently conduct a simulation study where two types of correlation are studied, namely identical service times, and service times with an additional Gaussian noise. The simulation study focuses on comparisons of end-to-end delays for independent service times at different nodes and correlated service times, respectively. It turns out that for both correlation types, in light traffic the delays in case of correlated service times are larger than for independent service times by a factor that first increases with increasing traffic intensity up to a maximum value approached in medium traffic after which it decreases quickly and drops down to become significantly smaller than one in heavy traffic. This effect intensifies with increasing number of network nodes and depends, as well as the crossover point from which on correlated service times yield smaller delays, on the distribution of the service times at the first node.  相似文献   

17.
This study presents a simulation optimization approach for a hybrid flow shop scheduling problem in a real-world semiconductor back-end assembly facility. The complexity of the problem is determined based on demand and supply characteristics. Demand varies with orders characterized by different quantities, product types, and release times. Supply varies with the number of flexible manufacturing routes but is constrained in a multi-line/multi-stage production system that contains certain types and numbers of identical and unrelated parallel machines. An order is typically split into separate jobs for parallel processing and subsequently merged for completion to reduce flow time. Split jobs that apply the same qualified machine type per order are compiled for quality and traceability. The objective is to achieve the feasible minimal flow time by determining the optimal assignment of the production line and machine type at each stage for each order. A simulation optimization approach is adopted due to the complex and stochastic nature of the problem. The approach includes a simulation model for performance evaluation, an optimization strategy with application of a genetic algorithm, and an acceleration technique via an optimal computing budget allocation. Furthermore, scenario analyses of the different levels of demand, product mix, and lot sizing are performed to reveal the advantage of simulation. This study demonstrates the value of the simulation optimization approach for practical applications and provides directions for future research on the stochastic hybrid flow shop scheduling problem.  相似文献   

18.
Kubernetes是一个管理容器化应用的开源平台,其默认的调度算法在优选阶段仅把CPU和内存两种资源来作为计算节点的评分指标,同时还忽略了不同类型的Pod对节点资源的占用比例是不同的,容易导致某一资源达到性能瓶颈,从而造成节点对资源使用失衡.针对上述问题,本文在Kubernetes原有的资源指标基础上增加了带宽和磁盘容量,考虑到CPU、内存、带宽和磁盘容量这4类资源在节点上的占用比例对节点的性能的影响,可能造成Pod中应用的非正常运行,甚至杀死Pod,从而影响集群整体的高可靠性.本文将等待创建的Pod区分为可压缩消耗型、不可压缩消耗型以及均衡型,并为每种类型的Pod设置相应的权重,最后通过改进的秃鹰搜索算法(TBESK)来寻找出最优节点进行调度.实验结果表明,随着集群中Pod的数量在不断增加,在集群负载较大的情况下, TBESK算法的综合负载标准差和默认的调度算法相比提升了24%.  相似文献   

19.
Mining data streams is the process of extracting information from non-stopping, rapidly flowing data records to provide knowledge that is reliable and timely. Streaming data algorithms need to be one pass and operate under strict limitations of memory and response time. In addition, the classification of streaming data requires learning in an environment where the data characteristics might change constantly. Many of the classification algorithms presented in literature assume a 100 % labeling rate, which is impractical and expensive when data records are rapidly flowing in. In this paper, a new incremental grid density based learning framework, the GC3 framework, is proposed to perform classification of streaming data with concept drift and limited labeling. The proposed framework uses grid density clustering to detect changes in the input data space. It maintains an evolving ensemble of classifiers to learn and adapt to the model changes over time. The framework also uses a uniform grid density sampling mechanism to obtain a uniform subset of samples for better classification performance with a lower labeling rate. The entire framework is designed to be one-pass, incremental and work with limited memory to perform any-time classification on demand. Experimental comparison with state of the art concept drift handling systems demonstrate the GC3 frameworks ability to provide high classification performance, using fewer models in the ensemble and with only 4-6 % of the samples labeled. The results show that the GC3 framework is effective and attractive for use in real world data stream classification applications.  相似文献   

20.
Condition-dependent training strategy divides a training database into a number of clusters, each corresponding to a noise condition and subsequently trains a hidden Markov model (HMM) set for each cluster. This paper investigates and compares a number of condition-dependent training strategies in order to achieve a better understanding of the effects on automatic speech recogntion (ASR) performance as caused by a splitting of the training databases. Also, the relationship between mismatches in signal-to-noise ratio (SNR) is analyzed. The results show that a splitting of the training material in terms of both noise type and SNR value is advantageous compared to previously used methods, and that training of only a limited number of HMM sets is sufficient for each noise type for robustly handling of SNR mismatches. This leads to the introduction of an SNR and noise classification-based training strategy (SNT-SNC). Better ASR performance is obtained on test material containing data from known noise types as compared to either multicondition training or noise-type dependent training strategies. The computational complexity of the SNT-SNC framework is kept low by choosing only one HMM set for recognition. The HMM set is chosen on the basis of results from noise classification and SNR value estimations. However, compared to other strategies, the SNT-SNC framework shows lower performance for unknown noise types. This problem is partly overcome by introducing a number of model and feature domain techniques. Experiments using both artificially corrupted and real-world noisy speech databases are conducted and demonstrate the effectiveness of these methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号