首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Over the last few years, the dimensionality of datasets involved in data mining applications has increased dramatically. In this situation, feature selection becomes indispensable as it allows for dimensionality reduction and relevance detection. The research proposed in this paper broadens the scope of feature selection by taking into consideration not only the relevance of the features but also their associated costs. A new general framework is proposed, which consists of adding a new term to the evaluation function of a filter feature selection method so that the cost is taken into account. Although the proposed methodology could be applied to any feature selection filter, in this paper the approach is applied to two representative filter methods: Correlation-based Feature Selection (CFS) and Minimal-Redundancy-Maximal-Relevance (mRMR), as an example of use. The behavior of the proposed framework is tested on 17 heterogeneous classification datasets, employing a Support Vector Machine (SVM) as a classifier. The results of the experimental study show that the approach is sound and that it allows the user to reduce the cost without compromising the classification error.  相似文献   

Developers apply object-oriented (OO) design principles to produce modular, reusable software. Therefore, service-specific groups of related software classes called modules arise in OO systems. Extracting the modules is critical for better software comprehension, efficient architecture recovery, determination of service candidates to migrate legacy software to a service-oriented architecture, and transportation of such services to cloud-based distributed systems. In this study, we propose a novel approach to automatic module extraction to identify services in OO software systems. In our approach, first we create a weighted and directed graph of the software system in which vertices and edges represent the classes and their relations, respectively. Then, we apply a clustering algorithm over the graph to extract the modules. We calculate the weight of an edge by considering its probability of being within a module or between modules. To estimate these positional probabilities, we propose a machine-learning-based classification system that we train with data gathered from a real-world OO reference system. We have implemented an automatic module extraction tool and evaluated the proposed approach on several open-source and industrial projects. The experimental results show that the proposed approach generates highly accurate decompositions that are close to authoritative module structures and outperforms existing methods.  相似文献   

矿工加入矿池是目前比特币挖矿最常见的方式。然而,比特币系统中存在矿池互相渗透攻击的现象,这将导致被攻击矿池的矿工收益减少,发起攻击的矿池算力降低,从而造成比特币系统的整体算力减小。针对矿池之间互相攻击,不合作挖矿的问题,提出自适应零行列式策略(AZD),采取"比较预期合作收益与背叛收益,选择促进高收益的策略"的思想促进矿池合作。首先,通过结合时序差分增强算法与零行列式策略的方法预测下一轮合作收益与背叛收益;其次,通过决策过程(DMP)选择策略进一步改变下一轮的合作概率和背叛概率;最后,通过迭代执行自适应零行列式策略,达到网络中矿池均互相合作、积极挖矿的目的。实验模拟表明,AZD策略与自适应策略相比,合作概率收敛为1的速度提高了36.54%;与零行列式策略相比,稳定度提高了50%。这个结果表明AZD策略能够有效促进矿工合作,提高合作收敛速率,保证矿池的稳定收益。  相似文献   

提出了双摄像机模组的组合式标定和校正方法,能够将传统的标定和校正2道工序合并为1道工序,不需要借助于外部测量设备,仅利用双摄像机同时对目标模板拍摄的1幅图像,即可实现双摄像机模组的标定和校正。先基于交比不变性计算摄像机的径向畸变系数,将摄像机畸变成像模型转换为线性模型,利用线性模型分别对2个摄像机进行标定;然后计算2个摄像机之间的位姿偏移参数,调节右摄像机位姿,进行双摄像机之间的位姿校正;最后标定2个摄像机之间的位姿参数。实际应用结果表明,所提出的双摄像机模组校正和标定方法,校正和标定精度高,缩短了工艺时间,提高了工艺效率,能够满足双摄像机模组封装生产工艺的要求。  相似文献   

应用模糊方法的设计模式挖掘策略研究   总被引:1,自引:0,他引:1       下载免费PDF全文
从系统源码中挖掘设计模式对软件的可理解性和可维护性具有重要意义。基于模糊理论,提出一种模式匹配方法,实现设计模式挖掘。其中,使用基于类关系的素数矩阵模型对设计模式结构及源码信息进行描述,并作为匹配的模型基础;采用聚类方法对源码模型进行优化,提高匹配效率;将模糊方法与设计模式匹配策略相结合,引入静态和动态信息,提高匹配的正确性。实验结果证明此方法在精确性和完整性方面得到了很大的提高,并且避免了对特殊模式的失效性。  相似文献   

Increased emphasis on control of work-in-process costs in assembly scheduling of large, complex items leads to increased needs for aids to foremen in dealing with schedule changes. The task is complicated by constraints on resources that often require that activities are begun earlier than a just-in-time schedule would otherwise dictate. A criterion used in prototype tandem knowledge-based decision-aiding systems in the past was based on the assumption that investment costs do not compound. This can provide misleading choices in some cases. The present work refines the criterion previously used by including the compounding costs of holding subassemblies in inventory. A simplified version of the new formula is developed which provides simple rules for deciding which activities to start early if necessary. Numerical comparisons are made between the criteria.  相似文献   


The minimum independent dominating set problem (MIDS) is an extension of the classical dominating set problem with wide applications. In this paper, we describe a greedy randomized adaptive search procedure (GRASP) with path cost heuristic for MIDS, as well as the classical tabu mechanism. Our novel GRASP algorithm makes better use of the vertex neighborhood information provided by path cost and thus is able to discover better and more solutions and to escape from local optimal solutions when the original GRASP fails to find new improved solutions. Moreover, to further overcome the serious cycling problem, the tabu mechanism is employed to forbid some just-removed vertices back to the candidate solution. Computational experiments carried out on standard benchmarks, namely DIMACS instances, show that our algorithm consistently outperforms two MIDS solvers as well as the original GRASP.


This paper proposes a cost-based fuzzy classification system for pattern classification problems with an order of class importance. The task here is to minimize the misclassification of patterns from an important class. It is assumed that the classification importance is given for each class, not for each pattern. Another assumption is that only the order of importance is given for given classes without any numerical measures of importance. We show the performance of the proposed cost-based fuzzy classification system for a real-world pattern classification problem. This work was presented in part at the 12th International Symposium on Artificial Life and Robotics, Oita, Japan, January 25–27, 2007  相似文献   

In this paper we introduce a method called CL.E.D.M. (CLassification through ELECTRE and Data Mining), that employs aspects of the methodological framework of the ELECTRE I outranking method, and aims at increasing the accuracy of existing data mining classification algorithms. In particular, the method chooses the best decision rules extracted from the training process of the data mining classification algorithms, and then it assigns the classes that correspond to these rules, to the objects that must be classified. Three well known data mining classification algorithms are tested in five different widely used databases to verify the robustness of the proposed method.  相似文献   

This paper proposes an efficient method, the frequent items ultrametric trees (FIUT), for mining frequent itemsets in a database. FIUT uses a special frequent items ultrametric tree (FIU-tree) structure to enhance its efficiency in obtaining frequent itemsets. Compared to related work, FIUT has four major advantages. First, it minimizes I/O overhead by scanning the database only twice. Second, the FIU-tree is an improved way to partition a database, which results from clustering transactions, and significantly reduces the search space. Third, only frequent items in each transaction are inserted as nodes into the FIU-tree for compressed storage. Finally, all frequent itemsets are generated by checking the leaves of each FIU-tree, without traversing the tree recursively, which significantly reduces computing time. FIUT was compared with FP-growth, a well-known and widely used algorithm, and the simulation results showed that the FIUT outperforms the FP-growth. In addition, further extensions of this approach and their implications are discussed.  相似文献   

流数据产生速率具有不可预见性,当其速率超过系统处理能力时,部分数据元素不能被实时处理。降载技术是处理此问题的关键技术之一。分析了目前降载技术的不足,提出了一种面向挖掘流数据频繁项集的降载策略。该策略采用了基于元组出现频率的语义删除策略,优先删除出现频率相对较低的元组,从而有效解决了在挖掘流数据中的频繁项所遇到系统超载时所出现的问题,同时采用了根据流数据产生速率自动地控制是否启动降载策略,有效地解决了降载的适应性问题。最后,通过实验和分析,证明了该策略在流数据频繁项挖掘中有效性。  相似文献   


While the Internet and World Wide Web have put a huge volume of low-quality information at the easy access of an information gathering system, filtering out irrelevant information has become a big challenge. In this paper, a Web data mining and cleaning strategy for information gathering is proposed. A data-mining model is presented for the data that come from multiple agents. Using the model, a data-cleaning algorithm is then presented to eliminate irrelevant data. To evaluate the data-cleaning strategy, an interpretation is given for the mining model according to evidence theory. An experiment is also conducted to evaluate the strategy using Web data. The experimental results have shown that the proposed strategy is efficient and promising.  相似文献   

A hybrid push/pull system of an assemble-to-order manufacturing environment is investigated in this paper. In this environment, raw material can be transformed into common semi-finished products at a point where next downstream operations are triggered by customer orders. The production of the earlier upstream stations is controlled by push-type production, while the production of the later downstream stations is controlled by pull-type production. The hybrid system often compromises the conflicting performance characteristics of the push and the pull environments. In the push type, high inventory cost is anticipated in the return of low delivery leadtime. On the contrary, in the pull type, high delivery leadtime is expected in the return of low inventory cost. The objective function for the presented hybrid model is to minimize the sum of inventory holding cost and delivery leadtime cost, which is the cost of the time period since customers have placed an order until it is fulfilled. The model is applied to solve the inventory and late delivery problems in an assemble-to-order manufacturer. A genetic algorithm (GA) is used. A discrete event simulation model is used to evaluate the objective function for each chromosome in the GA. The pure push and pull systems are also simulated in order to compare their performance with the hybrid system. Sensitivity analysis on the coefficient of variation (CV) of time between actual customer order arrivals and on various cost ratios of delivery leadtime and inventory are carried out. In most cases, the hybrid performs the best. Results show that the hybrid production system would save the company significantly compared to the pure push or pure pull production systems.  相似文献   

In this work, an ordinal optimization-based evolution algorithm (OOEA) is proposed to solve a problem for a good enough target inventory level of the assemble-to-order (ATO) system. First, the ATO system is formulated as a combinatorial optimization problem with integer variables that possesses a huge solution space. Next, the genetic algorithm is used to select N excellent solutions from the solution space, where the fitness is evaluated with the radial basis function network. Finally, we proceed with the optimal computing budget allocation technique to search for a good enough solution. The proposed OOEA is applied to an ATO system comprising 10 items on 6 products. The solution quality is demonstrated by comparing with those obtained by two competing methods. The good enough target inventory level obtained by the OOEA is promising in the aspects of solution quality and computational efficiency.  相似文献   

如何有效的从轨迹数据中挖掘轨迹模式和规律具有重要意义,本文基于交通路网研究移动对象轨迹预测,将序列分析方法和马尔科夫统计模型结合,提出了一种基于后缀自动机的变阶马尔科夫模型挖掘方法。该方法根据移动对象的历史轨迹数据进行学习训练,计算轨迹序列上下文的概率特征,建立序列的后缀自动机模型,结合当前实际轨迹数据,动态自适应预测将来的位置信息。实验结果表明:相比固定阶马尔科夫模型,随着阶数的增加(L>=2),固定阶马尔科夫模型预测的精度逐步降低,而该方法能动态自适应,精度保持在81.3%左右,取得较好的预测效果;同时,该方法只需线性的时间和空间开销,大大降低了存储空间和时间,能实现大规模数据的在线学习。  相似文献   

In this paper the authors examine the effectiveness of the Powell-Toint strategy for evaluating the Hessian of the potential energy surface of a finite element model that can be used for linear stress analysis and transient response predictions of structures. Cases for which the Powell-Toint strategy may be cost-effective with the conventional method of stress analysis are identified.  相似文献   

Cost-based abduction (CBA) is an important problem in reasoning under uncertainty. The CBA problem is NP-hard, and existing techniques have exponential worst-case complexity. This paper presents an admissible heuristic for CBA based on the use of linear programming to obtain an optimistic estimate of the cost-to-goal. The article then presents empirical results that indicate that the authors' method is efficient in comparison to Santos‘ integer linear programming method.  相似文献   

针对优化函数未知的昂贵区间多目标优化问题,提出一种基于主曲线建模的NSGA-II算法.该算法首先根据决策空间流形分布的种群数据构建K主曲线;然后利用所构建的K主曲线模型,通过插值和延展的方法生成子代.与遗传算法的随机生成子代策略相比,通过所提出方法生成有效子代效率会更高.由于目标空间拥挤距离无法求出,为此利用K主曲线找出待测解的前、后近距离解,按照决策空间拥挤距离对同序值解进行筛选,从而实现NSGA-II算法的改进.  相似文献   

一种新的动态频繁项集挖掘方法   总被引:1,自引:0,他引:1  
频繁项集挖掘是关联规则挖掘的重要步骤。在数据动态变化的环境下进行关联规则挖掘具有重要的现实意义。提出一种动态频繁项集挖掘算法,该算法建立在前一阶段挖掘的基础上,能避免过多地扫描数据库而影响挖掘性能,在最后生成全局频繁项集时,不需要全程扫描数据库,根据之前挖掘结果有选择地扫描相关的事务子集。实验表明,该算法挖掘性能远远优于Apriori算法,能有效地实现在数据动态变化环境下的挖掘频繁项集。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号