首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Researchers realized the importance of integrating fuzziness into association rules mining in databases with binary and quantitative attributes. However, most of the earlier algorithms proposed for fuzzy association rules mining either assume that fuzzy sets are given or employ a clustering algorithm, like CURE, to decide on fuzzy sets; for both cases the number of fuzzy sets is pre-specified. In this paper, we propose an automated method to decide on the number of fuzzy sets and for the autonomous mining of both fuzzy sets and fuzzy association rules. We achieve this by developing an automated clustering method based on multi-objective Genetic Algorithms (GA); the aim of the proposed approach is to automatically cluster values of a quantitative attribute in order to obtain large number of large itemsets in less time. We compare the proposed multi-objective GA based approach with two other approaches, namely: 1) CURE-based approach, which is known as one of the most efficient clustering algorithms; 2) Chien et al. clustering approach, which is an automatic interval partition method based on variation of density. Experimental results on 100 K transactions extracted from the adult data of USA census in year 2000 showed that the proposed automated clustering method exhibits good performance over both CURE-based approach and Chien et al.’s work in terms of runtime, number of large itemsets and number of association rules.  相似文献   

2.
Extracting knowledge from big network traffic data is a matter of foremost importance for multiple purposes including trend analysis, network troubleshooting, capacity planning, network forensics, and traffic classification. An extremely useful approach to profile traffic is to extract and display to a network administrator the multi-dimensional hierarchical heavy hitters (HHHs) of a dataset. However, existing schemes for computing HHHs have several limitations: (1) they require significant computational resources; (2) they do not scale to high dimensional data; and (3) they are not easily extensible. In this paper, we introduce a fundamentally new approach for extracting HHHs based on generalized frequent item-set mining (FIM), which allows to process traffic data much more efficiently and scales to much higher dimensional data than present schemes. Based on generalized FIM, we build and thoroughly evaluate a traffic profiling system we call FaRNet. Our comparison with AutoFocus, which is the most related tool of similar nature, shows that FaRNet is up to three orders of magnitude faster. Finally, we describe experiences on how generalized FIM is useful in practice after using FaRNet operationally for several months in the NOC of GÉANT, the European backbone network.  相似文献   

3.
As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users’ privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method.  相似文献   

4.
A modification to the maximum likelihood algorithm was developed for classification of forest types in Sweden's part of the CORINE land cover mapping project. The new method, called the “calibrated maximum likelihood classification” involves an automated and iterative adjustment of prior weights until class frequency in the output corresponds to class frequency as calculated from objective (field-inventoried) estimates. This modification compensates for the maximum likelihood algorithm's tendency to over-represent dominant classes and under-represent less frequent ones. National forest inventory plot data measured from a five-year period are used to estimate relative frequency of class occurrence and to derive spectral signatures for each forest class. The classification method was implemented operationally within an automated production system which allowed rapid production of a country-wide forest type map from Landsat TM/ETM+ satellite data. The production system automated the retrieval and updating of forest inventory plots, a plot-to-image matching routine, illumination and haze correction of satellite imagery, and classification into forest classes using the calibrated maximum likelihood classification. This paper describes the details of the method and demonstrates the result of using an iterative adjustment of prior weights versus unadjusted prior weights. It shows that the calibrated maximum likelihood algorithm adjusts for the overclassification of classes that are well represented in the training data as well as for other classes, resulting in an output where class proportions are close to those as expected based on forest inventory data.  相似文献   

5.
This paper presents the design and development of an automated system to assist with Design for Assembly (DFA) analysis. The system is designed to accept information on alternative assemblies using DFA metaphors. Statistics are calculated for these assemblies so as to evaluate their assemblability. The alternative assemblies and improvements in any assembly design are evaluated using these statistics.

A binary tree data structure is used in the DFA system to represent the design data. This structure is implemented by a linked method with three links in each tree node. This allows any arbitrary tree to be represented efficiently, and it also allows for unpredictable tree growth and easy tree manipulation. The user interface of the DFA system is managed by the “User Interface Management System”, that achieves direct and fast control of the screen by directly accessing the video memory.  相似文献   


6.
由于信息传播模型是社区挖掘、社区影响力研究的基础,文中提出结合用户兴趣的信息传播模型,设计基于频繁子树的信息传播微观模式挖掘方法.首先,基于微博社交网络图表示及用户多标签建模,将微观信息传播模式转换为频繁子树挖掘问题.然后,针对微博社交网络图单节点多标签特性,设计多标签节点树的频繁子树挖掘算法(MLTreeMiner).最后,结合主题提取方法,使用MLTreeMiner挖掘信息传播模式.在人工数据集上的实验表明,MLtreeMiner能高效地对多标签节点树进行频繁子树挖掘.针对新浪微博真实数据的实验也验证方法的有效性.  相似文献   

7.
We present an architecture and algorithms for performing automated software problem determination using call-stack matching. In an environment where software is used by a large user community, the same problem may re-occur many times. We show that this can be detected by matching the program call-stack against a historical database of call-stacks, so that as soon as the problem has been resolved once, future cases of the same or similar problems can be automatically resolved. This would greatly reduce the number of cases that need to be dealt with by human support analysts. We also show how a call-stack matching algorithm can be automatically learned from a small sample of call-stacks labeled by human analysts, and examine the performance of this learning algorithm on two different data sets.Mark Brodie is a research staff member in the “Machine Learning for Systems” group at the IBM T.J. Watson Research Center in Hawthorne, NY. He did his undergraduate work in Mathematics at the University of the Witwatersrand in South Africa and received his PhD in Computer Science at the University of Illinois in 2000. His research interests include machine learning, data mining, and problem determination.Sheng Ma received his BS degree in Electrical Engineering from Tsinghua University, Beijing China, in 1992, and his MS and PhD with honors in Electrical Engineering from Rensselaer Polytechnic Institute, Troy, NY, in 1995 and 1998, respectively. He joined the IBM T.J. Watson Research Center as a research staff member in 1998 and became manager of the “Machine Learning for Systems” group in 2001. His current research interests include machine learning, data mining, network traffic modeling and control, and network and computer systems management.Leonid Rachevsky is a software systems analyst in the “Machine Learning for Systems” group at the IBM T.J. Watson Research Center in Hawthorne, NY. He obtained his MSc in Mathematics at Kazan State University and his PhD in Technical Science (Applied Mathematics) at the Kazan Institute of Chemical Engineering in Kazan, USSR (now Russia). He has worked extensively as a software engineer and senior software analyst in Israel, Canada and the United States.Jon Champlin is an advisory software engineer with the Lotus division of IBM’s Software Group. He received a Bachelors of Computer Science from Siena College in 1993. He is part of the external support group and has developed several serviceability features for Lotus Notes/Domino.  相似文献   

8.
Discovery of unapparent association rules based on extracted probability   总被引:1,自引:0,他引:1  
Association rule mining is an important task in data mining. However, not all of the generated rules are interesting, and some unapparent rules may be ignored. We have introduced an “extracted probability” measure in this article. Using this measure, 3 models are presented to modify the confidence of rules. An efficient method based on the support-confidence framework is then developed to generate rules of interest. The adult dataset from the UCI machine learning repository and a database of occupational accidents are analyzed in this article. The analysis reveals that the proposed methods can effectively generate interesting rules from a variety of association rules.  相似文献   

9.
C.A.  J.  Q.P.  T.F.   《Annual Reviews in Control》2007,31(2):241-253
In the modeling and control of semiconductor manufacturing, the control engineer must be aware of all influences on the performance of each process. Upstream processes may affect the wafer substrate in a manner that alters performance in downstream operations, and the context within which a process is run may fundamentally change the way the process behaves. Incorporating these influences into a control method ultimately leads to better predictability and improved control performance. Control threads are a way of incorporating these effects into the control of a process by partitioning historical data into groups within which the deterministic sources of variation are uniform. However, if there are many products, which require many threads to be defined, there may be insufficient data to model each thread. This multi-product–multi-tool manufacturing environment (“high-mix”) requires advanced methodologies based on state estimation and recursive least squares. Several such approaches are compared in this paper based on simulation models for a high-mix fab.  相似文献   

10.
11.
This paper addresses the problem of vehicle location (positioning) for automated transport in fully automated manufacturing systems. This study is motivated by the complexity of vehicle control found in modern integrated circuit semiconductor fabrication facilities where the material handling network path is composed of multiple loops interconnected. In order to contribute to decrease the manufacturing lead-time of semiconductor products, we propose an integer linear program that minimizes the maximum time needed to serve a transport request. Computation experiments are based on real-life data. We discuss the practical usefulness of the mathematical model by means of a simulation experiment used to analyze the factory operational behavior.  相似文献   

12.
Aihua Li  Yong Shi  Jing He   《Applied Soft Computing》2008,8(3):1259-1265
Cardholders’ behavior prediction is an important issue in credit card portfolio management. As a promising data mining approach, multiple criteria programming (MCLP) has been successfully applied to classify credit cardholders’ behavior into two groups. In order to better control credit risk for financial institutes, this paper proposes three methods based on MCLP to improve the “Bad” catching accuracy rate. One is called MCLP with unbalanced training set selection, the second is called fuzzy linear programming (FLP) method with moving boundary, and the third is called penalized multi criteria linear programming (PMCLP). The experimental examples demonstrate the promising performance of these methods.  相似文献   

13.
Due to the rapid development of information technologies, abundant data have become readily available. Data mining techniques have been used for process optimization in many manufacturing processes in automotive, LCD, semiconductor, and steel production, among others. However, a large amount of missing values occurs in the data set due to several causes (e.g., data discarded by gross measurement errors, measurement machine breakdown, routine maintenance, sampling inspection, and sensor failure), which frequently complicate the application of data mining to the data set. This study proposes a new procedure for optimizing processes called missing values-Patient Rule Induction Method (m-PRIM), which handles the missing-values problem systematically and yields considerable process improvement, even if a significant portion of the data set has missing values. A case study in a semiconductor manufacturing process is conducted to illustrate the proposed procedure.  相似文献   

14.
The complexity of semiconductor manufacturing is increasing due to the smaller feature sizes, greater number of layers, and existing process reentry characteristics. As a result, it is difficult to manage and clarify responsibility for low yields in specific products. This paper presents a comprehensive data mining method for predicting and classifying the product yields in semiconductor manufacturing processes. A genetic programming (GP) approach, capable of constructing a yield prediction system and performing automatic discovery of the significant factors that might cause low yield, is presented. Comparison with the results then is performed using a decision tree induction algorithm. Moreover, this research illustrates the robustness and effectiveness of this method using a well-known DRAM fab’s real data set, with discussion of the results. Received: November 2004 / Accepted: September 2005  相似文献   

15.
The problem of detecting an anomaly (or abnormal event) is such that the distribution of observations is different before and after an unknown onset time, and the objective is to detect the change by statistically matching the observed pattern with that predicted by a model. In the context of asymmetric threats, The expression “asymmetric threats” refers to tactics employed by countries, terrorist groups, or individuals to carry out attacks on a superior opponent, while trying to avoid direct confrontation. the detection of an abnormal situation refers to the discovery of suspicious activities of a hostile nation or group out of noisy, scattered, and partial intelligence data. The problem becomes complex in a low signal-to-noise ratio environment, such as asymmetric threats, because the “signal” observations are far fewer than “noise” observations. Furthermore, the signal observations are “hidden” in the noise. In this paper, we illustrate the capabilities of hidden Markov models (HMMs), combined with feature-aided tracking, for the detection of asymmetric threats. A transaction-based probabilistic model is proposed to combine HMMs and feature-aided tracking. A procedure analogous to Page's test is used for the quickest detection of abnormal events. The simulation results show that our method is able to detect the modeled pattern of an asymmetric threat with a high performance as compared to a maximum likelihood-based data mining technique. Performance analysis shows that the detection of HMMs improves with increase in the complexity of HMMs (i.e., the number of states in an HMM).   相似文献   

16.
Clusters and grids of workstations provide available resources for data mining processes. To exploit these resources, new distributed algorithms are necessary, particularly concerning the way to distribute data and to use this partition. We present a clustering algorithm dubbed Progressive Clustering that provides an “intelligent” distribution of data on grids. The usefulness of this algorithm is shown for several distributed datamining tasks.  相似文献   

17.
Mining large amounts of unstructured data for extracting meaningful, accurate, and actionable information, is at the core of a variety of research disciplines including computer science, mathematical and statistical modelling, as well as knowledge engineering. In particular, the ability to model complex scenarios based on unstructured datasets is an important step towards an integrated and accurate knowledge extraction approach. This would provide a significant insight in any decision making process driven by Big Data analysis activities. However, there are multiple challenges that need to be fully addressed in order to achieve this, especially when large and unstructured data sets are considered.In this article we propose and analyse a novel method to extract and build fragments of Bayesian networks (BNs) from unstructured large data sources. The results of our analysis show the potential of our approach, and highlight its accuracy and efficiency. More specifically, when compared with existing approaches, our method addresses specific challenges posed by the automated extraction of BNs with extensive applications to unstructured and highly dynamic data sources.The aim of this work is to advance the current state-of-the-art approaches to the automated extraction of BNs from unstructured datasets, which provide a versatile and powerful modelling framework to facilitate knowledge discovery in complex decision scenarios.  相似文献   

18.
Natural intelligence in design and manufacturing   总被引:1,自引:0,他引:1  
This paper describes a hybrid intelligent system to implement and experiment with the “automated factory”. The objective of the project is to develop and test a new method of automating design and manufacturing by utilizing natural intelligence or more specifically, techniques such as fuzzy logic, Fuzzy Associative Memory (FAM), Backpropogation neural networks (BP), and Adaptive Resonance Theory (ART1).  相似文献   

19.
The paper presents a method of interactive construction of global Hidden Markov Models (HMMs) based on local sequence patterns discovered in data. The method is based on finding interesting sequences whose frequency in the database differs from that predicted by the model. The patterns are then presented to the user who updates the model using their intelligence and their understanding of the modelled domain. It is demonstrated that such an approach leads to more understandable models than automated approaches. Two variants of the problem are considered: mining patterns occurring only at the beginning of sequences and mining patterns occurring at any position; both practically meaningful. For each variant, algorithms have been developed allowing for efficient discovery of all sequences with given minimum interestingness. Applications to modelling webpage visitors behavior and to modelling protein secondary structure are presented, validating the proposed approach.  相似文献   

20.
Many artificial intelligence tasks, such as automated question answering, reasoning, or heterogeneous database integration, involve verification of a semantic category (e.g. “coffee” is a drink, “red” is a color, while “steak” is not a drink and “big” is not a color). In this research, we explore completely automated on-the-fly verification of a membership in any arbitrary category which has not been expected a priori. Our approach does not rely on any manually codified knowledge (such as WordNet or Wikipedia) but instead capitalizes on the diversity of topics and word usage on the World Wide Web, thus can be considered “knowledge-light” and complementary to the “knowledge-intensive” approaches. We have created a quantitative verification model and established (1) what specific variables are important and (2) what ranges and upper limits of accuracy are attainable. While our semantic verification algorithm is entirely self-contained (not involving any previously reported components that are beyond the scope of this paper), we have tested it empirically within our fact seeking engine on the well known TREC conference test questions. Due to our implementation of semantic verification, the answer accuracy has improved by up to 16% depending on the specific models and metrics used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号