首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
“Sequential pattern mining” is a prominent and significant method to explore the knowledge and innovation from the large database. Common sequential pattern mining algorithms handle static databases. Pragmatically, looking into the functional and actual execution, the database grows exponentially thereby leading to the necessity and requirement of such innovation, research, and development culminating into the designing of mining algorithm. Once the database is updated, the previous mining result will be incorrect, and we need to restart and trigger the entire mining process for the new updated sequential database. To overcome and avoid the process of rescanning of the entire database, this unique system of incremental mining of sequential pattern is available. The previous approaches, system, and techniques are a priori-based frameworks but mine patterns is an advanced and sophisticated technique giving the desired solution. We propose and incorporate an algorithm called STISPM for incremental mining of sequential patterns using the sequence tree space structure. STISPM uses the depth-first approach along with backward tracking and the dynamic lookahead pruning strategy that removes infrequent and irregular patterns. The process and approach from the root node to any leaf node depict a sequential pattern in the database. The structural characteristic of the sequence tree makes it convenient and appropriate for incremental sequential pattern mining. The sequence tree also stores all the sequential patterns with its count and statistics, so whenever the support system is withdrawn or changed, our algorithm using frequent sequence tree as the storage structure can find and detect all the sequential patterns without mining the database once again.  相似文献   

2.
As a result of the growing competition in recent years, new trends such as increased product complexity, changing customer requirements and shortening development time have emerged within the product development process (PDP). These trends have added more challenges to the already‐difficult task of quality and reliability prediction and improvement. They have given rise to an increase in the number of unexpected events in the PDP. Traditional tools are only partially adequate to cover these unexpected events. As such, new tools are being sought to complement traditional ones. This paper investigates the use of one such tool, textual data mining for the purpose of quality and reliability improvement. The motivation for this paper stems from the need to handle ‘loosely structured textual data’ within the product development process. Thus far, most of the studies on data mining within the PDP have focused on numerical databases. In this paper, the need for the study of textual databases is established. Possible areas within a generic PDP for consumer and professional products, where textual data mining could be employed are highlighted. In addition, successful implementations of textual data mining within two large multi‐national companies are presented. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

3.
《中国工程学刊》2012,35(5):547-554
Development of least association rules (ARs) mining algorithms is one of the more challenging areas in data mining. Exclusive measurements, complexity and excessive computational cost are the main obstacles as compared to frequent pattern mining. Indeed, most previous studies still use the Apriori-like algorithms. To address this issue, this article proposes a new correlation measurement called definite factor (DF) and a scalable trie-based algorithm named significant least pattern growth (SLP-Growth). This algorithm generates the least patterns based on interval support and finally determines it significances using DF. Experiments with the real datasets show that the SLP-Growth can discover highly positive correlated and significant of least ARs. Indeed, it also outperforms the fast frequent pattern-Growth algorithm up to two times, thus verifying its efficiency.  相似文献   

4.
为实现在大型事务数据库中挖掘有价值的序列数据,提出了一种基于位图的高效的序列模式挖掘算法(SMBR)。SMBR算法采用位图表示数据库的方法,提出一种简化的位图表示结构。该算法首先由序列扩展和项扩展产生候选序列,然后通过原序列位图和被扩展项位图位置快速运算生成频繁序列。实验表明,应用于大型事务数据库,该方法不仅能有效地提高挖掘效率,而且挖掘处理过程中产生的临时数据所需的内存大大降低,能够高效地挖掘序列模式。  相似文献   

5.
A fundamental problem with all process monitoring techniques is the requirement of a large Phase-I data set to establish control limits and overcome estimation error. This assumption of having a large Phase-I data set is very restrictive and often problematic, especially when the sampling is expensive or not available, eg, time-between-events (TBE) settings. Moreover, with the advancement in technology, quality practitioners are now more interested in online process monitoring. Therefore, the Bayesian methodology not only provides a natural solution for sequential and adaptive learning but also addresses the problem of a large Phase-I data set for setting up a monitoring structure. In this study, we propose Bayesian control charts for TBE assuming homogenous Poisson process. In particular, a predictive approach is adopted to introduce predictive limit control charts. Beside the Bayesian predictive Shewhart charts with dynamic control limits, a comparison of the frequentist sequential charts, designed by using unbiased and biased estimator of the process parameter, is also a part of the present study. To assess the predictive TBE chart performance in the presence of practitioner-to-practitioner variability, we use the average of the average run length (AARL) and the standard deviation of the in-control run length (SDARL).  相似文献   

6.
With the increasing concern about product quality, attention has shifted to the monitoring of production processes to be assured of good quality. Achieving good quality is a challenging task in the garment industry due to the great complexity of garment products. This paper presents an intelligent system, using fuzzy association rule mining with a recursive process mining algorithm, to find the relationships between production process parameters and product quality. The goal is to derive a set of decision rules for fuzzy logic that will determine the quantitative values of the process parameters. Learnt process parameters used in production form new inputs of the initial step of the mining algorithm so that new sets of rules can be obtained recursively. Radio frequency identification technology is deployed to increase the efficiency of the system. With the recursive characteristics of the system, process parameters can be continually refined for the purpose of achieving quality assurance. A case study is described in which the system is applied in a garment manufacturing company. After a six-month pilot run of the system, the numbers of critical defects, major defects and minor defects were reduced by 7, 20 and 24%, respectively while production time and rework cost improved by 26 and 30%, respectively. Results demonstrate the practical viability of the system to provide decision support for garment manufacturers who may not be able to determine the appropriate process settings for achieving the desired product quality.  相似文献   

7.
纯碱生产中新型碳酸化塔的开发   总被引:1,自引:0,他引:1  
我国自行开发的变换气制碱新工艺有很强的竞争力。但由于其中的主要设备碳酸化塔结构大型化困难,多年来只在小型厂中采用,发展不快。新开发的大型碳酸化塔已在最近投产,该塔克服了传统的索尔维碳酸化塔的缺点,这项技术开发成功将促进变换气制碱技术的发展,增强我国纯碱工业在国际市场上的竞争力。  相似文献   

8.
In recent years, wireless sensing technologies have provided a much sought-after alternative to expensive cabled monitoring systems. Wireless sensing networks forego the high data transfer rates associated with cabled sensors in exchange for low-cost and low-power communication between a large number of sensing devices, each of which features embedded data processing capabilities. As such, a new paradigm in large-scale data processing has emerged; one where communication bandwidth is somewhat limited but distributed data processing centers are abundant. By taking advantage of this grid of computational resources, data processing tasks once performed independently by a central processing unit can now be parallelized, automated, and carried out within a wireless sensor network. By utilizing the intelligent organization and self-healing properties of many wireless networks, an extremely scalable multiprocessor computational framework can be developed to perform advanced engineering analyses. In this study, a novel parallelization of the simulated annealing stochastic search algorithm is presented and used to update structural models by comparing model predictions to experimental results. The resulting distributed model updating algorithm is validated within a network of wireless sensors by identifying the mass, stiffness, and damping properties of a three-story steel structure subjected to seismic base motion.  相似文献   

9.
This paper deals with the conceptual design and development of an enterprise modeling and integration framework using knowledge discovery and data mining. First, the paper briefly presents the background and current state-of-the-art of knowledge discovery in databases and data mining systems and projects. Next, enterprise knowledge engineering is dealt with. The paper suggests a novel approach of utilizing existing enterprise reference architectures, integration and modeling frameworks by the introduction of new enterprise views such as mining and knowledge views. An extension and a generic exploration of the information view that already exists within some enterprise models are also proposed. The Zachman Framework for Enterprise Architecture is also outlined versus the existing architectures and the proposed enterprise framework. The main contribution of this paper is the identification and definition of a common knowledge enterprise model which represents an original combination between the previous projects on enterprise architectures and the Object Management Group (OMG) models and standards. The identified common knowledge enterprise model has therefore been designed using the OMG's Model-Driven Architecture (MDA) and Common Warehouse MetaModel (CWM), and it also follows the RM-ODP (ISO/OSI). It has been partially implemented in Java?, Enterprise JavaBeans (EJB) and Corba/IDL. Finally, the advantages and limitations of the proposed enterprise model are outlined.  相似文献   

10.
In process optimization, the setting of the process variables is usually determined by estimating a function that relates the quality to the process variables and then optimizing this estimated function. However, it is difficult to build an accurate function from process data in industrial settings because the process variables are correlated, outliers are included in the data, and the form of the functional relation between the quality and process variables may be unknown. A solution derived from an inaccurate function is normally far from being optimal. To overcome this problem, we use a data mining approach. First, a partial least squares model is used to reduce the dimensionality of the process and quality variables. Then the process settings that yield the best output are identified by sequentially partitioning the reduced process variable space using a rule induction method. The proposed method finds an optimal setting from historical data without constructing an explicit quality function. The proposed method is illustrated with two examples obtained from steel making processes. We also show, through simulation, that the proposed method gives more stable results than estimating an explicit function even when the form of the function is known in advance.  相似文献   

11.
B Sathiya  T V Geetha  K Saruladha 《Sadhana》2017,42(12):2009-2024
The growth and use of semantic web has led to a drastic increase in the size, heterogeneity and number of ontologies that are available on the web. Correspondingly, scalable ontology matching algorithms that will eliminate the heterogeneity among large ontologies have become a necessity. Ontology matching algorithms generally do not scale well due to the massive number of complex computations required to achieve matching. One of the methods used to address this problem is the use of partition-based systems to reduce the matching space. In this paper, we propose a new partitioning-based scalable ontology matching system called PSOM2. We have designed a new neighbour-based intra-similarity measure to increase the quality of the cluster set formation for the partition-based ontology matching process. These sets of clusters or sub-ontologies are matched across the input ontologies to identify matchable cluster pairs, based on anchors that are efficiently discovered through a new light-weight linguistic matcher (EI-sub). However, in order to further increase the efficiency of the time-consuming anchor discovery process we have designed a MapReduce-based EI-sub process where anchors are discovered in distributed and parallel fashion. Experiments on benchmark OAEI (Ontology Alignment Evaluation Initiative) large scale ontologies demonstrate that the new PSOM2 system achieves, on an average, 31% decrease in entropy of the clusters and 54.5% reduction in overall run time. Based on the experimental results, it is evident that the new PSOM2 achieves better quality clusters and a major reduction in execution time, leading to an effective and scalable ontology matching system.  相似文献   

12.
Abstract

Mathematical modelling of matter transmission during gas nitriding has been deduced by a numerical calculation in the present study. The diffusion coefficient of nitrogen in 38CrMoAl steel and the matter transmission coefficent in the interfacial reaction have been measured. Owing to the large difference between the nitrogen activity at the surface of the workpiece and that of the gas phase during the nitriding process, it is very difficult to control the nitrogen potential and balance the nitrogen activity. In order to solve this problem, is it proposed that the nitrogen potential is dynamically controlled by computer. Under conditions of high nitriding speed, the computer controlled technique used in practical manufacture shows good reproducibility and can control the nitrogen potential accurately, thereby reducing the brittleness of the nitrided layer.  相似文献   

13.
Land cover change detection has been a topic of active research in the remote sensing community. Due to enormous amount of data available from satellites, it has attracted the attention of data mining researchers to search a new direction for solution. The Terra Moderate Resolution Imaging Spectrometer (MODIS) vegetation index (EVI/NDVI) data products are used for land cover change detection. These data products are associated with various challenges such as seasonality of data, spatio-temporal correlation, missing values, poor quality measurement, high resolution and high dimensional data. The land cover change detection has often been performed by comparing two or more satellite snapshot images acquired on different dates. The image comparison techniques have a number of limitations. The data mining technique addresses many challenges such as missing value and poor quality measurements present in the data set, by performing the pre-processing of data. Furthermore, the data mining approaches are capable of handling large data sets and also use some of the inherent characteristics of spatio-temporal data; hence, they can be applied to increasingly immense data set. This paper stretches in detail various data mining algorithms for land cover change detection and each algorithm’s advantages and limitations. Also, an empirical study of some existing land cover change detection algorithms and results have been presented in this paper.  相似文献   

14.
Discrete manufacturing process designs can be modelled using computer simulation. Determining optimal designs using such models is very difficult, due to the large number of manufacturing process sequences and associated parameter settings that exist. This has forced researchers to develop heuristic strategies to address such design problems. This paper introduces a new general heuristic strategy for discrete manufacturing process design optimization, called generalised hill climbing (GHC) algorithms. GHC algorithms provide a unifying approach for addressing such problems in particular, and intractable discrete optimization problems in general. Heuristic strategies such as simulated annealing, threshold accepting, Monte Carlo search, local search, and tabu search (among others) can all he formulated as GHC algorithms. Computational results are reported with various GHC algorithms applied to computer simulation models of discrete manufacturing process designs under study at the Materials Process Design Branch of Wright Laboratory, Wright Patterson Air Force Base (Dayton, Ohio, USA).  相似文献   

15.
Distributed data mining has played a vital role in numerous application domains. However, it is widely observed that data mining may pose a privacy threat to individual’s sensitive information. To address privacy problem in distributed association rule mining (a data mining technique), we propose two protocols, which are securely generating global association rules in horizontally distributed databases. The first protocol uses the notion of Elliptic-curve-based Paillier cryptosystem, which helps in achieving the integrity and authenticity of the messages exchanged among involving sites over the insecure communication channel. It offers privacy of individual site’s information against the involving sites and an external adversary. However, the collusion of two sites may affect the privacy of individuals. To address this problem, we incorporate Shamir’s secret sharing scheme in the second protocol. It provides privacy by preventing colluding sites and external adversary attack. We analyse both protocols in terms of fulfilling the privacy-preserving distributed association rule mining requirements.  相似文献   

16.
Correct identification of a peptide sequence from MS/MS data is still a challenging research problem, particularly in proteomic analyses of higher eukaryotes where protein databases are large. The scoring methods of search programs often generate cases where incorrect peptide sequences score higher than correct peptide sequences (referred to as distraction). Because smaller databases yield less distraction and better discrimination between correct and incorrect assignments, we developed a method for editing a peptide-centric database (PC-DB) to remove unlikely sequences and strategies for enabling search programs to utilize this peptide database. Rules for unlikely missed cleavage and nontryptic proteolysis products were identified by data mining 11 849 high-confidence peptide assignments. We also evaluated ion exchange chromatographic behavior as an editing criterion to generate subset databases. When used to search a well-annotated test data set of MS/MS spectra, we found no loss of critical information using PC-DBs, validating the methods for generating and searching against the databases. On the other hand, improved confidence in peptide assignments was achieved for tryptic peptides, measured by changes in DeltaCN and RSP. Decreased distraction was also achieved, consistent with the 3-9-fold decrease in database size. Data mining identified a major class of common nonspecific proteolytic products corresponding to leucine aminopeptidase (LAP) cleavages. Large improvements in identifying LAP products were achieved using the PC-DB approach when compared with conventional searches against protein databases. These results demonstrate that peptide properties can be used to reduce database size, yielding improved accuracy and information capture due to reduced distraction, but with little loss of information compared to conventional protein database searches.  相似文献   

17.
A survey of temporal data mining   总被引:2,自引:0,他引:2  
Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.  相似文献   

18.
A scalable and low-cost deposition of high-quality charge transport layers and photoactive perovskite layers are the grand challenges for large-area and efficient perovskite solar modules and tandem cells. An inverted structure with an inorganic hole transport layer is expected for long-term stability. Among various hole transport materials, nickel oxide has been investigated for highly efficient and stable perovskite solar cells. However, the reported deposition methods are either difficult for large-scale conformal deposition or require a high vacuum process. Chemical bath deposition is supposed to realize a uniform, conformal, and scalable coating by a solution process. However, the conventional chemical bath deposition requires a high annealing temperature of over 400 °C. In this work, an amino-alcohol ligand-based controllable release and deposition of NiOX using chemical bath deposition with a low calcining temperature of 270 °C is developed. The uniform and conformal in-situ growth precursive films can be adjusted by tuning the ligand structure. The inverted structured perovskite solar cells and large-area solar modules reached a champion PCE of 22.03% and 19.03%, respectively. This study paves an efficient, low-temperature, and scalable chemical bath deposition route for large-area NiOX thin films for the scalable fabrication of highly efficient perovskite solar modules.  相似文献   

19.
In the statistical process control environment, a primary method to deal with autocorrelated data is the use of a residual chart. Although this methodology has the advantage that it can be applied to any autocorrelated data, it needs some modeling effort in practice. In addition, the detection capability of the residual chart is not always great. This article proposes a statistical control chart for stationary process data. It is simple to implement, and no modeling effort is required. Comparisons are made among the proposed chart, the residual chart, and other charts. When the process autocorrelation is not very strong and the mean changes are not large, the new chart performs better than the residual chart and the other charts.  相似文献   

20.
Protein–protein interactions (PPIs) have been widely used to understand different biological processes and cellular functions associated with several diseases like cancer. Although some cancer‐related protein interaction databases are available, lack of experimental data and conflicting PPI data among different available databases have slowed down the cancer research. Therefore, in this study, the authors have focused on various proteins that are directly related to different types of cancer disease. They have prepared a PPI database between cancer‐associated proteins with the rest of the human proteins. They have also incorporated the annotation type and direction of each interaction. Subsequently, a biclustering‐based association rule mining algorithm is applied to predict new interactions with type and direction. This study shows the prediction power of association rule mining algorithm over the traditional classifier model without choosing a negative data set. The time complexity of the biclustering‐based association rule mining is also analysed and compared to traditional association rule mining. The authors are able to discover 38 new PPIs which are not present in the cancer database. The biological relevance of these newly predicted interactions is analysed by published literature. Recognition of such interactions may accelerate a way of developing new drugs to prevent different cancer‐related diseases.Inspec keywords: cancer, medical computing, data mining, proteins, genetics, pattern clusteringOther keywords: biological processes, cancer‐related diseases, cancer research, cancer‐related protein interaction databases, protein–protein interactions, cancer‐associated protein interactions, biclustering‐based association rule mining approach, negative data set, annotation type, human proteins, cancer‐associated proteins, PPI database, cancer disease  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号