首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This research is focused on the prediction of ICU readmissions using fuzzy modeling and feature selection approaches. There are a number of published scores for assessing the risk of readmissions, but their poor predictive performance renders them unsuitable for implementation in the clinical setting. In this work, we propose the use of feature engineering and advanced computational intelligence techniques to improve the performance of current models. In particular, we propose an approach that relies on transforming raw vital signs, laboratory results and demographic information into more informative pieces of data, selecting a subset of relevant and non–redundant variables and applying fuzzy ensemble modeling to the feature–engineered data for deriving important nonlinear relations between variables. Different criteria for selecting the best predictor from the ensemble and novel evaluation measures are explored. In particular, the area under the sensitivity curve and area under the specificity curve are investigated. The ensemble approach combined with feature transformation and feature selection showed increased performance, being able to predict early readmissions with an AUC of 0.77 ± 0.02. To the best of our knowledge, this is the first computational intelligence technique allowing the prediction of readmissions in a daily basis. The high balance between sensitivity and specificity shows its strength and suitability for the management of the patient discharge decision making process.  相似文献   

2.
临床行为模式挖掘的数据预处理   总被引:1,自引:0,他引:1  
临床行为数据经清理后仍然存在时间关系噪音,直接用于序列挖掘算法难以发现高质量的模式.提出了一种时间规范化模型,该模型定义了时序行为的顺序和并列关系,针对所给出的关系进行相交系数的计算,根据计算结果确定行为时间关系中的噪音,遵循规范后的所有行为相互之间既无噪音又保持原正确关系不变的准则,进行噪音清除.针对模型进行了算法实现,对样本数据的测试结果表明,经处理后的数据满足了后续的模式挖掘的要求.  相似文献   

3.
The use of social networks has grown noticeably in recent years and this fact has led to the production of numerous volumes of data. Data that are widely used by users on the social media sites are very large, noisy, unstructured and dynamic. Providing a flexible framework and method to apply in all of these networks can be the perfect solution. The uncertainties arising from the complexity of decisions in recognition of the Tie Strength among people have led researchers to seek effective variables of intimacy among people. Since there are several effective variables which their effectiveness rate are not precisely determined and their relations are nonlinear and complex, using data mining techniques can be considered as one of the practical solutions for this problem. Some types of unsupervised mining methods have been conducted in the field of detecting the type of tie. Data mining could be considered as one of the applicable tools for researchers in exploring the relationships among users.In this paper, the problem of tie strength prediction is modeled as a data mining problem on which different supervised and unsupervised mining methods are applicable. We propose a comprehensive study on the effects of using different classification techniques such as decision trees, Naive Bayes and so on; in addition to some ensemble classification methods such as Bagging and Boosting methods for predicting tie strength of users of a social network. LinkedIn social network is used as a real case study and our experimental results are proposed on its extracted data. Several models, based on basic techniques and ensemble methods are created and their efficiencies are compared based on F-Measure, accuracy, and average executing time. Our experimental results show that, our profile-behavioral based model has much better accuracy in comparison with profile-data based models techniques.  相似文献   

4.
Data mining can be defined as a process for finding trends and patterns in large data. An important technique for extracting useful information, such as regularities, from usually historical data, is called as association rule mining. Most research on data mining is concentrated on traditional relational data model. On the other hand, the query flocks technique, which extends the concept of association rule mining with a ‘generate-and-test’ model for different kind of patterns, can also be applied to deductive databases. In this paper, query flocks technique is extended with view definitions including recursive views. Although in our system query flock technique can be applied to a data base schema including both the intensional data base (IDB) or rules and the extensible data base (EDB) or tabled relations, we have designed an architecture to compile query flocks from datalog into SQL in order to be able to use commercially available data base management systems (DBMS) as an underlying engine of our system. However, since recursive datalog views (IDB's) cannot be converted directly into SQL statements, they are materialized before the final compilation operation. On this architecture, optimizations suitable for the extended query flocks are also introduced. Using the prototype system, which is developed on a commercial database environment, advantages of the new architecture together with the optimizations, are also presented.  相似文献   

5.
Data mining for features using scale-sensitive gated experts   总被引:1,自引:0,他引:1  
Introduces a tool for exploratory data analysis and data mining called scale-sensitive gated experts (SSGE) which can partition a complex nonlinear regression surface into a set of simpler surfaces (which we call features). The set of simpler surfaces has the property that each element of the set can be efficiently modeled by a single feedforward neural network. The degree to which the regression surface is partitioned is controlled by an external scale parameter. The SSGE consists of a nonlinear gating network and several competing nonlinear experts. Although SSGE is similar to the mixture of experts model of Jacobs et al. (1991) the mixture of experts model gives only one partitioning of the input-output space, and thus a single set of features, whereas the SSGE gives the user the capability to discover families of features. One obtains a new member of the family of features for each setting of the scale parameter. We derive the scale-sensitive gated experts and demonstrate its performance on a time series segmentation problem. The main results are: (1) the scale parameter controls the granularity of the features of the regression surface, (2) similar features are modeled by the same expert and different kinds of features are modeled by different experts, and (3) for the time series problem, the SSGE finds different regimes of behavior, each with a specific and interesting interpretation  相似文献   

6.
7.
Ramakrishnan  N. Grama  A.Y. 《Computer》1999,32(8):34-37
The idea of unsupervised learning from basic facts (axioms) or from data has fascinated researchers for decades. Knowledge discovery engines try to extract general inferences from facts or training data. Statistical methods take a more structured approach, attempting to quantify data by known and intuitively understood models. The problem of gleaning knowledge from existing data sources poses a significant paradigm shift from these traditional approaches. The size, noise, diversity, dimensionality, and distributed nature of typical data sets make even formal problem specification difficult. Moreover, you typically do not have control over data generation. This lack of control opens up a Pandora's box filled with issues such as overfitting, limited coverage, and missing/incorrect data with high dimensionality. Once specified, solution techniques must deal with complexity, scalability (to meaningful data sizes), and presentation. This entire process is where data mining makes its transition from serendipity to science  相似文献   

8.
The stable and efficient operation of anaerobic wastewater treatment plants (WWTPs) is a major challenge for monitoring and control systems. Support for distributed anaerobic WWTPs through remotely monitoring their data was investigated in the TELEMAC framework. This paper describes how the accumulating filtered sensor data was mined to contribute to the refining of expert experience for insights into digester states. Visualisation techniques were used to present cluster analyses of digester states. A procedure for determining prediction intervals is described together with its application for volatile fatty acid concentrations; this procedure enables prediction risk assessment.  相似文献   

9.
The quantity and complexity of data acquired, time-stamped and stored in clinical databases by automated medical devices is rapidly and continuously increasing. As a result, it becomes more and more important to provide clinicians with easy-to-use interactive tools to analyze huge amounts of this data. This paper proposes an approach for visual data mining on temporal data and applies it to a real medical problem, i.e. the management of hemodialysis. The approach is based on the integration of 3D and 2D information visualization techniques and offers a set of interactive functionalities that will be described in detail in the paper. We will also discuss how the system has been evaluated with end users and how the evaluation led to changes in system design.  相似文献   

10.
The representation of multiple continuous attributes as dimensions in a vector space has been among the most influential concepts in machine learning and data mining. We consider sets of related continuous attributes as vector data and search for patterns that relate a vector attribute to one or more items. The presence of an item set defines a subset of vectors that may or may not show unexpected density fluctuations. We test for fluctuations by studying density histograms. A vector–item pattern is considered significant if its density histogram significantly differs from what is expected for a random subset of transactions. Using two different density measures, we evaluate the algorithm on two real data sets and one that was artificially constructed from time series data.  相似文献   

11.
Unemployment rate prediction has become critically significant, because it can help government to make decision and design policies. In previous studies, traditional univariate time series models and econometric methods for unemployment rate prediction have attracted much attention from governments, organizations, research institutes, and scholars. Recently, novel methods using search engine query data were proposed to forecast unemployment rate. In this paper, a data mining framework using search engine query data for unemployment rate prediction is presented. Under the framework, a set of data mining tools including neural networks (NNs) and support vector regressions (SVRs) is developed to forecast unemployment trend. In the proposed method, search engine query data related to employment activities is firstly extracted. Secondly, feature selection model is suggested to reduce the dimension of the query data. Thirdly, various NNs and SVRs are employed to model the relationship between unemployment rate data and query data, and genetic algorithm is used to optimize the parameters and refine the features simultaneously. Fourthly, an appropriate data mining method is selected as the selective predictor by using the cross-validation method. Finally, the selective predictor with the best feature subset and proper parameters is used to forecast unemployment trend. The empirical results show that the proposed framework clearly outperforms the traditional forecasting approaches, and support vector regression with radical basis function (RBF) kernel is dominant for the unemployment rate prediction. These findings imply that the data mining framework is efficient for unemployment rate prediction, and it can strengthen government’s quick responses and service capability.  相似文献   

12.
ContextSoftware development projects involve the use of a wide range of tools to produce a software artifact. Software repositories such as source control systems have become a focus for emergent research because they are a source of rich information regarding software development projects. The mining of such repositories is becoming increasingly common with a view to gaining a deeper understanding of the development process.ObjectiveThis paper explores the concepts of representing a software development project as a process that results in the creation of a data stream. It also describes the extraction of metrics from the Jazz repository and the application of data stream mining techniques to identify useful metrics for predicting build success or failure.MethodThis research is a systematic study using the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift by applying the Massive Online Analysis (MOA) tool.ResultsThe results indicate that only a relatively small number of the available measures considered have any significance for predicting the outcome of a build over time. These significant measures are identified and the implication of the results discussed, particularly the relative difficulty of being able to predict failed builds. The Hoeffding Tree approach is shown to produce a more stable and robust model than traditional data mining approaches.ConclusionOverall prediction accuracies of 75% have been achieved through the use of the Hoeffding Tree classification method. Despite this high overall accuracy, there is greater difficulty in predicting failure than success. The emergence of a stable classification tree is limited by the lack of data but overall the approach shows promise in terms of informing software development activities in order to minimize the chance of failure.  相似文献   

13.
We describe how statistical association models and, specifically, log linear and graphical models, can be usefully employed to study consumer behaviours. We describe some methodological problems related to the implementation of discrete graphical models for market basket analysis data. In particular, we shall discuss model selection procedures.  相似文献   

14.
Few studies attempt to model the economic feasibility of mining undiscovered mineral resources given the sparseness of data; and the coupled, nonlinear, spatial, and temporal relationships among variables. In this study, a type of unsupervised artificial neural network, called a self-organized map (SOM), is trained using data from 203 porphyry copper deposit sites across the world. The sparse data set includes one dependent variable indicating the economic feasibility, and seventy two independent variables from categories describing characteristics of mining method, metallurgy, dimensions, economics, and amount. Analysis of component planes reveals relations and strengths in the underlying SOM multivariate density function which are used to impute missing values. Application of the Davies–Bouldin criteria to k-means clusters of SOM neurons identified 14 regional economic resource units (conceptual models). A best subsets approach applied to median values from these models identified 20 statistically significant combinations of variables. During model fitting by the multiple linear regression technique, only four of the empirical models had variables that were all significant at the 95% confidence level. The best model explained 98% of the variability in economic feasibility and incorporated variables describing distance to natural gas, road, and water; and the total amount of resources. This model was independently validated by comparing predictions of economic feasibility at 68 mine sites not included in the training data. Eighty-four percent of the reported economic feasibility is correctly predicted with 8 false positives and 2 false negative. We demonstrate the application of this model to a permissive copper porphyry tract that crosses a portion of British Columbia and Yukon territories of Canada. The proposed hybrid approach provides an alternative modeling paradigm for translating estimates of contained metal into meaningful societal measures.  相似文献   

15.
A data mining based approach to discover previously unknown priority dispatching rules for job shop scheduling problem is presented. This approach is based on seeking the knowledge that is assumed to be embedded in the efficient solutions provided by the optimization module built using tabu search. The objective is to discover the scheduling concepts using data mining and hence to obtain a set of rules capable of approximating the efficient solutions for a job shop scheduling problem (JSSP). A data mining based scheduling framework is presented and implemented for a job shop problem with maximum lateness as the scheduling objective.  相似文献   

16.
17.
Traces are everywhere from information systems that store their continuous executions, to any type of health care applications that record each patient’s history. The transformation of a set of traces into a mathematical model that can be used for a formal reasoning is therefore of great value. The discovery of process models out of traces is an interesting problem that has received significant attention in the last years. This is a central problem in Process Mining, a novel area which tries to close the cycle between system design and validation, by resorting on methods for the automated discovery, analysis and extension of process models. In this work, algorithms for the derivation of a Petri net from a set of traces are presented. The methods are grounded on the theory of regions, which maps a model in the state-based domain (e.g., an automata) into a model in the event-based domain (e.g., a Petri net). When dealing with large examples, a direct application of the theory of regions will suffer from two problems: one is the state-explosion problem, i.e., the resources required by algorithms that work at the state-level are sometimes prohibitive. This paper introduces decomposition and projection techniques to alleviate the complexity of the region-based algorithms for Petri net discovery, thus extending its applicability to handle large inputs. A second problem is known as the overfitting problem for region-based approaches, which informally means that, in order to represent with high accuracy the trace set, the models obtained are often spaghetti-like. By focusing on special type of processes called conservative and for which an elegant theory and efficient algorithms can be devised, the techniques presented in this paper alleviate the overfitting problem and moreover incorporate structure into the models generated.  相似文献   

18.
基于神经网络的多传感器火灾预测数据处理   总被引:4,自引:0,他引:4  
为降低火灾报警系统的漏检、误报率,利用神经网络良好的非线性映射能力,对多传感器(温度传感器、烟雾传感器和CO传感器)同时探测到的数据进行智能化处理。仿真结果表明:基于神经网络的多传感器火灾报警系统能准确地识别各种火灾信号,减少了误报,增强了系统的抗干扰能力和对环境的适应性能.  相似文献   

19.
Protein thermostability information is closely linked to commercial production of many biomaterials. Recent developments have shown that amino acid composition, special sequence patterns and hydrogen bonds, disulfide bonds, salt bridges and so on are of considerable importance to thermostability. In this study, we present a system to integrate these various factors that predict protein thermostability. In this study, the features of proteins in the PGTdb are analyzed. We consider both structure and sequence features and correlation coefficients are incorporated into the feature selection algorithm. Machine learning algorithms are then used to develop identification systems and performances between the different algorithms are compared. In this research, two features, (E + F + M + R)/residue and charged/non-charged, are found to be critical to the thermostability of proteins. Although the sequence and structural models achieve a higher accuracy, sequence-only models provides sufficient accuracy for sequence-only thermostability prediction.  相似文献   

20.
This paper presents a hybrid approach of case-based reasoning and rule-based reasoning, as an alternative to the purely rule-based method, to build a clinical decision support system for ICU. This enables the system to tackle problems like high complexity, low experienced new staff and changing medical conditions. The purely rule-based method has its limitations since it requires explicit knowledge of the details of each domain of ICU, such as cardiac domain hence takes years to build knowledge base. Case-based reasoning uses knowledge in the form of specific cases to solve a new problem, and the solution is based on the similarities between the new problem and the available cases. This paper presents a case-based reasoning and rule-based reasoning based model which can provide clinical decision support for all domains of ICU unlike rule-based inference models which are highly domain knowledge specific. Experiments with real ICU data as well as simulated data clearly demonstrate the efficacy of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号