首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
从医疗社交网站的用户评论中挖掘药物副作用时,由于人们可能采用不同的表述方式来描述副作用,而新药的上市与用药者的差异性也会造成新的副作用出现,因此从评论中识别新的副作用名称并进行标准化十分重要。该文利用条件随机场模型识别评论中的副作用,对识别出的副作用名称进行标准化,最后得到药物的副作用。通过将挖掘出的药物已知的副作用与数据库记录进行对比验证了本文方法的有效性,同时得到一个按评论中的发生频率排序的药物潜在副作用列表。实验结果显示,条件随机场模型可以识别出已知的与新的副作用名称,而标准化技术将副作用名称进行聚合与归并,有利于药物副作用的发现。
  相似文献   

2.
Temporal data mining is still one of important research topic since there are application areas that need knowledge from temporal data such as sequential patterns, similar time sequences, cyclic and temporal association rules, and so on. Although there are many studies for temporal data mining, they do not deal with discovering knowledge from temporal interval data such as patient histories, purchaser histories, and web logs etc. We propose a new temporal data mining technique that can extract temporal interval relation rules from temporal interval data by using Allen’s theory: a preprocessing algorithm designed for the generalization of temporal interval data and a temporal relation algorithm for mining temporal relation rules from the generalized temporal interval data. This technique can provide more useful knowledge in comparison with conventional data mining techniques.  相似文献   

3.
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions.  相似文献   

4.
The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes.  相似文献   

5.
Discovering unknown adverse drug reactions (ADRs) in postmarketing surveillance as early as possible is highly desirable. Nevertheless, current postmarketing surveillance methods largely rely on spontaneous reports that suffer from serious underreporting, latency, and inconsistent reporting. Thus these methods are not ideal for rapidly identifying rare ADRs. The multiagent systems paradigm is an emerging and effective approach to tackling distributed problems, especially when data sources and knowledge are geographically located in different places and coordination and collaboration are necessary for decision making. In this article, we propose an active, multiagent framework for early detection of ADRs by utilizing electronic patient data distributed across many different sources and locations. In this framework, intelligent agents assist a team of experts based on the well‐known human decision‐making model called Recognition‐Primed Decision (RPD). We generalize the RPD model to a fuzzy RPD model and utilize fuzzy logic technology to not only represent, interpret, and compute imprecise and subjective cues that are commonly encountered in the ADR problem but also to retrieve prior experiences by evaluating the extent of matching between the current situation and a past experience. We describe our preliminary multiagent system design and illustrate its potential benefits for assisting expert teams in early detection of previously unknown ADRs. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 827–845, 2007.  相似文献   

6.
Many data sets contain temporal records which span a long period of time; each record is associated with a time stamp and describes some aspects of a real-world entity at a particular time (e.g., author information in DBLP). In such cases, we often wish to identify records that describe the same entity over time and so be able to perform interesting longitudinal data analysis. However, existing record linkage techniques ignore temporal information and fall short for temporal data. This article studies linking temporal records. First, we apply time decay to capture the effect of elapsed time on entity value evolution. Second, instead of comparing each pair of records locally, we propose clustering methods that consider the time order of the records and make global decisions. Experimental results show that our algorithms significantly outperform traditional linkage methods on various temporal data sets.  相似文献   

7.
电子病历文本挖掘研究综述   总被引:1,自引:0,他引:1  
电子病历是医院信息化发展的产物,其中包含了丰富的医疗信息和临床知识,是辅助临床决策和药物挖掘等的重要资源.因此,如何高效地挖掘大量电子病历数据中的信息是一个重要的研究课题.近些年来,随着计算机技术尤其是机器学习以及深度学习的蓬勃发展,对电子病历这一特殊领域数据的挖掘有了更高的要求.电子病历综述旨在通过对电子病历研究现状...  相似文献   

8.
9.
数据挖掘技术为商业银行信用风险管理问题提供了新的思路和方法。本文运用三种常用的数据挖掘方法——多元判别分析、聚类分析及贝叶斯网络模型,以商业银行的客户信用风险评级指标数据为样本,对信用风险评估方法进行实证分析,对三种方法的验证结果进行比较。结论表明,在信用风险各项属性指标之间条件相互依赖的情况下,贝叶斯网络模型优于其它两种方法。  相似文献   

10.
A large volume of research in temporal data mining is focusing on discovering temporal rules from time-stamped data. The majority of the methods proposed so far have been mainly devoted to the mining of temporal rules which describe relationships between data sequences or instantaneous events and do not consider the presence of complex temporal patterns into the dataset. Such complex patterns, such as trends or up and down behaviors, are often very interesting for the users. In this paper we propose a new kind of temporal association rule and the related extraction algorithm; the learned rules involve complex temporal patterns in both their antecedent and consequent. Within our proposed approach, the user defines a set of complex patterns of interest that constitute the basis for the construction of the temporal rule; such complex patterns are represented and retrieved in the data through the formalism of knowledge-based Temporal Abstractions. An Apriori-like algorithm looks then for meaningful temporal relationships (in particular, precedence temporal relationships) among the complex patterns of interest. The paper presents the results obtained by the rule extraction algorithm on a simulated dataset and on two different datasets related to biomedical applications: the first one concerns the analysis of time series coming from the monitoring of different clinical variables during hemodialysis sessions, while the other one deals with the biological problem of inferring relationships between genes from DNA microarray data.  相似文献   

11.
时序不变式反映了事件间的时序逻辑关系,被广泛应用于异常检测、系统行为理解、模型推理等技术.在实际使用中,一般通过分析软件系统的日志数据挖掘时序不变式.相比全序日志,偏序日志可为挖掘算法提供更为准确的数据来源.但是,现有的基于偏序日志的时序不变式挖掘方法存在效率较低等问题.为此,以系统执行路径为数据来源,提出了一种基于集...  相似文献   

12.
高效隐私保护频繁模式挖掘算法研究   总被引:1,自引:0,他引:1  
阐述了隐私保护数据挖掘的目标,即在获取有效的数据挖掘结果的同时,满足用户对隐私保护的要求.针对个体用户及组织用户的隐私保护,论述了不同的方法,并归纳出隐私保护数据挖掘中所采用的两种主流算法.改进了高效隐私保护关联规则挖掘算法(EMASK)中需要完全的数据库扫描并且进行多次比较操作的弊端,提出了基于粒度计算的高效隐私保护频繁模式挖掘算法(BEMASK).该算法将关系数据表转换成面向机器的关系模型,数据处理被转换成粒度计算的方式,计算频繁项集变成了计算基本颗粒的交集.特别是数据的垂直Bitmap表示,在保证准确性不降低的情况下,一方面减少了I/O操作的次数,另一方面较大地提高了效率.  相似文献   

13.
Mining frequent arrangements of temporal intervals   总被引:3,自引:3,他引:0  
The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Three efficient methods to find frequent arrangements of temporal intervals are described; the first two are tree-based and use breadth and depth first search to mine the set of frequent arrangements, whereas the third one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets.  相似文献   

14.
Various data mining methods have been developed last few years for hepatitis study using a large temporal and relational database given to the research community. In this work we introduce a novel temporal abstraction method to this study by detecting and exploiting temporal patterns and relations between events in viral hepatitis such as “event A slightly happened before event B and B simultaneously ended with event C”. We developed algorithms to first detect significant temporal patterns in temporal sequences and then to identify temporal relations between these temporal patterns. Many findings by data mining methods applied to transactions/graphs of temporal relations shown to be significant by physician evaluation and matching with published in Medline.  相似文献   

15.
In medical information system, the data that describe patient health records are often time stamped. These data are liable to complexities such as missing data, observations at irregular time intervals and large attribute set. Due to these complexities, mining in clinical time-series data, remains a challenging area of research. This paper proposes a bio-statistical mining framework, named statistical tolerance rough set induced decision tree (STRiD), which handles these complexities and builds an effective classification model. The constructed model is used in developing a clinical decision support system (CDSS) to assist the physician in clinical diagnosis. The STRiD framework provides the following functionalities namely temporal pre-processing, attribute selection and classification. In temporal pre-processing, an enhanced fuzzy-inference based double exponential smoothing method is presented to impute the missing values and to derive the temporal patterns for each attribute. In attribute selection, relevant attributes are selected using the tolerance rough set. A classification model is constructed with the selected attributes using temporal pattern induced decision tree classifier. For experimentation, this work uses clinical time series datasets of hepatitis and thrombosis patients. The constructed classification model has proven the effectiveness of the proposed framework with a classification accuracy of 91.5% for hepatitis and 90.65% for thrombosis.  相似文献   

16.
本文主要研究深度学习在抗菌药物使用方法分类及数据挖掘应用,在现有的疾病和电子病历抗菌药物使用方法的文本数据挖掘过程中,利用基于注意力机制的长短期记忆网络模型训练抗菌药物语料数据,通过自我学习特征的方式表示和理解问题,避免人工特征的提取误差,使分类的准确率最大值较传统数据挖掘方法提高至89.97%,从而更好地为不同疾病患者提供相应的抗菌药物治疗方案.根据实验结果,该方法在不需要人工制定特征规则的条件下,可以自主学习生成治疗方案知识库,从而为医生治疗患者提供最佳的辅助决策支持.  相似文献   

17.
In data mining and knowledge discovery, there are two conflicting goals: privacy protection and knowledge preservation. On the one hand, we anonymize data to protect privacy; on the other hand, we allow miners to discover useful knowledge from anonymized data. In this paper, we present an anonymization method which provides both privacy protection and knowledge preservation. Unlike most anonymization methods, where data are generalized or permuted, our method anonymizes data by randomly breaking links among attribute values in records. By data randomization, our method maintains statistical relations among data to preserve knowledge, whereas in most anonymization methods, knowledge is lost. Thus the data anonymized by our method maintains useful knowledge for statistical study. Furthermore, we propose an enhanced algorithm for extra privacy protection to tackle the situation where the user’s prior knowledge of original data may cause privacy leakage. The privacy levels and the accuracy of knowledge preservation of our method, along with their relations to the parameters in the method are analyzed. Experiment results demonstrate that our method is effective on both privacy protection and knowledge preservation comparing with existing methods.  相似文献   

18.
In this paper we describe the application of data mining methods for predicting the evolution of patients in an intensive care unit. We discuss the importance of such methods for health care and other application domains of engineering. We argue that this problem is an important but challenging one for the current state of the art data mining methods and explain what improvements on current methods would be useful. We present a promising study on a preliminary data set that demonstrates some of the possibilities in this area.  相似文献   

19.
20.
Association rule mining is an important topic in data mining. The problem is to discover all (or almost all) associations among items in the transaction database that satisfy some user-specified constraints. Usually, the constraints are related to minimal support and minimal confidence. Class association rules (CARs) are a special type of association rules that can be applied for classification problem. Previous research showed that classification based on association rules has higher accuracy than can be achieved with an inductive learning algorithm or C4.5. As such, many methods have been proposed for mining CARs, although these use batch processing. However, datasets are often changed, with records added or/and deleted, and consequently updating CARs is a challenging problem. This paper proposes an efficient method for updating CARs when records are deleted. First, we use an MECR-tree to store nodes for the original dataset. The information in the nodes of this tree are updated based on the deleted records. Second, the concept of pre-large itemsets is used to avoid rescanning the original dataset. Finally, we propose an algorithm to efficiently update and generate CARs. We also analyze the time complexity to show the efficiency of our proposed algorithm. The experimental results show that the proposed method outperforms mining CARs from the dataset after record deletion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号