首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 13 毫秒
1.
针对医院临床数据数量庞大,数据之间关联性大,容易出现数据提取不准确等问题,提出基于模糊分类处理技术的医院临床数据智能分类方法。通过对临床运营各项指标的说明,根据指标分析数据的特性;对医院临床数据进行检索,将检索出来的数据进行提取,根据数据的特点,使用模糊分类的技术对数据进行处理,完成临床数据的智能分类。实验结果表明,所提方法对临床数据的分类效果远远优于传统方法,满足了医院对数据处理的要求,为未来医院大量的数据分类处理奠定了坚实的基础。  相似文献   

2.
不均衡数据集学习中基于初分类的过抽样算法   总被引:2,自引:0,他引:2  
韩慧  王路  温明  王文渊 《计算机应用》2006,26(8):1894-1897
为了有效地提高不均衡数据集中少数类的分类性能,提出了基于初分类的过抽样算法。首先,对测试集进行初分类,以尽可能多地保留多数类的有用信息;其次,对于被初分类预测为少数类的样本进行再次分类,以有效地提高少数类的分类性能。使用美国加州大学欧文分校的数据集将基于初分类的过抽样算法与合成少数类过抽样算法、欠抽样方法进行了实验比较。结果表明,基于初分类的过抽样算法的少数类与多数类的分类性能都优于其他两种算法。  相似文献   

3.
不平衡数据分类的研究现状*   总被引:9,自引:3,他引:6  
不平衡数据在实际应用中广泛存在,它们已对机器学习领域构成了一个挑战,如何有效处理不平衡数据也成为目前的一个新的研究热点.综述了这一新领域的研究现状,包括该领域最新研究内容、方法及成果.  相似文献   

4.
In practice, there are many binary classification problems, such as credit risk assessment, medical testing for determining if a patient has a certain disease or not, etc. However, different problems have different characteristics that may lead to different difficulties of the problem. One important characteristic is the degree of imbalance of two classes in data sets. For data sets with different degrees of imbalance, are the commonly used binary classification methods still feasible? In this study, various binary classification models, including traditional statistical methods and newly emerged methods from artificial intelligence, such as linear regression, discriminant analysis, decision tree, neural network, support vector machines, etc., are reviewed, and their performance in terms of the measure of classification accuracy and area under Receiver Operating Characteristic (ROC) curve are tested and compared on fourteen data sets with different imbalance degrees. The results help to select the appropriate methods for problems with different degrees of imbalance.  相似文献   

5.
There are relatively few institutions that have developed clinical data warehouses, containing patient data from the point of care. Because of the various care practices, data types and definitions, and the perceived incompleteness of clinical information systems, the development of a clinical data warehouse is a challenge.In order to deal with managerial and clinical information needs, as well as educational and research aims that are important in the setting of a university hospital, Erasmus Medical Center Rotterdam, The Netherlands, developed a data warehouse incrementally. In this paper we report on the in-house development of an integral part of the data warehouse specifically for the intensive care units (ICU-DWH). It was modeled using Atos Origin Metadata Frame method. The paper describes the methodology, the development process and the content of the ICU-DWH, and discusses the need for (clinical) data warehouses in intensive care.  相似文献   

6.
Since the introduction of DNA microarray technology, there has been an increasing interest on clinical application for cancer diagnosis. However, in order to effectively translate the advances in the field of microarray-based classification into the clinic area, there are still some problems related with both model performance and biological interpretability of the results. In this paper, a novel ensemble model is proposed able to integrate prior knowledge in the form of gene sets into the whole microarray classification process. Each gene set is used as an informed feature selection subset to train several base classifiers in order to estimate their accuracy. This information is later used for selecting those classifiers comprising the final ensemble model. The internal architecture of the proposed ensemble allows the replacement of both base classifiers and the heuristics employed to carry out classifier fusion, thereby achieving a high level of flexibility and making it possible to configure and adapt the model to different contexts. Experimental results using different datasets and several gene sets show that the proposal is able to outperform classical alternatives by using existing prior knowledge adapted from publicly available databases.  相似文献   

7.
依据粗糙集理论,分析决策表中条件属性的分类的变化,使得决策子集和决策规则支持度发生变化的情况.经过门纳推理得出选取最佳属性分类法的判定方法及算法.最后,通过仿真结果验证了算法2的有效.  相似文献   

8.
A rapid hierarchical classification program enables the clustering of 5000 elements in only a few minutes of central processor time using an IBM 370/168 computer. The program algorithm, based on the reductibility axiom in graph theory, is related to the criterion of correspondence analysis. Its application to a set of hydrogeological data is described briefly.  相似文献   

9.
计算机安全漏洞分类研究   总被引:2,自引:0,他引:2  
计算机及网络安全问题的根源在于计算机漏洞的存在。漏洞是实施网络攻击和加强网络防护的关键因素,漏洞的分类研究是漏洞研究的基础。该文首先给出了计算机安全漏洞的定义并分析了漏洞分类的重要意义,然后介绍了典型的漏洞分类方法和目前常用的漏洞分类法,在此基础上,从多维分类方式和动态变化方式上提出了漏洞分类的进一步研究方法。  相似文献   

10.
基于欠采样的不均衡数据分类算法是一种随机数据优化算法,但它不能最好地反映中医临床原始数据的分布并解决数据的特征冗余问题。提出了基于预测风险的最远病例不均衡装袋算法(PRFS-FPUSAB)。该算法中首先基于欠采样提出了改进的抽样方式尽可能地反映原始数据分布,然后结合集成学习、预测风险标准提高不均衡的分类性能并进行特征选择。在中医临床采集的经络电阻数据上的实验结果表明,该算法改善了曲线下面积并且选择的特征也符合中医学相关理论。  相似文献   

11.
Classification on medical data raises several problems such as class imbalance, double meaning of missing data, volumetry or need of highly interpretable results. In this paper a new algorithm is proposed: MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data), a multi-objective local search algorithm that is conceived to deal with these issues all together. It is based on a new modelization as a Pittsburgh multi-objective partial classification rule mining problem, which is described in the first part of this paper. An existing dominance-based multi-objective local search (DMLS) is modified to deal with this modelization. After experimentally tuning the parameters of MOCA-I and determining which version of DMLS algorithm is the most effective, the obtained MOCA-I version is compared to several state-of-the-art classification algorithms. This comparison is realized on 10 small and middle-sized data sets of literature and 2 real data sets; MOCA-I obtains the best results on the 10 data sets and is statistically better than other approaches on the real data sets.  相似文献   

12.
数据集中数据之间往往相互关联,所有数据整体上呈现特定的模式结构,而传统分类方法(如支持向量机)忽略数据关联信息,仅仅利用数据的物理特征(如距离、相似性等)构建数据分类模型,并在分类阶段计算测试样本与所建立分类模型间的相似性来预测测试样本的标签类型。为了解决传统分类方法利用单一数据信息的问题,提出一种挖掘数据模式结构信息的混合数据分类方法。该方法融合了两种不同类型的分类技术,将使用单一数据物理特征的传统分类方法作为普通分类方法,将利用数据模式结构信息的分类方法作为高级分类方法。特别地,该方法不仅可有效地识别数据模式结构信息以提高数据分类性能,还能提高传统分类方法的泛化能力。在人造数据集和UCI真实数据集上的大量实验结果表明了该混合数据分类方法的有效性,其分类性能优于传统分类方法。  相似文献   

13.
Automatic intensity-based tissue classification sets requirements for the quality of multispectral magnetic resonance (MR) images. Tests for evaluating the separability of tissue classes, and on the other hand class distances required to obtain reliable classification, are presented in this study. Intraslice, interslice and interpatient training schemes for 5-nn classification were considered. Interslice training was utilized in classification of images from 10 patients with ischemic stroke giving results of satisfactory but highly variable quality. Based on the experience with these data sets, similar tests are recommended before imaging a large patient series in order to avoid extra manual work and to obtain reliable classification results.  相似文献   

14.
The significance of the preprocessing stage in any data mining task is well known. Before attempting medical data classification, characteristics ofmedical datasets, including noise, incompleteness, and the existence of multiple and possibly irrelevant features, need to be addressed. In this paper, we show that selecting the right combination of preprocessing methods has a considerable impact on the classification potential of a dataset. The preprocessing operations considered include the discretization of numeric attributes, the selection of attribute subset(s), and the handling of missing values. The classification is performed by an ant colony optimization algorithm as a case study. Experimental results on 25 real-world medical datasets show that a significant relative improvement in predictive accuracy, exceeding 60% in some cases, is obtained.  相似文献   

15.
Support for capturing architectural knowledge has been identified as an important research challenge. As the basis for an approach to recovering design decisions and capturing their rationale, we performed an expert survey in practice to gain insights into the different kinds, influence factors, and sources for design decisions and also into how they are currently captured in practice. The survey was conducted with 25 software architects, software team leads, and senior developers from 22 different companies in 10 different countries with more than 13 years of experience in software development on average. The survey confirms earlier work by other authors on design decision classification and influence factors, and also identifies additional kinds of decisions and influence factors not mentioned in previous work. In addition, we gained insight into the practice of capturing, the relative importance of different decisions and influence factors, and into potential sources for recovering decisions.  相似文献   

16.
This study evaluated the synergistic use of high spatial resolution multispectral imagery (i.e., QuickBird, 2.4 m) and low-posting-density LIDAR data (3 m) for forest species classification using an object-based approach. The integration of QuickBird multispectral imagery and LIDAR data was considered during image segmentation and the subsequent object-based classification. Three segmentation schemes were examined: (1) segmentation based solely on the spectral image layers; (2) segmentation based solely on LIDAR-derived layers; and (3) segmentation based on both the spectral and LIDAR-derived layers. For each segmentation scheme, objects were generated at twelve different scales in order to determine optimal scale parameters. Six categories of classification metrics were generated for each object based on spectral data alone, LIDAR data alone and the combination of both data sources. Machine learning decision trees were used to build classification rule sets. Quantitative segmentation quality assessment and classification accuracy results showed the integration of spectral and LIDAR data, in both image segmentation and object-based classification, improved the forest classification compared to using either data source independently. Better segmentation quality led to higher classification accuracy. The highest classification accuracy (Kappa = 91.6%) was acquired when using both spectral- and LIDAR-derived metrics based on objects segmented from both spectral and LIDAR layers at scale parameter 250, where best segmentation quality was achieved. Optimal scales were analyzed for each segmentation-classification scheme. Statistical analysis of classification accuracies at different scales revealed that there was a range of optimal scales that provided statistically similar accuracy.  相似文献   

17.
A novel, language-based interface to the specification of multivariate volume classification and shading algorithms has been implemented. The system facilitates experimentation by providing access to data relevant to volume classification and shading (scalars, gradients, and gradient magnitudes) in a C-like language environment. The user writes code to calculate opacity and colour on a per voxel basis. The code is interpreted and compiled in a transparent fashion and then executed on a volume data-set. The output is a volume primitive suitable for input to standard volume rendering algorithms.  相似文献   

18.
代价敏感学习是解决不均衡数据分类问题的一个重要策略,数据特征的非线性也给分类带来一定困难,针对此问题,结合代价敏感学习思想与核主成分分析KPCA提出一种代价敏感的Stacking集成算法KPCA-Stacking.首先对原始数据集采用自适应综合采样方法(ADASYN)进行过采样并进行KPCA降维处理;其次将KNN、LD...  相似文献   

19.
用于不均衡数据集分类的KNN算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对KNN在处理不均衡数据集时,少数类分类精度不高的问题,提出了一种改进的算法G-KNN。该算法对少数类样本使用交叉算子和变异算子生成部分新的少数类样本,若新生成的少数类样本到父代样本的欧几里德距离小于父代少数类之间的最大距离,则认为是有效样本,并把这类样本加入到下轮产生少数类的过程中。在UCI数据集上进行测试,实验结果表明,该方法与KNN算法中应用随机抽样相比,在提高少数类的分类精度方面取得了较好的效果。  相似文献   

20.
US hospitals now fully embrace electronic documentation systems as a way to reduce medical errors and improve patient safety outcomes. Whether spending time on electronic documentation detracts from the time available for direct patient care, however, is still unresolved. There is no knowledge on the permanent effects of documenting electronically and whether it takes away significant time from patient care when the healthcare information system is mature. To understand the time spent on documentation, direct patient care tasks, and other clinical tasks in a mature information system, we conducted an observational and interview study in a midwestern academic hospital. The hospital implemented an electronic medical record system 11 years ago. We observed 22 health care workers across intensive care units, inpatient floors, and an outpatient clinic in the hospital. Results show that healthcare workers spend more time on documentation activities compared to patient care activities. Clinical roles have no influence on the time spent on documentation. This paper describes results on the time spent between documentation and patient care tasks, and discusses implications for future practice.Relevance to industryThe study applies to healthcare industry that faces immense challenges in balancing documentation activities and patient care activities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号