首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.

Similar to many other professions, the medical field has undergone immense automation during the past decade. The complexity and rise of healthcare data led to a surge in artificial intelligence applications. Despite increased automation, such applications lack the desired accuracy and efficiency for healthcare problems. To address the aforementioned issue, this study presents an automatic health care system that can effectively substitute a doctor at an initial stage of diagnosis and help save time by recommending the necessary precautions. The proposed approach comprises two modules where Modul-1 aims at training the machine learning models using the disease symptoms dataset and their corresponding symptoms and precautions. Preprocessing and feature extraction are done as prerequisite steps. In Module-1 several algorithms are applied to the disease dataset such as support vector machine, random forest, extra trees classifier, logistic regression, multinomial naive Bayes, and decision tree. Module-2 interacts with the user (patient) through which the patient can describe the illness symptoms using a microphone. The voice data are transformed into text using the Google speech recognizer. The transformed data is later used with the trained model for disease prediction, as well as, recommending the precautions. The proposed approach achieves an accuracy of 99.9% during the real-time evaluation.

  相似文献   

2.
Heterogeneous performance prediction models are valuable tools to accurately predict application runtime, allowing for efficient design space exploration and application mapping. The existing performance models require intricate system architecture knowledge, making the modeling task difficult. In this research, we propose a regression‐based performance prediction framework for general purpose graphical processing unit (GPGPU) clusters that statistically abstracts the system architecture characteristics, enabling performance prediction without detailed system architecture knowledge. The regression‐based framework targets deterministic synchronous iterative algorithms using our synchronous iterative GPGPU execution model and is broken into two components: the computation component that models the GPGPU device and host computations and the communication component that models the network‐level communications. The computation component regression models use algorithm characteristics such as the number of floating‐point operations and total bytes as predictor variables and are trained using several small, instrumented executions of synchronous iterative algorithms that include a range of floating‐point operations‐to‐byte requirements. The regression models for network‐level communications are developed using micro‐benchmarks and employ data transfer size and processor count as predictor variables. Our performance prediction framework achieves prediction accuracy over 90% compared with the actual implementations for several tested GPGPU cluster configurations. The end goal of this research is to offer the scientific computing community, an accurate and easy‐to‐use performance prediction framework that empowers users to optimally utilize the heterogeneous resources. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.  相似文献   

4.
Some medical and epidemiological surveys have been designed to predict a nominal response variable with several levels. With regard to the type of pregnancy there are four possible states: wanted, unwanted by wife, unwanted by husband and unwanted by couple. In this paper, we have predicted the type of pregnancy, as well as the factors influencing it using two different models and comparing them. Regarding the type of pregnancy with several levels, we developed a multinomial logistic regression and a neural network based on the data and compared their results using three statistical indices: sensitivity, specificity and kappa coefficient. Based on these three indices, neural network proved to be a better fit for prediction on data in comparison to multinomial logistic regression. When the relations among variables are complex, one can use neural networks instead of multinomial logistic regression to predict the nominal response variables with several levels in order to gain more accurate predictions.  相似文献   

5.
Clinical decision support systems (CDSSs) have the potential to save lives and reduce unnecessary costs through early detection and frequent monitoring of both traditional risk factors and novel biomarkers for cardiovascular disease (CVD). However, the widespread adoption of CDSSs for the identification of heart diseases has been limited, likely due to the poor interpretability of clinically relevant results and the lack of seamless integration between measurements and disease predictions. In this paper we present the Cardiac ScoreCard—a multivariate index assay system with the potential to assist in the diagnosis and prognosis of a spectrum of CVD. The Cardiac ScoreCard system is based on lasso logistic regression techniques which utilize both patient demographics and novel biomarker data for the prediction of heart failure (HF) and cardiac wellness. Lasso logistic regression models were trained on a merged clinical dataset comprising 579 patients with 6 traditional risk factors and 14 biomarker measurements. The prediction performance of the Cardiac ScoreCard was assessed with 5-fold cross-validation and compared with reference methods. The experimental results reveal that the ScoreCard models improved performance in discriminating disease versus non-case (AUC = 0.8403 and 0.9412 for cardiac wellness and HF, respectively), and the models exhibit good calibration. Clinical insights to the prediction of HF and cardiac wellness are provided in the form of logistic regression coefficients which suggest that augmenting the traditional risk factors with a multimarker panel spanning a diverse cardiovascular pathophysiology provides improved performance over reference methods. Additionally, a framework is provided for seamless integration with biomarker measurements from point-of-care medical microdevices, and a lasso-based feature selection process is described for the down-selection of biomarkers in multimarker panels.  相似文献   

6.
The multinomial distribution has been widely used to model count data. To increase clustering efficiency, we use an approximation to the Fisher scoring algorithm, which is more robust regarding the choice of initial parameter values. Then, we use a novel approach to estimate the optimal number of components, based on minimum message length criterion. Moreover, we consider a generalization of the multinomial model obtained by introducing the Dirichlet as prior, yielding the Dirichlet Compound Multinomial (DCM). Even though DCM can address the burstiness phenomenon of count data, the presence of Gamma function in its density function usually leads to undesired complications. In this article, we use two alternative representations of DCM distribution to perform clustering based on finite mixture models, where the mixture parameters are estimated using the minorization–maximization framework. To evaluate and compare the performance of our proposed models, we have considered three challenging real‐world applications that involve high‐dimensional count vectors, namely, sentiment analysis, facial expression recognition, and human action recognition. The results show that the proposed algorithms increase the clustering efficiency of their respective models remarkably, and the best results are achieved by the second parametrization of DCM, which can accommodate over‐dispersed count data.  相似文献   

7.
Several new estimators of the marginal likelihood for complex non-Gaussian models are developed. These estimators make use of the output of auxiliary mixture sampling for count data and for binary and multinomial data. One of these estimators is based on combining Chib’s estimator with data augmentation as in auxiliary mixture sampling, while the other estimators are importance sampling and bridge sampling based on constructing an unsupervised importance density from the output of auxiliary mixture sampling. These estimators are applied to a logit regression model, to a Poisson regression model, to a binomial model with random intercept, as well as to state space modeling of count data.  相似文献   

8.
Monitoring gene expression profiles is a novel approach to cancer diagnosis. Several studies have showed that the sparse logistic regression is a useful classification method for gene expression data. Not only does it give a sparse solution with high accuracy, it provides the user with explicit probabilities of classification apart from the class information. However, its optimal extension to more than two classes is not obvious. In this paper, we propose a multiclass extension of sparse logistic regression. Analysis of five publicly available gene expression data sets shows that the proposed method outperforms the standard multinomial logistic model in prediction accuracy as well as gene selectivity.  相似文献   

9.
Cancer class prediction and discovery is beneficial to imperfect non-automated cancer diagnoses which affect patient cancer treatments. Serial Analysis of Gene Expression (SAGE) is a relatively new method for monitoring gene expression levels and is expected to contribute significantly to the progress in cancer treatment by enabling an automatic, precise and early diagnosis. A promising application of SAGE gene expression data is classification of cancers. In this paper, we build three event models (the multivariate Bernoulli model, the multinomial model and the normalized multinomial model) for SAGE gene expression profiles. The event models based methods are compared with the standard Naïve Bayes method. Both binary classification and multicategory classification are investigated. Experiments results on several SAGE datasets show that event models are better than standard Naïve Bayes in general. Normalized Information Gain (NIG), an extension of Information Gain (IG), is proposed for gene selection. The impact of gene correlation on the classification performance is investigated.  相似文献   

10.
The availability of a large amount of medical data leads to the need of intelligent disease prediction and analysis tools to extract hidden information. A large number of data mining and statistical analysis tools are used for disease prediction. Single data‐mining techniques show acceptable level of accuracy for heart disease diagnosis. This article focuses on prediction and analysis of heart disease using weighted vote‐based classifier ensemble technique. The proposed ensemble model overcomes the limitations of conventional data‐mining techniques by employing the ensemble of five heterogeneous classifiers: naive Bayes, decision tree based on Gini index, decision tree based on information gain, instance‐based learner, and support vector machines. We have used five benchmark heart disease data sets taken from UCI repository. Each data set contains different set of feature space that ultimately leads to the prediction of heart disease. The effectiveness of proposed ensemble classifier is investigated by comparing the performance with different researchers' techniques. Tenfold cross‐validation is used to handle the class imbalance problem. Moreover, confusion matrices and analysis of variance statistics are used to show the prediction results of all classifiers. The experimental results verify that the proposed ensemble classifier can deal with all types of attributes and it has achieved the high diagnosis accuracy of 87.37%, sensitivity of 93.75%, specificity of 92.86%, and F‐measure of 82.17%. The F‐ratio higher than the F‐critical and p‐value less than 0.01 for a 95% confidence interval indicate that the results are statistically significant for all the data sets.  相似文献   

11.
为提高民航飞机发动机性能参数的预测精度,本文提出一种基于模糊推理和XGBoost算法的发动机性能参数预测方法。对发动机进行总体性能分析,确定油门杆位置、气压高度、总温、全重、马赫数及飞行阶段为影响发动机性能参数的主要因素。其次采用模糊推理对快速存取记录器(QAR)数据进行纵向飞行阶段划分,消除人为划分训练数据对预测精度的主观影响。最后,建立各发动机性能参数的XGBoost预测模型,并与多种预测模型进行对比实验。实验结果表明:对发动机N1、燃油流量参数的预测,XGBoost预测模型相比支持向量回归(SVM)、线性回归模型和BP神经网络,其精度更高且不需要对训练数据进行缩放。  相似文献   

12.
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these “less thans” is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data.We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards.  相似文献   

13.
ContextEffort-aware models, e.g., effort-aware bug prediction models aim to help practitioners identify and prioritize buggy software locations according to the effort involved with fixing the bugs. Since the effort of current bugs is not yet known and the effort of past bugs is typically not explicitly recorded, effort-aware bug prediction models are forced to use approximations, such as the number of lines of code (LOC) of the predicted files.ObjectiveAlthough the choice of these approximations is critical for the performance of the prediction models, there is no empirical evidence on whether LOC is actually a good approximation. Therefore, in this paper, we investigate the question: is LOC a good measure of effort for use in effort-aware models?MethodWe perform an empirical study on four open source projects, for which we obtain explicitly-recorded effort data, and compare the use of LOC to various complexity, size and churn metrics as measures of effort.ResultsWe find that using a combination of complexity, size and churn metrics are a better measure of effort than using LOC alone. Furthermore, we examine the impact of our findings on previous effort-aware bug prediction work and find that using LOC as a measure for effort does not significantly affect the list of files being flagged, however, using LOC under-estimates the amount of effort required compared to our best effort predictor by approximately 66%.ConclusionStudies using effort-aware models should not assume that LOC is a good measure of effort. For the case of effort-aware bug prediction, using LOC provides results that are similar to combining complexity, churn, size and LOC as a proxy for effort when prioritizing the most risky files. However, we find that for the purpose of effort-estimation, using LOC may under-estimate the amount of effort required.  相似文献   

14.
Most methods of selecting an appropriate log-linear model for categorical data are sensitive to the underlying distributional assumptions. However, there are many situations in which the assumption that the data are randomly chosen from an underlying Poisson, multinomial or product-multinomial distribution cannot be sustained. In these cases we propose a criterion to select among log-linear models that is an analogue of the Cp statistic for regression models and describe a method to estimate the denominator of this statistic.  相似文献   

15.
Healthcare data analysis is currently a challenging and crucial research issue for the development of a robust disease diagnosis and prediction system. Many specific and a few common methods have been discussed in the literature for healthcare data classification. The present study implements 32 classification methods of six categories (Bayes, function‐based, lazy, meta, rule‐based, and tree‐based) with the objective of searching the best and common categories and methods in healthcare data mining. The performance of each classification method has been evaluated based on analysis time, classification accuracy, precision, recall, F‐measure, area under the receiver operating characteristic curve, root mean square error, kappa coefficient, Kulczynski's measure, and Fowlkes–Mallows index and compared with more than 90 classification methods used in past studies. Seventeen healthcare datasets related to thyroid, cancer, skin disease, heart disease, hepatitis, lymphography, audiology, diabetes, surgery, arrhythmia, postsurvival, liver, and tumour have been used in the performance assessment of the classification methods. The tree‐based classification methods have a better performance (with an average classification accuracy of 79.92% and maximum accuracy of 99.50%; an analysis time of 3.91 s for the logistic model tree classifier) than the other methods. Furthermore, the association of datasets and classification methods has been discussed.  相似文献   

16.
死亡风险预测指根据病人临床体征监测数据来预测未来一段时间的死亡风险。对于ICU病患,通过死亡风险预测可以有针对性地对病人做出临床诊断,以及合理安排有限的医疗资源。基于临床使用的MEWS和Glasgow昏迷评分量表,针对ICU病人临床监测的17项生理参数,提出一种基于多通道的ICU脑血管疾病死亡风险预测模型。引入多通道概念应用于BiLSTM模型,用于突出每个生理参数对死亡风险预测的作用。采用Attention机制用于提高模型预测精度。实验数据来自MIMIC [Ⅲ]数据库,从中提取3?080位脑血管疾病患者的16?260条记录用于此次研究,除了六组超参数实验之外,将所提模型与LSTM、Multichannel-BiLSTM、逻辑回归(logistic regression)和支持向量机(support vector machine, SVM)四种模型进行了对比分析,准确率Accuracy、灵敏度Sensitive、特异性Specificity、AUC-ROC和AUC-PRC作为评价指标,实验结果表明,所提模型性能优于其他模型,AUC值达到94.3%。  相似文献   

17.
Classification is an important task in data mining. Class imbalance has been reported to hinder the performance of standard classification models. However, our study shows that class imbalance may not be the only cause to blame for poor performance. Rather, the underlying complexity of the problem may play a more fundamental role. In this paper, a decision tree method based on Kolmogorov-Smirnov statistic (K-S tree), is proposed to segment the training data so that a complex problem can be divided into several easier sub-problems where class imbalance becomes less challenging. K-S tree is also used to perform feature selection, which not only selects relevant variables but also removes redundant ones. After segmentation, a two-way re-sampling method is used at the segment level to empirically determine the optimal sampling percentage and the rebalanced data is used to fit logistic regression models, also at the segment level. The effectiveness of the proposed method is demonstrated through its application on property refinance prediction.  相似文献   

18.
针对神经网络和决策树方法在算法上的本质联系和互补优势,将C4.5决策树提取规则的基于知识的神经网络(knowledgebased neural network,KBNN)用于出行方式预测。对居民通勤出行方式选择数据的分析表明,KBNN相比于决策树方法、普通前馈神经网络和多项Logit模型(MNL)有更高的预测精度,方法不仅提高了网络的可解释性,且易于构造、收敛速度更快,实用性较强,为出行方式选择预测提供了新的思路。  相似文献   

19.
Current approaches to obtain lumbar morphometry data usually require expensive medical imaging technology, long processing time, and are often limited by small sample size. This study develops regression models for the cross-sectional areas (CSAs) of the lower lumbar (i.e., from L3/L4 to L5/S1 level) intervertebral discs (IVDs) and vertebral endplates (EPs) using both simple and complex anthropometric variables. CSAs were measured using OsiriX© software, based on 3T magnetic resonance imaging (MRI) scans from a sample of 13 females and 22 males, aged between 20 and 40, and asymptomatic of low back disorders. Comprehensive body anthropometry data were collected and included in the regression analyses. Several multiple regression models were developed with varying levels of complexity. Subject stature, elbow dimensions, and ankle dimensions were statistically significant predictors for the CSAs of IVDs and EPs. Gender exhibited a more predictive relationship with the CSAs when compared to body weight and age. In general, regression models using newly proposed best subset procedure resulted in smaller prediction errors, compared to the models using easy-to-measure variables (i.e., gender, age, height, and weight). However, simple regression models are still worthy of investigation given the low cost, ease of data collection, and satisfactory model performance.  相似文献   

20.
ContextSoftware Process Engineering promotes the systematic production of software by following a set of well-defined technical and management processes. A comprehensive management of these processes involves the accomplishment of a number of activities such as model design, verification, validation, deployment and evaluation. However, the deployment and evaluation activities need more research efforts in order to achieve greater automation.ObjectiveWith the aim of minimizing the required time to adapt the tools at the beginning of each new project and reducing the complexity of the construction of mechanisms for automated evaluation, the Software Process Deployment & Evaluation Framework (SPDEF) has been elaborated and is described in this paper.MethodThe proposed framework is based on the application of well-known techniques in Software Engineering, such as Model Driven Engineering and Information Integration through Linked Open Data. It comprises a systematic method for the deployment and evaluation, a number of models and relationships between models, and some software tools.ResultsAutomated deployment of the OpenUP methodology is tested through the application of the SPDEF framework and support tools to enable the automated quality assessment of software development or maintenance projects.ConclusionsMaking use of the method and the software components developed in the context of the proposed framework, the alignment between the definition of the processes and the supporting tools is improved, while the existing complexity is reduced when it comes to automating the quality evaluation of software processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号