首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
The quantitative structure–property relationship (QSPR) is a fundamental technique for evaluating and screening potentially valuable molecules in the field of drug discovery. There is an urgent need to speed up pharmaceutical research and development and a huge chemical space to explore, which necessitate effective and precise computer-aided QSPR modeling methods. Previous studies with various deep learning models are limited because they are trained on separate small datasets, known as the small-sample problem. Using transfer learning, this article describes a sparse sharing method that uses advanced graph-based models to construct an efficient and reasonable multitask learning workflow for QSPR prediction. The proposed workflow is systematically and comprehensively tested with four benchmark datasets containing different targets, and several precisely predicted molecular examples are illustrated. The results demonstrate that an obvious improvement in the prediction of molecular properties is achieved, along with the ability to predict multiple properties simultaneously.  相似文献   

2.
3.
The prediction of drug–target affinity (DTA) is a crucial step for drug screening and discovery. In this study, a new graph-based prediction model named SAG-DTA (self-attention graph drug–target affinity) was implemented. Unlike previous graph-based methods, the proposed model utilized self-attention mechanisms on the drug molecular graph to obtain effective representations of drugs for DTA prediction. Features of each atom node in the molecular graph were weighted using an attention score before being aggregated as molecule representation. Various self-attention scoring methods were compared in this study. In addition, two pooing architectures, namely, global and hierarchical architectures, were presented and evaluated on benchmark datasets. Results of comparative experiments on both regression and binary classification tasks showed that SAG-DTA was superior to previous sequence-based or other graph-based methods and exhibited good generalization ability.  相似文献   

4.
The proposition of non-fullerene acceptors (NFAs) in organic solar cells has made great progress in the raise of power conversion efficiency, and it also broadens the ways for searching and designing new acceptor molecules. In this work, the design of novel NFAs with required properties is performed with the conditional generative model constructed from a convolutional neural network (CNN). The temporal CNN is firstly trained to be a good string-based molecular conditional generative model to directly generate the desired molecules. The reliability of generated molecular properties is then demonstrated by a graph-based prediction model and evaluated with quantum chemical calculations. Specifically, the global attention mechanism is incorporated in the prediction model to pool the extracted information of molecular structures and provide interpretability. By combining the generative and prediction models, thousands of NFAs with required frontier molecular orbital energies are generated. The generated new molecules essentially explore the chemical space and enrich the database of transformation rules for molecular design. The conditional generation model can also be trained to generate the molecules from molecular fragments, and the contribution of molecular fragments to the properties is subsequently predicted by the prediction model.  相似文献   

5.
分子性质预测模型是针对特定应用需求筛选设计化学品的有力工具,然而诸多相关建模过程中的测试集划分、交叉验证、算法选择等关键环节普遍存在严谨性不足的问题,模型真实预测性能难以保证。以基团贡献法预测离子液体密度为例,探讨了分子性质预测模型建模过程中数据集划分和交叉验证的重要性,提出了自动基团划分方法并研究了数据集中基团涉及分子个数对预测精度的影响。通过对比五种回归算法(多重线性回归、岭回归、随机森林、支持向量机、神经网络),基于岭回归的基团贡献模型预测性能最佳,在由1078种离子液体、共计23034个数据点组成的数据集上得到的平均相对误差为1.88%。  相似文献   

6.
Machine learning (ML) models are valuable research tools for making accurate predictions. However, ML models often unreliably extrapolate outside their training data. The multiparameter delta method quantifies uncertainty for ML models (and generally for other nonlinear models) with parameters trained by least squares regression. The uncertainty measure requires the gradient of the model prediction and the Hessian of the loss function, both with respect to model parameters. Both the gradient and Hessian can be readily obtained from most ML software frameworks by automatic differentiation. We show examples of the uncertainty method in applications of molecular simulations and neural networks. We further show that the uncertainty measure is larger for input space regions that are not part of the training data. Therefore, this method can be used to identify extrapolation and to aid in selecting training data or assessing model reliability.  相似文献   

7.
任嘉辉  刘豫  刘朝  刘浪  李莹 《化工学报》2022,73(4):1493-1500
临界温度是一种非常关键的热物理性质,对其进行理论预测一直是热物性研究的热点。然而,早期预测模型往往不能有效区分工质同分异构体。本文借助机器学习算法,采用“分子指纹+拓扑指数”的新型分子结构描述方法表达工质的分子结构并建立临界温度模型,在测试集预测中的绝对平均偏差为3.99%,表明本文模型具有良好的预测能力。本文模型与文献对比的结果表明,新模型不仅可以有效区分工质同分异构体,在计算精度方面也超越了现有其他模型。  相似文献   

8.
In internal rubber‐mixing processes, data‐driven soft sensors have become increasingly important for providing online measurements for the Mooney viscosity information. Nevertheless, the prediction uncertainty of the model has rarely been explored. Additionally, traditional viscosity prediction models are based on single models and, thus, may not be appropriate for complex processes with multiple recipes and shifting operating conditions. To address both problems simultaneously, we propose a new ensemble Gaussian process regression (EGPR)‐based modeling method. First, several local Gaussian process regression (GPR) models were built with the training samples in each subclass. Then, the prediction uncertainty was adopted to evaluate the probabilistic relationship between the new test sample and several local GPR models. Moreover, the prediction value and the prediction variance was generated automatically with Bayesian inference. The prediction results in an industrial rubber‐mixing process show the superiority of EGPR in terms of prediction accuracy and reliability. © 2014 Wiley Periodicals, Inc. J. Appl. Polym. Sci. 2015 , 132, 41432.  相似文献   

9.
10.
This article outlines advances in molecular modeling and simulation using massively parallel high‐performance computers (HPC). In the SkaSim project, partners from the HPC community collaborated with users from science and industry. The aim was to optimize the prediction of thermodynamic property data in terms of efficiency, quality and reliability using HPC methods. In this context, various topics were dealt with: atomistic simulation of homogeneous gas bubble formation, surface tension of classical fluids and ionic liquids, multicriteria optimization of molecular models, the development of the molecular simulation codes ls1 mardyn and ms2, atomistic simulation of gas separation processes, molecular membrane structure generators, transport resistors and the evaluation of predictive property data models based on specific mixture types.  相似文献   

11.
Uncertainty quantification plays a significant role in establishing reliability of mathematical models, while applying to process optimization or technology feasibility studies. Uncertainties, in general, could occur either in mathematical model or in model parameters. In this work, process of CO2 adsorption on amine sorbents, which are loaded in hollow fibers is studied to quantify the impact of uncertainties in the adsorption isotherm parameters on the model prediction. The process design variable that is most closely related to the process economics is the CO2 sorption capacity, whose uncertainty is investigated. We apply Bayesian analysis and determine a utility function surface corresponding to the value of information gained by the respective experimental design point. It is demonstrated that performing an experiment at a condition with a higher utility has a higher reduction of design variable prediction uncertainty compared to choosing a design point at a lower utility.  相似文献   

12.
13.
定量结构-性质相关性(QSPR)研究将有机物结构特征表征方法和各种统计建模工具相结合,研究有机物结构与其各种性质之间的内在关系.它不仅可以揭示物质性质与分子结构之间的定量函数关系,而且为工程上提供预测有机物性质的有效方法,因此在众多领域得到了广泛的应用.阐述了QSPR研究基本原理,论述了其在闪点、自燃点、爆炸极限等化学物质燃烧特性预测中的应用和进展,并对各性质的不同预测模型进行了比较,分析其优缺点及适用范围.对实验样本设计、分子结构表征及建模方法选择等的研究现状和发展趋势进行了讨论,提出了QSPR在安全科学研究中的应用前景和发展方向.  相似文献   

14.
蔡涛  杨博  李宏光 《化工学报》2020,71(3):1095-1102
模糊认知图(fuzzy cognitive maps, FCM)作为一种复杂系统的建模工具,能够对系统的非线性和不确定性进行处理。由于工业过程变量间往往存在着时间延迟,传统的FCM模型难以处理这类多变量的时间序列数据,建立的预测模型往往不能反映系统内各变量真实的因果关系,从而导致预测结果的解释性差、准确度低等问题。为此,提出了一种时延挖掘模糊时间认知图(time-delay-mining fuzzy time cognitive maps, TM-FTCM),它使用互相关函数(cross-correlation function,CCF)从数据中挖掘时延信息,并通过在推理机制中添加自我影响因子和偏置及优化转换函数等参数,有效地解决了由于工业过程变量间的时延导致的预测模型不准确等问题。通过数值仿真实例及实际化工过程数据,验证了所提方法的有效性。  相似文献   

15.
16.
Prediction of Timber Kiln Drying Rates by Neural Networks   总被引:1,自引:0,他引:1  
The purpose of this exploratory work was to apply artificial neural network (ANN) modeling to the prediction of timber kiln drying rates based on species and basic density information for the hem-fir mix that grows along the local coastal areas. The ANN models with three inputs (initial moisture content, basic density, and drying time) were developed to predict one output, namely, average final moisture content. The back-propagation algorithm, the most common neural network learning method, was implemented for testing, training, and validation. Optimal configuration of the network model was obtained by varying its main parameters, such as transfer function, learning rule, number of neurons and layers, and learning runs. Accurate prediction of the experimental drying rate data by the ANN model was achieved with a mean absolute relative error less than 2%, thus supporting the powerful predictive capacity of this modeling method.  相似文献   

17.
18.
The purpose of this exploratory work was to apply artificial neural network (ANN) modeling to the prediction of timber kiln drying rates based on species and basic density information for the hem-fir mix that grows along the local coastal areas. The ANN models with three inputs (initial moisture content, basic density, and drying time) were developed to predict one output, namely, average final moisture content. The back-propagation algorithm, the most common neural network learning method, was implemented for testing, training, and validation. Optimal configuration of the network model was obtained by varying its main parameters, such as transfer function, learning rule, number of neurons and layers, and learning runs. Accurate prediction of the experimental drying rate data by the ANN model was achieved with a mean absolute relative error less than 2%, thus supporting the powerful predictive capacity of this modeling method.  相似文献   

19.
With the advent of powerful computer simulation techniques, it is time to move from the widely used knowledge-guided empirical methods to approaches driven by data science, mainly machine learning algorithms. We investigated the predictive performance of three machine learning algorithms for six different glass properties. For such, we used an extensive dataset of about 150,000 oxide glasses, which was segmented into smaller datasets for each property investigated. Using the decision tree induction, k-nearest neighbors, and random forest algorithms, selected from a previous study of six algorithms, we induced predictive models for glass transition temperature, liquidus temperature, elastic modulus, thermal expansion coefficient, refractive index, and Abbe number. Moreover, each model was induced with default and tuned hyperparameter values. We demonstrate that, apart from the elastic modulus (which had the smallest training dataset), the induced predictive models for the other five properties yield a comparable uncertainty to the usual data spread. However, for glasses with extremely low or high values of these properties, the prediction uncertainty is significantly higher. Finally, as expected, glasses containing chemical elements that are poorly represented in the training set yielded higher prediction errors. The method developed here calls attention to the success and possible pitfalls of machine learning algorithms. The analysis of the SHAP values indicated the key elements that increase or decrease the value of the modeled properties. It also estimated the maximum possible increase or decrease. Insights gained by this analysis can help empirical compositional tuning and computer-aided inverse design of glass formulations.  相似文献   

20.
For online melt index prediction in multiple‐grade polyethylene polymerization processes, using only a fixed model is insufficient. Additionally, without enough process knowledge, it is difficult to select suitable input variables to accurately construct prediction models. A novel manifold learning based local probabilistic modeling method named ensemble just‐in‐time Gaussian process regression (EJGPR) is developed. By utilizing output variables, an optimization framework is proposed to preserve the local structure of both input and output variables. Then the output information is integrated into construction of a JGPR‐based local model. Additionally, some new extracted variables in the projection space can be obtained. Moreover, using the probabilistic prediction information, the uncertainty of each JGPR‐based local candidate model can be simply described. Consequently, using an efficient ensemble strategy, a more accurate EJGPR prediction model can be constructed online. The melt index prediction results in an industrial polyethylene process show it has better performance than conventional methods. © 2017 Wiley Periodicals, Inc. J. Appl. Polym. Sci. 2017 , 134, 45094.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号