首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 625 毫秒
1.
细胞色素P4502C9(cytochrome P4502C9,CYP2C9)是肝脏重要的一种异物质代谢酶,许多药物或化学物质均可抑制和干扰其活性,在某种药物发现早期,预测基于CYP2C9抑制的药-药相互作用对筛选及发现新药具有重要意义。本文旨在建立CYP2C9抑制剂的预测模型,并确定抑制剂和非抑制剂显著不同的参数。选择81个化合物作为数据集,随机选其中64个为训练集,其余为验证集;选取250个分子参数给化合物数字化。采用逐步判别分析法(stepwise discriminant analysis method)和K-均值聚类分析法(K-Means cluster analysis method)模拟,建立数学模型,并用验证集检验模型的预测能力。结果表明:训练集的抑制剂正确率为96.4%,非抑制剂为97.2%;验证集的抑制剂正确率为85.7%,非抑制剂为90.0%。而采用K-均值聚类法时,抑制剂和非抑制剂的正确率也分别达到了82.9%和86.9%。对结果的深入分析找出对该模型贡献较大的参数为分子中氨基、烯基基团电拓扑状态指数、碳环数量以及疏水性参数,那些参数对区分抑制剂和非抑制剂两种结构差异、帮助指导CYP2C9抑制剂的筛选和发现具有重要意义。  相似文献   

2.
目的在于定量预测雄激素受体干扰物活性,并确定最佳建模方法。选择150个分子作为数据集,随机选38个分子作为检验集,其它分子为训练集。每个化合物分子计算了193个分子参数。通过采用多元线性回归和主成分回归等方法,建立数学模型,并用验证集检验了所建模型的预测能力。结果发现逐步筛选法和主成分分析方法所建模型都表现出较强的预测能力(应用于检验集的相关系数分别为R=0.61,R=0.52)。以上研究将有助于新药雄激素受体抑制剂的筛选和开发。  相似文献   

3.
环境化合物对鱼类毒性的定量构效关系研究   总被引:2,自引:2,他引:0  
本研究基于定量构效关系方法预测环境化合物对鱼类的毒害(50%Lethal Concentration,LC50),并确定影响毒性关键分子的结构特征及几种模拟方法的比较.构建114个化学分子的数据集,随机选取85个75%分子为训练集,剩下的29个分子作为检验集,每个化学分子计算194个分子参数,分别采用逐步多元线性回归分析法(multiple linear regression,MLR)、主成分回归法(Principal Component Regression,PCA)和偏最小二乘法(Partial Least Square,PLS)构建定量结构-毒害关系(Quantitativestructure-activity relationships,QSTB)模型.用逐步多元线性回归分析法得出的训练集和预测集的实验值-logLC50与预测值-logLC50的相关系数分别为R2tr=0.86,R2te=O.83,说明该模型可靠性和鲁棒性较高;主成分回归法用8个主成分,其训练集和预测集的实验-logLC50与预测-logLC50的R2tr=0.81,R2te=O.77;偏最小二乘法用了5个潜变量,其训练集和预测集的实验-logLC50与预测-logLC50的R2tr=0.88,R2te=0.85.MLR方法得出化合物对鱼类的毒害影响较大的分子参数,主要分属电拓扑状态参数(SssO,SsCl,SdCH2,SsNH2)、分子连接指数(Xvo)以及修正Kappa指数(Ka2).以上研究对预测环境化合物的鱼类毒害(LC50),以及从机理上加深对有机物的毒性作用机理提供重要价值.  相似文献   

4.
建立预测类黄酮化合物抑制恶性疟原虫株活性定量的模型,并确定影响类黄酮化合物活性的主要因素。本文选用了38个结构不同的类黄酮化合物作为数据集,采用多元线性同归法及主成分分析法分析每个化合物的220个分子参数,建立最优的预测模型。比较用不同方法建立的模型,结果发现带logP参数的向后筛选法为最优方法,所建模型统计结果良好(训练集相关系数R~2=0.81,标准训练误差SEE=0.27),模型代入检验集数据时结果也令人满意(检验集相关系数R~2=0.83,标准检验误差SEP=0.39),可靠性和预测性较强。脂水分配系数的对数logP为模型重要影响参数。建模和确定影响因素有助于筛选新型类黄酮抗疟疾药物和研发。  相似文献   

5.
基于支持向量学习机预测药物透血脑屏障的活性   总被引:1,自引:1,他引:0  
为了预测药物透血脑屏障的活性,计算表征分子组成和拓扑等特征的87个分子描述符,经遗传算法筛选,参与建立基于支持向量学习机(SVM)的药物透血脑屏障活性分类模型.在模型训练中用网格搜索法确定核函数的两个重要参数C和γ,同时用5重交叉验证模型,结果证明模型预测能力较高,交叉验证的预测正确率达85.6%.  相似文献   

6.
紫杉醇类似物定量构效关系(QSAR)的研究   总被引:1,自引:0,他引:1  
紫杉醇是从紫杉或称红豆杉中提取的1种天然抗癌物质,具有独特的抗癌机理。由于紫杉醇的种种限制,开发具有更高抗癌活性的紫杉醇类似物药物具有广阔的前景。本文选用36个结构多样的紫杉醇类似物分子作为数据集,随机选取其中28个作为训练集,其它为检验集,采用多元线性回归(MLR)法及主成分回归分析(PCA)法分析每个化合物的197个分子参数,分别建立定量构效关系的最优预测模型。并用检验集检验所建模型的预测能力。结果表明:多元线性回归分析法所建模型与主成分回归所建模型相比,发现逐步筛选法为最优建模方法。该方法所建模型统计结果良好(R~2=0.846,SEE=1.060),应用于检验集时,结果也比较满意(R~2=0.841,SEP=1.071),模型的可靠性和预测性较强。建模和确定主要影响因素有助于指导筛选和研发新型类紫杉醇药物。  相似文献   

7.
【目的】为了管理化学物质的使用,需要用已知化合物的毒性来预测未知化合物的毒性。【方法】采用定量构效关系(QSAR)方法预测一系列环境化合物对大型蚤类的毒害(50%Lethal Concentration,LC_(50)),确定影响毒性关键分子结构的特征比较几种模拟方法的优劣。将323个有机物分子作为数据集,随机选取其中81个分子作为测试集,其余为训练集,每个分子计算了196个参数。【结果】分别采用逐步多元线性回归分析法(R_(tr)~2=0.661,R_(te)~2=0.612)、主成分回归法(R_(tr)~2=0.590,R_(te)~2=0.577)和偏最小二乘法(R_(tr)~2=0.788,R_(te)~2=0.607)构建QSAR模型。这3种模型都表明分子量参数(M_W)对化合物的毒性影响较大。【结论】借助优质的QSAR模型方法预测和比较该类化合物的毒性情况,对水环境监测具有重要意义。  相似文献   

8.
采用傅里叶变换红外光谱,测定了45个来自青海省不同产地的枸杞样品的红外光谱。小波变换对红外光谱原始数据进行了预处理。红外光谱数据压缩到原来的1/8,其分析精度与原始光谱数据基本相当。将45个样本数据分为30个训练集和15个测试集,建立随机森林(RF)预测枸杞产地模型,使用内部交叉验证和外部数据进行验证。采用R语言实现随机森林算法,并对模型的参数进行了优化。结果,所建立的判别模型中训练样本判别正确率为100%,测试样本判别正确率为100%。研究结果表明,建立的模型能够正确地对枸杞样品快速地进行产地鉴别,红外光谱法结合随机森林可作为中药材产域分类鉴别的一种新的现代化方法。  相似文献   

9.
基于近红外光谱的水蜜桃采摘期的鉴别方法   总被引:1,自引:0,他引:1  
提出了一种利用近红外漫反射光谱技术结合光纤传感技术建立水蜜桃采摘期的鉴别方法.从无锡阳山镇的某大棚采摘了距最佳采摘期天数为3,2,1以及处于最佳采摘期的水蜜桃各48个,用近红外光谱仪对样品进行了光谱采集.对原始光谱进行平滑、一阶微分和多元散射校正预处理,采用主成分分析(PCA)结合偏最小二乘(PLS)法建立了水蜜桃采摘期的鉴别模型.研究显示:一阶微分和平滑组合预处理后的鉴别模型效果最好,校正集模型和预测集模型的决定系数分别为0.9279和0.9138;模型的内部交叉验证均方差(RMSECV)和预测均方根偏差(RMSEP)分别为0.3003和0.3349;水蜜桃样品校正集和预测集的鉴别正确率分别为95.13%和93.75%.结果表明:利用近红外漫反射光谱技术对水蜜桃采摘期的鉴别具有很好的应用前景.  相似文献   

10.
以E-Dragon软件计算的拓扑指数和连接性指数作为变量,随机将209种多氯联苯化合物(PCBs)样本数据划分为训练集、验证集和预测集,采用微粒群-v-支持向量机(PSO-v-SVM)对其色谱保留指数建立QSPR模型,选定的最佳模型入选变量仅5个,对训练集、验证集和预测集计算结果的R2分别为0.999、0.998和0.999,预测的准确性很高.本文选定的模型较文献[16-19]的计算结果好,预测结果更可靠.  相似文献   

11.
董林  舒红  李莎 《计算机应用研究》2013,30(8):2330-2333
为简化空间频繁模式挖掘的预处理步骤并提高挖掘效率, 提出一种可以直接以空间矢量和栅格图层作为输入的挖掘算法FISA(fast intersect spatial Apriori)。该算法利用图层求交和面积计算操作实现谓词集支持度计数进而实现频繁谓词集和关联规则挖掘。相对于基于事务空间关联规则挖掘算法, FISA不需要预先进行空间数据事务化处理, 并且所得结果均有对应图层, 便于实现结果的可视化; 相对于其他基于空间分析的挖掘算法, FISA支持空间数据的矢量和栅格格式, 且引入了快速求交方法以保证其可伸缩性。实验结果表明该算法可以直接从空间数据中高效正确地挖掘出频繁模式。  相似文献   

12.
13.
We perform a systematic analysis of the effectiveness of features for the problem of predicting the quality of machine translation (MT) at the sentence level. Starting from a comprehensive feature set, we apply a technique based on Gaussian processes, a Bayesian non-linear learning method, to automatically identify features leading to accurate model performance. We consider application to several datasets across different language pairs and text domains, with translations produced by various MT systems and scored for quality according to different evaluation criteria. We show that selecting features with this technique leads to significantly better performance in most datasets, as compared to using the complete feature sets or a state-of-the-art feature selection approach. In addition, we identify a small set of features which seem to perform well across most datasets.  相似文献   

14.
Classification of very-high-dimensional images is of the utmost interest in remote sensing applications. Storage space, and mainly the computational effort required for classifying these kinds of images, are the main drawbacks in practice. Moreover, it is well known that a number of spectral classifiers may not be useful (even not valid) in practice for classifying very-high-dimensional images. Even if they are valid, they do not provide high-accuracy classifications when the training sets are high-overlapping in the representation space due to the shape of the decision boundaries they impose. In these cases, it is preferable to adopt a classifier that may adjust the decision boundaries in a better fashion. To do so, classification based on regularized discriminant analysis (RDA) was compared with a number of non-parametric classifiers. Two synthetic image databases consisting of high-dimensional images were used for testing the performance of the classifiers. These datasets were created using a procedure proposed by the authors. The main conclusion of this paper is that RDA may be used successfully for classifying very-high-dimensional images with high-overlapping training sets. RDA also provides an excellent classification accuracy for classifying real datasets in which training sets are high-overlapping in the representation space.  相似文献   

15.
One of the major challenges in data mining is the extraction of comprehensible knowledge from recorded data. In this paper, a coevolutionary-based classification technique, namely COevolutionary Rule Extractor (CORE), is proposed to discover classification rules in data mining. Unlike existing approaches where candidate rules and rule sets are evolved at different stages in the classification process, the proposed CORE coevolves rules and rule sets concurrently in two cooperative populations to confine the search space and to produce good rule sets that are comprehensive. The proposed coevolutionary classification technique is extensively validated upon seven datasets obtained from the University of California, Irvine (UCI) machine learning repository, which are representative artificial and real-world data from various domains. Comparison results show that the proposed CORE produces comprehensive and good classification rules for most datasets, which are competitive as compared with existing classifiers in literature. Simulation results obtained from box plots also unveil that CORE is relatively robust and invariant to random partition of datasets.  相似文献   

16.
17.
18.
Anthrax is a highly lethal, acute infectious disease caused by the rod-shaped, Gram-positive bacterium Bacillus anthracis. The anthrax toxin lethal factor (LF), a zinc metalloprotease secreted by the bacilli, plays a key role in anthrax pathogenesis and is chiefly responsible for anthrax-related toxemia and host death, partly via inactivation of mitogen-activated protein kinase kinase (MAPKK) enzymes and consequent disruption of key cellular signaling pathways. Antibiotics such as fluoroquinolones are capable of clearing the bacilli but have no effect on LF-mediated toxemia; LF itself therefore remains the preferred target for toxin inactivation. However, currently no LF inhibitor is available on the market as a therapeutic, partly due to the insufficiency of existing LF inhibitor scaffolds in terms of efficacy, selectivity, and toxicity. In the current work, we present novel support vector machine (SVM) models with high prediction accuracy that are designed to rapidly identify potential novel, structurally diverse LF inhibitor chemical matter from compound libraries. These SVM models were trained and validated using 508 compounds with published LF biological activity data and 847 inactive compounds deposited in the Pub Chem BioAssay database. One model, M1, demonstrated particularly favorable selectivity toward highly active compounds by correctly predicting 39 (95.12%) out of 41 nanomolar-level LF inhibitors, 46 (93.88%) out of 49 inactives, and 844 (99.65%) out of 847 Pub Chem inactives in external, unbiased test sets. These models are expected to facilitate the prediction of LF inhibitory activity for existing molecules, as well as identification of novel potential LF inhibitors from large datasets.  相似文献   

19.
对复杂网络中节点的3种暂态中心性进行了预测研究。通过在真实数据集中分析节点不同时刻的暂态中心性值发现,不同时刻节点的暂态中心性具有很强的相关性。基于此,提出几种预测方法对真实数据集中节点未来的暂态中心性值进行预测。通过对真实值与预测值进行误差分析,比较了不同预测方法在不同真实数据中的预测性能。结果表明,在MIT数据集中,最近时窗加权平均方法的性能最好;在Infocom 06数据集中,最近时窗平均方法的性能最好。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号