首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
安全缺陷报告可以描述软件产品中的安全关键漏洞.为了消除软件产品的安全攻击风险,安全缺陷报告(security bug report, SBR)预测越来越受到研究人员的关注.但在实际软件开发场景中,需要进行软件安全漏洞预测的项目可能是来自新公司或属于新启动的项目,没有足够的已标记安全缺陷报告供在实践中构建此软件安全漏洞预测模型.一种简单的解决方案就是使用迁移模型,即利用其他项目已经标记过的数据来构建预测模型.受到该领域最近的两项研究工作的启发,以安全关键字过滤为思路提出一种融合知识图谱的跨项目安全缺陷报告预测方法KG-SBRP (knowledge graph of security bug report prediction).使用安全缺陷报告中的文本信息域结合CWE(common weakness enumeration)与CVE Details (common vulnerabilities and exposures)共同构建三元组规则实体,以三元组规则实体构建安全漏洞知识图谱,在图谱中结合实体及其关系识别安全缺陷报告.将数据分为训练集和测试集进行模型拟合和性能评估.所构建的模型...  相似文献   

2.
We propose a method for automatically identifying individual instances of English verb-particle constructions (VPCs) in raw text. Our method employs the RASP parser and analysis of the sentential context of each VPC candidate to differentiate VPCs from simple combinations of a verb and prepositional phrase. We show that our proposed method has an F-score of 0.974 at VPC identification over the Brown Corpus and Wall Street Journal.  相似文献   

3.
朱敏  毛莺池  程永  陈程军  王龙宝 《软件学报》2023,34(7):3226-3240
针对事件抽取存在未充分利用句法关系、论元角色缺失的情况,提出了基于双重注意力机制的事件抽取(event extraction based on dual attention mechanism,EEDAM)方法,有助于提高事件抽取的精确率和召回率.首先,基于4种嵌入向量进行句子编码,引入依赖关系,构建依赖关系图,使深度神经网络可以充分利用句法关系.然后,通过图转换注意网络生成新的依赖弧和聚合节点信息,捕获长程依赖关系和潜在交互,加权融合注意力网络,捕捉句中关键的语义信息,抽取句子级事件论元,提升模型预测能力.最后,利用关键句检测和相似性排序,进行文档级论元填充.实验结果表明,采用基于双重注意力机制的事件抽取方法,在ACE2005数据集上,较最佳基线联合多中文事件抽取器(joint multiple Chinese event extractor,JMCEE)在精确率、召回率和F1-score分别提高17.82%、4.61%、9.80%;在大坝安全运行日志数据集上,较最佳基线JMCEE在精确率、召回率和F1-score上分别提高18.08%、4.41%、9.93%.  相似文献   

4.
Shadows in high-spatial-resolution remote-sensing images become more pronounced. The detection of shadows is an essential requirement for both detailed high-spatial land-cover classification and applications such as three-dimensional (3D) reconstruction of buildings as well as cloud removal. This article presents a method for integrating the photochemical reflectance index (PRI) and Red Edge normalized difference vegetation index (RENDVI) for shadow identification (IPRSI) using high-spatial-resolution airborne hyperspectral data. This method detects shadows by setting thresholds to the PRI and RENDVI to separate shadows from vegetated and non-vegetated areas. The proposed method outperformed the invariant colour spaces model and the object-based method in terms of shadow extraction accuracy. The overall shadow identification accuracy of the IPRSI was 88.97% with an F-score of 90.96 (81.32% with F-score 81.97 for the invariant colour spaces model and 78.02% with F-score 82.07 for the object-based method). The IPRSI is a potential method with the wide application of hyperspectral data in high spatial resolution that is increasingly easier to be obtained with the development of remote-sensing platforms (such as unmanned aerial vehicles (UAVs), small satellites, and airships).  相似文献   

5.
A common practice in operational Machine Translation (MT) and Natural Language Processing (NLP) systems is to assume that a verb has a fixed number of senses and rely on a precompiled lexicon to achieve large coverage. This paper demonstrates that this assumption is too weak to cope with the similar problems of lexical divergences between languages and unexpected uses of words that give rise to cases outside of the pre-compiled lexicon coverage. We first examine the lexical divergences between English verbs and Chinese verbs. We then focus on a specific lexical selection problem—translating Englishchange-of-state verbs into Chinese verb compounds. We show that an accurate translation depends not only on information about the participants, but also on contextual information. Therefore, selectional restrictions on verb arguments lack the necessary power for accurate lexical selection. Second, we examine verb representation theories and practices in MT systems and show that under the fixed sense assumption, the existing representation schemes are not adequate for handling these lexical divergences and extending existing verb senses to unexpected usages. We then propose a method of verb representation based on conceptual lattices which allows the similarities among different verbs in different languages to be quantitatively measured. A prototype system UNICON implements this theory and performs more accurate MT lexical selection for our chosen set of verbs. An additional lexical module for UNICON is also provided that handles sense extension.  相似文献   

6.
As new high-throughput technologies have created an explosion of biomedical literature, there arises a pressing need for automatic information extraction from the literature bank. To this end, biomedical named entity recognition (NER) from natural language text is indispensable. Current NER approaches include: dictionary based, rule based, or machine learning based. Since, there is no consolidated nomenclature for most biomedical NEs, any NER system relying on limited dictionaries or rules does not seem to perform satisfactorily. In this paper, we consider a machine learning model, CRF, for the construction of our NER framework. CRF is a well-known model for solving other sequence tagging problems. In our framework, we do our best to utilize available resources including dictionaries, web corpora, and lexical analyzers, and represent them as linguistic features in the CRF model. In the experiment on the JNLPBA 2004 data, with minimal post-processing, our system achieves an F-score of 70.2%, which is better than most state-of-the-art systems. On the GENIA 3.02 corpus, our system achieves an F-score of 78.4% for protein names, which is 2.8% higher than the next-best system. In addition, we also examine the usefulness of each feature in our CRF model. Our experience could be valuable to other researchers working on machine learning based NER.  相似文献   

7.
This paper focuses on H filtering for linear time‐delay systems. A new Lyapunov–Krasovskii functional (LKF) is constructed by uniformly dividing the delay interval into two subintervals, and choosing different Lyapunov matrices on each subinterval. Based on this new LKF, a less conservative delay‐dependent bounded real lemma (BRL) is established to ensure that the resulting filtering error system is asymptotically stable with a prescribed H performance. Then, this new BRL is equivalently converted into a set of linear matrix inequalities, which guarantee the existence of a suitable H filter. Compared with some existing filtering results, some imposed constraints on the Lyapunov matrices are removed through derivation of the sufficient condition for the existence of the filter. Numerical examples show that the results obtained in this paper significantly improve the H performance of the filtering error system over some existing results in the literature. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

8.
目前商标分卡处理方法是先进行文本检测再进行区域分类, 最后对不同的区域进行拆分组合形成商标分卡. 这种分步式的处理耗时长, 并且因为误差的叠加会导致最终结果准确率下降. 针对这一问题, 本文提出了多任务的网络模型TextCls, 通过设计多任务学习模型来提升商标分卡的检测和分类模块的推理速度和精确率. 该模型包含一个特征提取网络, 以及文本检测和区域分类两个任务分支. 其中, 文本检测分支采用分割网络学习像素分类图, 然后使用像素聚合获得文本框, 像素分类图主要是学习文本像素和背景像素的信息; 区域分类分支对区域特征细分为中文、英文和图形, 着重学习不同类型区域的特征. 两个分支通过共享特征提取网络, 像素信息和区域特征相互促进学习, 最终两个任务的精确率得以提升. 为了弥补商标图像的文本检测数据集的缺失以及验证TextCls的有效性, 本文还收集并标注了一个由2000张商标图像构成的文本检测数据集trademark_text (https://github.com/kongbailongtian/trademark_text), 结果表明: 与最佳的文本检测算法相比, 本文的文本检测分支将精确率由94.44%提升至95.16%, 调和平均值F1 score达92.12%; 区域分类分支的F1 score也由97.09%提升至98.18%.  相似文献   

9.
肖升  何炎祥 《计算机科学》2012,39(5):161-164,176
为将动词与其论元间的约束规则应用于事件抽取,在事件模型中引入动词论元结构形成模型变体,围绕模型变体提出基于动词论元结构的中文事件抽取方法。此方法首先对待抽取文本进行预处理和句法分析,得出其语法结构;然后将所得结构与动词论元结构属性进行比较,找出每个动词支配的论元;最后利用论元的语义属性确定与之对应的事件特征并由此完成事件抽取。实验结果显示,此方法能有效提高抽取系统的性能和效率。  相似文献   

10.
白内障是一种主要导致视觉损伤的眼病. 早期干预和白内障手术是改善患者视力和生活质量的主要手段. 眼前节光学相干断层成像图像 (anterior segment optical coherence tomography, AS-OCT) 是一种新型眼科图像, 其具有非接触、高分辨率、检查快速等特点. 在临床上, 眼科医生已经逐渐采用AS-OCT图像进行眼科疾病如青光眼的诊断, 然而尚未有研究工作利用它进行皮质性白内障 (cortical cataract, CC) 自动分类. 为此, 提出了一个基于AS-OCT图像的自动皮质性白内障分类框架, 由图像预处理、特征提取、特征筛选和分类等4部分组成. 首先, 利用反光区域去除和对比度增强方法进行图像预处理; 紧接着使用灰度共生矩阵 (grey level co-occurrence matrix, GLCM)、灰度区域大小矩阵 (grey level size zone matrix, GLSZM) 和邻域灰度差矩阵 (neighborhood grey tone difference matrix, NGTDM) 方法从皮质区域提取了22个特征; 然后, 采用斯皮尔曼相关系数方法对提取的特征进行特征重要性分析并筛除冗余特征; 最后利用线性支持向量机方法进行分类. 在一个临床AS-OCT图像数据集上的实验结果表明, 所提出的皮质性白内障分类框架准确率、召回率、精确率和F1分别达到86.04%, 86.18%, 88.27%和86.35%, 取得与先进的深度学习算法接近的性能, 表明其具有作为辅助眼科医生进行皮质性白内障临床诊断工具的潜力.  相似文献   

11.
王青松  魏如玉 《计算机科学》2016,43(4):256-259, 269
朴素贝叶斯算法在垃圾邮件过滤领域得到了广泛应用,该算法中,特征提取是一个必不可少的环节。过去针对中文的垃圾邮件过滤方法都以词作为文本的特征项单位进行提取,面对大规模的邮件训练样本,这种算法的时间效率会成为邮件过滤技术中的一个瓶颈。对此,提出一种基于短语的贝叶斯中文垃圾邮件过滤方法,在特征项提取阶段结合文本分类领域提出的新的短语分析方法,按照基本名词短语、基本动词短语、基本语义分析规则,以短语为单位进行提取。通过分别以词和短语为单位进行垃圾邮件过滤的对比测试实验证实了所提出方法的有效性。  相似文献   

12.
一种高性能的两类中文文本分类方法   总被引:35,自引:0,他引:35  
提出了一种高性能的两类中文文本分类方法.该方法采用两步分类策略:第1步以词性为动词、名词、形容词或副词的词语作为特征,以改进的互信息公式来选择特征,以朴素贝叶斯分类器进行分类.利用文本特征估算文本属于两种类型的测度X和Y,构造二维文本空间,将文本映射为二维空间中的一个点,将分类器看作是在二维空间中寻求一条分割直线.根据文本点到分割直线的距离将二维空间分为可靠和不可靠两部分,以此评估第1步分类结果,若第1步分类可靠,做出分类决策;否则进行第2步.第2步将文本看作由词性为动词或名词的词语构成的序列,以该序列中相邻两个词语构成的二元词语串作为特征,以改进互信息公式来选择特征,以朴素贝叶斯分类器进行分类.在由12600篇文本构成的数据集上运行的实验表明,两步文本分类方法达到了较高的分类性能,精确率、召回率和F1值分别为97.19%,93.94%和95.54%.  相似文献   

13.
Abstract

Given a non-empty set S and a system F of functions defined on S, the object of search is to detect an unknown element x in S, by observing the values of a sequence of test-functions f in F at x till there is enough information to detect x. The basic concepts and definitions of search theory were developed by Rényi [16, 17]. He defined various types of homogeneities for the system F and discussed the duration of the process of random search.

Strongly homogeneous systems of order two in two symbols are found to be closely related to the incidence matrix of equireplicated pairwise balanced designs. Some new systems of this type have been constructed here by using the incidence of subsets of points and m-flats of finite geometries. Properties of weakly homogeneous binary systems of order two have been studied, and some new systems constructed using association matrices and Hadamard matrices. Two series of search systems possessing an optimality property have been also constructed.  相似文献   

14.
ABSTRACT

Recently, precise and deterministic feature extraction is one of the current research topics for bearing fault diagnosis. For this aim, an experimental bearing test setup was created in this study. In this setup, vibration signals were obtained from the bearings on which artificial faults were generated in specific sizes. A new feature extraction method based on co-occurrence matrices for bearing vibration signals was proposed instead of the conventional feature extraction methods, as in the literature. The One (1) Dimensional–Local Binary Patterns (1D-LBP) method was first applied to bearing vibration signals, and a new signal whose values ranged between 0–255 was obtained. Then, co-occurrence matrices were obtained from these signals. The correlation, energy, homogeneity, and contrast features were extracted from these matrices. Different machine learning methods were employed with these features to carry out the classification process. Three different data sets were used to test the proposed approach. As a result of analysing the signals with the proposed model, the success rate is 87.50% for dataset1 (different speed), 96.5% for dataset2 (fault size (mm)) and 99.30% for dataset3 (fault type – inner ring, outer ring, ball) was found, respectively.  相似文献   

15.
16.
本体作为一种概念模型建模工具,被应用到计算机的各个领域,用来信息组织和知识管理。本体扩展是一种将新概念以及概念间的关系添加到已有本体的合适位置,以扩大本体为目的的方法。提出一种基于词间语义关联性从文本中扩展本体的方法,该方法主要利用共现分析、词过滤技术和词间语义关联性从文本中发现潜在的概念,作为待扩展概念,并使用扩展规则、包含分析等关系识别技术将概念添加到已有本体中。以电子政务领域的教育子领域为例,使用该方法扩展了一个教育领域的领域本体,结果表明该方法扩展的本体比较合理,具备较强的应用能力。  相似文献   

17.
王庆  陈泽亚  郭静  陈晰  王晶华 《计算机应用》2015,35(6):1649-1653
针对专业领域中科技项目的关键词提取和项目词库建立的问题,提出了一种基于语义关系、利用共现矩阵建立项目关键词词库的方法。该方法在传统的基于共现矩阵提取关键词研究的基础上,综合考虑了关键词在文章中的位置、词性以及逆向文件频率(IDF)等因素,对传统算法进行改进。另外,给出一种利用共现矩阵建立关键词关联网络,并通过计算与语义基向量相似度识别热点关键词的方法。使用882篇电力项目数据进行仿真实验,实验结果表明改进后的方法能够有效对科技项目进行关键词提取,建立关键词关联网络,并在准确率、召回率以及平衡F分数(F1-score)等指标上明显优于基于多特征融合的中文文本关键词提取方法。  相似文献   

18.
This paper is concerned with an approach to exploiting information available from the co-occurrence matrices computed for different distance parameter values. A polynomial of degree n is fitted to each of 14 Haralick's coefficients computed from the average co-occurrence matrices evaluated for several distance parameter values. Parameters of the polynomials constitute a set of new features. The experimental investigations performed substantiated the usefulness of the approach.  相似文献   

19.
A new design of robust filters for uncertain systems   总被引:1,自引:0,他引:1  
In this paper, a structured polynomial parameter-dependent approach is proposed for robust H2 filtering of linear uncertain systems. Given a stable system with parameter uncertainties residing in a polytope with s vertices, the focus is on designing a robust filter such that the filtering error system is robustly asymptotically stable and has a guaranteed estimation error variance for the entire uncertainty domain. A new polynomial parameter-dependent idea is introduced to solve the robust H2 filtering problem, which is different from the quadratic framework that entails fixed matrices for the entire uncertainty domain, or the linearly parameter-dependent framework that uses linear convex combinations of s matrices. This idea is realized by carefully selecting the structure of the matrices involved in the products with system matrices. Linear matrix inequality (LMI) conditions are obtained for the existence of admissible filters and based on these, the filter design is cast into a convex optimization problem, which can be readily solved via standard numerical software. Both continuous and discrete-time cases are considered. The merit of the methods presented in this paper lies in their less conservatism than the existing robust filter design methods, as shown both theoretically and through extensive numerical examples.  相似文献   

20.
过滤模板的生成是网络信息过滤中一个至关重要的问题。针对模板生成中的非线性特征,借鉴遗传算法可以在全局范围内寻找最优解的特性,引入遗传算法解决文本信息过滤问题,并应用基于集合论的方法证明其理论可行性。在实际应用中,应用遗传算法生成模板进行了文本分类和文本过滤试验,并根据应用过程中存在的问题提出了遗传算子的自适应策略。理论证明以及实验结果都表明,该方法具有可行性,能够在信息过滤中取得较好的结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号