首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
苏珊  张杨  张冬雯 《计算机应用》2022,42(6):1702-1707
基于启发式和机器学习的代码坏味检测方法已被证明具有一定的局限性,且现有的检测方法大多集中在较为常见的代码坏味上。针对这些问题,提出了一种深度学习方法来检测过紧的耦合、分散的耦合和散弹式修改这三种与耦合度相关检测较为少见的代码坏味。首先,提取三种代码坏味需要的度量并对得到的数据进行处理;之后,构建卷积神经网络(CNN)与注意力(Attention)机制相结合的深度学习模型,引入的注意力机制可以对输入的度量特征进行权重的分配。从21个开源项目中提取数据集,在10个开源项目中对检测方法进行了验证,并与CNN模型进行对比。实验结果表明:过紧的耦合和分散的耦合在所提模型中取得了更好的结果,相应代码坏味的查准率分别达到了93.61%和99.76%;而散弹式修改在CNN模型中有更好的结果,相应代码坏味查准率达到了98.59%。  相似文献   

2.
3.
Attributing authorship of documents with unknown creators has been studied extensively for natural language text such as essays and literature, but less so for non‐natural languages such as computer source code. Previous attempts at attributing authorship of source code can be categorised by two attributes: the software features used for the classification, either strings of n tokens/bytes (n‐grams) or software metrics; and the classification technique that exploits those features, either information retrieval ranking or machine learning. The results of existing studies, however, are not directly comparable as all use different test beds and evaluation methodologies, making it difficult to assess which approach is superior. This paper summarises all previous techniques to source code authorship attribution, implements feature sets that are motivated by the literature, and applies information retrieval ranking methods or machine classifiers for each approach. Importantly, all approaches are tested on identical collections from varying programming languages and author types. Our conclusions are as follows: (i) ranking and machine classifier approaches are around 90% and 85% accurate, respectively, for a one‐in‐10 classification problem; (ii) the byte‐level n‐gram approach is best used with different parameters to those previously published; (iii) neural networks and support vector machines were found to be the most accurate machine classifiers of the eight evaluated; (iv) use of n‐gram features in combination with machine classifiers shows promise, but there are scalability problems that still must be overcome; and (v) approaches based on information retrieval techniques are currently more accurate than approaches based on machine learning. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
Neural Computing and Applications - Stance detection is an evolving opinion mining research area motivated by the vast increase in the variety and volume of user-generated content. In this regard,...  相似文献   

5.
Nowadays, smartphone devices are an integral part of our lives since they enable us to access a large variety of services from personal to banking. The worldwide popularity and adoption of smartphone devices continue to approach the capabilities of traditional computing environments. The computer malware like botnets is becoming an emerging threat to users and network operators, especially on popular platform such as android. Due to the rapid growth of botnet applications, there is a pressing need to develop an effective solution to detect them. Most of the existing detection techniques can detect only malicious android applications, but it cannot detect android botnet applications. In this paper, we propose a structural analysis-based learning framework, which adopts machine learning techniques to classify botnets and benign applications using the botnet characteristics-related unique patterns of requested permissions and used features. The experimental evaluation based on real-world benchmark datasets shows that the selected patterns can achieve high detection accuracy with low false positive rate. The experimental and statistical tests show that the support vector machine classifier performs well compared to other classification algorithms.  相似文献   

6.
7.
The growing complexity of new features in multicore processors imposes significant pressure towards functional verification. Although a large amount of time and effort are spent on it, functional design bugs escape into the products and cause catastrophic effects. Hence, online design bug detection is needed to detect the functional bugs in the field. In this work, we propose a novel approach by leveraging Performance Monitoring Counters (PMC) and machine learning to detect and locate pipeline bugs in a processor. We establish the correlation between PMC events and pipeline bugs in order to extract the features to build and train machine learning models. We design and implement a synthetic bug injection framework to obtain datasets for our simulation. To evaluate the proposal, Multi2Sim simulator is used to simulate the x86 architecture model. An x86 fault model is developed to synthetically inject bugs in x86 pipeline stages. PMC event values are collected by executing the SPEC CPU2006 and MiBench benchmarks for both bug and no-bug scenarios in the x86 simulator. This training data obtained through simulation is used to build a Bug Detection Model (BDM) that detects a pipeline bug and a Bug Location Model (BLM) that locates the pipeline unit where the bug occurred. Simulation results show that both BDM and BLM provide an accuracy of 97.3% and 91.6% using Decision tree and Random forest, respectively. When compared against other state of art approaches, our solution can locate the pipeline unit where the bug occurred with a high accuracy and without using additional hardware.  相似文献   

8.
Even though advanced Machine Learning (ML) techniques have been adopted for DDoS detection, the attack remains a major threat of the Internet. Most of the existing ML-based DDoS detection approaches are under two categories: supervised and unsupervised. Supervised ML approaches for DDoS detection rely on availability of labeled network traffic datasets. Whereas, unsupervised ML approaches detect attacks by analyzing the incoming network traffic. Both approaches are challenged by large amount of network traffic data, low detection accuracy and high false positive rates. In this paper we present an online sequential semi-supervised ML approach for DDoS detection based on network Entropy estimation, Co-clustering, Information Gain Ratio and Exra-Trees algorithm. The unsupervised part of the approach allows to reduce the irrelevant normal traffic data for DDoS detection which allows to reduce false positive rates and increase accuracy. Whereas, the supervised part allows to reduce the false positive rates of the unsupervised part and to accurately classify the DDoS traffic. Various experiments were performed to evaluate the proposed approach using three public datasets namely NSL-KDD, UNB ISCX 12 and UNSW-NB15. An accuracy of 98.23%, 99.88% and 93.71% is achieved for respectively NSL-KDD, UNB ISCX 12 and UNSW-NB15 datasets, with respectively the false positive rates 0.33%, 0.35% and 0.46%.  相似文献   

9.

Obstructive sleep apnea is a syndrome which is characterized by the decrease in air flow or respiratory arrest depending on upper respiratory tract obstructions recurring during sleep and often observed with the decrease in the oxygen saturation. The aim of this study was to determine the connection between the respiratory arrests and the photoplethysmography (PPG) signal in obstructive sleep apnea patients. Determination of this connection is important for the suggestion of using a new signal in diagnosis of the disease. Thirty-four time-domain features were extracted from the PPG signal in the study. The relation between these features and respiratory arrests was statistically investigated. The Mann–Whitney U test was applied to reveal whether this relation was incidental or statistically significant, and 32 out of 34 features were found statistically significant. After this stage, the features of the PPG signal were classified with k-nearest neighbors classification algorithm, radial basis function neural network, probabilistic neural network, multilayer feedforward neural network (MLFFNN) and ensemble classification method. The output of the classifiers was considered as apnea and control (normal). When the classifier results were compared, the best performance was obtained with MLFFNN. Test accuracy rate is 97.07 % and kappa value is 0.93 for MLFFNN. It has been concluded with the results obtained that respiratory arrests can be recognized through the PPG signal and the PPG signal can be used for the diagnosis of OSA.

  相似文献   

10.
Species’ potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species’ potential distribution.  相似文献   

11.
Computational Visual Media - Visual analytics for machine learning has recently evolved as one of the most exciting areas in the field of visualization. To better identify which research topics are...  相似文献   

12.
It is well known that microarray printing, hybridization, and washing oftentimes create erroneous measurements, and these errors detrimentally impact machine microarray spot quality classification. Thus, it is crucial to identify and remove these errors if automation is to replace the still common practice of visually assessing spot quality, an extremely expensive and time-consuming procedure. A major problem in microarray spot quality classification methods proposed in the literature is the correlation among the features extracted from the spots. In this paper, we propose using a random subspace ensemble of neural networks and a feature selection algorithm to improve the performance of our microarray spot quality classification method. Our best method obtains an error under the receiver operating characteristic curve (EAUR) of 0.3 outperforming the stand-alone support vector machine EAUR of 1.7. The consistency of our proposed approach makes it a viable alternative to the labour-intensive manual method of spot quality assessment.  相似文献   

13.
自修改代码技术是恶意程序用以防止反汇编静态分析的最常见技术。传统操作系统的恶意代码防范技术不能有效监测和防止自修改恶意代码的执行和传播。介绍了一个基于虚拟机架构对自修改代码进行监测和监控的方法CASMonitor,能够从虚拟机外部动态、透明地监控虚拟机内部指定程序的执行过程,监测代码的自修改行为,解析新生成代码的入口点,进而提供病毒扫描等功能。x86/Win32虚拟机架构下的实验表明,该技术能够处理多种自修改代码行为以及常见的加壳工具。  相似文献   

14.
Hsu  Chih-Yu  Wang  Shuai  Qiao  Yu 《Multimedia Tools and Applications》2021,80(19):29643-29656
Multimedia Tools and Applications - The multimedia service company, Netflix, increased the number of new subscribers during the Coronavirus pandemic age. Intrusion detection systems for multimedia...  相似文献   

15.
代码克隆检测是提高软件开发效率、软件质量和可靠性的重要手段。基于抽象语法树(abstract syntax tree,AST)的单语言克隆检测已经取得了较为显著的效果,但跨语言代码的AST节点存在同义词、近义词且手工标注数据集成本高等问题,限制了现有克隆检测方法的有效性和实用性。针对上述问题,提出一种基于对比学习的树卷积神经网络(contrastive tree convolutional neural network,CTCNN)的跨语言代码克隆检测方法。该方法首先将不同编程语言的代码解析为AST,并对AST的节点类型和节点值作同义词转换处理,以降低不同编程语言AST之间的差异;同时,采用对比学习扩充负样本并对模型进行训练,使得在小样本数据集下能够最小化克隆对之间的距离,最大化非克隆对之间的距离。最后在公开数据集上进行了评测,精确度达到95.26%、召回率为99.98%、F1为97.56%。结果表明,相较于现有的最好的CLCDSA和C4方法,该模型的检测精度分别提高了43.92%和3.73%,其F1值分别提升了29.84%和6.29%,证明了所提模型是一种有效的跨语言代码克隆检测方法。  相似文献   

16.
Multimedia Tools and Applications - Automatic Speaker Verification (ASV) systems are vulnerable to spoofing attacks. Most existing spoofing detection systems rely on two main points; the feature...  相似文献   

17.
It is very important for financial institutions to develop credit rating systems to help them to decide whether to grant credit to consumers before issuing loans. In literature, statistical and machine learning techniques for credit rating have been extensively studied. Recent studies focusing on hybrid models by combining different machine learning techniques have shown promising results. However, there are various types of combination methods to develop hybrid models. It is unknown that which hybrid machine learning model can perform the best in credit rating. In this paper, four different types of hybrid models are compared by ‘Classification + Classification’, ‘Classification + Clustering’, ‘Clustering + Classification’, and ‘Clustering + Clustering’ techniques, respectively. A real world dataset from a bank in Taiwan is considered for the experiment. The experimental results show that the ‘Classification + Classification’ hybrid model based on the combination of logistic regression and neural networks can provide the highest prediction accuracy and maximize the profit.  相似文献   

18.
This paper presents an application of a classification method to adaptively and dynamically modify the therapy and real-time displays of a virtual reality system in accordance with the specific state of each patient using his/her physiological reactions. First, a theoretical background about several machine learning techniques for classification is presented. Then, nine machine learning techniques are compared in order to select the best candidate in terms of accuracy. Finally, first experimental results are presented to show that the therapy can be modulated in function of the patient state using machine learning classification techniques.  相似文献   

19.
We present a comparative study on the most popular machine learning methods applied to the challenging problem of customer churning prediction in the telecommunications industry. In the first phase of our experiments, all models were applied and evaluated using cross-validation on a popular, public domain dataset. In the second phase, the performance improvement offered by boosting was studied. In order to determine the most efficient parameter combinations we performed a series of Monte Carlo simulations for each method and for a wide range of parameters. Our results demonstrate clear superiority of the boosted versions of the models against the plain (non-boosted) versions. The best overall classifier was the SVM-POLY using AdaBoost with accuracy of almost 97% and F-measure over 84%.  相似文献   

20.
This paper presents a method for combining domain knowledge and machine learning (CDKML) for classifier generation and online adaptation. The method exploits advantages in domain knowledge and machine learning as complementary information sources. Whereas machine learning may discover patterns in interest domains that are too subtle for humans to detect, domain knowledge may contain information on a domain not present in the available domain dataset. CDKML has three steps. First, prior domain knowledge is enriched with relevant patterns obtained by machine learning to create an initial classifier. Second, genetic algorithms refine the classifier. Third, the classifier is adapted online on the basis of user feedback using the Markov decision process. CDKML was applied in fall detection. Tests showed that the classifiers developed by CDKML have better performance than machine‐learning classifiers generated on a training dataset that does not adequately represent all real‐life cases of the learned concept. The accuracy of the initial classifier was 10 percentage points higher than the best machine‐learning classifier and the refinement added 3 percentage points. The online adaptation improved the accuracy of the refined classifier by an additional 15 percentage points.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号