共查询到20条相似文献,搜索用时 0 毫秒
1.
Malicious web content detection by machine learning 总被引:1,自引:0,他引:1
Yung-Tsung Hou Yimeng Chang Tsuhan Chen Chi-Sung Laih Chia-Mei Chen 《Expert systems with applications》2010,37(1):55-60
The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore, such DHTML code can disguise itself easily through obfuscation or transformation, which makes the detection even harder. Anti-virus software packages commonly use signature-based approaches which might not be able to efficiently identify camouflaged malicious HTML codes. Therefore, our paper proposes a malicious web page detection using the technique of machine learning. Our study analyzes the characteristic of a malicious webpage systematically and presents important features for machine learning. Experimental results demonstrate that our method is resilient to code obfuscations and can correctly determine whether a webpage is malicious or not. 相似文献
2.
Sanjoy Dasgupta 《Theoretical computer science》2011,412(19):1767-1781
An active learner has a collection of data points, each with a label that is initially hidden but can be obtained at some cost. Without spending too much, it wishes to find a classifier that will accurately map points to labels. There are two common intuitions about how this learning process should be organized: (i) by choosing query points that shrink the space of candidate classifiers as rapidly as possible; and (ii) by exploiting natural clusters in the (unlabeled) data set. Recent research has yielded learning algorithms for both paradigms that are efficient, work with generic hypothesis classes, and have rigorously characterized labeling requirements. Here we survey these advances by focusing on two representative algorithms and discussing their mathematical properties and empirical performance. 相似文献
3.
An active learning based TCM-KNN algorithm for supervised network intrusion detection 总被引:1,自引:0,他引:1
As network attacks have increased in number and severity over the past few years, intrusion detection is increasingly becoming a critical component of secure information systems and supervised network intrusion detection has been an active and difficult research topic in the field of intrusion detection for many years. However, it hasn't been widely applied in practice due to some inherent issues. The most important reason is the difficulties in obtaining adequate attack data for the supervised classifiers to model the attack patterns, and the data acquisition task is always time-consuming and greatly relies on the domain experts. In this paper, we propose a novel supervised network intrusion detection method based on TCM-KNN (Transductive Confidence Machines for K-Nearest Neighbors) machine learning algorithm and active learning based training data selection method. It can effectively detect anomalies with high detection rate, low false positives under the circumstance of using much fewer selected data as well as selected features for training in comparison with the traditional supervised intrusion detection methods. A series of experimental results on the well-known KDD Cup 1999 data set demonstrate that the proposed method is more robust and effective than the state-of-the-art intrusion detection methods, as well as can be further optimized as discussed in this paper for real applications. 相似文献
4.
《Expert systems with applications》2014,41(14):6086-6097
In the past few years, active learning has been reasonably successful and it has drawn a lot of attention. However, recent active learning methods have focused on strategies in which a large unlabeled dataset has to be reprocessed at each learning iteration. As the datasets grow, these strategies become inefficient or even a tremendous computational challenge. In order to address these issues, we propose an effective and efficient active learning paradigm which attains a significant reduction in the size of the learning set by applying an a priori process of identification and organization of a small relevant subset. Furthermore, the concomitant classification and selection processes enable the classification of a very small number of samples, while selecting the informative ones. Experimental results showed that the proposed paradigm allows to achieve high accuracy quickly with minimum user interaction, further improving its efficiency. 相似文献
5.
Today’s college students have grown up with technology. These digital natives typically gravitate toward group activities in technology embedded social contexts. However, despite this multidimensional evolution, little has changed in conventional classrooms where they build their education experience. We investigate learning models in a classroom environment which still remains the main driver of education today. We describe a conversational learning model based on group activities which involve multi-party conversations. We implement this model in a technology-enhanced studio-classroom to “visualize” conversations which otherwise would remain abstract to learners. Teachers are empowered with instructional patterns to guide their changing role in this novel classroom environment. Based on standard assessment indicators, we conduct an experimental analysis which results show interesting tradeoffs of learning performance that favor the proposed conversational learning approach compared to those obtained from conventional instruction. 相似文献
6.
A stopping criterion for active learning 总被引:1,自引:0,他引:1
Active learning (AL) is a framework that attempts to reduce the cost of annotating training material for statistical learning methods. While a lot of papers have been presented on applying AL to natural language processing tasks reporting impressive savings, little work has been done on defining a stopping criterion. In this work, we present a stopping criterion for active learning based on the way instances are selected during uncertainty-based sampling and verify its applicability in a variety of settings. The statistical learning models used in our study are support vector machines (SVMs), maximum entropy models and Bayesian logistic regression and the tasks performed are text classification, named entity recognition and shallow parsing. In addition, we present a method for multiclass mutually exclusive SVM active learning. 相似文献
7.
Support vector machine active learning for music retrieval 总被引:7,自引:0,他引:7
8.
Jianfeng Shen Bin Ju Tao Jiang Jingjing Ren Miao Zheng Chengwei Yao Lanjuan LiAuthor vitae 《Neurocomputing》2011,74(18):3785-3792
Image classification is an important task in computer vision and machine learning. However, it is known that manually labeling images is time-consuming and expensive, but the unlabeled images are easily available. Active learning is a mechanism which tries to determine which unlabeled data points would be the most informative (i.e., improve the classifier the most) if they are labeled and used as training samples. In this paper, we introduce the idea of column subset selection, which aims to select the most representation columns from a data matrix, into active learning and propose a novel active learning algorithm, column subset selection for active learning (CSSactive). CSSactive selects the most representative images to label, then the other images are reconstructed by these labeled images. The goal of CSSactive is to minimize the reconstruction error. Besides, most of the previous active learning approaches are based on linear model, and hence they only consider linear functions. Therefore, they fail to discover the intrinsic geometry in images when the image space is highly nonlinear. Therefore, we provide a kernel-based column subset selection for active learning (KCSSactive) algorithm which performs the active learning in Reproducing Kernel Hilbert Space (RKHS) instead of the original image space to address this problem. Experimental results on Yale, AT&T and COIL20 data sets demonstrate the effectiveness of our proposed approaches. 相似文献
9.
针对网络数据流异常检测,既要保证分类准确率,又要提高检测速度的问题,在原有数据流挖掘技术的基础上提出一种改进的增量式学习算法.算法中建立多模型轮转结构,在每次训练中从几何角度出发求出当前训练样本集的支持向量,选择出分布于超平面间隔中的支持向量进行增量SVM训练.使用UCI标准数据库中的数据进行实验,并且与另外两种经典分类模型进行比较,结果表明了方法的有效性. 相似文献
10.
In this paper, we describe a new error-driven active learning approach to self-growing radial basis function networks for early robot learning. There are several mappings that need to be set up for an autonomous robot system for sensorimotor coordination and transformation of sensory information from one modality to another, and these mappings are usually highly nonlinear. Traditional passive learning approaches usually cause both large mapping errors and nonuniform mapping error distribution compared to active learning. A hierarchical clustering technique is introduced to group large mapping errors and these error clusters drive the system to actively explore details of these clusters. Higher level local growing radial basis function subnetworks are used to approximate the residual errors from previous mapping levels. Plastic radial basis function networks construct the substrate of the learning system and a simplified node-decoupled extended Kalman filter algorithm is presented to train these radial basis function networks. Experimental results are given to compare the performance among active learning with hierarchical adaptive RBF networks, passive learning with adaptive RBF networks and hierarchical mixtures of experts, as well as their robustness under noise conditions. 相似文献
11.
《Expert systems with applications》2014,41(11):5201-5211
Learning non-taxonomic relationships is a sub-field of Ontology Learning that aims at automating the extraction of these relationships from text. Several techniques have been proposed based on Natural Language Processing and Machine Learning. However just like for other techniques for Ontology Learning, evaluating techniques for learning non-taxonomic relationships is an open problem. Three general proposals suggest that the learned ontologies can be evaluated in an executable application or by domain experts or even by a comparison with a predefined reference ontology. This article proposes two procedures to evaluate techniques for learning non-taxonomic relationships based on the comparison of the relationships obtained with those of a reference ontology. Also, these procedures are used in the evaluation of two state of the art techniques performing the extraction of relationships from two corpora in the domains of biology and Family Law. 相似文献
12.
The performance of eight machine learning classifiers were compared with three aphasia related classification problems. The first problem contained naming data of aphasic and non-aphasic speakers tested with the Philadelphia Naming Test. The second problem included the naming data of Alzheimer and vascular disease patients tested with Finnish version of the Boston Naming Test. The third problem included aphasia test data of patients suffering from four different aphasic syndromes tested with the Aachen Aphasia Test. The first two data sets were small. Therefore, the data used in the tests were artificially generated from the original confrontation naming data of 23 and 22 subjects, respectively. The third set contained aphasia test data of 146 aphasic speakers and was used as such in the experiments. With the first and the third data set the classifiers could successfully be used for the task, while the results with the second data set were less encouraging. However, based on the results, no single classifier performed exceptionally well with all data sets, suggesting that the selection of the classifier used for classification of aphasic data should be based on the experiments performed with the data set at hand. 相似文献
13.
This paper presents a novel active learning method developed in the framework of ε-insensitive support vector regression (SVR) for the solution of regression problems with small size initial training data. The proposed active learning method selects iteratively the most informative as well as representative unlabeled samples to be included in the training set by jointly evaluating three criteria: (i) relevancy, (ii) diversity, and (iii) density of samples. All three criteria are implemented according to the SVR properties and are applied in two clustering-based consecutive steps. In the first step, a novel measure to select the most relevant samples that have high probability to be located either outside or on the boundary of the ε-tube of SVR is defined. To this end, initially a clustering method is applied to all unlabeled samples together with the training samples that are inside the ε-tube (those that are not support vectors, i.e., non-SVs); then the clusters with non-SVs are eliminated. The unlabeled samples in the remaining clusters are considered as the most relevant patterns. In the second step, a novel measure to select diverse samples among the relevant patterns from the high density regions in the feature space is defined to better model the SVR learning function. To this end, initially clusters with the highest density of samples are chosen to identify the highest density regions in the feature space. Then, the sample from each selected cluster that is associated with the portion of feature space having the highest density (i.e., the most representative of the underlying distribution of samples contained in the related cluster) is selected to be included in the training set. In this way diverse samples taken from high density regions are efficiently identified. Experimental results obtained on four different data sets show the robustness of the proposed technique particularly when a small-size initial training set are available. 相似文献
14.
Pessimistic cost-sensitive active learning of decision trees for profit maximizing targeting campaigns 总被引:1,自引:0,他引:1
In business applications such as direct marketing, decision-makers are required to choose the action which best maximizes a utility function. Cost-sensitive learning methods can help them achieve this goal. In this paper, we introduce Pessimistic Active Learning (PAL). PAL employs a novel pessimistic measure, which relies on confidence intervals and is used to balance the exploration/exploitation trade-off. In order to acquire an initial sample of labeled data, PAL applies orthogonal arrays of fractional factorial design. PAL was tested on ten datasets using a decision tree inducer. A comparison of these results to those of other methods indicates PAL’s superiority. 相似文献
15.
Rule-based intrusion detection systems generally rely on hand crafted signatures developed by domain experts. This could lead to a delay in updating the signature bases and potentially compromising the security of protected systems. In this paper, we present a biologically-inspired computational approach to dynamically and adaptively learn signatures for network intrusion detection using a supervised learning classifier system. The classifier is an online and incremental parallel production rule-based system.A signature extraction system is developed that adaptively extracts signatures to the knowledge base as they are discovered by the classifier. The signature extraction algorithm is augmented by introducing new generalisation operators that minimise overlap and conflict between signatures. Mechanisms are provided to adapt main algorithm parameters to deal with online noisy and imbalanced class data. Our approach is hybrid in that signatures for both intrusive and normal behaviours are learnt.The performance of the developed systems is evaluated with a publicly available intrusion detection dataset and results are presented that show the effectiveness of the proposed system. 相似文献
16.
In traditional approaches, process planning and scheduling are carried out sequentially, where scheduling is done separately after the process plan has been generated. However, the functions of these two systems are usually complementary. The traditional approach has become an obstacle to improve the productivity and responsiveness of the manufacturing system. If the two systems can be integrated more tightly, greater performance and higher productivity of a manufacturing system can be achieved. Therefore, the research on the integrated process planning and scheduling (IPPS) problem is necessary. In this paper, a new active learning genetic algorithm based method has been developed to facilitate the integration and optimization of these two systems. Experimental studies have been used to test the approach, and the comparisons have been made between this approach and some previous approaches to indicate the adaptability and superiority of the proposed approach. The experimental results show that the proposed approach is a promising and very effective method on the research of the IPPS problem. 相似文献
17.
In this paper, active noise control using recurrent neural networks is addressed. A new learning algorithm for recurrent neural networks based on Adjoint Extended Kalman Filter is developed for active noise control. The overall control structure for active noise control is constructed using two recurrent neural networks: the first neural network is used to model secondary path of active noise control while the second one is employed to generate control signal. Real-time experiment of the proposed algorithm using digital signal processor is carried-out to show the effectiveness of the method. 相似文献
18.
SVM based adaptive learning method for text classification from positive and unlabeled documents 总被引:1,自引:6,他引:1
Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text
classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text
classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify
many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set
of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly,
different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in
the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier.
Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct
the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In
addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several
comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results
show that our method increases the performance (F1-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers
both in harvest rate and target recall. 相似文献
19.
The purpose of this study was to examine the effects of an Online Learning Community (OLC) on active and reflective learners’ learning performance and attitude in a face-to-face undergraduate digital design course. 814 freshmen in an introductory digital design course were randomly assigned to one of two treatments: one offered students an OLC, which required students to discuss their assignments and readings online and participate in certain online learning activities; the other one did not offer the OLC (NC: no online learning community), but required involving students in face-to-face discussion. Individual students’ learning styles were measured using Felder and Solomon’s Index of Learning Styles Questionnaire. Results indicated that both active and reflective learners in the OLC intervention performed significantly better than those who were in the NC intervention. Results also indicated that active learners performed significantly better than reflective learners in the NC intervention; however, reflective learners performed significantly better than active learners in the OLC intervention. No significant difference between active and reflective learners’ attitudes was found. These findings indicated that OLC might be an effective means for improving both active and reflective learners’ learning performance and attitudes; however, its effects on active learners might not be as great as on reflective learners. 相似文献
20.
A cortex-like learning machine for temporal hierarchical pattern clustering, detection, and recognition 总被引:1,自引:0,他引:1
James Ting-Ho LoAuthor Vitae 《Neurocomputing》2012,78(1):89-103
A learning machine, called a clustering interpreting probabilistic associative memory (CIPAM), is proposed. CIPAM consists of a clusterer and an interpreter. The clusterer is a recurrent hierarchical neural network of unsupervised processing units (UPUs). The interpreter is a number of supervised processing units (SPUs) that branch out from the clusterer. Each processing unit (PU), UPU or SPU, comprises “dendritic encoders” for encoding inputs to the PU, “synapses” for storing resultant codes, a “nonspiking neuron” for generating inhibitory graded signals to modulate neighboring spiking neurons, “spiking neurons” for computing the subjective probability distribution (SPD) or the membership function, in the sense of fuzzy logic, of the label of said inputs to the PU and generating spike trains with the SPD or membership function as the firing rates, and a masking matrix for maximizing generalization. While UPUs employ unsupervised covariance learning mechanisms, SPUs employ supervised ones. They both also have unsupervised accumulation learning mechanisms. The clusterer of CIPAM clusters temporal and spatial data. The interpreter interprets the resultant clusters, effecting detection and recognition of temporal and hierarchical causes. 相似文献