共查询到20条相似文献,搜索用时 15 毫秒
1.
Interest in the analysis of user behaviour on the Internet has been increasing rapidly, especially since the advent of electronic commerce. In this context, we argue here for the usefulness of constructing communities of users with common behaviour, making use of machine learning techniques. In particular, we assume that the users of any service on the Internet constitute a large community and we aim to construct smaller communities of users with common characteristics. The paper presents the results of three case studies for three different types of Internet service: a digital library, an information broker and a Web site. Particular attention is paid on the different types of information access involved in the three case studies: query-based information retrieval, profile-based information filtering and Web-site navigation. Each type of access imposes different constraints on the representation of the learning task. Two different unsupervised learning methods are evaluated: conceptual clustering and cluster mining. One of our main concerns is the construction of meaningful communities that can be used for improving information access on the Internet. Analysis of the results in the three case studies brings to surface some of the important properties of the task, suggesting the feasibility of a common methodology for the three different types of information access on the Internet. 相似文献
2.
Tor is a widespread network for anonymity over the Internet. Network owners try to identify and block Tor flows. On the other side, Tor developers enhance flow anonymity with various plugins. Tor and its plugins can be detected by deep packet inspection (DPI) methods. However, DPI-based solutions are computation intensive, need considerable human effort, and usually are hard to maintain and update. These issues limit the application of DPI methods in practical scenarios. As an alternative, we propose to use machine learning-based techniques that automatically learn from examples and adapt to new data whenever required. We report an empirical study on detection of three widely used Tor pluggable transports, namely Obfs3, Obfs4, and ScrambleSuit using four learning algorithms. We investigate the performance of Adaboost and Random Forests as two ensemble methods. In addition, we study the effectiveness of SVM and C4.5 as well-known parametric and nonparametric classifiers. These algorithms use general statistics of first few packets of the inspected flows. Experimental results conducted on real traffics show that all the adopted algorithms can perfectly detect the desired traffics by only inspecting first 10–50 packets. The trained classifiers can readily be employed in modern network switches and intelligent traffic monitoring systems. 相似文献
3.
Nowadays, smartphone devices are an integral part of our lives since they enable us to access a large variety of services from personal to banking. The worldwide popularity and adoption of smartphone devices continue to approach the capabilities of traditional computing environments. The computer malware like botnets is becoming an emerging threat to users and network operators, especially on popular platform such as android. Due to the rapid growth of botnet applications, there is a pressing need to develop an effective solution to detect them. Most of the existing detection techniques can detect only malicious android applications, but it cannot detect android botnet applications. In this paper, we propose a structural analysis-based learning framework, which adopts machine learning techniques to classify botnets and benign applications using the botnet characteristics-related unique patterns of requested permissions and used features. The experimental evaluation based on real-world benchmark datasets shows that the selected patterns can achieve high detection accuracy with low false positive rate. The experimental and statistical tests show that the support vector machine classifier performs well compared to other classification algorithms. 相似文献
4.
This paper presents an application of a classification method to adaptively and dynamically modify the therapy and real-time displays of a virtual reality system in accordance with the specific state of each patient using his/her physiological reactions. First, a theoretical background about several machine learning techniques for classification is presented. Then, nine machine learning techniques are compared in order to select the best candidate in terms of accuracy. Finally, first experimental results are presented to show that the therapy can be modulated in function of the patient state using machine learning classification techniques. 相似文献
5.
Generally, skin disease is a common one in human diseases. In computer vision application, the skin color is the powerful indication for this disease. This system identifies the skin cancer disease based on the images of skin. Initially, the skin is filtered using median filter and segmented using Mean shift segmentation. Segmented images are fed as input to feature extraction. GLCM, Moment Invariants and GLRLM features are extracted in this research work. The extracted features are classified by using classification techniques like Support vector machine, Probabilistic Neural Networks and Random forest and Combined SVM+ RF classifiers. Here combined SVM+RF classifier provided better results than other classifiers. 相似文献
7.
In recent years, computer vision has been widely used on industrial environments, allowing robots to perform important tasks like quality control, inspection and recognition. Vision systems are typically used to determine the position and orientation of objects in the workstation, enabling them to be transported and assembled by a robotic cell (e.g. industrial manipulator). These systems commonly resort to CCD (Charge-Coupled Device) Cameras fixed and located in a particular work area or attached directly to the robotic arm (eye-in-hand vision system). Although it is a valid approach, the performance of these vision systems is directly influenced by the industrial environment lighting. Taking all these into consideration, a new approach is proposed for eye-on-hand systems, where the use of cameras will be replaced by the 2D Laser Range Finder (LRF). The LRF will be attached to a robotic manipulator, which executes a pre-defined path to produce grayscale images of the workstation. With this technique the environment lighting interference is minimized resulting in a more reliable and robust computer vision system. After the grayscale image is created, this work focuses on the recognition and classification of different objects using inherent features (based on the invariant moments of Hu) with the most well-known machine learning models: k-Nearest Neighbor (kNN), Neural Networks (NNs) and Support Vector Machines (SVMs). In order to achieve a good performance for each classification model, a wrapper method is used to select one good subset of features, as well as an assessment model technique called K-fold cross-validation to adjust the parameters of the classifiers. The performance of the models is also compared, achieving performances of 83.5% for kNN, 95.5% for the NN and 98.9% for the SVM (generalized accuracy). These high performances are related with the feature selection algorithm based on the simulated annealing heuristic, and the model assessment ( k-fold cross-validation). It makes possible to identify the most important features in the recognition process, as well as the adjustment of the best parameters for the machine learning models, increasing the classification ratio of the work objects present in the robot's environment. 相似文献
8.
Neural Computing and Applications - The financial time series is inherently nonlinear and hence cannot be efficiently predicted by using linear statistical methods such as regression. Hence,... 相似文献
9.
Mechanical excavators are widely used in mining, tunneling and civil engineering projects. There are several types of mechanical excavators, such as a roadheader, tunnel boring machine and impact hammer. This is because these tools can bring productivity to the project quickly, accurately and safely. Among these, roadheaders have some advantages like selective mining, mobility, less over excavation, minimal ground disturbances, elimination of blast vibration, reduced ventilation requirements and initial investment cost. A critical issue in successful roadheader application is the ability to evaluate and predict the machine performance named instantaneous (net) cutting rate. Although there are several prediction methods in the literature, for the prediction of roadheader performance, only a few of them have been developed via artificial neural network techniques. In this study, for this purpose, 333 data sets including uniaxial compressive strength and power on cutting boom, 103 data set including RQD, and 125 data sets including machine weight are accumulated from the literature. This paper focuses on roadheader performance prediction using six different machine learning algorithms and a combination of various machine learning algorithms via ensemble techniques. Algorithms are ZeroR, random forest (RF), Gaussian process, linear regression, logistic regression and multi-layer perceptron (MLP). As a result, MLP and RF give better results than the other algorithms also the best solution achieved was bagging technique on RF and principle component analysis (PCA). The best success rate obtained in this study is 90.2% successful prediction, and it is relatively better than contemporary research. 相似文献
10.
Bayesian networks (BNs) provide a means for representing, displaying, and making available in a usable form the knowledge of experts in a given field. In this paper, we look at the performance of an expert constructed BN compared with other machine learning (ML) techniques for predicting the outcome (win, lose, or draw) of matches played by Tottenham Hotspur Football Club. The period under study was 1995–1997 – the expert BN was constructed at the start of that period, based almost exclusively on subjective judgement. Our objective was to determine retrospectively the comparative accuracy of the expert BN compared to some alternative ML models that were built using data from the two-year period. The additional ML techniques considered were: MC4, a decision tree learner; Naive Bayesian learner; Data Driven Bayesian (a BN whose structure and node probability tables are learnt entirely from data); and a K-nearest neighbour learner. The results show that the expert BN is generally superior to the other techniques for this domain in predictive accuracy. The results are even more impressive for BNs given that, in a number of key respects, the study assumptions place them at a disadvantage. For example, we have assumed that the BN prediction is ‘incorrect’ if a BN predicts more than one outcome as equally most likely (whereas, in fact, such a prediction would prove valuable to somebody who could place an ‘each way’ bet on the outcome). Although the expert BN has now long been irrelevant (since it contains variables relating to key players who have retired or left the club) the results here tend to confirm the excellent potential of BNs when they are built by a reliable domain expert. The ability to provide accurate predictions without requiring much learning data are an obvious bonus in any domain where data are scarce. Moreover, the BN was relatively simple for the expert to build and its structure could be used again in this and similar types of problems. 相似文献
11.
Content-based image retrieval (CBIR) systems traditionally find images within a database that are similar to query image using
low level features, such as colour histograms. However, this requires a user to provide an image to the system. It is easier
for a user to query the CBIR system using search terms which requires the image content to be described by semantic labels.
However, finding a relationship between the image features and semantic labels is a challenging problem to solve. This paper
aims to discover semantic labels for facial features for use in a face image retrieval system. Face image retrieval traditionally
uses global face-image information to determine similarity between images. However little has been done in the field of face
image retrieval to use local face-features and semantic labelling. Our work aims to develop a clustering method for the discovery
of semantic labels of face-features. We also present a machine learning based face-feature localization mechanism which we
show has promise in providing accurate localization. 相似文献
12.
Cardiac amyloidosis is an uncommon disease that has been known for a long time. Moreover, modern advancement in noninvasive imaging of heart via ultrasound, magnetic resonance imaging has enhanced the detection of secret cardiac amyloidosis in patients identified with the heart disease. This article focused on detecting the heart disease especially cardiac amyloidosis on electro cardio gram images using recent technology of both machine learning and deep learning approaches. In addition, apart from detecting the disease on images, we are categorizing the heart images as normal and cardiac amyloidosis if any deviations occur. For CA disease identification along with its classification, 300 cardiac images have taken and those images are analyzed using machine learning algorithms namely nearest centroid, gradient boosting and random forest. Several metrics such as precision, recall, f-score, sensitivity, accuracy, and confusion matrix based on binary classification which classifies the images into positive (CA) and negative (non-CA) are estimated. Among these approaches, gradient boosting method achieves 95% accuracy as better outcomes which measure the model performance in detecting cardiac amyloidosis disease as well as ECG images are categorized into either normal or abnormal via classification metrics. Furthermore, we applied deep learning based neural network “DeepNet” model is applied on augmented data along with CNN which attains 93% accuracy in CA disease identification. 相似文献
13.
This study uses machine learning techniques (ML) to classify and cluster different Western music genres. Three artificial neural network models (multi-layer perceptron neural network [MLP], probabilistic neural network [PNN]) and self-organizing maps neural network (SOM) along with support vector machines (SVM) are compared to two standard statistical methods (linear discriminant analysis [LDA] and cluster analysis [CA]). The variable sets considered are average frequencies, variance frequencies, maximum frequencies, amplitude or loudness of the sound and the median of the location of the 15 highest peaks in the periodogram. The results show that machine learning models outperform traditional statistical techniques in classifying and clustering different music genres due to their robustness and flexibility of modeling algorithms. The study also shows how it is possible to identify various dimensions of music genres by uncovering complex patterns in the multidimensional data. 相似文献
14.
Neural Computing and Applications - The application of artificial neural networks in mapping the mechanical characteristics of the cement-based materials is underlined in previous investigations.... 相似文献
15.
The amount of information contained in databases available on the Web has grown explosively in the last years. This information, known as the Deep Web, is heterogeneous and dynamically generated by querying these back-end (relational) databases through Web Query Interfaces (WQIs) that are a special type of HTML forms. The problem of accessing to the information of Deep Web is a great challenge because the information existing usually is not indexed by general-purpose search engines. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in the Deep Web. Since WQIs are the only means to access to the Deep Web, the automatic identification of WQIs plays an important role. It facilitates traditional search engines to increase the coverage and the access to interesting information not available on the indexable Web. The accurate identification of Deep Web data sources are key issues in the information retrieval process. In this paper we propose a new strategy for automatic discovery of WQIs. This novel proposal makes an adequate selection of HTML elements extracted from HTML forms, which are used in a set of heuristic rules that help to identify WQIs. The proposed strategy uses machine learning algorithms for classification of searchable (WQIs) and non-searchable (non-WQI) HTML forms using a prototypes selection algorithm that allows to remove irrelevant or redundant data in the training set. The internal content of Web Query Interfaces was analyzed with the objective of identifying only those HTML elements that are frequently appearing provide relevant information for the WQIs identification. For testing, we use three groups of datasets, two available at the UIUC repository and a new dataset that we created using a generic crawler supported by human experts that includes advanced and simple query interfaces. The experimental results show that the proposed strategy outperforms others previously reported works. 相似文献
16.
Appendicitis is a common disease that occurs particularly often in childhood and adolescence. The accurate diagnosis of acute appendicitis is the most significant precaution to avoid severe unnecessary surgery. In this paper, the author presents a machine learning (ML) technique to predict appendix illness whether it is acute or subacute, especially between 10 and 30 years and whether it requires an operation or just taking medication for treatment. The dataset has been collected from public hospital-based citizens between 2016 and 2019. The predictive results of the models achieved by different ML techniques (Logistic Regression, Naïve Bayes, Generalized Linear, Decision Tree, Support Vector Machine, Gradient Boosted Tree, Random Forest) are compared. The covered dataset are 625 specimens and the total of the medical records that are applied in this paper include 371 males (60.22%) and 254 females (40.12%). According to the dataset, the records consist of 318 (50.88%) operated and 307 (49.12%) unoperated patients. It is observed that the random forest algorithm obtains the optimal result with an accurately predicted result of 83.75%, precision of 84.11%, sensitivity of 81.08%, and the specificity of 81.01%. Moreover, an estimation method based on ML techniques is improved and enhanced to detect individuals with acute appendicitis. 相似文献
17.
The growing complexity of new features in multicore processors imposes significant pressure towards functional verification. Although a large amount of time and effort are spent on it, functional design bugs escape into the products and cause catastrophic effects. Hence, online design bug detection is needed to detect the functional bugs in the field. In this work, we propose a novel approach by leveraging Performance Monitoring Counters (PMC) and machine learning to detect and locate pipeline bugs in a processor. We establish the correlation between PMC events and pipeline bugs in order to extract the features to build and train machine learning models. We design and implement a synthetic bug injection framework to obtain datasets for our simulation. To evaluate the proposal, Multi2Sim simulator is used to simulate the x86 architecture model. An x86 fault model is developed to synthetically inject bugs in x86 pipeline stages. PMC event values are collected by executing the SPEC CPU2006 and MiBench benchmarks for both bug and no-bug scenarios in the x86 simulator. This training data obtained through simulation is used to build a Bug Detection Model (BDM) that detects a pipeline bug and a Bug Location Model (BLM) that locates the pipeline unit where the bug occurred. Simulation results show that both BDM and BLM provide an accuracy of 97.3% and 91.6% using Decision tree and Random forest, respectively. When compared against other state of art approaches, our solution can locate the pipeline unit where the bug occurred with a high accuracy and without using additional hardware. 相似文献
18.
The human’s temperature is little known and important to the diagnosis of diseases, according to most researchers and health workers.In ancient medicine, doctors may treat patients with wet mud or slurry clay. The part that would dry up first was considered the diseased part when either of these spread over the body. This can be done today with thermal cameras generating pictures with electromagnetic frequencies. Inflammation and blockage areas that predict cancer without radiation or touch may be detected by thermography. It can be used before any visible symptoms occur as a great advantage in medical testing. Machine learning (ML) is used in this paper as statistical techniques to give software programs the capacity to learn from information without being directly coded. ML can help to do so by learning these thermal scans and identifying suspected areas where a doctor needs to research more. Thermal photography is a comparatively better alternative to other methods that need sophisticated equipment, enabling machines to provide an easier and more effective approach to clinics and hospitals. 相似文献
19.
Advances in the technology of astronomical spectra acquisition have resulted in an enormous amount of data available in world-wide telescope archives. It is no longer feasible to analyze them using classical approaches, so a new astronomical discipline,astroinformatics, has emerged. We describe the initial experiments in the investigation of spectral line profiles of emission line stars using machine learning with attempt to automatically identify Be and B[e] stars spectra in large archives and classify their types in an automatic manner. Due to the size of spectra collections, the dimension reduction techniques based on wavelet transformation are studied as well. The result clearly justifies that machine learning is able to distinguish different shapes of line profiles even after drastic dimension reduction. 相似文献
20.
Coefficient of consolidation in the soil is the significant engineering properties and an important parameter for designing and auditing of geo-technical structures. Therefore, in this study, authors have proposed an efficient methodology to prediction the coefficient of consolidation using machine learning models namely Multiple Linear Regression (MLR), Artificial Neural Network (ANN), Support Vector Regression (SVR), and Adaptive Network based Fuzzy Inference System (ANFIS). Further, various feature selection techniques such as Least Absolute Shrinkage and Selection Operator algorithm (LASSO), Random Forests - Recursive Feature Elimination (RF-RFE), and Mutual information have also been applied. It has been observed that feature selection methods have enhanced the quality of prediction model by eliminating the irrelevant features and utilized only important features while building the prediction models. Experiments are performed on the dataset collected on the 534 soil samples from Ha Noi –Hai Phong highway project, Vietnam. Experimental results show the adequacy of the proposed model, and the hybrid approach ANFIS which is a fusion of ANN and fuzzy inference system includes complementary information of the uncertainty and adaptability. ANFIS along with LASSO feature selection method produces the coefficient of determination of 0.831 and thus provides the best prediction for the coefficient of consolidation of a soil as compared to other approaches. 相似文献
|