首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The amount of information contained in databases available on the Web has grown explosively in the last years. This information, known as the Deep Web, is heterogeneous and dynamically generated by querying these back-end (relational) databases through Web Query Interfaces (WQIs) that are a special type of HTML forms. The problem of accessing to the information of Deep Web is a great challenge because the information existing usually is not indexed by general-purpose search engines. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in the Deep Web. Since WQIs are the only means to access to the Deep Web, the automatic identification of WQIs plays an important role. It facilitates traditional search engines to increase the coverage and the access to interesting information not available on the indexable Web. The accurate identification of Deep Web data sources are key issues in the information retrieval process. In this paper we propose a new strategy for automatic discovery of WQIs. This novel proposal makes an adequate selection of HTML elements extracted from HTML forms, which are used in a set of heuristic rules that help to identify WQIs. The proposed strategy uses machine learning algorithms for classification of searchable (WQIs) and non-searchable (non-WQI) HTML forms using a prototypes selection algorithm that allows to remove irrelevant or redundant data in the training set. The internal content of Web Query Interfaces was analyzed with the objective of identifying only those HTML elements that are frequently appearing provide relevant information for the WQIs identification. For testing, we use three groups of datasets, two available at the UIUC repository and a new dataset that we created using a generic crawler supported by human experts that includes advanced and simple query interfaces. The experimental results show that the proposed strategy outperforms others previously reported works.  相似文献   

2.
张欢欢  洪敏  袁玉波 《计算机应用》2018,38(11):3193-3198
针对输入人脸特征的不准确性导致识别系统识别率不高的问题,提出了一种有效的基于极端学习机(ELM)的人脸特征深度稀疏自编码(DSAE)方法。首先,利用截断式核范数构造损失函数,通过最小化损失函数提取人脸图像的稀疏特征;其次,利用极端学习机自编码器(ELM-AE)模型进行人脸特征的自编码,实现数据维度的降低以及噪声过滤;最后,通过经验风险极小化得到最优的深度结构。在ORL、IMM、Yale和UMIST人脸数据集上的实验结果表明,DSAE方法对高维人脸图像的识别率明显优于极端学习机、随机森林(RF)等算法,且具有良好的泛化性能。  相似文献   

3.
The aim of Information Lifecycle Management (ILM) is to govern data throughout its lifecycle as efficiently as possible and effectively from technical points of view. A core aspect is the question, where the data should be stored, since different costs and access times are entailed. For this purpose data have to be classified, which presently is either done manually in an elaborate way, or with recourse to only a few data attributes, in particular access frequency. In the context of Data-Warehouse-Systems this article introduces an automated and therefore speedy and cost-effective data classification for ILM. Machine learning techniques, in particular an artificial neural network (multilayer perceptron), a support vector machine and a decision tree approach are compared on an SAP-based real-world data set from the automotive industry. This data classification considers a large number of data attributes and thus attains similar results akin to human experts. In this comparison of machine learning techniques, besides the accuracy of classification, also the types of misclassification that appear, are included, since this is important in ILM.  相似文献   

4.

Obstructive sleep apnea is a syndrome which is characterized by the decrease in air flow or respiratory arrest depending on upper respiratory tract obstructions recurring during sleep and often observed with the decrease in the oxygen saturation. The aim of this study was to determine the connection between the respiratory arrests and the photoplethysmography (PPG) signal in obstructive sleep apnea patients. Determination of this connection is important for the suggestion of using a new signal in diagnosis of the disease. Thirty-four time-domain features were extracted from the PPG signal in the study. The relation between these features and respiratory arrests was statistically investigated. The Mann–Whitney U test was applied to reveal whether this relation was incidental or statistically significant, and 32 out of 34 features were found statistically significant. After this stage, the features of the PPG signal were classified with k-nearest neighbors classification algorithm, radial basis function neural network, probabilistic neural network, multilayer feedforward neural network (MLFFNN) and ensemble classification method. The output of the classifiers was considered as apnea and control (normal). When the classifier results were compared, the best performance was obtained with MLFFNN. Test accuracy rate is 97.07 % and kappa value is 0.93 for MLFFNN. It has been concluded with the results obtained that respiratory arrests can be recognized through the PPG signal and the PPG signal can be used for the diagnosis of OSA.

  相似文献   

5.
针对图像自动标注中因人工选择特征而导致信息缺失的缺点,提出使用卷积神经网络对样本进行自主特征学习。为了适应图像自动标注的多标签学习的特点以及提高对低频词汇的召回率,首先改进卷积神经网络的损失函数,构建一个多标签学习的卷积神经网络(CNN-MLL)模型,然后利用图像标注词间的相关性对网络模型输出结果进行改善。通过在IAPR TC-12标准图像标注数据集上对比了其他传统方法,实验得出,基于采用均方误差函数的卷积神经网络(CNN-MSE)的方法较支持向量机(SVM)方法在平均召回率上提升了12.9%,较反向传播神经网络(BPNN)方法在平均准确率上提升了37.9%;基于标注结果改善的CNN-MLL方法较普通卷积神经网络的平均准确率和平均召回率分别提升了23%和20%。实验结果表明基于标注结果改善的CNN-MLL方法能有效地避免因人工选择特征造成的信息缺失同时增加了对低频词汇的召回率。  相似文献   

6.
This paper presents an application of a classification method to adaptively and dynamically modify the therapy and real-time displays of a virtual reality system in accordance with the specific state of each patient using his/her physiological reactions. First, a theoretical background about several machine learning techniques for classification is presented. Then, nine machine learning techniques are compared in order to select the best candidate in terms of accuracy. Finally, first experimental results are presented to show that the therapy can be modulated in function of the patient state using machine learning classification techniques.  相似文献   

7.
Multimedia Tools and Applications - Automatic Speaker Verification (ASV) systems are vulnerable to spoofing attacks. Most existing spoofing detection systems rely on two main points; the feature...  相似文献   

8.
Artificial Life and Robotics - While e-learning lectures allow students to learn at their own pace, it is difficult to manage students’ concentration, which prevents them from receiving...  相似文献   

9.
Generally, skin disease is a common one in human diseases. In computer vision application, the skin color is the powerful indication for this disease. This system identifies the skin cancer disease based on the images of skin. Initially, the skin is filtered using median filter and segmented using Mean shift segmentation. Segmented images are fed as input to feature extraction. GLCM, Moment Invariants and GLRLM features are extracted in this research work. The extracted features are classified by using classification techniques like Support vector machine, Probabilistic Neural Networks and Random forest and Combined SVM+ RF classifiers. Here combined SVM+RF classifier provided better results than other classifiers.  相似文献   

10.
The analysis of social communities related logs has recently received considerable attention for its importance in shedding light on social concerns by identifying different groups, and hence helps in resolving issues like predicting terrorist groups. In the customer analysis domain, identifying calling communities can be used for determining a particular customer’s value according to the general pattern behavior of the community that the customer belongs to; this helps the effective targeted marketing design, which is significantly important for increasing profitability. In telecommunication industry, machine learning techniques have been applied to the Call Detail Record (CDR) for predicting customer behavior such as churn prediction. In this paper, we pursue identifying the calling communities and demonstrate how cluster analysis can be used to effectively identify communities using information derived from the CDR data. We use the information extracted from the cluster analysis to identify customer calling patterns. Customers calling patterns are then given to a classification algorithm to generate a classifier model for predicting the calling communities of a customer. We apply different machine learning techniques to build classifier models and compare them in terms of classification accuracy and computational performance. The reported test results demonstrate the applicability and effectiveness of the proposed approach.  相似文献   

11.
Multimedia Tools and Applications - Thanks to the evolution of technology, we find a very large number of internet users who use social networks to react and share things with each other. These...  相似文献   

12.

Mechanical excavators are widely used in mining, tunneling and civil engineering projects. There are several types of mechanical excavators, such as a roadheader, tunnel boring machine and impact hammer. This is because these tools can bring productivity to the project quickly, accurately and safely. Among these, roadheaders have some advantages like selective mining, mobility, less over excavation, minimal ground disturbances, elimination of blast vibration, reduced ventilation requirements and initial investment cost. A critical issue in successful roadheader application is the ability to evaluate and predict the machine performance named instantaneous (net) cutting rate. Although there are several prediction methods in the literature, for the prediction of roadheader performance, only a few of them have been developed via artificial neural network techniques. In this study, for this purpose, 333 data sets including uniaxial compressive strength and power on cutting boom, 103 data set including RQD, and 125 data sets including machine weight are accumulated from the literature. This paper focuses on roadheader performance prediction using six different machine learning algorithms and a combination of various machine learning algorithms via ensemble techniques. Algorithms are ZeroR, random forest (RF), Gaussian process, linear regression, logistic regression and multi-layer perceptron (MLP). As a result, MLP and RF give better results than the other algorithms also the best solution achieved was bagging technique on RF and principle component analysis (PCA). The best success rate obtained in this study is 90.2% successful prediction, and it is relatively better than contemporary research.

  相似文献   

13.
The explosive growth of malware variants poses a major threat to information security. Traditional anti-virus systems based on signatures fail to classify unknown malware into their corresponding families and to detect new kinds of malware programs. Therefore, we propose a machine learning based malware analysis system, which is composed of three modules: data processing, decision making, and new malware detection. The data processing module deals with gray-scale images, Opcode n-gram, and import functions, which are employed to extract the features of the malware. The decision-making module uses the features to classify the malware and to identify suspicious malware. Finally, the detection module uses the shared nearest neighbor (SNN) clustering algorithm to discover new malware families. Our approach is evaluated on more than 20 000 malware instances, which were collected by Kingsoft, ESET NOD32, and Anubis. The results show that our system can effectively classify the unknown malware with a best accuracy of 98.9%, and successfully detects 86.7% of the new malware.  相似文献   

14.
Object recognition using laser range finder and machine learning techniques   总被引:1,自引:0,他引:1  
In recent years, computer vision has been widely used on industrial environments, allowing robots to perform important tasks like quality control, inspection and recognition. Vision systems are typically used to determine the position and orientation of objects in the workstation, enabling them to be transported and assembled by a robotic cell (e.g. industrial manipulator). These systems commonly resort to CCD (Charge-Coupled Device) Cameras fixed and located in a particular work area or attached directly to the robotic arm (eye-in-hand vision system). Although it is a valid approach, the performance of these vision systems is directly influenced by the industrial environment lighting. Taking all these into consideration, a new approach is proposed for eye-on-hand systems, where the use of cameras will be replaced by the 2D Laser Range Finder (LRF). The LRF will be attached to a robotic manipulator, which executes a pre-defined path to produce grayscale images of the workstation. With this technique the environment lighting interference is minimized resulting in a more reliable and robust computer vision system. After the grayscale image is created, this work focuses on the recognition and classification of different objects using inherent features (based on the invariant moments of Hu) with the most well-known machine learning models: k-Nearest Neighbor (kNN), Neural Networks (NNs) and Support Vector Machines (SVMs). In order to achieve a good performance for each classification model, a wrapper method is used to select one good subset of features, as well as an assessment model technique called K-fold cross-validation to adjust the parameters of the classifiers. The performance of the models is also compared, achieving performances of 83.5% for kNN, 95.5% for the NN and 98.9% for the SVM (generalized accuracy). These high performances are related with the feature selection algorithm based on the simulated annealing heuristic, and the model assessment (k-fold cross-validation). It makes possible to identify the most important features in the recognition process, as well as the adjustment of the best parameters for the machine learning models, increasing the classification ratio of the work objects present in the robot's environment.  相似文献   

15.
Neural Computing and Applications - The financial time series is inherently nonlinear and hence cannot be efficiently predicted by using linear statistical methods such as regression. Hence,...  相似文献   

16.
Nowadays, smartphone devices are an integral part of our lives since they enable us to access a large variety of services from personal to banking. The worldwide popularity and adoption of smartphone devices continue to approach the capabilities of traditional computing environments. The computer malware like botnets is becoming an emerging threat to users and network operators, especially on popular platform such as android. Due to the rapid growth of botnet applications, there is a pressing need to develop an effective solution to detect them. Most of the existing detection techniques can detect only malicious android applications, but it cannot detect android botnet applications. In this paper, we propose a structural analysis-based learning framework, which adopts machine learning techniques to classify botnets and benign applications using the botnet characteristics-related unique patterns of requested permissions and used features. The experimental evaluation based on real-world benchmark datasets shows that the selected patterns can achieve high detection accuracy with low false positive rate. The experimental and statistical tests show that the support vector machine classifier performs well compared to other classification algorithms.  相似文献   

17.
Language Resources and Evaluation - Gesture and multimodal communication researchers typically annotate video data manually, even though this can be a very time-consuming task. In the present work,...  相似文献   

18.
Chen  Yuantao  Liu  Linwu  Tao  Jiajun  Chen  Xi  Xia  Runlong  Zhang  Qian  Xiong  Jie  Yang  Kai  Xie  Jingbo 《Multimedia Tools and Applications》2021,80(3):4237-4261

The automatic image annotation is an effective computer operation that predicts the annotation of an unknown image by automatically learning potential relationships between the semantic concept space and the visual feature space in the annotation image dataset. Usually, the auto-labeling image includes the processing: learning processing and labeling processing. Existing image annotation methods that employ convolutional features of deep learning methods have a number of limitations, including complex training and high space/time expenses associated with the image annotation procedure. Accordingly, this paper proposes an innovative method in which the visual features of the image are presented by the intermediate layer features of deep learning, while semantic concepts are represented by mean vectors of positive samples. Firstly, the convolutional result is directly output in the form of low-level visual features through the mid-level of the pre-trained deep learning model, with the image being represented by sparse coding. Secondly, the positive mean vector method is used to construct visual feature vectors for each text vocabulary item, so that a visual feature vector database is created. Finally, the visual feature vector similarity between the testing image and all text vocabulary is calculated, and the vocabulary with the largest similarity used for annotation. Experiments on the datasets demonstrate the effectiveness of the proposed method; in terms of F1 score, the proposed method’s performance on the Corel5k dataset and IAPR TC-12 dataset is superior to that of MBRM, JEC-AF, JEC-DF, and 2PKNN with end-to-end deep features.

  相似文献   

19.
This study uses machine learning techniques (ML) to classify and cluster different Western music genres. Three artificial neural network models (multi-layer perceptron neural network [MLP], probabilistic neural network [PNN]) and self-organizing maps neural network (SOM) along with support vector machines (SVM) are compared to two standard statistical methods (linear discriminant analysis [LDA] and cluster analysis [CA]). The variable sets considered are average frequencies, variance frequencies, maximum frequencies, amplitude or loudness of the sound and the median of the location of the 15 highest peaks in the periodogram. The results show that machine learning models outperform traditional statistical techniques in classifying and clustering different music genres due to their robustness and flexibility of modeling algorithms. The study also shows how it is possible to identify various dimensions of music genres by uncovering complex patterns in the multidimensional data.  相似文献   

20.
Neural Computing and Applications - The application of artificial neural networks in mapping the mechanical characteristics of the cement-based materials is underlined in previous investigations....  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号