首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The application of remote sensory images in crop monitoring has been increasing in the recent years due to its high classification accuracy. In this paper, a novel parallel classification methodology is proposed using a new clustering and classification concept. A novel neural network model with the Bs-Lion training algorithm is developed by integrating the Bayesian regularization training with the Lion Algorithm. Here, two levels of parallel processing are performed, namely parallel WLI-Fuzzy clustering and parallel BS-Lion neural network classification. The experimentation of the proposed parallel methodology is carried out using satellite images obtained from the Indian remote sensing satellite IRS-P6. The performance of the proposed system is compared with the existing techniques using validation measures accuracy, sensitivity and specificity. The experimentations resulted in promising results with an accuracy of 0.8994, sensitivity of 0.8682 and specificity of 0.8739, which favour the performance of the proposed parallel architecture in the classification.  相似文献   

2.
《中国工程学刊》2012,35(5):509-514
As we know, current classification methods are mostly based on the vector space model, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. We have proposed a system that uses integrated ontologies and natural language processing techniques to index texts. The traditional words matrix is replaced by a concepts-based matrix. For this purpose, we have developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support vector machine, a successful machine learning technique, is used for classification. Experimental results show that the proposed method improves text classification performance significantly.  相似文献   

3.
4.
Automotive image segmentation systems are becoming an important tool in the medical field for disease diagnosis. The white blood cell (WBC) segmentation is crucial, because it plays an important role in the determination of the diseases and helps experts to diagnose the blood disease disorders. The precise segmentation of the WBCs is quite challenging because of the complex contents in the bone marrow smears. In this paper, a novel neural network (NN) classifier is proposed for the classification of the bone marrow WBCs. The proposed NN classifier integrates the fractional gravitation search (FGS) algorithm for updating the weight in the radial basis function mapping for the classification of the WBC based on the cell nucleus feature. The experimentation of the proposed FGS-RBNN classifier is carried on the images collected from the publically available dataset. The performance of the proposed methodology is evaluated over the existing classifier approaches using the measures accuracy, sensitivity, and specificity. The results show that the classification using the nucleus features alone can be utilized to achieve the classification with the better accuracy. Moreover, the classification performance of the proposed FGS-RBNN is better than the existing classifiers, and it is proved to be the efficacious classifier with a classification accuracy of 95%.  相似文献   

5.
Spam mail classification considered complex and error-prone task in the distributed computing environment. There are various available spam mail classification approaches such as the naive Bayesian classifier, logistic regression and support vector machine and decision tree, recursive neural network, and long short-term memory algorithms. However, they do not consider the document when analyzing spam mail content. These approaches use the bag-of-words method, which analyzes a large amount of text data and classifies features with the help of term frequency-inverse document frequency. Because there are many words in a document, these approaches consume a massive amount of resources and become infeasible when performing classification on multiple associated mail documents together. Thus, spam mail is not classified fully, and these approaches remain with loopholes. Thus, we propose a term frequency topic inverse document frequency model that considers the meaning of text data in a larger semantic unit by applying weights based on the document’s topic. Moreover, the proposed approach reduces the scarcity problem through a frequency topic-inverse document frequency in singular value decomposition model. Our proposed approach also reduces the dimensionality, which ultimately increases the strength of document classification. Experimental evaluations show that the proposed approach classifies spam mail documents with higher accuracy using individual document-independent processing computation. Comparative evaluations show that the proposed approach performs better than the logistic regression model in the distributed computing environment, with higher document word frequencies of 97.05%, 99.17% and 96.59%.  相似文献   

6.
In order to stimulate innovation during the collaborative process of new product and production development, especially to avoid duplicating existing techniques or infringing upon others’ patents and intellectual property rights, the collaborative team of research and development, and patent engineers must accurately identify relevant patent knowledge in a timely manner. This research develops a novel knowledge management approach using ontology-based artificial neural network (ANN) algorithm to automatically classify and search knowledge documents stored in huge online patent corpuses. This research focuses on developing a smart and semantic oriented classification and search from the sources of the most critical and well-structured knowledge publications, i.e. patents, to gain valuable and practical references for the collaborative networks of technology-centric product and production development teams. The research uses the domain ontology schema created using Protégé and derives the semantic concept probabilities of key phrases that frequently occur in domain relevant patent documents. Then, by combining the term frequencies and the concept probabilities of key phrases as the ANN inputs, the method shows significant improvement in classification accuracy. In addition, this research provides an advanced semantic-oriented search algorithm to accurately identify related patent documents in the patent knowledge base. The case demonstration analyses 343 chemical mechanical polishing and 150 radio-frequency identification patents sample sets to verify and measure the performance of the proposed approach. The results are compared with the previous automatic classification methods demonstrating much improved outcomes.  相似文献   

7.
Plagiarism refers to the use of other people’s ideas and information without acknowledging the source. In this research, anti-plagiarism software was designed especially for the university and its campuses to identify plagiarized text in students’ written assignments and laboratory reports. The proposed framework collected original documents to identify plagiarized text using natural language processing. Our research proposes a method to detect plagiarism by applying the core concept of text, which is semantic associations of words and their syntactic composition. Information on the browser was obtained through Request application programming interface by name Url.AbsoluteUri, and it is stored in a centralized Microsoft database Server. A total of 55,001 data samples were collected from 2015 to 2019. Furthermore, we assimilated data from a university website, specifically from the psau.edu.sa network, and arranged the data into students’ categories. Furthermore, we extracted words from source documents and student documents using the WordNet library. On a benchmark dataset consisting of 785 plagiarized text and 4,716 original text data, a significant accuracy of 90.2% was achieved. Therefore, the proposed framework demonstrated better performance than the other available tools. Many students mentioned that working on assignments using the framework was suitable because they were able to work on the assignments in harmony, as per their timeframe and from different network locations. The framework also recommends procedures that can be used to avoid plagiarism.  相似文献   

8.
In recent years, the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web. As a result, the use of techniques for extracting useful information from large collections of data, and particularly documents, has become more necessary and challenging. Text clustering is such a technique; it consists in dividing a set of text documents into clusters (groups), so that documents within the same cluster are closely related, whereas documents in different clusters are as different as possible. Clustering depends on measuring the content (i.e., words) of a document in terms of relevance. Nevertheless, as documents usually contain a large number of words, some of them may be irrelevant to the topic under consideration or redundant. This can confuse and complicate the clustering process and make it less accurate. Accordingly, feature selection methods have been employed to reduce data dimensionality by selecting the most relevant features. In this study, we developed a text document clustering optimization model using a novel genetic frog-leaping algorithm that efficiently clusters text documents based on selected features. The proposed approach is based on two metaheuristic algorithms: a genetic algorithm (GA) and a shuffled frog-leaping algorithm (SFLA). The GA performs feature selection, and the SFLA performs clustering. To evaluate its effectiveness, the proposed approach was tested on a well-known text document dataset: the “20Newsgroup” dataset from the University of California Irvine Machine Learning Repository. Overall, after multiple experiments were compared and analyzed, it was demonstrated that using the proposed algorithm on the 20Newsgroup dataset greatly facilitated text document clustering, compared with classical K-means clustering. Nevertheless, this improvement requires longer computational time.  相似文献   

9.
情感分类是一种从文本中提取情感倾向的文本分类任务。集成学习通过结合几个分类器,在情感分类任务上能够获得比个体分类器更好的分类效果。但是,由于个体分类器在数据集上的表现不同,个体分类器在集成方法中的权重难以确定。针对集成学习中个体分类器的权重优化问题,提出一种基于差分进化优化个体分类器权重的集成分类方法,并将其应用于中文情感分类。以分类准确率为适应度值,通过差分进化算法优化5种个体分类器的权重组合,在3个领域的评论语料集上进行实验。实验结果表明,与一般的集成方法相比,该方法在中文情感分类上有更好的分类效果。  相似文献   

10.
11.
Feature selection and sentiment analysis are two common studies that are currently being conducted; consistent with the advancements in computing and growing the use of social media. High dimensional or large feature sets is a key issue in sentiment analysis as it can decrease the accuracy of sentiment classification and make it difficult to obtain the optimal subset of the features. Furthermore, most reviews from social media carry a lot of noise and irrelevant information. Therefore, this study proposes a new text-feature selection method that uses a combination of rough set theory (RST) and teaching-learning based optimization (TLBO), which is known as RSTLBO. The framework to develop the proposed RSTLBO includes numerous stages: (1) acquiring the standard datasets (user reviews of six major U.S. airlines) which are used to validate search result feature selection methods, (2) pre-processing of the dataset using text processing methods. This involves applying text processing methods from natural language processing techniques, combined with linguistic processing techniques to produce high classification results, (3) employing the RSTLBO method, and (4) using the selected features from the previous process for sentiment classification using the Support Vector Machine (SVM) technique. Results show an improvement in sentiment analysis when combining natural language processing with linguistic processing for text processing. More importantly, the proposed RSTLBO feature selection algorithm is able to produce an improved sentiment analysis.  相似文献   

12.
The present article proposes a novel computer‐aided diagnosis (CAD) technique for the classification of the magnetic resonance brain images. The current method adopt color converted hybrid clustering segmentation algorithm with hybrid feature selection approach based on IGSFFS (Information gain and Sequential Forward Floating Search) and Multi‐Class Support Vector Machine (MC‐SVM) classifier technique to segregate the magnetic resonance brain images into three categories namely normal, benign and malignant. The proposed hybrid evolutionary segmentation algorithm which is the combination of WFF(weighted firefly) and K‐means algorithm called WFF‐K‐means and modified cuckoo search (MCS) and K‐means algorithm called MCS‐K‐means, which can find better cluster partition in brain tumor datasets and also overcome local optima problems in K‐means clustering algorithm. The experimental results show that the performance of the proposed algorithm is better than other algorithms such as PSO‐K‐means, color converted K‐means, FCM and other traditional approaches. The multiple feature set comprises color, texture and shape features derived from the segmented image. These features are then fed into a MC‐SVM classifier with hybrid feature selection algorithm, trained with data labeled by experts, enabling the detection of brain images at high accuracy levels. The performance of the method is evaluated using classification accuracy, sensitivity, specificity, and receiver operating characteristic (ROC) curves. The proposed method provides highest classification accuracy of greater than 98% with high sensitivity and specificity rates of greater than 95% for the proposed diagnostic model and this shows the promise of the approach. © 2015 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 25, 226–244, 2015  相似文献   

13.
In bibliometric research, keyword analysis of publications provides an effective way not only to investigate the knowledge structure of research domains, but also to explore the developing trends within domains. To identify the most representative keywords, many approaches have been proposed. Most of them focus on using statistical regularities, syntax, grammar, or network-based characteristics to select representative keywords for the domain analysis. In this paper, we argue that the domain knowledge is reflected by the semantic meanings behind keywords rather than the keywords themselves. We apply the Google Word2Vec model, a model of a word distribution using deep learning, to represent the semantic meanings of the keywords. Based on this work, we propose a new domain knowledge approach, the Semantic Frequency-Semantic Active Index, similar to Term Frequency-Inverse Document Frequency, to link domain and background information and identify infrequent but important keywords. We adopt a semantic similarity measuring process before statistical computation to compute the frequencies of “semantic units” rather than keyword frequencies. Semantic units are generated by word vector clustering, while the Inverse Document Frequency is extended to include the semantic inverse document frequency; thus only words in the inverse documents with a certain similarity will be counted. Taking geographical natural hazards as the domain and natural hazards as the background discipline, we identify the domain-specific knowledge that distinguishes geographical natural hazards from other types of natural hazards. We compare and discuss the advantages and disadvantages of the proposed method in relation to existing methods, finding that by introducing the semantic meaning of the keywords, our method supports more effective domain knowledge analysis.  相似文献   

14.
As an increasing number of scientific literature dataset are open access, more attention has gravitated to keyword analysis in many scientific fields. Traditional keyword analyses include the frequency based and the network based methods, both providing efficient mining techniques for identifying the representative keywords. The semantic meanings behind the keywords are important for understanding the research content. However, traditional keyword analysis methods pay scant attention to semantic meanings; the network based or frequency based methods as traditionally used, present limited semantic associations among the keywords. Moreover, the ways in which the semantic meanings behind the keywords are associated to the citations are not clear. Thus, we use the Google Word2Vec model to build word vectors and reduce them to a two-dimensional plane in a Voronoi diagram using the t-SNE algorithm, to link meanings with citations. The distance between semantic meanings of keywords in two-dimensional plane are similar to distances in geographical space, thus we introduce a geographic metaphor, “Ghost City” to describe the relationship between semantics and citations for hot topics that have recently become not so hot. Along with “Ghost City” zones, “Always Hot”, “Newly Emerging Hot”, and “Always Silent” areas are classified and mapped, describing the spatial heterogeneity and homogeneity of the semantic distribution of keywords cited in a domain database. Using a collection of “geographical natural hazard” literature datasets, we demonstrate that the proposed method and classification scheme can efficiently provide a unique viewpoint for interpreting the interaction between semantics and the citations, as “Ghost City”, “Always Hot”, “Newly Emerging Hot”, and “Always Silent” areas.  相似文献   

15.
This paper presents development of a scheduling methodology for module processing in thin film transistor liquid crystal display (TFT-LCD) manufacturing. The problem is a parallel machine scheduling problem with rework probabilities, sequence-dependent setup times and due dates. It is assumed that rework probability for each job on a machine can be given through historical data acquisition. The dispatching algorithm named GRPD (greedy rework probability with due-dates) is proposed in this paper focusing on the rework processes. The performance of GRPD is measured by the six diagnostic indicators. A large number of test problems are randomly generated to evaluate the performance of the proposed algorithm. Computational results show that the proposed algorithm is significantly superior to existing dispatching algorithms for the test problems.  相似文献   

16.
Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data, this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform. Since the TF-IDF (term frequency-inverse document frequency) algorithm under Spark is irreversible to word mapping, the mapped words indexes cannot be traced back to the original words. In this paper, an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored. Firstly, the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper, and then the features are inputted to the LDA (Latent Dirichlet Allocation) topic model for training. Finally, the text topic clustering is obtained. Experimental results show that for large data samples, the processing speed of LDA topic model clustering has been improved based Spark. At the same time, compared with the LDA topic model based on word frequency input, the model proposed in this paper has a reduction of perplexity.  相似文献   

17.
Since the beginning of time, humans have relied on plants for food, energy, and medicine. Plants are recognized by leaf, flower, or fruit and linked to their suitable cluster. Classification methods are used to extract and select traits that are helpful in identifying a plant. In plant leaf image categorization, each plant is assigned a label according to its classification. The purpose of classifying plant leaf images is to enable farmers to recognize plants, leading to the management of plants in several aspects. This study aims to present a modified whale optimization algorithm and categorizes plant leaf images into classes. This modified algorithm works on different sets of plant leaves. The proposed algorithm examines several benchmark functions with adequate performance. On ten plant leaf images, this classification method was validated. The proposed model calculates precision, recall, F-measurement, and accuracy for ten different plant leaf image datasets and compares these parameters with other existing algorithms. Based on experimental data, it is observed that the accuracy of the proposed method outperforms the accuracy of different algorithms under consideration and improves accuracy by 5%.  相似文献   

18.
Text classification has always been an increasingly crucial topic in natural language processing. Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion, data sparsity, limited generalization ability and so on. Based on deep learning text classification, this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based (CNN-Based), Recurrent Neural Network-Based (RNN-based), Attention Mechanisms-Based and so on. Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets. The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data. In this paper, we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing, distributed representation of text, text classification model construction based on deep learning and performance evaluation.  相似文献   

19.
Electrocardiogram (ECG) signal is a measure of the heart’s electrical activity. Recently, ECG detection and classification have benefited from the use of computer-aided systems by cardiologists. The goal of this paper is to improve the accuracy of ECG classification by combining the Dipper Throated Optimization (DTO) and Differential Evolution Algorithm (DEA) into a unified algorithm to optimize the hyperparameters of neural network (NN) for boosting the ECG classification accuracy. In addition, we proposed a new feature selection method for selecting the significant feature that can improve the overall performance. To prove the superiority of the proposed approach, several experiments were conducted to compare the results achieved by the proposed approach and other competing approaches. Moreover, statistical analysis is performed to study the significance and stability of the proposed approach using Wilcoxon and ANOVA tests. Experimental results confirmed the superiority and effectiveness of the proposed approach. The classification accuracy achieved by the proposed approach is (99.98%).  相似文献   

20.
基于多超平面支持向量机的图像语义分类算法   总被引:1,自引:0,他引:1  
黄启宏  刘钊 《光电工程》2007,34(8):99-104
由于图像的低层可视特征与高层语义内容之间存在巨大的语义鸿沟,而基于内容的图像分类和检索准确性极大依赖低层可视特征的描述,本文提出了一种基于多超平面支持向量机的图像语义分类方法.多超平面分类器从优化问题的复杂度和运行泛化能力两方面进行研究,是最优分离超平面分类器一种显而易见的扩展.实验结果表明,本文提出的方法在图像语义分类的准确性方面要优于诸如采用色彩特征和纹理特征的支持向量机分类器的其它方法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号