首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 15 毫秒
1.
The internet, particularly online social networking platforms have revolutionized the way extremist groups are influencing and radicalizing individuals. Recent research reveals that the process initiates by exposing vast audiences to extremist content and then migrating potential victims to confined platforms for intensive radicalization. Consequently, social networks have evolved as a persuasive tool for extremism aiding as recruitment platform and psychological warfare. Thus, recognizing potential radical text or material is vital to restrict the circulation of the extremist chronicle. The aim of this research work is to identify radical text in social media. Our contributions are as follows: (i) A new dataset to be employed in radicalization detection; (ii) In depth analysis of new and previous datasets so that the variation in extremist group narrative could be identified; (iii) An approach to train classifier employing religious features along with radical features to detect radicalization; (iv) Observing the use of violent and bad words in radical, neutral and random groups by employing violent, terrorism and bad words dictionaries. Our research results clearly indicate that incorporating religious text in model training improves the accuracy, precision, recall, and F1-score of the classifiers. Secondly a variation in extremist narrative has been observed implying that usage of new dataset can have substantial effect on classifier performance. In addition to this, violence and bad words are creating a differentiating factor between radical and random users but for neutral (anti-ISIS) group it needs further investigation.  相似文献   

2.
In the financial sector, data are highly confidential and sensitive, and ensuring data privacy is critical. Sample fusion is the basis of horizontal federation learning, but it is suitable only for scenarios where customers have the same format but different targets, namely for scenarios with strong feature overlapping and weak user overlapping. To solve this limitation, this paper proposes a federated learning-based model with local data sharing and differential privacy. The indexing mechanism of differential privacy is used to obtain different degrees of privacy budgets, which are applied to the gradient according to the contribution degree to ensure privacy without affecting accuracy. In addition, data sharing is performed to improve the utility of the global model. Further, the distributed prediction model is used to predict customers’ loan propensity on the premise of protecting user privacy. Using an aggregation mechanism based on federated learning can help to train the model on distributed data without exposing local data. The proposed method is verified by experiments, and experimental results show that for non-iid data, the proposed method can effectively improve data accuracy and reduce the impact of sample tilt. The proposed method can be extended to edge computing, blockchain, and the Industrial Internet of Things (IIoT) fields. The theoretical analysis and experimental results show that the proposed method can ensure the privacy and accuracy of the federated learning process and can also improve the model utility for non-iid data by 7% compared to the federated averaging method (FedAvg).  相似文献   

3.
The text classification process has been extensively investigated in various languages, especially English. Text classification models are vital in several Natural Language Processing (NLP) applications. The Arabic language has a lot of significance. For instance, it is the fourth mostly-used language on the internet and the sixth official language of the United Nations. However, there are few studies on the text classification process in Arabic. A few text classification studies have been published earlier in the Arabic language. In general, researchers face two challenges in the Arabic text classification process: low accuracy and high dimensionality of the features. In this study, an Automated Arabic Text Classification using Hyperparameter Tuned Hybrid Deep Learning (AATC-HTHDL) model is proposed. The major goal of the proposed AATC-HTHDL method is to identify different class labels for the Arabic text. The first step in the proposed model is to pre-process the input data to transform it into a useful format. The Term Frequency-Inverse Document Frequency (TF-IDF) model is applied to extract the feature vectors. Next, the Convolutional Neural Network with Recurrent Neural Network (CRNN) model is utilized to classify the Arabic text. In the final stage, the Crow Search Algorithm (CSA) is applied to fine-tune the CRNN model’s hyperparameters, showing the work’s novelty. The proposed AATC-HTHDL model was experimentally validated under different parameters and the outcomes established the supremacy of the proposed AATC-HTHDL model over other approaches.  相似文献   

4.
With the rapid development of mobile wireless Internet and high-precision localization devices, location-based services (LBS) bring more convenience for people over recent years. In LBS, if the original location data are directly provided, serious privacy problems raise. As a response to these problems, a large number of location-privacy protection mechanisms (LPPMs) (including formal LPPMs, FLPPMs, etc.) and their evaluation metrics have been proposed to prevent personal location information from being leakage and quantify privacy leakage. However, existing schemes independently consider FLPPMs and evaluation metrics, without synergizing them into a unifying framework. In this paper, a unified model is proposed to synergize FLPPMs and evaluation metrics. In detail, the probabilistic process calculus (called δ-calculus) is proposed to characterize obfuscation schemes (which is a LPPM) and integrate α-entropy to δ-calculus to evaluate its privacy leakage. Further, we use two calculus moving and probabilistic choice to model nodes’ mobility and compute its probability distribution of nodes’ locations, and a renaming function to model privacy leakage. By formally defining the attacker’s ability and extending relative entropy, an evaluation algorithm is proposed to quantify the leakage of location privacy. Finally, a series of examples are designed to demonstrate the efficiency of our proposed approach.  相似文献   

5.
People often communicate with auto-answering tools such as conversational agents due to their 24/7 availability and unbiased responses. However, chatbots are normally designed for specific purposes and areas of experience and cannot answer questions outside their scope. Chatbots employ Natural Language Understanding (NLU) to infer their responses. There is a need for a chatbot that can learn from inquiries and expand its area of experience with time. This chatbot must be able to build profiles representing intended topics in a similar way to the human brain for fast retrieval. This study proposes a methodology to enhance a chatbot's brain functionality by clustering available knowledge bases on sets of related themes and building representative profiles. We used a COVID-19 information dataset to evaluate the proposed methodology. The pandemic has been accompanied by an “infodemic” of fake news. The chatbot was evaluated by a medical doctor and a public trial of 308 real users. Evaluations were obtained and statistically analyzed to measure effectiveness, efficiency, and satisfaction as described by the ISO9214 standard. The proposed COVID-19 chatbot system relieves doctors from answering questions. Chatbots provide an example of the use of technology to handle an infodemic.  相似文献   

6.
Social networking services (SNSs) provide massive data that can be a very influential source of information during pandemic outbreaks. This study shows that social media analysis can be used as a crisis detector (e.g., understanding the sentiment of social media users regarding various pandemic outbreaks). The novel Coronavirus Disease-19 (COVID-19), commonly known as coronavirus, has affected everyone worldwide in 2020. Streaming Twitter data have revealed the status of the COVID-19 outbreak in the most affected regions. This study focuses on identifying COVID-19 patients using tweets without requiring medical records to find the COVID-19 pandemic in Twitter messages (tweets). For this purpose, we propose herein an intelligent model using traditional machine learning-based approaches, such as support vector machine (SVM), logistic regression (LR), naïve Bayes (NB), random forest (RF), and decision tree (DT) with the help of the term frequency inverse document frequency (TF-IDF) to detect the COVID-19 pandemic in Twitter messages. The proposed intelligent traditional machine learning-based model classifies Twitter messages into four categories, namely, confirmed deaths, recovered, and suspected. For the experimental analysis, the tweet data on the COVID-19 pandemic are analyzed to evaluate the results of traditional machine learning approaches. A benchmark dataset for COVID-19 on Twitter messages is developed and can be used for future research studies. The experiments show that the results of the proposed approach are promising in detecting the COVID-19 pandemic in Twitter messages with overall accuracy, precision, recall, and F1 score between 70% and 80% and the confusion matrix for machine learning approaches (i.e., SVM, NB, LR, RF, and DT) with the TF-IDF feature extraction technique.  相似文献   

7.
Social media platforms have proven to be effective for information gathering during emergency events caused by natural or human-made disasters. Emergency response authorities, law enforcement agencies, and the public can use this information to gain situational awareness and improve disaster response. In case of emergencies, rapid responses are needed to address victims’ requests for help. The research community has developed many social media platforms and used them effectively for emergency response and coordination in the past. However, most of the present deployments of platforms in crisis management are not automated, and their operational success largely depends on experts who analyze the information manually and coordinate with relevant humanitarian agencies or law enforcement authorities to initiate emergency response operations. The seamless integration of automatically identifying types of urgent needs from millions of posts and delivery of relevant information to the appropriate agency for timely response has become essential. This research project aims to develop a generalized Information Technology (IT) solution for emergency response and disaster management by integrating social media data as its core component. In this paper, we focused on text analysis techniques which can help the emergency response authorities to filter through the sheer amount of information gathered automatically for supporting their relief efforts. More specifically, we applied state-of-the-art Natural Language Processing (NLP), Machine Learning (ML), and Deep Learning (DL) techniques ranging from unsupervised to supervised learning for an in-depth analysis of social media data for the purpose of extracting real-time information on a critical event to facilitate emergency response in a crisis. As a proof of concept, a case study on the COVID-19 pandemic on the data collected from Twitter is presented, providing evidence that the scientific and operational goals have been achieved.  相似文献   

8.
In machine learning, sentiment analysis is a technique to find and analyze the sentiments hidden in the text. For sentiment analysis, annotated data is a basic requirement. Generally, this data is manually annotated. Manual annotation is time consuming, costly and laborious process. To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis. Dataset is created from the reviews of ten most popular songs on YouTube. Reviews of five aspects—voice, video, music, lyrics and song, are extracted. An N-Gram based technique is proposed. Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds (575 h) if it was annotated manually. For the validation of the proposed technique, a sub-dataset—Voice, is annotated manually as well as with the proposed technique. Cohen's Kappa statistics is used to evaluate the degree of agreement between the two annotations. The high Kappa value (i.e., 0.9571%) shows the high level of agreement between the two. This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost. This research also contributes in consolidating the guidelines for the manual annotation process.  相似文献   

9.
Cardio Vascular disease (CVD), involving the heart and blood vessels is one of the most leading causes of death throughout the world. There are several risk factors for causing heart diseases like sedentary lifestyle, unhealthy diet, obesity, diabetes, hypertension, smoking and consumption of alcohol, stress, hereditary factory etc. Predicting cardiovascular disease and improving and treating the risk factors at an early stage are of paramount importance to save the precious life of a human being. At present, the highly stressful life with bad lifestyle activities causes heart disease at a very young age. The main aim of this research is to predict the premature heart disease based on machine learning algorithms. This paper deals with a novel approach using the machine learning algorithm for predicting the cardiovascular disease at the premature stage itself. Support Vector Machine (SVM) is used for segregating the CVD patients based on their symptoms and medical observation. The experimentation results by using the proposed method will facilitate the medical practitioners to provide suitable treatment for the patients on time. A sophisticated model has been developed with the current approach to examine the various stages of CVD and the performance metrics used have given effective and fruitful results as compared to other machine learning techniques.  相似文献   

10.
According to BBC News, online hate speech increased by 20% during the COVID-19 pandemic. Hate speech from anonymous users can result in psychological harm, including depression and trauma, and can even lead to suicide. Malicious online comments are increasingly becoming a social and cultural problem. It is therefore critical to detect such comments at the national level and detect malicious users at the corporate level. To achieve a healthy and safe Internet environment, studies should focus on institutional and technical topics. The detection of toxic comments can create a safe online environment. In this study, to detect malicious comments, we used approximately 9,400 examples of hate speech from a Korean corpus of entertainment news comments. We developed toxic comment classification models using supervised learning algorithms, including decision trees, random forest, a support vector machine, and K-nearest neighbors. The proposed model uses random forests to classify toxic words, achieving an F1-score of 0.94. We analyzed the trained model using the permutation feature importance, which is an explanatory machine learning method. Our experimental results confirmed that the toxic comment classifier properly classified hate words used in Korea. Using this research methodology, the proposed method can create a healthy Internet environment by detecting malicious comments written in Korean.  相似文献   

11.
Communication is a basic need of every human being to exchange thoughts and interact with the society. Acute peoples usually confab through different spoken languages, whereas deaf people cannot do so. Therefore, the Sign Language (SL) is the communication medium of such people for their conversation and interaction with the society. The SL is expressed in terms of specific gesture for every word and a gesture is consisted in a sequence of performed signs. The acute people normally observe these signs to understand the difference between single and multiple gestures for singular and plural words respectively. The signs for singular words such as I, eat, drink, home are unalike the plural words as school, cars, players. A special training is required to gain the sufficient knowledge and practice so that people can differentiate and understand every gesture/sign appropriately. Innumerable researches have been performed to articulate the computer-based solution to understand the single gesture with the help of a single hand enumeration. The complete understanding of such communications are possible only with the help of this differentiation of gestures in computer-based solution of SL to cope with the real world environment. Hence, there is still a demand for specific environment to automate such a communication solution to interact with such type of special people. This research focuses on facilitating the deaf community by capturing the gestures in video format and then mapping and differentiating as single or multiple gestures used in words. Finally, these are converted into the respective words/sentences within a reasonable time. This provide a real time solution for the deaf people to communicate and interact with the society.  相似文献   

12.
Supply chains are increasingly global, complex and multi-tiered. Consequently, companies often struggle to maintain complete visibility of their supply network. This poses a problem as visibility of the network structure is required for tasks like effectively managing supply chain risk. In this paper, we discuss automated supply chain mapping as a means of maintaining structural visibility of a company's supply chain, and we use Deep Learning to automatically extract buyer–supplier relations from natural language text. Early results show that supply chain mapping solutions using Natural Language Processing and Deep Learning could enable companies to (a) automatically generate rudimentary supply chain maps, (b) verify existing supply chain maps, or (c) augment existing maps with additional supplier information.  相似文献   

13.
Due to the widespread use of the internet and smart devices, various attacks like intrusion, zero-day, Malware, and security breaches are a constant threat to any organization's network infrastructure. Thus, a Network Intrusion Detection System (NIDS) is required to detect attacks in network traffic. This paper proposes a new hybrid method for intrusion detection and attack categorization. The proposed approach comprises three steps to address high false and low false-negative rates for intrusion detection and attack categorization. In the first step, the dataset is preprocessed through the data transformation technique and min-max method. Secondly, the random forest recursive feature elimination method is applied to identify optimal features that positively impact the model's performance. Next, we use various Support Vector Machine (SVM) types to detect intrusion and the Adaptive Neuro-Fuzzy System (ANFIS) to categorize probe, U2R, R2U, and DDOS attacks. The validation of the proposed method is calculated through Fine Gaussian SVM (FGSVM), which is 99.3% for the binary class. Mean Square Error (MSE) is reported as 0.084964 for training data, 0.0855203 for testing, and 0.084964 to validate multiclass categorization.  相似文献   

14.
With the advancements in internet facilities, people are more inclined towards the use of online services. The service providers shelve their items for e-users. These users post their feedbacks, reviews, ratings, etc. after the use of the item. The enormous increase in these reviews has raised the need for an automated system to analyze these reviews to rate these items. Sentiment Analysis (SA) is a technique that performs such decision analysis. This research targets the ranking and rating through sentiment analysis of these reviews, on different aspects. As a case study, Songs are opted to design and test the decision model. Different aspects of songs namely music, lyrics, song, voice and video are picked. For the reason, reviews of 20 songs are scraped from YouTube, pre-processed and formed a dataset. Different machine learning algorithms—Naïve Bayes (NB), Gradient Boost Tree, Logistic Regression LR, K-Nearest Neighbors (KNN) and Artificial Neural Network (ANN) are applied. ANN performed the best with 74.99% accuracy. Results are validated using K-Fold.  相似文献   

15.
The fast-paced growth of artificial intelligence applications provides unparalleled opportunities to improve the efficiency of various systems. Such as the transportation sector faces many obstacles following the implementation and integration of different vehicular and environmental aspects worldwide. Traffic congestion is among the major issues in this regard which demands serious attention due to the rapid growth in the number of vehicles on the road. To address this overwhelming problem, in this article, a cloud-based intelligent road traffic congestion prediction model is proposed that is empowered with a hybrid Neuro-Fuzzy approach. The aim of the study is to reduce the delay in the queues, the vehicles experience at different road junctions across the city. The proposed model also intended to help the automated traffic control systems by minimizing the congestion particularly in a smart city environment where observational data is obtained from various implanted Internet of Things (IoT) sensors across the road. After due preprocessing over the cloud server, the proposed approach makes use of this data by incorporating the neuro-fuzzy engine. Consequently, it possesses a high level of accuracy by means of intelligent decision making with minimum error rate. Simulation results reveal the accuracy of the proposed model as 98.72% during the validation phase in contrast to the highest accuracies achieved by state-of-the-art techniques in the literature such as 90.6%, 95.84%, 97.56% and 98.03%, respectively. As far as the training phase analysis is concerned, the proposed scheme exhibits 99.214% accuracy. The proposed prediction model is a potential contribution towards smart cities environment.  相似文献   

16.
In this paper, we provide a new approach to data encryption using generalized inverses. Encryption is based on the implementation of weighted Moore–Penrose inverse AMN(nxm) over the nx8 constant matrix. The square Hermitian positive definite matrix N8x8 p is the key. The proposed solution represents a very strong key since the number of different variants of positive definite matrices of order 8 is huge. We have provided NIST (National Institute of Standards and Technology) quality assurance tests for a random generated Hermitian matrix (a total of 10 different tests and additional analysis with approximate entropy and random digression). In the additional testing of the quality of the random matrix generated, we can conclude that the results of our analysis satisfy the defined strict requirements. This proposed MP encryption method can be applied effectively in the encryption and decryption of images in multi-party communications. In the experimental part of this paper, we give a comparison of encryption methods between machine learning methods. Machine learning algorithms could be compared by achieved results of classification concentrating on classes. In a comparative analysis, we give results of classifying of advanced encryption standard (AES) algorithm and proposed encryption method based on Moore–Penrose inverse.  相似文献   

17.
基于遗传算法的自然语言参数阈值优化方法   总被引:1,自引:0,他引:1  
提出了一种基于遗传算法的动词-动词搭配参数阈值自动优化方法.该方法的主要优点表现在三个方面:①该方法是一种数据驱动的机器学习方法,在一定程度上避免了经验性方法确定参数阈值所固有的人为误差;②与经验性方法每次分别确定一个参数阈值不同,该方法是一种多参数整体阈值优化方法;③不像经验性方法那样给不同数据提供的是同一组参数阈值,该方法能动态获得适合于不同规模或不同领域数据的参数阈值.对比实验表明,使用本方法所获得的4个阈值对于提高动词-动词搭配F值的效果明显.本方法不仅适用于动词-动词搭配参数阈值的选取,也适用于其它多参数阈值选取问题,如规则边界优化,分类与聚类参数阈值优化等.  相似文献   

18.
With the rapid development of mobile Internet and finance technology, online e-commerce transactions have been increasing and expanding very fast, which globally brings a lot of convenience and availability to our life, but meanwhile, chances of committing frauds also come in all shapes and sizes. Moreover, fraud detection in online e-commerce transactions is not totally the same to that in the existing areas due to the massive amounts of data generated in e-commerce, which makes the fraudulent transactions more covertly scattered with genuine transactions than before. In this article, a novel scalable and comprehensive approach for fraud detection in online e-commerce transactions is proposed with majorly four logical modules, which uses big data analytics and machine learning algorithms to parallelize the processing of the data from a Chinese e-commerce company. Groups of experimental results show that the approach is more accurate and efficient to detect frauds in online e-commerce transactions and scalable for big data processing to obtain real-time property.  相似文献   

19.
Metaverse is one of the main technologies in the daily lives of several people, such as education, tour systems, and mobile application services. Particularly, the number of users of mobile metaverse applications is increasing owing to the merit of accessibility everywhere. To provide an improved service, it is important to analyze online reviews that contain user satisfaction. Several previous studies have utilized traditional methods, such as the structural equation model (SEM) and technology acceptance method (TAM) for exploring user satisfaction, using limited survey data. These methods may not be appropriate for analyzing the users of mobile applications. To overcome this limitation, several researchers perform user experience analysis through online reviews and star ratings. However, some online reviews occasionally have inconsistencies between the star rating and the sentiment of the text. This variation disturbs the performance of machine learning. To alleviate the inconsistencies, Valence Aware Dictionary and sEntiment Reasoner (VADER), which is a sentiment classifier based on lexicon, is introduced. The current study aims to build a more accurate sentiment classifier based on machine learning with VADER. In this study, five sentiment classifiers are used, such as Naïve Bayes, K-Nearest Neighbors (KNN), Logistic Regression, Light Gradient Boosting Machine (LightGBM), and Categorical boosting algorithm (Catboost) with three embedding methods (Bag-of-Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), Word2Vec). The results show that classifiers that apply VADER outperform those that do not apply VADER, excluding one classifier (Logistic Regression with Word2Vec). Moreover, LightGBM with TF-IDF has the highest accuracy 88.68% among other models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号