首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到5条相似文献,搜索用时 2 毫秒
1.
基于遗传算法的自然语言参数阈值优化方法   总被引:1,自引:0,他引:1  
提出了一种基于遗传算法的动词-动词搭配参数阈值自动优化方法.该方法的主要优点表现在三个方面:①该方法是一种数据驱动的机器学习方法,在一定程度上避免了经验性方法确定参数阈值所固有的人为误差;②与经验性方法每次分别确定一个参数阈值不同,该方法是一种多参数整体阈值优化方法;③不像经验性方法那样给不同数据提供的是同一组参数阈值,该方法能动态获得适合于不同规模或不同领域数据的参数阈值.对比实验表明,使用本方法所获得的4个阈值对于提高动词-动词搭配F值的效果明显.本方法不仅适用于动词-动词搭配参数阈值的选取,也适用于其它多参数阈值选取问题,如规则边界优化,分类与聚类参数阈值优化等.  相似文献   

2.
Natural language semantic construction improves natural language comprehension ability and analytical skills of the machine. It is the basis for realizing the information exchange in the intelligent cloud-computing environment. This paper proposes a natural language semantic construction method based on cloud database, mainly including two parts: natural language cloud database construction and natural language semantic construction. Natural Language cloud database is established on the CloudStack cloud-computing environment, which is composed by corpus, thesaurus, word vector library and ontology knowledge base. In this section, we concentrate on the pretreatment of corpus and the presentation of background knowledge ontology, and then put forward a TF-IDF and word vector distance based algorithm for duplicated webpages (TWDW). It raises the recognition efficiency of repeated web pages. The part of natural language semantic construction mainly introduces the dynamic process of semantic construction and proposes a mapping algorithm based on semantic similarity (MBSS), which is a bridge between Predicate-Argument (PA) structure and background knowledge ontology. Experiments show that compared with the relevant algorithms, the precision and recall of both algorithms we propose have been significantly improved. The work in this paper improves the understanding of natural language semantics, and provides effective data support for the natural language interaction function of the cloud service.  相似文献   

3.
The internet, particularly online social networking platforms have revolutionized the way extremist groups are influencing and radicalizing individuals. Recent research reveals that the process initiates by exposing vast audiences to extremist content and then migrating potential victims to confined platforms for intensive radicalization. Consequently, social networks have evolved as a persuasive tool for extremism aiding as recruitment platform and psychological warfare. Thus, recognizing potential radical text or material is vital to restrict the circulation of the extremist chronicle. The aim of this research work is to identify radical text in social media. Our contributions are as follows: (i) A new dataset to be employed in radicalization detection; (ii) In depth analysis of new and previous datasets so that the variation in extremist group narrative could be identified; (iii) An approach to train classifier employing religious features along with radical features to detect radicalization; (iv) Observing the use of violent and bad words in radical, neutral and random groups by employing violent, terrorism and bad words dictionaries. Our research results clearly indicate that incorporating religious text in model training improves the accuracy, precision, recall, and F1-score of the classifiers. Secondly a variation in extremist narrative has been observed implying that usage of new dataset can have substantial effect on classifier performance. In addition to this, violence and bad words are creating a differentiating factor between radical and random users but for neutral (anti-ISIS) group it needs further investigation.  相似文献   

4.
Data privacy laws require service providers to inform their customers on how user data is gathered, used, protected, and shared. The General Data Protection Regulation (GDPR) is a legal framework that provides guidelines for collecting and processing personal information from individuals. Service providers use privacy policies to outline the ways an organization captures, retains, analyzes, and shares customers’ data with other parties. These policies are complex and written using legal jargon; therefore, users rarely read them before accepting them. There exist a number of approaches to automating the task of summarizing privacy policies and assigning risk levels. Most of the existing approaches are not GDPR compliant and use manual annotation/labeling of the privacy text to assign risk level, which is time-consuming and costly. We present a framework that helps users see not only data practice policy compliance with GDPR but also the risk levels to privacy associated with accepting that policy. The main contribution of our approach is eliminating the overhead cost of manual annotation by using the most frequent words in each category to create word-bags, which are used with Regular Expressions and Pointwise Mutual Information scores to assign risk levels that comply with the GDPR guidelines for data protection. We have also developed a web-based application to graphically display risk level reports for any given online privacy policy. Results show that our approach is not only consistent with GDPR but performs better than existing approaches by successfully assigning risk levels with 95.1% accuracy after assigning data practice categories with an accuracy rate of 79%.  相似文献   

5.
Social networking services (SNSs) provide massive data that can be a very influential source of information during pandemic outbreaks. This study shows that social media analysis can be used as a crisis detector (e.g., understanding the sentiment of social media users regarding various pandemic outbreaks). The novel Coronavirus Disease-19 (COVID-19), commonly known as coronavirus, has affected everyone worldwide in 2020. Streaming Twitter data have revealed the status of the COVID-19 outbreak in the most affected regions. This study focuses on identifying COVID-19 patients using tweets without requiring medical records to find the COVID-19 pandemic in Twitter messages (tweets). For this purpose, we propose herein an intelligent model using traditional machine learning-based approaches, such as support vector machine (SVM), logistic regression (LR), naïve Bayes (NB), random forest (RF), and decision tree (DT) with the help of the term frequency inverse document frequency (TF-IDF) to detect the COVID-19 pandemic in Twitter messages. The proposed intelligent traditional machine learning-based model classifies Twitter messages into four categories, namely, confirmed deaths, recovered, and suspected. For the experimental analysis, the tweet data on the COVID-19 pandemic are analyzed to evaluate the results of traditional machine learning approaches. A benchmark dataset for COVID-19 on Twitter messages is developed and can be used for future research studies. The experiments show that the results of the proposed approach are promising in detecting the COVID-19 pandemic in Twitter messages with overall accuracy, precision, recall, and F1 score between 70% and 80% and the confusion matrix for machine learning approaches (i.e., SVM, NB, LR, RF, and DT) with the TF-IDF feature extraction technique.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号