首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 109 毫秒
基于海量语料的热点新词识别是汉语自动处理领域的一项基础性课题,因要求快速处理大规模语料,且在新词检测中需要更多智力因素,在研究中存在较多困难。构建了一个基于海量语料的网络热点新词识别框架,整合了所提出的基于逐层剪枝算法的重复模式提取,基于统计学习模型的新词检测及基于组合特征的新词词性猜测等3个重要算法,用以提高新词识别的处理能力和识别效果。实验和数据分析表明,该框架能高效可靠地从大规模语料中提取重复模式,构造候选新词集合,并能有效实施新词检测和新词属性识别任务,处理效果达到了目前的较好水平。  相似文献   

在新闻领域标注语料上训练的中文分词系统在跨领域时性能会有明显下降。针对目标领域的大规模标注语料难以获取的问题,该文提出Active learning算法与n-gram统计特征相结合的领域自适应方法。该方法通过对目标领域文本与已有标注语料的差异进行统计分析,选择含有最多未标记过的语言现象的小规模语料优先进行人工标注,然后再结合大规模文本中的n-gram统计特征训练目标领域的分词系统。该文采用了CRF训练模型,并在100万句的科技文献领域上,验证了所提方法的有效性,评测数据为人工标注的300句科技文献语料。实验结果显示,在科技文献测试语料上,基于Active Learning训练的分词系统在各项评测指标上均有提高。

陈飞  刘奕群  魏超  张云亮  张敏  马少平 《软件学报》2013,24(5):1051-1060
开放领域新词发现研究对于中文自然语言处理的性能提升有着重要的意义.利用条件随机场(condition random field,简称CRF)可对序列输入标注的特点,将新词发现问题转化为预测已分词词语边界是否为新词边界的问题.在对海量规模中文互联网语料进行分析挖掘的基础上,提出了一系列区分新词边界的统计特征,并采用CRF方法综合这些特征实现了开放领域新词发现的算法,同时比较了K-Means 聚类、等频率、基于信息增益这3 种离散化方法对新词发现结果的影响.通过在SogouT 大规模中文语料库上的新词发现实验,验证了所提出的方法有较好的效果.  相似文献   

新词发现,作为自然语言处理的基本任务,是用计算方法研究中国古代文学必不可少的一步。该文提出一种基于古汉语料的新词识别方法,称为AP-LSTM-CRF算法。该算法分为三个步骤。第一步,基于Apache Spark分布式并行计算框架实现的并行化的Apriori改进算法,能够高效地从大规模原始语料中产生候选词集。第二步,用结合循环神经网络和条件随机场的切分概率模型对测试集文档的句子进行切分,产生切分概率的序列。第三步,用结合切分概率的过滤规则从候选词集里过滤掉噪声词,从而筛选出真正的新词。实验结果表明,该新词发现方法能够有效地从大规模古汉语语料中发现新词,在宋词和宋史数据集上分别进行实验,F1值分别达到了89.68%和81.13%,与现有方法相比,F1值分别提高了8.66%和2.21%。  相似文献   

随着互联网的飞速发展,需要处理的数据量不断增加,在互联网数据挖掘领域中传统的单机文本聚类算法无法满足海量数据处理的要求,针对在单机情况下,传统LDA算法无法分析处理大规模语料集的问题,提出基于MapReduce计算框架,采用Gibbs抽样方法的并行化LDA主题模型的建立方法。利用分布式计算框架MapReduce研究了LDA主题模型的并行化实现,并且考察了该并行计算程序的计算性能。通过对Hadoop并行计算与单机计算进行实验对比,发现该方法在处理大规模语料时,能够较大地提升算法的运行速度,并且随着集群节点数的增加,在加速比方面也有较好的表现。基于Hadoop平台并行化地实现LDA算法具有可行性,解决了单机无法分析大规模语料集中潜藏主题信息的问题。  相似文献   

从网络文本中提取新词是网络信息处理中的一个重要问题,在信息检索、文本挖掘、词典编纂、中文分词等领域中都有重要应用。本文提出了一种与语言无关的快速新词提取算法,首先针对后缀树的数据结构将多语言文本进行统一编码,然后使用改进的统计方法在双后缀树上以线性时间统计重复串与邻接类别,并计算字符串的整体度,同时通过剪枝大幅度减少计算量,在中、英文语料上较好地实现了新词的抽取及排序。  相似文献   

一种基于大规模语料的新词识别方法   总被引:1,自引:0,他引:1       下载免费PDF全文
提出了一种基于大规模语料的新词识别方法,在重复串统计的基础上,结合分析不同串的外部环境和内部构成,依次判断上下文邻接种类,首尾单字位置成词概率以及双字耦合度等语言特征,分别过滤得到新词。通过在不同规模的语料上实验发现,此方法可行有效,能够应用到词典编撰,术语提取等领域。  相似文献   

在中文自然语言处理领域中,分词是非常重要的步骤之一,它是关键词抽取、文本自动摘要、文本聚类的基础,分词结果的好坏直接影响进一步文本处理的准确性.近年来随着微博平台、直播平台、朋友圈等自由舆情平台的兴起,大量不规范使用的舆情文本尤其是不断出现的新词给分词结果的准确性带来了巨大的挑战,新词发现成为分词算法必须解决的问题.为解决在新词发现过程中,新词整体数据体量小、新词用法灵活以及过度合并词语易形成短语块等问题,本文提出了结合关联置信度与结巴分词的新词发现算法,该算法以结巴分词的初步分词结果为基础,通过计算词语与其左右邻接词集中各个词语之间的关联置信度,将被错误拆分的词语合并成候选新词,并通过切分连接词以防止多个词语被连接成短语的情况出现.以微博言论数据进行测试的实验表明,相比于其它基于置信度的分词方法结果,本文提出的算法可以大幅度提升发现新词尤其是命名实体、网络用语的准确率,在确保新词语义完整的前提下降低新词长度,并且在少量测试语料的情境下,本文提出的算法对低频新词依然具有识别能力.  相似文献   

新词发现在自然语言处理领域具有重要意义,在微博内容上的新词发现比在一般语料上更难.文中提出引入词关联性信息的迭代上下文熵算法,并通过上下文关系获取新词候选列表进行过滤.为进一步提高精度,引入自然语言处理中的词法特征,提出与统计特征相结合的过滤方法.与现有方法相比,准确率和召回率均有大幅提高,F-值提高到89.6%.  相似文献   

为了从大规模语料中快速提取高频重复模式,以递增n-gram模型为基础,使用散列数据结构提取重复串,并提出了一种基于低频字符和层次剪枝的逐层剪枝算法,用于过滤低频垃圾字串,减少I/O读写次数。在此基础上,应用改进的字串排序算法,使字符串排序可在O(n)时间内完成,从而有效提高重复模式的提取效率。实验表明,该算法是一种有效的重复模式提取算法,其I/O读写次数同语料规模呈线性关系,远小于使用首字符进行语料划分的方法,能快速有效地从规模远大于内存容量的文本语料中提取重复模式,特别适合于大规模语料的高频重复模式提取,对以重复模式为基础的新词识别、术语抽取等具有重要的支撑作用。  相似文献   

This study draw upon the theory of habit formation in consumption from macroeconomics to support the evidence on the existence of habit formation in social media consumption. Treating social media consumption as a form of digital good consumption and using aggregated weekly posts from the Facebook pages of a group of 12 politicians in the cabinet of Singapore, we verified through a non-separable recursive time model that social media consumption habits were developed among this group of politicians. This study further confirms the existence of reciprocity by validating habit formation in the social media consumption of citizens and followers of these politicians’ posts using time aggregated data of ‘likes’, ‘shares’ and ‘comments’. Further, this study shows the relationship between the strength of habit formation in social media consumption of politicians and citizens is positively correlated: the stronger the habit formation, the stronger the social capital reciprocity. Through these measurements, our analysis proved that political engagement in social media is a bi-directional habitual process and the use of a habit formation coefficient as a new parameter to measure ‘reciprocal engagement’ in social media provides a better understanding of the dynamic exchange between users of social media.  相似文献   

This study extends the U&G theoretical perspective to account for the situated, adaptive, and dynamic nature of mediated cognition and behavior. It specifies dynamic uses and gratifications of social media (compared to other media) in the everyday lives of college students using experience sampling data across 4 weeks. The study tests and quantifies reciprocal causal relationships between needs, social media use, and gratifications, as well as their self-sustaining endogenous (i.e., feedback) effects. Social media use is significantly driven by all four categories of needs examined (emotional, cognitive, social, and habitual), but only gratifies some of them. Ungratified needs accumulate over time and drive subsequent social media use. Interpersonal social environments also affect social media use. In particular, solitude and interpersonal support increase social media use, and moderate the effects of needs on social media use.  相似文献   

Examining the particular value of each platform for big data would be difficult because of the variety of social media forms and sizes. Using social media to objectively and subjectively analyze large groups of individuals makes it the most effective tool for this task. There are numerous sources of big data within the organization. Social media can be identified by the interaction and communication it facilitates. Utilizing social media has become a daily occurrence in modern society. In addition, this frequent use generates data demonstrating the importance of researching the relationship between big data and social media. It is because so many internet users are also active on social media. We conducted a systematic literature review (SLR) to identify 42 articles published between 2018 and 2022 that examined the significance of big data in social media and upcoming issues in this field. We also discuss the potential benefits of utilizing big data in social media. Our analysis discovered open problems and future challenges, such as high-quality data, information accessibility, speed, natural language processing (NLP), and enhancing prediction approaches. As proven by our investigations of evaluation metrics for big data in social media, the distribution reveals that 24% is related to data-trace, 12% is related to execution time, 21% to accuracy, 6% to cost, 10% to recall, 11% to precision, 11% to F1-score, and 5% run time complexity.  相似文献   

Media technologies, such as telephones, often challenge stammers. Other media, especially applications such as SMS and social networks, enable stammers to express themselves fluently. This study looks into the multifaceted meanings of the encounter between stammers and new media, focusing on applications which enable speech through writing, and a Stammers forum website, as a site for reflexive debate on the meaning of new media opportunities. The study focuses on questions such as anonymity, the “noise” of various media and the ways in which new media helps to improve the users quality of life, but at the same time might lead them to reduce their social life to an alternative “verbal ghetto”, confined to the borders of the new media platform.  相似文献   

Social media has emerged as a significant and effective means of assisting and endorsing activities and communications among peers, consumers and organizations that outdo the restrictions of time and space. While the previous studies acknowledge the role of agents of culture change, it largely remains silent on the role of social media in influencing acculturation outcomes and consumption choices. This study uses self-administered questionnaire to collect data from 514 Turkish-Dutch respondents and examines how their use of social media affects their acculturation and consumption choices. This research makes a significant contribution to consumer acculturation research by showing that social media is a vital means of culture change and a driver of acculturation strategies and consumption choices. This study is the first to investigate the role of social media as an agent of culture change in terms of how it impacts acculturation and consumption. The paper discusses implications for theory development and for practice.  相似文献   

Social media is increasingly being used as a communication bridge between government, emergency responders and managers, and the general public in extreme events. Passing information through social media channels enables individuals to send and receive content in real-time and without limitation of location and geography. While the use of social media in extreme event situations has become prevalent, there is often little strategy involved in message dissemination and too little understanding of the effects that underlying online social networks have on message distribution. In this study, we introduce a formal model for social media message dissemination in social networks through time. Our proposed model includes emphasis on single and multiple message scenarios and examines key communication characteristics in the development of more intentional and targeted social messaging strategies. We present a detailed experimental design on randomly generated networks and real-world sub-networks of the Twitter social graph and discuss our findings. We also include a Tabu Search procedure for solving single-message problem and discuss its potential value for large-scale problems in real-world applications.  相似文献   

We investigated the use of social media networking among Pharmacy students of Kenyatta University, Nairobi Kenya to understand their use of social media platforms, the type of platform and purpose of use as well as the time spent daily on networking. Questionnaire was used to collect the information and it was found out that Pharmacy students used social media very well to communicate with real and virtual friends but not so much for academic improvement. Majority of the students use Facebook and Twitter for less than 30 min daily but spent longer time on WhatsApp and YouTube applications. In this study WhatsApp was the most popular among the students being used mainly to communicate with real friends unlike the Facebook that was employed mainly to communicate with real and virtual friends. The study showed rational approach to the use of social networking by Pharmacy students as most students carry out social networking during the weekend more than the week days that are laden with school activities.  相似文献   

Majority of parents use social media platforms, with young mothers being the most active users. Academic research has only recently started addressing the impact of social media on mothers, although they are one of the most engaged online audiences. Instagram and Facebook perceived as positive types of social media, where users post positive content to increase encouraging response from their subscribers and thus enhance their self-esteem. This also relates to mothers portraying positive self-presentation online, therefore enhancing their parental self-esteem. This study provides in-depth analysis of 23 popular online profiles of mothers with more than thirty thousand followers on Instagram and 12 interviews with socially active mothers. This work focuses on mothers in Russia. Research findings show mothers with children of pre-school age are the most regular users of social media. This is due to time availability, as majority of these mothers are on maternity leave and due to little knowledge in child related aspects, which leads to lower self-esteem. They often look for assurance in online community. Mothers that are more confident have positive attitude towards social media communication. Mothers with initially lower self-esteem feel under pressure to maintain positive image to be in line with other mothers' presentation on social media. Mothers find Facebook more informative and supportive vehicle of communication than Instagram.  相似文献   

Tags are very popular in social media (like Youtube, Flickr) and provide valuable and crucial information for social media. But at the same time, there exist a great number of noisy tags, which lead to many studies on tag suggestion and recommendation for items including websites, photos, books, movies, and so on. The textual features of tags, likes tag frequency, have mostly been used in extracting tags that are related to items. In this paper, we address the problem of tag recommendation for social media users. This issue is as important as the tag recommendation for items, because the tags representing users are strongly related to the users’ favorite topics. We propose several novel features of tags for machine learning that we call social features as well as textual features. The experimental results of Flickr show that our proposed scheme achieves viable performance on tag recommendation for users.  相似文献   

In an age, where social media is seen to be a new driving force and a vehicle with a significant impact on political transformation and change, this paper highlights some of the paradoxes and challenges it poses. It has become an important platform for the mobilization, organization and implementation of social movements around the world. However, Egypt's uprising was a function of people, passion and not of any particular communication technology, social media tool or application. It was definitely not the Facebook, Twitter or social media revolution, it was the people uprising that capitalized on state-of-the-art technology to realize a dream of a nation in availing “bread, freedom and social justice.” Having said that, there is no doubt that social media boosted the people's desire for a better future, democracy and socioeconomic development that was for many decades put on hold by the consecutive regimes that ruled Egypt since 1952. The role of social media was more of a catalyst, a driver, a communication tool that helped as a platform for societal change. Yet, the country is still in a state of flux driven by the force unleased through social media manifested in speeding-up the process and in the dissemination of information across different segments of the society irrespective of their social or economic background, location or age. Expectations are high and aspirations reflect the desire of a nation to level up to its full potential; it is going to take some time but undoubtedly Egypt is on the right track. This paper demonstrates the clash of generations between older state power and younger citizens and the role social media played in the political transformation in the build-up to Egypt's uprising in January 2011 and beyond.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号