共查询到18条相似文献,搜索用时 390 毫秒
1.
潘夏福 《电脑编程技巧与维护》2017,(4)
随着手机用户的增长,垃圾短信日益泛滥.传统的垃圾短信过滤系统误判率较高,使用基于云计算的分类算法实现垃圾短信语义识别系统.系统使用概率分类算法进行垃圾短信语义识别,并采用基于云计算的语料库作为算法的训练集.实验表明,系统在垃圾短信识别方面具有很高的召回率和正确率,系统的设计为垃圾过滤提供了一种新的设计方法. 相似文献
2.
基于复杂网络的垃圾短信过滤算法 总被引:1,自引:0,他引:1
对垃圾短信发送用户的识别和过滤具有十分重要的研究价值和社会意义. 随着新形式和内容的垃圾短信出现, 传统的关键字匹配和发送速度频率过滤方法无法有效地处理这一问题. 在对短信发送/接收网络形式化表达的基础上, 以真实短信发送和接收以及通话关系数据为例, 统计和分析了短信发送网络的网络特性. 进一步分析和挖掘了垃圾短信用户在网络上发送接收的异常模式和行为, 并以此提出了一个基于语音关联程度和短信回复比率的过滤算法(NASFA算法). 通过实验和分析表明, 本文的算法能够高效地识别垃圾短信发送用户, 同时能够有效地控制将正常用户误识别为垃圾短信用户的比率. 相似文献
3.
4.
基于设计科学的视角,利用回复率规则,提出了度量短信重要性的SmsRank算法,并将该方法引入到垃圾短信的过滤算法中。通过实验,使用R语言验证了该算法在过滤垃圾短信的有效性,并且与SVM算法的分类结果做对比,结果表明其精准率明显优于SVM算法。最后,利用该算法提出了基于短信服务中心的应用模式。 相似文献
5.
研究了基于SVM算法的改进朴素贝叶斯文本分类算法及在垃圾短信过滤中的应用。针对朴素贝叶斯算法条件独立性假设、过分依赖于样本空间的分布和内在不稳定性的缺陷,造成了算法时间复杂度的增加,提出了改进的基于SVM算法的朴素贝叶斯算法垃圾短信过滤的解决方案,充分结合了朴素贝叶斯算法高效分类和SVM算法增量学习及不依赖样本空间的特点;首先利用结构风险最小化原理和非线性变换将分类问题转化为二次寻优问题,最后利用朴素贝叶斯算法过滤短信,提高分类的准确度和稳定性;仿真实验结果表明,该算法能够快速得到最优分类特征子集,有效提高了垃圾短信过滤的准确率和分类速度。 相似文献
6.
王江波 《数字社区&智能家居》2011,(6)
现有的垃圾短信检测过滤系统,可以发现和滤除常规垃圾短信。然而,总有一些垃圾短信无法被垃圾短信监控过滤系统识别出来,我们把发送这些垃圾短信的号码称为黑洞号码。该文探讨了垃圾短信中部分号码的黑洞性质,分析了其形成原因和发送特点。并从复杂网络的社会学角度结合黑洞号码的性质、特点提出了识别黑洞号码的方法。 相似文献
7.
基于CAPTCHA和Winnow算法的垃圾短信过滤研究 总被引:1,自引:1,他引:0
为识别并过滤掉日益增多的垃圾短信,提出了基于全自动人机识别系统(CAPTCHA)和Winnow算法的过滤方法。在CAPTCHA方法中,根据用户能否正确辨认图片,人类和计算机能被辨别,该方法能有效地过滤计算机发送的组垃圾短信。改进的Winnow过滤器可以直接处理原始文本,节省了中文分词时间,而且利用了复合分类思想,提高了分类精度。实验结果表明,CAPTCHA和改进的Winnow算法相结合能较准确地过滤掉垃圾短信。 相似文献
8.
移动环境下的垃圾短信过滤系统的研究 总被引:6,自引:0,他引:6
提出了一种分布式的垃圾短信过滤系统,它适合于移动网络,具有自学习能力,能够及时发现垃圾信息源,有效的过滤垃圾短信。在传统以词为属性的贝叶斯过滤算法的基础上,加入了规则和长度信息,利用互信息减小单词属性的个数。实验表明,它在短信过滤方面具有空间占用小和性能更好的特点,适合在移动电话上使用。同时还提出了一种垃圾短信发送者的可能性排名的方法。 相似文献
9.
10.
近年来随着垃圾短信过滤技术的进步,垃圾短信的特征也在发生变化,其中利用同音词伪装的垃圾短信,就能轻松逃避很多过滤系统的拦截。针对这个问题,利用同音词伪装其拼音不变的特点,提出了以拼音串作为提取垃圾短信特征的关键字,从短信中提取出普通向量和伪装向量,并分别作为输入量,进行相互独立的贝叶斯过滤的方法,最后综合两次过滤的结果,判断是否为垃圾短信。实验结果表明,该方法能有效地识利用同音字伪装的垃圾短信。 相似文献
11.
12.
The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. The method can learn many interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%. 相似文献
13.
The fast growth of mobile networks has greatly enriched our life by disseminating information and providing communications at any time and anywhere. However, at the same time, when people gather and exchange useful information, they also receive unwanted data and contents, such as spam and unsolicited commercial advertisements. SMS (Short Message Service) spam is one typical example of unwanted contents, which has caused a serious problem to mobile users by intruding their devices, occupying device memories and irritating the users. More critically, some of these fraudulent messages deceive users and cause them incalculable loss. SMS spam control has become a crucial issue that impacts the further success of mobile networks. A number of researches have been conducted to control unwanted contents or traffic, some are based on trust and reputation mechanisms. But the literature still lacks an effective solution for SMS spam control. In this paper, we present the design and implementation of an SMS spam control system named TruSMS based on trust management. It can control SMS spam from its source to destinations according to trust evaluation by analyzing spam detection behaviors and SMS traffic data. We evaluate TruSMS performance under a variety of intrusions and attacks with a prototype system implementation. The result shows that TruSMS is effective with regard to accuracy, efficiency and robustness, which imply its trustworthiness. 相似文献
14.
15.
通过用于垃圾文本流过滤的在线文本分类研究,提出了一种新的条件概率集成方法。采用语汇序列表示文本,使用索引结构存储分类知识,设计实现了分类模型的在线训练算法和在线分类算法。抽取电子邮件和手机短信的多种文本特征,分别在TREC07P电子邮件语料和真实中文手机短信语料上进行了垃圾信息过滤实验。实验结果表明,提出的方法能够获得很好的垃圾信息过滤效果。 相似文献
16.
Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical,
fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful
emails known as spam emails. A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance
rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context
of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering.
Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using
SVM is the choice of kernels as they directly affect the separation of emails in the feature space. This paper presents thorough
investigation of several distance-based kernels and specify spam filtering behaviors using SVM. The majority of used kernels
in recent studies concern continuous data and neglect the structure of the text. In contrast to classical kernels, we propose
the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem.
On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors
usable by SVM kernels. We detail a feature mapping variants in TC that yield improved performance for the standard SVM in
filtering task. Furthermore, to cope for realtime scenarios we propose an online active framework for spam filtering. We present
empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in
real time. We show that active online method using string kernels achieves higher precision and recall rates. 相似文献
17.
18.
针对目前中文短信过滤研究缺乏样本库的现状,提出一种客户端样本特征库生成方法。设计客户端短信过滤样本特征数据库,将客户端接收到的短信进行预处理和中文分词,考虑高信息量的低频词和带有较强类别特性的特征词,改进互信息评价函数提取样本特征,形成特征数据。采用Naive Bayes算法测试特征数目对过滤器性能的影响,实验结果表明,当特征数目为10时,测试准确率达到最大值,当样本特征库中短信数目达到2 000条时,数据库文件的大小约为714.28 KB,可在普通手机平台上运行,验证了特征库生成方法的可行性。 相似文献