首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
针对互联网中的不健康内容,通过对这类文本中用词特征的形式及出现频率的统计与分析,提出一种基于符号密度计算的特殊的自动识别算法。首先通过对训练文本的统计,得到初始特殊词表作为识别的基础。在进行文本分类时,利用包含两次筛选的特殊词自动识别算法动态更新特殊词表及其权值,从而将特殊词信息与二分文本分类器相结合,提高对不健康文本的识别精度。结果表明,加入特殊词自动识别及判断,有效地提高了非法文本的识别精度。  相似文献   

2.
顾苏杭  王士同 《控制与决策》2020,35(11):2653-2664
针对实际数据集中的每一类数据都潜在或显著地包含独有的数据风格信息,提出一种挖掘数据风格信息的双知识表达分类方法.在训练阶段,利用K近邻(KNN)算法构建社交网络以表达数据点之间的组织架构,并利用社交网络属性挖掘数据点及每一类数据整体风格信息.在分类阶段,用双知识表达约束所提出方法的分类行为,即赋予测试样本标签时既要使该样本物理上与所建分类模型最相似,也要使该样本风格上与分类模型最相似.与其他对比分类方法相比,所提出方法在不包含或包含不显著风格的数据集上至少能够取得竞争性的分类性能,在包含明显风格的数据集上能够取得优越性的分类性能.  相似文献   

3.
刘扬  刘杨  胡仕成  朱东杰 《计算机工程与设计》2007,28(23):5604-5606,5609
网络信息的检测与控制是网络管理的基本职责,限制客户端对非法网站访问是网络管理的重要任务.通过分析ARP欺骗的原理,利用ARP欺骗的方法对局域网内信息进行监听,分析DNS查询数据报内容,在ARP欺骗的基础上结合DNS欺骗对非法访问进行重定向技术进行了研究,并提出了相应的算法,减少了客户端对非法网站访问,实现了自动网络监控的功能.  相似文献   

4.
如何发现主题信息源是主题Web信息整合的前提。提出了一种主题信息源发现方法,将主题信息源发现转化为网站主题分类问题,并利用站外链接发现新的信息源。从网站中提取出能反映网站主题的内容特征词和结构特征词,建立描述网站主题的改进的向量空间模型。以该模型为基础,通过类中心向量法与SVM相结合对网站主题进行分类。提出一种能尽量少爬取网页的网络搜索策略,在发现站外链接的同时爬取最能代表网站主题的页面。将该主题信息源发现方法应用于林业商务信息源,通过实验验证了该方法的有效性。  相似文献   

5.
刘丹  崔阳 《微机发展》2013,(2):153-156,161
为了解决从网页中准确抽取产品信息这一B2B垂直搜索引擎的关键问题,以站点树为模型,首先分析了企业网站的结构特征,在此基础上构建了一个面向B2B垂直搜索引擎的网页信息抽取系统。该系统利用站点树在企业站点大量网页中识别出产品页,并进行去噪处理,然后使用基于规则的方法抽取产品页中包含的产品描述信息和参数信息。通过该系统抽取到的各类产品信息较为准确,且效率得到明显提高,适用于B2B垂直搜索引擎中对产品的描述、分类及搜索。  相似文献   

6.
朱笃涛  葛元  王林泉 《计算机工程》2004,30(22):147-148,186
提出了基于结构分类的手势识别方法,即在字母手势图像中寻找轮廓所包含的结构特征信息,并以此进行分类判断。这些结构特征中包含了大量的特征信息,如手指个数、手指间距离等,利用这些信息就可以运用传统的结构分类方法对手势加以识别  相似文献   

7.
针对高分辨率遥感影像场景分类中使用中、低层特征不能有效表达高分影像的语义信息,造成分类精度不高的问题,提出了一种联合Fisher核编码和卷积神经网络的高分影像场景分类方法。首先利用Fisher核编码框架提取影像的中层语义特征,然后利用深度卷积神经网络提取影像高层语义特征,最后融合中、高层特征利用支持向量机进行分类。实验采用迁移学习方法来克服深度卷积神经网络对训练数据量的需求。实验数据采用UC-Merced 21类和WHURS 19类2个高分影像数据集。实验结果表明,中、高层融合特征包含更丰富的场景信息,增加了目标的可区分性,相比已有方法,该方法能够有效提高分类精度;迁移学习方法能够克服深度卷积神经网络对训练数据量的依赖性。  相似文献   

8.
目前在识别钓鱼网站的研究中,对识别速度有着越来越高的需求,因此提出了一种基于混合特征选择模型的钓鱼网站快速识别方法。混合特征选择模型包含初次特征选择、二次特征选择和分类三个主要部分,使用信息增益、卡方检验相结合以及基于随机森林的递归特征消除算法建立了混合特征选择模型,并在模型中使用分布函数与梯度,获取最佳截断阈值,得到最优数据集,从而提高钓鱼网站识别的效率。实验数据表明,使用该混合特征选择模型进行特征筛选后的数据集,维度降低了79.2%,在分类精确度几乎不损失的情况下,降低了32%的分类时间复杂度,有效地提高了分类效率。另外,使用UCI机器学习库中的大型钓鱼数据集对该模型进行评价,分类精确率虽然损失1.7%,但数据集维度降低了70%,分类时间复杂度降低了41.1%。  相似文献   

9.
互联网网站数量快速增长使现有方法难以准确分类特定网站主题,如基于URL的方法无法处理未反映在URL中的主题信息,基于网页内容的方法受到数据稀疏性和语义关系捕捉的限制。为此,提出一种基于异构图神经网络的半监督网站主题分类方法HGNN-SWT。该方法不仅利用网站文本特征来弥补仅使用URL特征的不足,还利用异构图对网站文本和词语的稀疏关系进行建模,通过处理图中的节点和边关系来提高分类性能。同时引入基于随机游走的邻居节点采样方法,考虑节点的局部特征和全局图结构,并提出特征融合策略,捕捉网站文本数据的上下文关系和特征交互。通过在自制的Chinaz Website数据集上的实验,证明了HGNN-SWT方法在网站主题分类任务中相较于现有方法具有更高的准确率。  相似文献   

10.
脑电信号包含着大脑皮层活动的丰富信息,但同时也包含了大量的噪声。如何有效地从这些丰富的信息中提取有用特征.一直是该研究领域的热点问题。文中提出利用灰建模的方法进行脑电特征提取,具有一定的创新性。介绍了灰色建模机理及其在脑电特征提取中的应用,利用实测脑电信号建立了脑电GM(1,1)模型,并进行了模型参数估计和特征提取,用K近邻算法对所提取的特征参数进行了分类。分类结果表明,利用灰建模的方法进行脑电特征提取和分类的方法是可行、有效的,为脑电信号的特征提取提供了一种新的思路和方法。  相似文献   

11.
The increasing growth of malicious websites and systems for distributing malware through websites is making it urgent the adoption of effective techniques for timely detection of web security threats. Current mechanisms may exhibit some limitations, mainly concerning the amount of resources required, and a low true positives rate for zero-day attacks. With this paper, we propose and validate a set of features extracted from the content and the structure of webpages, which could be used as indicators of web security threats. The features are used for building a predictor, based on five machine learning algorithms, which is applied to classify unknown web applications. The experimentation demonstrated that the proposed set of features is able to correctly classify malicious web sites with a high level of precision, corresponding to 0.84 in the best case, and recall corresponding to 0.89 in the best case. The classifiers reveal to be successful also with zero day attacks.  相似文献   

12.
Abstract

Terrorist groups have reconnoitered smarter ways to use online discussion forums for their violent plans. They have been using their privately owned discussion forums for various illegal purposes. A comparative study of work done on various dark web forums of terroristic organizations is done in this paper. This paper proposes a novel approach to identify procurement of modern weapons over the social media forum by terrorist groups. We used data from four dark web forum websites named “Ansar Aljihad Network”, “IslamicAwakening”, “Gawaher”, and “IslamicNetwork”. Multiple experts independently annotated 313 randomly selected posts as procurement (YES) or non- procurement (NO) to label the forum threads. Mutual agreement between experts is computed to find the level of significance. Furthermore, we used machine learning classification techniques (MLCT) in order to classify labeled posts. To our knowledge, our procurement model presents a first of its kind model to automatically detect procurement of modern weapons over dark web. The work done presents application of social media analytics and text mining to counter terrorism.  相似文献   

13.
Users have clear expectations of where web objects are located on a web page. Studies conducted with manipulated, fictitious websites showed that web objects placed according to user expectations are found faster and remembered more easily. Whether this is also true for existing websites has not yet been examined. The present study investigates the relation between location typicality and efficiency in finding target web objects in online shops, online newspapers, and company web pages. Forty participants attended a within-subject eye-tracking experiment. Typical web object placement led to fewer fixations and participants found target web objects faster. However, some web objects were less sensitive to location typicality, if they were more visually salient and conformed to user expectations in appearance. Placing web objects at expected locations and designing their appearance according to user expectations facilitates orientation, which is beneficial for first impressions and the overall user experience of websites.  相似文献   

14.
针对钓鱼攻击者常用的伪造HTTPS网站以及其他混淆技术,借鉴了目前主流基于机器学习以及规则匹配的检测钓鱼网站的方法RMLR和PhishDef,增加对网页文本关键字和网页子链接等信息进行特征提取的过程,提出了Nmap-RF分类方法。Nmap-RF是基于规则匹配和随机森林方法的集成钓鱼网站检测方法。根据网页协议对网站进行预过滤,若判定其为钓鱼网站则省略后续特征提取步骤。否则以文本关键字置信度,网页子链接置信度,钓鱼类词汇相似度以及网页PageRank作为关键特征,以常见URL、Whois、DNS信息和网页标签信息作为辅助特征,经过随机森林分类模型判断后给出最终的分类结果。实验证明,Nmap-RF集成方法可以在平均9~10 μs的时间内对钓鱼网页进行检测,且可以过滤掉98.4%的不合法页面,平均总精度可达99.6%。  相似文献   

15.
WWW网页布局规则初探   总被引:1,自引:0,他引:1  
连入WWW(WorldWideWeb)的用户和网点数量每年成倍增长.要想使一个网点在数以百万计的网点中给访问者留下深刻的印象,设计出一个含有丰富信息、易于浏览、视觉上怡人的网页(WebPages)是不可或缺的.本文从Gestalt心理学、传统印刷术、超文本制作和人机交互学出发,尝试提出一个网络文档(WebDocuments)界面布局的设计规则,包括文本、图形、静态和动态的网络文档.本文把网络文档分成五种基本类型,然后给出了一些布局规则.  相似文献   

16.
In this paper, we examine external failures and internal faults traceable to web software and source contents. We develop related defect and quality measurements based on different perspectives of customers, users, information or service hosts, maintainers, developers, integrators, and managers. These measurements can help web information and service providers with their quality assessment and improvement activities to meet the quality expectations of their customers and users. The different usages of our measurement framework by different stakeholders of web sites and web applications are also outlined and discussed. The data sources include existing web server logs and statistics reports, defect repositories from web application development and maintenance activities, and source files. We applied our approach to four diverse websites: one educational website, one open source software project website, one online catalog showroom for a small company, and one e-Commerce website for a large company. The results demonstrated the viability and effectiveness of our approach.  相似文献   

17.
分析了高等院校构建网站时遇到的普遍问题,提出了文件分布存储、数据集中存储的设计思想,并且在Java EE平台上设计和实现了分布式站群管理系统,其站点分布式部署、高度的代码共享、内嵌的网页编辑器和多种类型的内容管理使得高校可以快速、灵活、方便地构建自己的部门网站群,系统经过一年多的运行和完善,具有较大的应用价值。  相似文献   

18.
This study investigates how webmasters of sites affiliated with bounded communities manage tensions created by the open social affordances of the internet. We examine how webmasters strategically manage their respective websites to accommodate their assumed target audiences. Through in‐depth interviews with Orthodox webmasters in Israel, we uncover how they cultivate 3 unique strategies ‐‐ control, layering, and guiding ‐‐ to contain information flows. We thereby elucidate how web strategies reflect the relationships between community, religion and CMC.  相似文献   

19.
The ability to automatically detect fraudulent escrow websites is important in order to alleviate online auction fraud. Despite research on related topics, such as web spam and spoof site detection, fake escrow website categorization has received little attention. The authentic appearance of fake escrow websites makes it difficult for Internet users to differentiate legitimate sites from phonies; making systems for detecting such websites an important endeavor. In this study we evaluated the effectiveness of various features and techniques for detecting fake escrow websites. Our analysis included a rich set of fraud cues extracted from web page text, image, and link information. We also compared several machine learning algorithms, including support vector machines, neural networks, decision trees, naïve bayes, and principal component analysis. Experiments were conducted to assess the proposed fraud cues and techniques on a test bed encompassing nearly 90,000 web pages derived from 410 legitimate and fake escrow websites. The combination of an extended feature set and a support vector machines ensemble classifier enabled accuracies over 90 and 96% for page and site level classification, respectively, when differentiating fake pages from real ones. Deeper analysis revealed that an extended set of fraud cues is necessary due to the broad spectrum of tactics employed by fraudsters. The study confirms the feasibility of using automated methods for detecting fake escrow websites. The results may also be useful for informing existing online escrow fraud resources and communities of practice about the plethora of fraud cues pervasive in fake websites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号