首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Web信息抽取技术一直是信息技术领域的研究热点。而且,近年来,DIV+CSS的网页布局方法开始普遍应用于网页设计中。基于此,提出了一种较为简单和实用的基于正文特征和网页结构的新闻网页正文抽取方法。首先识别和提取网页正文内容块,然后利用正则表达式滤除内容块中的HTML标记并提取网页正文。实验结果表明,该方法对正文抽取具有较高的通用性与准确率。  相似文献   

2.
基于改进HTML-Tree的中文网页特征向量提取方法   总被引:1,自引:0,他引:1  
中文网页特征向量的提取是提高中文网页分类准确度和召回度的关键.经过研究HTML网页的结构特点,提出一种基于改进的HTML-Tree及网页元素权重的中文网页文本预处理方法,并在此基础上进行网页文本特征向量的提取.该方法充分利用不同类别网页的特点,考虑了网页内各种元素权重的贡献.经过实验验证,该方法提高了网页特征向量提取的效率,有效提高了中文网页分类的准确度和召回度.  相似文献   

3.
Many HTML pages are generated by software programs by querying some underlying databases and then filling in a template with the data. In these situations the metainformation about the data structure is lost, so automated software programs cannot process these data in such powerful manners as information from databases. We propose a set of novel techniques for detecting structured records in a web page and extracting the data values that constitute them. Our method needs only an input page. It starts by identifying the data region of interest in the page. Then it is partitioned into records by using a clustering method that groups similar subtrees in the DOM tree of the page. Finally, the attributes of the data records are extracted by using a method based on multiple string alignment. We have tested our techniques with a high number of real web sources, obtaining high precision and recall values.  相似文献   

4.
钓鱼网站每年在电子商务、通信、银行等领域给用户造成极大损失,成功有效的防范钓鱼网站成为一项艰巨任务。本文通过对实际数据的分析,提取了URL相关特点、网页文本内容2方面特征描述网页,然后对不同特征构建相应分类器,根据增量学习思想优化各分类器,提升算法在线学习能力。最后采用分类集成的方法综合各个分类器的预测结果,达到对钓鱼网站在线智能检测的目标。实验表明,集成分类具有良好的在线学习能力和泛化能力。  相似文献   

5.
针对自然场景下文本识别所存在的字符分割困难、识别精度依赖字典等问题,文中提出了一种基于注意力机制与连接时间分类损失相结合的文本识别算法。利用卷积神经网络与双向长短时期记忆网络实现对图像的特征编码,再使用Attention-CTC结构实现对特征序列的解码,有效解决Attention解码无约束的问题。该算法避免了对标签进行额外对齐预处理和后续语法处理,在加快训练收敛速度的同时显著提高了文本识别率。实验结果表明,该算法对字体模糊、背景复杂的文本图像都具有很好的鲁棒性。  相似文献   

6.
网上交易作为证券行业进行业务开展的主要手段之后,网上交易的安全性就成了人们日益关注的话题。在网上交易过程中,网络钓鱼攻击是一种重要的攻击方式。由于此方式的特殊性,导致被害用户损失严重。所以做好网络钓鱼的防范,对于证券行业具有很大的现实意义。这里从网络钓鱼的实施过程、社会危害及防范方法等几个角度,对网络钓鱼进行全面的阐述,详细描述了恶意分子如何引导用户去访问恶意网站,以及网络钓鱼的主要防范技术,如黑白名单检测技术、页面相似度检查技术,从技术角度和用户的上网行为角度,提出了相关的方法,尽量规避网络钓鱼技术给证券行业带来的安全风险。  相似文献   

7.
The existing anti-phishing approaches use the blacklist methods or features based machine learning techniques. Blacklist methods fail to detect new phishing attacks and produce high false positive rate. Moreover, existing machine learning based methods extract features from the third party, search engine, etc. Therefore, they are complicated, slow in nature, and not fit for the real-time environment. To solve this problem, this paper presents a machine learning based novel anti-phishing approach that extracts the features from client side only. We have examined the various attributes of the phishing and legitimate websites in depth and identified nineteen outstanding features to distinguish phishing websites from legitimate ones. These nineteen features are extracted from the URL and source code of the website and do not depend on any third party, which makes the proposed approach fast, reliable, and intelligent. Compared to other methods, the proposed approach has relatively high accuracy in detection of phishing websites as it achieved 99.39% true positive rate and 99.09% of overall detection accuracy.  相似文献   

8.
李剑 《电子科技》2012,25(1):105-107
为能够高效地把网页中的噪音信息过滤掉,采用基于改进的DOM树和BP神经网络的网页净化方法。根据DOM树和网页内容的特征,用HTMLParser建立内容块树,把网页中的内容按照一定的相关性分割成多个子块,从而把整个内容块的处理简化为处理各个子块。由统计可知,子内容块的内容具有明显的数值特征,可以该特征作为BP神经网络的学习来源。这样可把网页的净化问题转化成通过学习建立过滤模型的问题。实验结果证明,该方法在有主题的中文网页应用上取得了理想的效果。  相似文献   

9.
多标签碰撞问题被认为是射频识别系统中的一个关键问题.近来,许多基于比特追踪技术的查询树算法被提出用于有效的解决标签碰撞问题,然而由于无用的碰撞比特信息和空闲时隙的存在,这些查询树算法的性能都有待进一步提升.本文提出了一种基于比特查询的查询方法,该算法使得标签返回一个映射过比特串来取代原始的ID序列.同传统的ID查询相比较,比特查询不仅可以消除空闲查询还可以将碰撞标签分成更多子集并充分利用碰撞比特信息.基于该比特查询方法,我们提出了一种基于比特查询的多进制树(Bit query based M-ary tree,BQBMT)新型查询树算法,它通过多进制树迭代的分离碰撞,并通过比特查询模式和ID查询模式之间的最佳切换来快速识别标签.理论分析和仿真结果显示,BQBMT算法的的系统效率接近0.89,超过了现有的QT算法和混合防碰撞算法.  相似文献   

10.
目前,网络钓鱼攻击给互联网用户带来严重的威胁。为了应对这种威胁,许多软件厂商与组织提出了各种反钓鱼策略。论文针对基于浏览器的钓鱼网站检测技术进行了分析研究。  相似文献   

11.
崔凯  才华  刘广文  刘智 《液晶与显示》2018,33(3):254-260
人脸对齐是人脸识别系统中的一个核心部分,定位的准确性和定位速度直接影响到人脸识别的效果。人脸图像存在不同姿态、不同表情、不同光照条件等因素的影响,真实场景下的人脸对齐成为一个难题。本文提出了一种基于SURF特征的栈式自编码网络人脸对齐方法,首先通过粗糙定位网络找到近似人脸特征点,并提取局部的SURF特征,输入到局部细化网络,通过级联结构,进一步精确化人脸特征点的具体位置。最后,在人脸数据集AFLW和HELEN上与近几年的对齐方法进行对比实验,平均错误率8.80%,i5四核CPU,2.3Hz主频硬件平台下计算时间7.6ms。我们的人脸对齐方法在真实场景下(包括单人和多人)具有较好的鲁棒性,可以实现准确定位。  相似文献   

12.
In this paper, we present a probabilistic approach to determining whether extracted facial features from a video sequence are appropriate for creating a 3D face model. In our approach, the distance between two feature points selected from the MPEG‐4 facial object is defined as a random variable for each node of a probability network. To avoid generating an unnatural or non‐realistic 3D face model, automatically extracted 2D facial features from a video sequence are fed into the proposed probabilistic network before a corresponding 3D face model is built. Simulation results show that the proposed probabilistic network can be used as a quality control agent to verify the correctness of extracted facial features.  相似文献   

13.
14.
In this paper, we present a robust approach to the registration of white matter tractographies extracted from diffusion tensor-magnetic resonance imaging scans. The fibers are projected into a high dimensional feature space based on the sequence of their 3-D coordinates. Adaptive mean-shift clustering is applied to extract a compact set of representative fiber-modes (FM). Each FM is assigned to a multivariate Gaussian distribution according to its population thereby leading to a Gaussian mixture model (GMM) representation for the entire set of fibers. The registration between two fiber sets is treated as the alignment of two GMMs and is performed by maximizing their correlation ratio. A nine-parameters affine transform is recovered and eventually refined to a twelve-parameters affine transform using an innovative mean-shift based registration refinement scheme presented in this paper. The validation of the algorithm on synthetic intrasubject data demonstrates its robustness to interrupted and deviating fiber artifacts as well as outliers. Using real intrasubject data, a comparison is conducted to other intensity based and fiber-based registration algorithms, demonstrating competitive results. An option for tracking-in-time, on specific white matter fiber tracts, is also demonstrated on the real data.   相似文献   

15.
Cyber security training programs encourage users to report suspicious spear phishing emails, and most antiphishing software provide interfaces to assist in the reporting. Evidence, however, suggests that reporting is scarce. This research examined why this is the case. To this end, Social Cognitive Theory (SCT) was used to examine the influence of the triadic factors of perceived self-efficacy toward antiphishing behaviors, expected negative outcomes from reporting spear phishing emails, and cyber security self-monitoring, on individuals’ likelihood of reporting spear phishing emails. Based on recent research on phishing victims, the present study also incorporated cyber risk beliefs (CRBs) into the SCT framework. The model, tested using survey data (N = 386), revealed that the likelihood of reporting spear phishing emails is increased by perceived self-efficacy, expected negative outcomes, and cyber security self-monitoring. Furthermore, the CRBs directly influenced the three SCT factors and indirectly the individuals’ likelihood of reporting spear phishing emails. The findings add to our understanding of SCT and the science of cyber security.  相似文献   

16.
王明军  易芳  李乐  黄朝军 《红外与激光工程》2022,51(5):20210342-1-20210342-10
点云配准是三维重建的关键技术之一。针对点云匹配中迭代最近点算法(ICP)速率低、对初始位置要求高的问题,提出了一种基于自适应局部邻域特征点提取和匹配的点云配准方法。首先根据局部表面变化因子与平均变化因子的大小关系,自适应地提取特征点;其次利用快速点特征直方图(FPFH)综合描述每个特征点的局部信息,结合随机抽样一致性(RANSAC)算法实现粗配准;最后根据得到的初始变换矩阵和基于特征点的ICP算法实现精配准。对斯坦福数据集、含噪声的点云以及场景点云进行配准实验,实验结果表明:所提出的特征点提取算法能高效地提取点云的特征;相比于其他特征点检测方法,所提方法在粗配准中的配准精度和配准速度更高,且抗噪性能更好;与ICP算法相比,基于文中特征点的ICP算法在斯坦福数据集和场景点云中的配准速度提升了约10倍,在含噪声的点云中,能根据所提取的特征点高效地进行配准。该研究为提高三维重建和目标识别的匹配效率提供了一种高效的方法。  相似文献   

17.
邹瑛 《通信技术》2011,44(7):135-137
在嵌入式Linux下原版属性页控件对分页内子控件的键盘聚焦消息处理不灵活,对此,分析了MiniGUI下属性页控件的键盘和鼠标响应特点,将属性分页节点串联为一级链表,每个分页节点下显示的控件挂接到二级链表上,形成星状多级链表,并将属性页控件的数据结构和消息接口分开定义;最后给出了该控件实现的消息处理流程,并讨论了优化属性页控件外观精美显示效果的绘图方案,实现了灵活的属性页控件设计。  相似文献   

18.
Non-rigid image registration is a prerequisite for many medical image analysis applications such as image fusion of multimodality images and quantitative change analysis of a temporal sequence in computer-aided diagnosis. By establishing the point correspondence of the extracted feature points, it is possible to recover the deformation using nonlinear interpolation methods. However, it may be very difficult to establish such correspondence at an initial stage when confronted with large and complex deformation. In this paper, a mixture of principal axes registration (mPAR) method is proposed to resolve the correspondence problem through a neural computational approach. The novel feature of mPAR is the alignment of two point sets without the need of establishing explicit point correspondence. Instead, it aligns the two point sets by minimizing the relative entropy between their probability distributions resulting in a maximum likelihood estimate of the transformation matrix. The registration process consists of two steps: (1) a finite mixture scheme to establish an improved point correspondence and (2) a multilayer perceptron neural network (MLP) to recover the nonlinear deformation. The neural computation for registration uses a committee machine to obtain a mixture of piece-wise rigid registrations, which gives a reliable point correspondence using multiple extracted objects in a finite mixture scheme. Then the MLP is used to determine the coefficients of a polynomial transform using extracted control points. We have applied our mPAR method to register synthetic data sets, surgical prostate models, and a temporal sequence of mammograms of a single patient. The experimental results show that mPAR not only improves the accuracy of the point correspondence but also results in a desirable error-resilience property for control point selection errors.  相似文献   

19.
Self-assembly is not widely used in industrial micro-fabrication, although it can potentially involve assembly processes that are considerably less complex. A variety of procedures for self-alignment of parts have been introduced and investigated lately. These procedures mainly utilise capillary, gravitational or electrostatic forces in the micro-scale. This paper investigates two different concepts for accurate self-assembly of parts. One is well described in the literature by third parties and involves the alignment of parts by utilising the surface tensions of micro-scaled adhesive films, which are selectively coated on hydrophobic alignment structures. In the present publication the influence of the dimensions of such structured alignment sites on the process flow is discussed. The second concept is a novel approach to accomplish self-alignment of micro-structures with electrostatic attraction. Several complementary and electrically conductive micro-structured patterns serve as binding sites for the alignment of parts in this approach. In order to obtain knowledge of how these two approaches operate, they have been modelled and simulated. Additionally, in order to analyse the feasibility of these procedures and to verify simulation results experiments have been performed on micro-structured parts and substrates. In particular, the layout of the alignment structures and the size of the parts were identical for both described concepts in the experimental work; therefore, these two methods were compared. With the self-assembly procedure that utilises electrostatic attraction, high alignment accuracies and forces, affecting the part over large distances, were observed. Finally, parts with micro-structured binding sites, which were as small as 10 × 10 μm2, could accurately be self-aligned with electrostatic attraction.  相似文献   

20.
为解决汉韩双语平行语料库资源匮乏以及传统句对齐算法面向跨语系语言准确率较低的问题,提出了融合特征的汉韩双语句对齐方法.首先将Bi-LSTM融入孪生神经网络构建句对齐模型,用以分别提取汉语和韩语句子的特征并进行对齐.之后基于语料的特点提取句对齐特征融入输入层.通过与传统Bi-LSTM和不同特征组合的孪生Bi-LSTM的对...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号