首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
基于改进HTML-Tree的中文网页特征向量提取方法   总被引:1,自引:0,他引:1  
中文网页特征向量的提取是提高中文网页分类准确度和召回度的关键.经过研究HTML网页的结构特点,提出一种基于改进的HTML-Tree及网页元素权重的中文网页文本预处理方法,并在此基础上进行网页文本特征向量的提取.该方法充分利用不同类别网页的特点,考虑了网页内各种元素权重的贡献.经过实验验证,该方法提高了网页特征向量提取的效率,有效提高了中文网页分类的准确度和召回度.  相似文献   

2.
针对海量网页文本去重效率不高问题,提出了一种高效的并行网页去重算法.该算法利用Hadoop框架的Map/Reduce机制,通过对网页文本提取特征串,使用Google的Simhash算法对提取的特征串进行哈希映射得到相应的哈希码,然后对产生的哈希码进行海明距离比较,从而得到重复的网页数据.实验表明,与相关去重算法相比,所提算法有效地提高了文本去重计算效率.  相似文献   

3.
多媒体网页资源中存在着很多的重复网页,而网页消重可以消除重复的网页,降低存储的成本,提高搜索引擎的性能。  相似文献   

4.
实现了一个中文网页采集、过滤和分类系统.文中从网页预处理、特征选择、分类器模型等方面介绍了该系统的具体解决方案.实验结果表明,该分类系统取得了令人满意的分类效果.  相似文献   

5.
详细介绍了一种快速的中文网页分类系统的设计与实现,通过解析出网页的主要内容、网页的Title、网页的Meta标签内容和指向该网页的父网页上的锚文本,并根据这些信息用VSM法将网页分类。实验结果表明,本文的方法可以使中文网页分类性能速度得到大的提高且能保持较高的准确率。  相似文献   

6.
文中采用网页代码的静态文本特征分析的方法,讨论并分析网页静态链接安全相关性的技术研究.首先根据当前网站木马分析研究的现状及存在的问题,提出了网页静态链接关系分析模型,并描述了网页链接文本安全相关性的分析算法;然后对模型及算法进行了算法的数学表示,并给出算法实现的数据结构.最后通过对实验结果的分析,验证了该算法的可靠性和合理性.  相似文献   

7.
为了提高网页文本分类的准确性.克服传统的文本分类算法易受网页中虚假、错误信息的影响.提出一种基于链接信息的网页分类算法.通过对K近邻方法的改进.利用当前网页与其父网页的链接信息对网页实沲分类,用空间向量表示待分类网页的父链接信息。在训练集合中找到K篇与该网页链接信息向量最相似的网页,计算该网页所属的类别,通过实验与传统文本分类算法进行了对比,验证了该方法的有效性.  相似文献   

8.
本文首先描述基于关键字和特征码的网页去重算法思想,然后对算法中的关键词提取问题,特征串提取问题和特征串相似度计算问题进行了分析和研究。  相似文献   

9.
曹为梅 《山东电子》1998,(3):11-12,14
本文探讨了中文系统和英文系统的编码及网页显示的不同特点,提出了中文网页制作的几种技巧和办法。  相似文献   

10.
网页的内容信息对于提高聚类质量来说并不完全够用,针对网络社区网页之间存在的天然链接关系,本文提出了一种挖掘用户标签的增强型社区网页聚类算法.本文采用多种距离度量方法,并挖掘网页链接关系,然后将网页的内容信息相似度和链接关系结合起来进行聚类.实验表明,提出的算法是有效的.  相似文献   

11.
DUV lithography, using the 248 nm wavelength, is a viable manufacturing option for devices with features at 130 nm and less. Given the low kl value of the lithography, integrated process development is a necessary method for achieving acceptable process latitude. The application of assist features for rule based OPC requires the simultaneous optimization of the mask, illumination optics and the resist.Described in this paper are the details involved in optimizing each of these aspects for line and space imaging.A reference pitch is first chosen to determine how the optics will be set. The ideal sigma setting is determined by a simple geometrically derived expression. The inner and outer machine settings are determined, in turn,with the simulation of a figure of merit. The maximum value of the response surface of this FOM occurs at the optimal sigma settings. Experimental confirmation of this is shown in the paper.Assist features are used to modify the aerial image of the more isolated images on the mask. The effect that the diffraction of the scattering bars (SBs) has on the image intensity distribution is explained. Rules for determining the size and placement of SBs are also given.Resist is optimized for use with off-axis illumination and assist features. A general explanation of the material' s effect is discussed along with the affect on the through-pitch bias. The paper culminates with the showing of the lithographic results from the fully optimized system.  相似文献   

12.
From its emergence in the late 1980s as a lower cost alternative to early EEPROM technologies, flash memory has evolved to higher densities and speedsand rapidly growing acceptance in mobile applications.In the process, flash memory devices have placed increased test requirements on manufacturers. Today, as flash device test grows in importance in China, manufacturers face growing pressure for reduced cost-oftest, increased throughput and greater return on investment for test equipment. At the same time, the move to integrated flash packages for contactless smart card applications adds a significant further challenge to manufacturers seeking rapid, low-cost test.  相似文献   

13.
The relation between the power of the Brillouin signal and the strain is one of the bases of the distributed fiber sensors of temperature and strain. The coefficient of the Bfillouin gain can be changed by the temperature and the strain that will affect the power of the Brillouin scattering. The relation between the change of the Brillouin gain coefficient and the strain is thought to be linear by many researchers. However, it is not always linear based on the theoretical analysis and numerical simulation. Therefore, errors will be caused if the relation between the change of the Brillouin gain coefficient and the strain is regarded as to be linear approximately for measuring the temperature and the strain. For this reason, the influence of the parameters on the Brillouin gain coefficient is proposed through theoretical analysis and numerical simulation.  相似文献   

14.
The parallel thinning algorithm with two subiterations is improved in this paper. By analyzing the notions of connected components and passes, a conclusion is drawn that the number of passes and the number of eight-connected components are equal. Then the expression of the number of eight-connected components is obtained which replaces the old one in the algorithm. And a reserving condition is proposed by experiments, which alleviates the excess deletion where a diagonal line and a beeline intersect. The experimental results demonstrate that the thinned curve is almost located in the middle of the original curve connectivelv with single pixel width and the processing speed is high.  相似文献   

15.
Today, micro-system technology and the development of new MEMS (Micro-Electro-Mechanical Systems) are emerging rapidly. In order for this development to become a success in the long run, measurement systems have to ensure product quality. Most often, MEMS have to be tested by means of functionality or destructive tests. One reason for this is that there are no suitable systems or sensing probes available which can be used for the measurement of quasi inaccessible features like small holes or cavities. We present a measurement system that could be used for these kinds of measurements. The system combines a fiber optical, miniaturized sensing probe with low-coherence interferometry, so that absolute distance measurements with nanometer accuracy are possible.  相似文献   

16.
This paper presents a new method to increase the waveguide coupling efficiency in hybrid silicon lasers. We find that the propagation constant of the InGaAsP emitting layer can be equal to that of the Si resonant layer through improving the design size of the InP waveguide. The coupling power achieves 42% of the total power in the hybrid lasers when the thickness of the bonding layer is 100 nm. Our result is very close to 50% of the total power reported by Intel when the thickness of the thin bonding layer is less than 5 nm. Therefore, our invariable coupling power technique is simpler than Intel's.  相似文献   

17.
A new quantum protocol to teleport an arbitrary unknown N-qubit entangled state from a sender to a fixed receiver under M controllers(M < N) is proposed. The quantum resources required are M non-maximally entangled Greenberger-Home-Zeilinger (GHZ) state and N-M non-maximally entangled Einstein-Podolsky-Rosen (EPR) pairs. The sender performs N generalized Bell-state measurements on the 2N particles. Controllers take M single-particle measurement along x-axis, and the receiver needs to introduce one auxiliary two-level particle to extract quantum information probabilistically with the fidelity unit if controllers cooperate with it.  相似文献   

18.
A continuous-wave (CW) 457 nm blue laser operating at the power of 4.2 W is demonstrated by using a fiber coupled laser diode module pumped Nd: YVO4 and using LBO as the intra-cavity SHG crystal With the optimization of laser cavity and crystal parameters, the laser operates at a very high efficiency. When the pumping power is about 31 W, the output at 457nm reaches 4.2 W, and the optical to optical conversion efficiency is about 13.5% accordingly. The stability of the out putpower is better than 1.2% for 8 h continuously working.  相似文献   

19.
It is well known that adding more antennas at the transmitter or at the receiver may offer larger channel capacity in the multiple-input multiple-output(MIMO) communication systems. In this letter, a simple proof is presented for the fact that the channel capacity increases with an increase in the number of receiving antennas. The proof is based on the famous capacity formula of Foschini and Gans with matrix theory.  相似文献   

20.
Call for Papers     
正Wireless Body-area Networks The last decade has witnessed the convergence of three giant worlds:electronics,computer science and telecommunications.The next decade should follow this convergence in most of our activities with the generalization of sensor networks.In particular with the progress in medicine,people live longer and the aging of population will push the development of wireless personal networks  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号