基于DOM修剪的藏文Web信息提取 Tibetan Web Information Extraction Based on DOM Pruning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于DOM修剪的藏文Web信息提取

引用本文：	珠杰,欧珠,格桑多吉.基于DOM修剪的藏文Web信息提取[J].计算机工程,2008,34(24):58-60.

作者姓名：	珠杰欧珠格桑多吉

作者单位：	西藏大学计算机科学与技术系,拉萨,850000

基金项目：	国家自然科学基金资助项目

摘要：	随着互联网的普及和藏文信息技术的不断发展，出现了大量的藏文网站。该文根据藏文“音节点”的特征识别藏文网页并进行抓取。在建立DOM树的基础上，分析网页的链接、非链接文本与主题信息块之间的相关度。通过语义修剪算法提取藏文主题信息。经测试证实，该算法在藏文网页识别和藏文主题信息提取中具有较好的适应性。
关键词：	音节点 DOM树藏文 Web信息提取
修稿时间：
Tibetan Web Information Extraction Based on DOM Pruning

Zhu Jie,Ngodrup,GeSang Dorje.Tibetan Web Information Extraction Based on DOM Pruning[J].Computer Engineering,2008,34(24):58-60.

Authors:	Zhu Jie Ngodrup GeSang Dorje

Affiliation:	(Department of Computer Science and Technology, Tibetan University, Lhasa 850000)

Abstract:	With the widespread use of Internet and the development of Tibetan information technology, there are a lot of Websites of Tibetan information resource. This paper identifies Tibetan Web page and crawls it according to features of Tibetan syllable dot. Based on DOM, it analyzes relevance between linked and non-linked Web page text with topical information via pruning semantics algorithm to extract Tibetan topical information. Test result shows that the algorithm to identify and extract in the Tibetan Websites topical information has good adaptation.

Keywords:	syllable dot DOM tree Tibetan Web information extraction
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏