首页 | 本学科首页   官方微博 | 高级检索  
     

信息抽取技术在LBS中的应用
引用本文:张清军,朱才连,侯林山.信息抽取技术在LBS中的应用[J].四川大学学报(工程科学版),2005,37(1):116-120.
作者姓名:张清军  朱才连  侯林山
作者单位:中国科学院,测量与地球物理研究所,湖北,武汉,430077
基金项目:国家自然科学基金资助项目(40274058)
摘    要:由于LBS系统的终端设备处理能力较低,显示屏幕较小,再加上无线数据网络带宽不足,因此无法浏览整个Web网页。采用信息抽取技术可以将用户感兴趣的信息提取出来,再发送给用户终端,有效地解决上述问题,信息抽取技术将是LBS系统中的一项重要应用。提出了一种基于信息抽取的从删.到WML的页面转换方法,首先标记少量的Web网页形成样本实例集,采用归纳算法生成信息抽取规则;其次应用抽取规则和模式匹配来处理结构和风格类似的Web页面;最后将抽取结果转换为WML页面。开发了原型系统,通过对实际数据源的抽取,验证了此方法的有效性。

关 键 词:LBS  信息抽取  模式匹配  页面转换
文章编号:1009-3087(2005)01-0116-05

Application of Information Extraction Technique in LBS
ZHANG Qing-jun,ZHU Cai-lian,HOU Lin-shan.Application of Information Extraction Technique in LBS[J].Journal of Sichuan University (Engineering Science Edition),2005,37(1):116-120.
Authors:ZHANG Qing-jun  ZHU Cai-lian  HOU Lin-shan
Abstract:Because LBS terminal devices have hardware constraints such as slow processing ability, small screen, and low bandwidth of wireless networks, it is difficult for these devices to display the entire Web page. It can effectively solve the above-mentioned problem that extracting the interesting contents for user from Web pages and send them to terminal devices. Therefor information extraction technique is an important application in LBS system.A new approach of page transformation from HTML to WML based on information extraction is put forward. Firstly, a set of training examples are generated from some Web pages labeled with examples of the data to be extracted, then extraction rules are induced from these user-labeled training instances; Secondly, extraction rules and pattern match can be used to extract information from other Web pages similar to the training examples in structure and style; Lastly, the extracted contents are transformed into WML page. A prototype system is implemented to test a set of Web pages. The experimental results show that this new method is effective.
Keywords:LBS  information extraction  pattern match  page transformation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号