期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李国静尹天阳张兴睿《计算机应用与软件》2021,38(9):167-172

鉴于传统方法在赌博网站检测上时效性低、准确度低,提出基于PAM概率主题模型的赌博网站检测方法.抽取网站及其关联网页的文本内容,并参考网站的结构化信息给不同的文本内容赋予不同的权重;利用PAM模型对网页文本信息进行主题挖掘,分析其是否大概率倾向于"赌博"主题;综合计算所抽取的各个网页的主题信息,判断该网站是否属于赌博网站,从而实现对赌博网站的有效检测.通过实验分析,该方法在赌博网站检测上的准确度达到72.3％. 相似文献

2.

基于模板化的Blog信息抽取

下载免费PDF全文

时达明林鸿飞赵晶《计算机工程与应用》2008,44(9):156-158

Blog（博客）可以称为在线个人日志。作为一种新兴的媒体,Blog目前已经成为一种在Web上表达个人观点和情感的一种非常流行的方式。那么如何从Blog中快速准确地抽取有用的信息（话题发布时间、话题题目、话题内容、评论内容等）就成为了Blog应用中一个非常重要的步骤。提出了一种基于模板化的Blog信息抽取方法,该方法通过分析Blog网站的HTML源代码,然后提取出网站的模板,并根据该模板对Blog网页进行信息抽取。对来自国内10个著名博客网站进行模板的提取,并对这10个网站中的7 374个Blog网页进行了实验,实验结果表明,该方法能根据提取出的模板快速、准确地对Blog网页进行信息抽取。相似文献

3.

基于主题型页面的正文信息抽取技术研究

万文兵《计算机光盘软件与应用》2015,(1):15-16

Web页面信息通常包含大量无关结构和HTML标记,而页面主题信息通常淹没其中,如何快速获取Web页面主题信息。本文提出了一种抽取策略,首先判定是否为主题型页面,然后提取网页正文信息,最后利用正则表达式滤除内容块中HTML标记和无关文字。实验结果表明:该方法能准确地完成主题型网页的正文抽取任务。相似文献

4.

基于加权频繁子树相似度的网页评论信息抽取_*

郝志峰袁琴蔡瑞初温雯骆魁永《计算机应用研究》2017,34(6)

快速积累的海量产品评论信息是商家和消费者进行需求调研或购物决策时的重要依据。针对现有网页信息抽取方法普遍存在人工耗时大,抽取准确率低等问题,提出了一种基于加权频繁子树相似度的网页评论信息抽取方法WTS。首先通过视觉特征对网页进行剪枝处理。然后,通过深度加权的相似度度量方法抽取最佳频繁子树。最后,通过子树对齐方法抽取评论路径并解析评论内容。通过对京东、苏宁等网站的评论内容抽取实验,验证了WTS比D-EEM、POL等方法在抽取产品评论信息上具有一定的优势。相似文献

5.

基于标签路径特征融合的在线Web新闻内容抽取

吴共庆胡骏李莉徐喆昊刘鹏程胡学钢吴信东《软件学报》2016,27(3):714-735

精准地抽取新闻网页的内容,是提高Web新闻分析等应用系统工作质量的关键技术之一.由于缺少Web新闻出版的标准,存在大量不同的出版格式,并且Web本身是一种具有高度异构性的大数据载体,导致Web新闻内容抽取成为一个开放性问题.经大量实例分析发现,新闻网页内容与其上的标签路径存在潜在的关联性.因此,设计了标签路径特征系,以从不同视角区分网页内容和噪音.在特征相似性分析的基础上,提出了一种基于组合特征选择的特征融合策略,并设计了基于融合特征的Web新闻内容抽取方法CEPF.CEPF是一种快速的通用、无需训练的在线Web新闻内容抽取算法,可抽取多种来源、多种风格、多种语言的Web新闻网页.在CleanEval等测试数据集上的实验结果表明,CEPF方法优于CETR等抽取方法. 相似文献

6.

基于网页DOM树节点路径相似度的正文抽取

《微型机与应用》2016,(19):74-77

由于人工抽取网页信息效率低、成本高,因此根据对大量网页结构的观察,提出基于网页文档对象模型DOM树节点路径相似度的正文抽取方法。依据同网站下的网页结构相同的特点去除网页噪声得到网页的主题内容,然后结合正文节点在DOM树中的路径的相似度抽取正文。通过对不同类型的中文新闻网站上的1 000个网页进行实验,结果表明该方法对于97.6%的网页都能够去除大部分噪声并保持正文内容的完整性,正文抽取结果有93.30%的准确率和95.59%的召回率。所提算法对不同类型的网页都有较好的适应性。相似文献

7.

基于DOM树的可适应性多信息块Web信息抽取

杨文超乔鸿《网络安全技术与应用》2012,(11):62-64

Web信息抽取通常采用的是一种归纳学习方法,从指定的模版网页中归纳到抽取规则,这种方法虽然能够准确地抽取出信息,当网站的模版发生改变后,必须重新获得抽取规则,因而这种抽取器的维护成本比较高,可适应性差。本文针对这一难题,提出一种基于DOM树的可适应性多信息块Web信息抽取,该方法首先通过NekoHtml将网页解析成DOM树,然后确定包含关键词组的信息块,从而实现Web信息抽取。经过大量网站的实验证明该方法适用于不同站点的信息抽取,并且能对多信息块的Web页面进行信息抽取。相似文献

8.

针对模板生成网页的一种数据自动抽取方法 总被引：5，自引：0，他引：5

杨少华林海略韩燕波《软件学报》2008,19(2):209-223

当前,Web上的很多网页是动态生成的,网站根据请求从后台数据库中选取数据并嵌入到通用的模板中,例如电子商务网站的商品描述网页.研究如何从这类由模板生成的网页中检测出其背后的模板,并将嵌入的数据(例如商品名称、价格等等)自动地抽取出来.给出了模板检测问题的形式化描述,并深入分析模板产生网页的结构特征.提出了一种新颖的模板检测方法,并利用检测出的模板自动地从实例网页中抽取数据.与其他已有方法相比,该方法能够适用于"列表页面"和"详细页面"两种类型的网页.在两个第三方的测试集上进行了实验,结果表明,该方法具有很高的抽取准确率. 相似文献

9.

基于主题概念实现对购物网站的自动主题分类

陈洪平方巍黄黎崔志明《计算机应用与软件》2010,27(9)

基于传统的关键词统计的分类方法难以正确识别网页的主题,从而难以实现按主题进行分类.为了有效地对Web上的结构化数据源进行主题分类,结合语义知识,将基于概念的主题分类方法,应用到网络购物网站数据源的自动主题分类中.实验表明,该方法能够较好地提高主题分类的精度. 相似文献

10.

新闻网页主题内容提取方法研究

罗永莲秦振吉《微计算机应用》2007,28(5):556-560

网页的半结构化特点与新闻的自身特征为选择性抽取网页内容创造了条件。我们在前人的研究基础上,挖掘Web页面结构特征、充分利用Html标记与新闻特征,重点从Web页面编者对文本修饰角度出发,提出了基于网页内容分割的主题内容抽取方法。实验结果表明该方法能有效地抽取新闻各要素,测试的抽取准确率在96%以上。相似文献

11.

Retracted: A novel viewpoint on information and interface design for auction web sites

Hui‐Ming Kuo Cheng‐Wu Chen 《人机工程学与制造业中的人性因素》2012,22(4):287-295

For the reason of convenience, auction sites are one of the most popular and quickly developing of Internet shopping sites. There have been many problems, however, arising in relation to online auction commerce; these problems have been dealt with one by one. In addition to the important issue of security, some customers have faced problems such as complicated processes, insufficient product information, and bad interface design. Based on the consumer behavior model that we have built on auction sites, the purpose of this study is to evaluate the information and interface design for current domestic and foreign auction sites. The results indicated that some auction Web sites provided insufficient information and inconvenient interface design during some shopping steps. Foreign auction Web sites provided sufficient information and a more convenient interface design than did the domestic auction Web sites. The results, hopefully, can be used as the reference for online auction Web sites to provide customers more convenient shopping environments. © 2011 Wiley Periodicals, Inc. 相似文献

12.

Effects of organizational scheme and labeling on task performance in product-centered and user-centered retail Web sites

Resnick ML Sanchez J 《Human factors》2004,46(1):104-117

As companies increase the quantity of information they provide through their Web sites, it is critical that content is structured with an appropriate architecture. However, resource constraints often limit the ability of companies to apply all Web design principles completely. This study quantifies the effects of two major information architecture principles in a controlled study that isolates the incremental effects of organizational scheme and labeling on user performance and satisfaction. Sixty participants with a wide range of Internet and on-line shopping experience were recruited to complete a series of shopping tasks on a prototype retail shopping Web site. User-centered labels provided a significant benefit in performance and satisfaction over labels obtained through company-centered methods. User-centered organization did not result in improved performance except when the label quality was poor. Significant interactions suggest specific guidelines for allocating resources in Web site design. Applications of this research include the design of Web sites for any commercial application, particularly E-commerce. 相似文献

13.

基于快速构建模板的购物信息抽取方法

李萍朱建波周立新廖彬《计算机应用》2014,34(3):733-737

针对由模板生成的购物信息网页,且根据其网页信息量大,网页结构复杂的特点,提出了一种不使用复杂的学习规则,而将购物信息从模板网页中抽取出来的方法。研究内容包括定义网页模板和网页的信息抽取模板,设计用于快速构建模板的模板语言,并提出一种基于模板语言抽取内容的模型。实验结果表明,在标准的450个网页的测试集下,所提方法的召回率相比抽取问题算法(EXALG)提高了12%;在250个网页的测试集下,召回率相比基于视觉信息和标签结构的包装器生成器(ViNTs)方法和增加自动信息抽取和视觉感知(ViPER)方法分别提升了7.4%,0.2%;准确率相比ViNTs方法和ViPER方法分别提升了5.2%,0.2%。基于快速构建模板的信息抽取方法的召回率和准确率都有很大提升,使得购物信息检索和购物比价系统中的网页分析的准确性和信息召回率得到很大的改进。相似文献

14.

Agent based intelligent search framework for product information using ontology mapping 总被引：2，自引：0，他引：2

Wooju Kim Dae Woo Choi Sangun Park 《Journal of Intelligent Information Systems》2008,30(3):227-247

The Semantic Web and Web services provide many opportunities in various applications such as product search and comparison in electronic commerce. We implemented an intelligent meta-search and recommendation system for products through consideration of multiple attributes by using ontology mapping and Web services. Under the assumption that each shopping site offers product ontology and product search service with Web services, we proposed a meta-search framework to configure a customer’s search intent, make and dispatch proper queries to each shopping site, evaluate search results from shopping sites, and show the customer the relevant product list with associated rankings. Ontology mapping is used for generating proper queries for shopping sites that have different product categories. We also implemented our framework and performed empirical evaluation of our approach with two leading shopping sites in the world. 相似文献

15.

面向网上论坛的信息抽取技术 总被引：5，自引：0，他引：5

奚伟鹏李昕蒋凯武港山《计算机工程》2005,31(4):66-68

在分析了网上论坛内部的信息组织模式和链接结构的基础上,提出了一套面向网上论坛的语义话题线索抽取框架、叙述了其具体实现。为信息抽取定义了完善的抽取规则规范,提供了用户定制规则的可视化工具和论坛站点中语义信息单元自动下载抽取的后台引擎。相似文献

16.

基于主题的Web文本聚类方法

张万山肖瑶梁俊杰余敦辉《计算机应用》2014,34(11):3144-3146

针对传统Web文本聚类算法没有考虑Web文本主题信息导致对多主题Web文本聚类结果准确率不高的问题,提出基于主题的Web文本聚类方法。该方法通过主题提取、特征抽取、文本聚类三个步骤实现对多主题Web文本的聚类。相对于传统的Web文本聚类算法,所提方法充分考虑了Web文本的主题信息。实验结果表明,对多主题Web文本聚类,所提方法的准确率比基于K-means的文本聚类方法和基于《知网》的文本聚类方法要好。相似文献

17.

What users want in e-commerce design: effects of age,education and income

《Ergonomics》2012,55(1-3):153-168

Preferences for certain characteristics of an online shopping experience may be related to demographic data. This paper discusses the characteristics of that experience, demographic data and preferences by demographic group. The results of an online survey of 488 individuals in the United States indicate that respondents are generally satisfied with their online shopping experiences, with security, information quality and information quantity ranking first in importance overall. The sensory impact of a site ranked last overall of the seven characteristics measured. Preferences for these characteristics in e-commerce sites were differentiated by age, education and income. The sensory impact of sites became less important as respondents increased in age, income or education. As the income of respondents increased, the importance of the reputation of the vendor rose. Web site designers may incorporate these findings into the design of e-commerce sites in an attempt to increase the shopping satisfaction of their users. Results from the customer relationship management portion of the survey suggest that current push technologies and site personalization are not an effective means of achieving user satisfaction. 相似文献

18.

What users want in e-commerce design: effects of age,education and income

Lightner NJ 《Ergonomics》2003,46(1-3):153-168

Preferences for certain characteristics of an online shopping experience may be related to demographic data. This paper discusses the characteristics of that experience, demographic data and preferences by demographic group. The results of an online survey of 488 individuals in the United States indicate that respondents are generally satisfied with their online shopping experiences, with security, information quality and information quantity ranking first in importance overall. The sensory impact of a site ranked last overall of the seven characteristics measured. Preferences for these characteristics in e-commerce sites were differentiated by age, education and income. The sensory impact of sites became less important as respondents increased in age, income or education. As the income of respondents increased, the importance of the reputation of the vendor rose. Web site designers may incorporate these findings into the design of e-commerce sites in an attempt to increase the shopping satisfaction of their users. Results from the customer relationship management portion of the survey suggest that current push technologies and site personalization are not an effective means of achieving user satisfaction. 相似文献

19.

一种计算求精网页主题抽取结果的方法

李剑金蓓弘《小型微型计算机系统》2004,25(3):347-351

传统的主题抽取方法单纯依靠分析网页内容的来自动获取网页主题，其分析结果并不十分精确．在WWW上，网页之间通过超链接来互相联系，而链接关系紧密的网页趋向于属于同一主题、基于这一思想，本文提出了一种利用Web链接结构信息来对主题抽取结果进行求精的方法，其通过所链接网页对本网页的影响来修正本网页的主题权值．本文还通过一个实际应用例子，分析了这一方法的特点。相似文献