排序方式: 共有3条查询结果,搜索用时 234 毫秒
1
1.
Karane M. M. S. Panteleev A. V. 《Journal of Computer and Systems Sciences International》2022,61(5):751-775
Journal of Computer and Systems Sciences International - The problem of finding the optimal control of bundles of trajectories of continuous deterministic systems with incomplete feedback is... 相似文献
2.
Karane Vieira André Luiz da Costa Carvalho Klessius Berlt Edleno S. de Moura Altigran S. da Silva Juliana Freire 《World Wide Web》2009,12(2):171-211
Templates are pieces of HTML code common to a set of web pages usually adopted by content providers to enhance the uniformity
of layout and navigation of theirs Web sites. They are usually generated using authoring/publishing tools or by programs that
build HTML pages to publish content from a database. In spite of their usefulness, the content of templates can negatively
affect the quality of results produced by systems that automatically process information available in web sites, such as search
engines, clustering and automatic categorization programs. Further, the information available in templates is redundant and
thus processing and storing such information just once for a set of pages may save computational resources. In this paper,
we present and evaluate methods for detecting templates considering a scenario where multiple templates can be found in a
collection of Web pages. Most of previous work have studied template detection algorithms in a scenario where the collection
has just a single template. The scenario with multiple templates is more realistic and, as it is discussed here, it raises
important questions that may require extensions and adjustments in previously proposed template detection algorithms. We show
how to apply and evaluate two template detection algorithms in this scenario, creating solutions for detecting multiple templates.
The methods studied partitions the input collection into clusters that contain common HTML paths and share a high number of
HTML nodes and then apply a single-template detection procedure over each cluster. We also propose a new algorithm for single
template detection based on a restricted form of bottom-up tree-mapping that requires only small set of pages to correctly
identify a template and which has a worst-case linear complexity. Our experimental results over a representative set of Web
pages show that our approach is efficient and scalable while obtaining accurate results. 相似文献
3.
Karane Vieira Luciano Barbosa Altigran Soares da Silva Juliana Freire Edleno Moura 《World Wide Web》2016,19(3):449-474
Focused crawlers are effective tools for applications requiring a high number of pages belonging to a specific topic. Several strategies for implementing these crawlers have been proposed in the literature, which aim to improve crawling efficiency by increasing the number of relevant pages retrieved while avoiding non-relevant pages. However, an important aspect of these crawlers has been largely overlooked: the selection of the seed pages that serve as the starting points for a crawl. In this paper, we show that the seeds can greatly influence the performance of crawlers, and propose a new framework for automatically finding seeds. We describe a system that implements this framework and show, through a detailed experimental evaluation, that by providing crawlers a seed set that is large and varied, they not only obtain higher harvest rates but also an improved topic coverage. 相似文献
1