排序方式: 共有5条查询结果,搜索用时 15 毫秒
1
1.
In this pager,we report our success in building efficient scalable classifiers by exploring the capabilities of modern relational database management systems (RDBMS).In addition to high classification accuracy,the unique features of the approach include its high training speed ,linear scalability,and simplicity in implementation.More importantly,the major computation required in the approach can be implemented using standard functions provided by the modern realtional DBMS.Besides,with the effective rule pruning strategy,the algorithm proposed in this paper can produce a compact set of classification rules,The results of experiments conducted for performance evaluation an analysis are presented. 相似文献
2.
3.
With the development of the Internet,the World Web has become an invaluable information source for most organizations,However,most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents.Effectively extracting data from such documents remains a non-trivial task.In this paper,we present a schema-guided approach to extracting data from HTML pages .Under the approach,the user defines a schema specifying what to be extracted and provides sample mappings between the schema and th HTML page.The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required datas in the form of XML conforming to the use-defined schema .A prototype system implementing the approach has been developed .The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy. 相似文献
4.
5.
利用数据库技术实现的可扩展的分类算法 总被引:9,自引:0,他引:9
重点研究将数据挖掘中的分类技术与数据库技术紧密结合的高效的可扩展的分类算法.提出一种基于分组记数技术构造分类器的方法,利用数据库系统的结构化查询语言来实现主要计算任务.为了提高算法的执行效率,还提出了优化策略和冗余规则的剪裁策略,并将分类规则的发现过程与相关属性的选择方法有机地结合在一起.使用这些方法和策略,分类算法能够从大规模数据集中快速地发现一组简洁的规则.除了具有与现有分类算法相当的准确度和较高的执行效率以外,该分类算法还具有良好的基于训练集元组个数和属性个数两方面的可扩展性和易于实现的特点. 相似文献
1