期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Fast Scalable Classifier Tightly Integrated with RDBMS

刘红岩陆宏钧陈剑《计算机科学技术学报》2002,17(2):152-159

In this pager,we report our success in building efficient scalable classifiers by exploring the capabilities of modern relational database management systems (RDBMS).In addition to high classification accuracy,the unique features of the approach include its high training speed ,linear scalability,and simplicity in implementation.More importantly,the major computation required in the approach can be implemented using standard functions provided by the modern realtional DBMS.Besides,with the effective rule pruning strategy,the algorithm proposed in this paper can produce a compact set of classification rules,The results of experiments conducted for performance evaluation an analysis are presented. 相似文献

2.

Managing very large document collections using semantics 总被引：2，自引：1，他引：1

下载免费PDF全文

王国仁陆宏钧于戈鲍玉斌《计算机科学技术学报》2003,18(3):0-0

相似文献

3.

Data extraction from the web based on pre-defined schema 总被引：8，自引：1，他引：7

下载免费PDF全文

孟小峰陆宏钧王海燕谷明哲《计算机科学技术学报》2002,17(4):0-0

With the development of the Internet,the World Web has become an invaluable information source for most organizations,However,most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents.Effectively extracting data from such documents remains a non-trivial task.In this paper,we present a schema-guided approach to extracting data from HTML pages .Under the approach,the user defines a schema specifying what to be extracted and provides sample mappings between the schema and th HTML page.The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required datas in the form of XML conforming to the use-defined schema .A prototype system implementing the approach has been developed .The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy. 相似文献

4.

关于切换回归的集成模糊聚类算法 GFC 总被引：1，自引：0，他引：1

王士同江海峰陆宏钧《软件学报》2002,13(10):1905-1914

已经有多个方法可用于解决切换回归问题.根据所提出的基于Newton引力定理的引力聚类算法GC,结合模糊聚类算法,进一步提出了新的集成模糊聚类算法 GFC.理论分析表明GFC 能收敛到局部最小.实验结果表明GFC在解决切换回归问题时,比标准模糊聚类算法更有效,特别在收敛速度方面. 相似文献

5.

利用数据库技术实现的可扩展的分类算法 总被引：9，自引：0，他引：9

刘红岩陆宏钧陈剑《软件学报》2002,13(6):1075-1081

重点研究将数据挖掘中的分类技术与数据库技术紧密结合的高效的可扩展的分类算法.提出一种基于分组记数技术构造分类器的方法,利用数据库系统的结构化查询语言来实现主要计算任务.为了提高算法的执行效率,还提出了优化策略和冗余规则的剪裁策略,并将分类规则的发现过程与相关属性的选择方法有机地结合在一起.使用这些方法和策略,分类算法能够从大规模数据集中快速地发现一组简洁的规则.除了具有与现有分类算法相当的准确度和较高的执行效率以外,该分类算法还具有良好的基于训练集元组个数和属性个数两方面的可扩展性和易于实现的特点. 相似文献