首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 93 毫秒
1.
基于Yahoo的信息自动分类器的原理与设计   总被引:2,自引:0,他引:2       下载免费PDF全文
本文介绍了一种基于Yahoo层次的自动分类器,此分类器主要是把基于文本数据的机器学习技巧用于Yahoo的层次结构;讨论了分类文档过程的文档表示、功能选择和学习方法及相关的算法。  相似文献   

2.
决策树支持向量机多分类器设计的向量投影法   总被引:2,自引:1,他引:1  
针对如何有效地设计决策树支持向量机(SVM)多类分类器的层次结构这个关键问题,提出一种基于向量投影的类间可分性测度的设计方法,并给出一种基于该类间可分性测度设计决策树SVM多分类器层次结构的方法.为加快每个SVM子分类器的训练速度且保持其高推广性,将基于向量投影的支持向量预选取方法用于每个子分类器的训练中.通过对3个大规模数据集和手写体数字识别的仿真实验表明,新方法能有效地提高决策树SVM多类分类器的分类精度和速度.  相似文献   

3.
一种设计层次支持向量机多类分类器的新方法*   总被引:15,自引:2,他引:13  
层次结构的设计是层次支持向量机多类分类方法应用中的关键问题,类间可分性是设计层次结构的重要依据,提出了一种基于线性支持向量机度量类间相似程度的方法,并给出了一种基于类间可分性设计层次支持向量机多类分类器的新方法。实验表明,新方法有效地提高了层次支持向量机多类分类器的分类精度和速度。  相似文献   

4.

针对如何有效地设计决策树支持向量机(SVM)多类分类器的层次结构这个关键问题,提出一种基于向量投影的类间可分性测度的设计方法,并给出一种基于该类间可分性测度设计决策树SVM 多分类器层次结构的方法.为加快每个SVM子分类器的训练速度且保持其高推广性,将基于向量投影的支持向量预选取方法用于每个子分类器的训练中.通过对3个大规模数据集和手写体数字识别的仿真实验表明,新方法能有效地提高决策树SVM类分类器的分类精度和速度.

  相似文献   

5.
利用格贴近度对模糊集的贴近程度进行度量,给出一种基于格贴近度的SVM决策树层次结构设计方法,从而解决对多类模糊样本的分类问题。实验结果表明:基于该层次结构设计方法得到的多类分类器,对多类模糊样本具有良好的分类效果。  相似文献   

6.
语言风格是高考阅读理解中的重要考察内容,然而不同考察方式所需的分类层次不尽相同,该文将语言风格鉴赏转化为层次分类问题。在类别标签指导下,利用图分割算法,获取与特定类别相对应的原始簇。基于原始簇,利用层次聚类获取语言风格类别层次结构,之后结合层次结构训练SVM层次分类器。在解答语言风格鉴赏题过程中,依据阅读理解题干确定所需分类层次,利用SVM层次分类器完成对阅读材料语言风格判别,最后结合知识库生成语言风格鉴赏题答案。实验结果表明,基于层次结构的语言风格判别方法,可以为高考鉴赏类考题的解答提供技术支撑。  相似文献   

7.
与以往的层次化分类不同,本文使用了一种本质为图的层次结构,利用这种层次结构解决平面分类问题,从而提高平面分类的查准率和查全率.在普通的类别层次结构中,同一父类的兄弟类别之间的混淆关系是对称的,但事实上类别之间的混淆关系不是对称的.本文从分类器的混淆矩阵入手,引入了混淆类别的概念.利用混淆类别构造的类别层次结构,从查准率和查全率的角度来考虑类别之间的关系,表达出了混淆关系的非对称性.实验结果显示,使用类别的混淆类别构建类别层次结构的方法,无论从宏观上还是微观上都可以提高分类的准确率.  相似文献   

8.
基于子空间集成的概念漂移数据流分类算法   总被引:4,自引:2,他引:2  
具有概念漂移的复杂结构数据流分类问题已成为数据挖掘领域研究的热点之一。提出了一种新颖的子空间分类算法,并采用层次结构将其构成集成分类器用于解决带概念漂移的数据流的分类问题。在将数据流划分为数据块后,在每个数据块上利用子空间分类算法建立若干个底层分类器,然后由这几个底层分类器组成集成分类模型的基分类器。同时,引入数理统计中的参数估计方法检测概念漂移,动态调整模型。实验结果表明:该子空间集成算法不但能够提高分类模型对复杂类别结构数据流的分类精度,而且还能够快速适应概念漂移的情况。  相似文献   

9.
为提高语义图像分类器性能,提出一种基于公理化模糊集的语义图像层次关联规则分类器。首先,为提高算法精度,在对图像数据集进行特征提取基础上,采用公理化理论(AFS)构建图像集模糊概念的AFS属性表达,提高图像集属性辨识度;其次,为提高算法计算效率,考虑采用层次结构关联规则,构建语义图像分类器,利用概念之间的本体信息,提高并行分类能力;最后,通过对算法参数及横向对比实验,显示所提算法具有较高的计算精度和计算效率。  相似文献   

10.
本文讨论了多分类器组合中的分类器选择问题,提出一种基于遗传算法的分类器选择算法,此算法可以快速选出有效的分类器参与组合.文中给出了指定分类器数目和任意分类器数目两种情况下分类器选择的算法.最后在CENPARMI手写体数字数据库上验证了我们的算法和结论.实验结果表明,此种分类器选择算法具有较好的性能.  相似文献   

11.
Most of the research on text categorization has focused on classifying text documents into a set of categories with no structural relationships among them (flat classification). However, in many information repositories documents are organized in a hierarchy of categories to support a thematic search by browsing topics of interests. The consideration of the hierarchical relationship among categories opens several additional issues in the development of methods for automated document classification. Questions concern the representation of documents, the learning process, the classification process and the evaluation criteria of experimental results. They are systematically investigated in this paper, whose main contribution is a general hierarchical text categorization framework where the hierarchy of categories is involved in all phases of automated document classification, namely feature selection, learning and classification of a new document. An automated threshold determination method for classification scores is embedded in the proposed framework. It can be applied to any classifier that returns a degree of membership of a document to a category. In this work three learning methods are considered for the construction of document classifiers, namely centroid-based, naïve Bayes and SVM. The proposed framework has been implemented in the system WebClassIII and has been tested on three datasets (Yahoo, DMOZ, RCV1) which present a variety of situations in terms of hierarchical structure. Experimental results are reported and several conclusions are drawn on the comparison of the flat vs. the hierarchical approach as well as on the comparison of different hierarchical classifiers. The paper concludes with a review of related work and a discussion of previous findings vs. our findings.  相似文献   

12.
分层强化学习中的动态分层方法研究   总被引:1,自引:0,他引:1  
分层强化学习中现有的自动分层方法均是在对状态空间进行一定程度探测之后一次性生成层次结构,不充分探测不能保证求解质量,过度探测则影响学习速度,为了克服学习算法性能高度依赖于状态空间探测程度这个问题,本文提出一种动态分层方法,该方法将免疫聚类及二次应答机制融入Sutton提出的Option分层强化学习框架,能对Option状态空间进行动态调整,并沿着学习轨迹动态生成Option内部策略,以二维有障碍栅格空间内两点间最短路径规划为学习任务进行了仿真实验,结果表明,动态分层方法对状态空间探测程度的依赖性很小,动态分层方法更适用于解决大规模强化学习问题.  相似文献   

13.
探讨了Elearning及本体,提出了基于本体的Elearning系统层次结构模型,并重点研究了本体在其中的应用:用于描述学习材料语义的内容本体,用于定义学习材料上下文的上下文本体以及用于在学习课程中组织学习材料的结构本体。  相似文献   

14.
本体在E-learning系统中的应用研究   总被引:5,自引:0,他引:5  
探讨了E-learning及本体,提出了基于本体的E-learning系统层次结构模型,并重点研究了本体在其中的应用:用于描述学习材料语义的内容本体,用于定义学习材料上下文的上下文本体以及用于在学习课程中组织学习材料的结构本体。  相似文献   

15.
Fang X  Rau PL 《Ergonomics》2003,46(1-3):242-254
Two experiments were carried out to examine the effects of cultural differences between the Chinese and the US people on the perceived usability and search performance of World Wide Web (WWW) portal sites. Chinese users in Taiwan and US users in Chicago were recruited to perform searching tasks on two versions of Yahoo! portal site: the standard Yahoo! and Yahoo! Chinese. The layout of Yahoo! Chinese is the same as the layout of Yahoo!, and categories on Yahoo! Chinese have been translated from its US counterpart. A special browser was programmed to record all the keystroke data and participants were asked to fill out a satisfaction questionnaire after finishing the tasks. Significant differences of satisfaction and steps to perform some tasks were found between the two groups. The experiment results also provided more detailed insights into the cultural differences between the Chinese and the US users.  相似文献   

16.
多标记数据有很多的冗余特征和数据,为了解决多标记数据中冗余和无关特征,提高多标记学习算法的泛化能力。提出一个基于模拟退火的卷积式特征选择方法——SAML(simulated annealing based feature selection for multi-label data),已有的算法只是使用了遗传算法来进行优化,新算法采用模拟退火来寻找最优子集,其效果在已有的工作中表现出比前者遗传算法更好的效果。在用于公开评测的Yahoo网页分类数据集上的实验结果表明,SAML算法的性能优于新近提出的一些流行的多标记特征选择方法。  相似文献   

17.
《Ergonomics》2012,55(1-3):242-254
Two experiments were carried out to examine the effects of cultural differences between the Chinese and the US people on the perceived usability and search performance of World Wide Web (WWW) portal sites. Chinese users in Taiwan and US users in Chicago were recruited to perform searching tasks on two versions of Yahoo! portal site: the standard Yahoo! and Yahoo! Chinese. The layout of Yahoo! Chinese is the same as the layout of Yahoo!, and categories on Yahoo! Chinese have been translated from its US counterpart. A special browser was programmed to record all the keystroke data and participants were asked to fill out a satisfaction questionnaire after finishing the tasks. Significant differences of satisfaction and steps to perform some tasks were found between the two groups. The experiment results also provided more detailed insights into the cultural differences between the Chinese and the US users.  相似文献   

18.
Among the developments of information technology, the most popular tools nowadays for seeking the knowledge are the Google or Yahoo keywords-based search engines on the Internet. Users can easily obtain the information they need, but they still have to read and organize those documents by themselves. Due to that reason, users have to spend most of time in browsing and skipping the documents they have searched. In order to facilitate this process, this paper proposes a query-based ontology knowledge acquisition system which dynamically constructs query-based partial ontology to provide proficient answers for users’ queries. To construct the relationships and hierarchy of concepts in such an ontology, the formal concept analysis approach is adopted. After the ontology is built, the system can deduct the specific answer according to the relationships and hierarchy of ontology without asking users to read the whole document sets. We collected three kinds of sports news pages as source documents including those regarding NBA, CPBL and MLB to evaluate the precision of the system function in the experiment, which, as a result, reveals that the proposed approach indeed can work effectively.  相似文献   

19.
Multiclass classification has been investigated for many years in the literature. Recently, the scales of real-world multiclass classification applications have become larger and larger. For example, there are hundreds of thousands of categories employed in the Open Directory Project (ODP) and the Yahoo! directory. In such cases, the scalability of classification methods turns out to be a major concern. To tackle this problem, hierarchical classification is proposed and widely adopted to get better trade-off between effectiveness and efficiency. Unfortunately, many data sets are not explicitly organized in hierarchical forms and, therefore, hierarchical classification cannot be used directly. In this paper, we propose a novel algorithm to automatically mine a hierarchical structure from the flat taxonomy of a data corpus as a preparation for the adoption of hierarchical classification. In particular, we first compute matrices to represent the relations among categories, documents, and terms. And, then, we cocluster the three substances at different scales through consistent bipartite spectral graph copartitioning, which is formulated as a generalized singular value decomposition problem. At last, a hierarchical taxonomy is constructed from the category clusters. Our experiments showed that the proposed algorithm could discover very reasonable taxonomy hierarchy and help improve the classification accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号