首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A data warehouse is an important decision support system with cleaned and integrated data for knowledge discovery and data mining systems. In reality, the data warehouse mining system has provided many applicable solutions in industries, yet there are still many problems causing users extra problems in discovering knowledge or even failing to obtain the real and useful knowledge they need. To improve the overall data warehouse mining process, we present an intelligent data warehouse mining approach incorporated with schema ontology, schema constraint ontology, domain ontology and user preference ontology. The structures of these ontologies are illustrated and how they benefit the mining process is also demonstrated by examples utilizing rule mining. Finally, we present a prototype multidimensional association mining system, which with intelligent assistance through the support of the ontologies, can help users build useful data mining models, prevent ineffective pattern generation, discover concept extended rules, and provide an active knowledge re-discovering mechanism.  相似文献   

2.
Twitter has become one of the most popular social media platforms, widely used for discussion and information dissemination on all kinds of topics. As a result, both business and academics have researched methods to identify the topics being discussed on Twitter. Those methods can be employed for a number of applications, including emergency management, advertisements, and corporate/government communication. However, deriving topics from this short text based and highly dynamic environment remains a huge challenge. Most current methods use the content of tweets as the only source for topic derivation. Recently, tweet interactions have been considered for improving the quality of topic derivation. In this paper, we propose a method that considers both content and interactions with a temporal aspect to further improve the quality of topic derivation. The impact of the temporal aspect in user/tweet interactions is analyzed based on several Twitter datasets. The proposed method incorporates time when it clusters tweets and identifies representative terms for each topic. Experimental results show that the inclusion of the temporal aspect in the interactions results in a significant improvement in the quality of topic derivation comparing to existing baseline methods.  相似文献   

3.
An intelligent condition-based maintenance platform for rotating machinery   总被引:1,自引:0,他引:1  
Maintenance is of necessity for sustaining machinery availability and reliability in order to ensure productivity, product quality, on-time delivery, and safe working environment. The costly maintenance strategies such as corrective maintenance and scheduled maintenance have been progressively replaced by superior maintenance strategies in which condition-based maintenance (CBM) is one of the delegates. This strategy commonly consists of sequent modules such as data acquisition, signal processing, feature extraction and feature selection, condition monitoring, etc. However, approaches in literature which have been developed for each module and implemented for different applications are standalone instead of a comprehensive system. Furthermore, these approaches have been demonstrated in a laboratory environment without any industrial validations. For these reasons, an intelligent algorithm based CBM platform is proposed in this paper to be applied for rotating machinery easily and effectively. Subsequently, two case-studies are presented in order to evaluate the effectiveness of this platform in industrial applications.  相似文献   

4.
In this paper, an intelligent agent-based communication support platform for multimodal transport is developed. The rationale for doing so is found in the potential of such a system to increase cost efficiency, service and safety for different transport-related actors. Although, at present several comparable systems exist, their current implementation is far from successful because of technological and economic obstacles. The new expert communication platform put forward here (called MamMoeT) addresses these two issues by using a software agent-based approach. Software agents are pieces of software representing a single user. They are autonomous, communicative and intelligent. The MamMoeT system developed can be described as a real-time decision support system in which intelligent software agents handle communicative tasks, exchange desired amounts of information among different users using common exchange protocols which act as translators between different systems.  相似文献   

5.
基于特定领域的中文微博热点话题挖掘系统BTopicMiner   总被引:1,自引:0,他引:1  
李劲  张华  吴浩雄  向军 《计算机应用》2012,32(8):2346-2349
随着微博应用的迅猛发展,自动地从海量微博信息中提取出用户感兴趣的热点话题成为一个具有挑战性的研究课题。为此研究并提出了基于扩展的话题模型的中文微博热点话题抽取算法。为了解决微博信息固有的数据稀疏性问题,算法首先利用文本聚类方法将内容相关的微博消息合成为微博文档;基于微博之间的跟帖关系蕴含着话题的关联性的假设,算法对传统潜在狄利克雷分配(LDA)话题模型进行扩展以建模微博之间的跟帖关系;最后利用互信息(MI)计算被抽取出的话题的话题词汇用于热点话题推荐。为了验证扩展的话题抽取模型的有效性,实现了一个基于特定领域的中文微博热点话题挖掘的原型系统——BTopicMiner。实验结果表明:基于微博跟帖关系的扩展话题模型可以更准确地自动提取微博中的热点话题,同时利用MI度量自动计算得到的话题词汇和人工挑选的热点词汇之间的语义相似度达到75%以上。  相似文献   

6.
This work proposes an intelligent learning diagnosis system that supports a Web-based thematic learning model, which aims to cultivate learners’ ability of knowledge integration by giving the learners the opportunities to select the learning topics that they are interested, and gain knowledge on the specific topics by surfing on the Internet to search related learning courseware and discussing what they have learned with their colleagues. Based on the log files that record the learners’ past online learning behavior, an intelligent diagnosis system is used to give appropriate learning guidance to assist the learners in improving their study behaviors and grade online class participation for the instructor. The achievement of the learners’ final reports can also be predicted by the diagnosis system accurately. Our experimental results reveal that the proposed learning diagnosis system can efficiently help learners to expand their knowledge while surfing in cyberspace Web-based “theme-based learning” model.  相似文献   

7.
8.
TopCat: data mining for topic identification in a text corpus   总被引:3,自引:0,他引:3  
TopCat (topic categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. We present a novel method for identifying related items based on traditional data mining techniques. Frequent itemsets are generated from the groups of items, followed by clusters formed with a hypergraph partitioning scheme. We present an evaluation against a manually categorized ground truth news corpus; it shows this technique is effective in identifying topics in collections of news articles.  相似文献   

9.
INES (INtelligent Educational System) is an operative prototype of an e-learning platform. This platform includes several tools and technologies, such as: (i) semantic management of users and contents; (ii) conversational agents to communicate with students in natural language; (iii) BDI-based (Believes, Desires, Intentions) agents, which shape the tutoring module of the system; (iv) an inference engine; and (v) ontologies, to semantically model the users, their activities, and the learning contents. The main contribution of this paper is the intelligent tutoring module of the system. Briefly, the tasks of this module are to recognize each student (checking his/her system credentials) and to obtain information about his/her learning progress. So, it can be able to suggest to each student specific tasks to achieve his/her particular learning objectives, based on several parameters related to the existing learning paths and the student’s profile.  相似文献   

10.
The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based anti-virus systems fail to detect polymorphic/metamorphic and new, previously unseen malicious executables. Data mining methods such as Naive Bayes and Decision Tree have been studied on small collections of executables. In this paper, resting on the analysis of Windows APIs called by PE files, we develop the Intelligent Malware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtained from the anti-virus laboratory of KingSoft Corporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our IMDS system outperform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. Our system has already been incorporated into the scanning tool of KingSoft’s Anti-Virus software. A short version of the paper is appeared in [33]. The work is partially supported by NSF IIS-0546280 and an IBM Faculty Research Award. The authors would also like to thank the members in the anti-virus laboratory at KingSoft Corporation for their helpful discussions and suggestions.  相似文献   

11.
12.
江浩  陈兴蜀杜敏 《计算机应用》2013,33(11):3071-3075
热点话题挖掘是舆情监控的重要技术基础。针对现有的论坛热点话题挖掘方法没有解决数据中词汇噪声较多且热度评价方式单一的问题,提出一种基于主题聚簇评价的热点话题挖掘方法。采用潜在狄里克雷分配主题模型对论坛文本数据建模,对映射到主题空间的文档集去除主题噪声后用优化聚类中心选择的K-means++算法进行聚类,最后从主题突发度、主题纯净度和聚簇关注度三个方面对聚簇进行评价。通过实验分析得出主题噪声阈值设置为0.75,聚类中心数设置为50时,可以使聚类质量与聚类速度达到最优。真实数据集上的测试结果表明该方法可以有效地将聚簇按出现热点话题的可能性排序。最后设计了热点话题的展示方法。  相似文献   

13.
研究了我国企业竞争情报的热点主题和主题演化态势,利用主题挖掘与主题演化方法系统梳理了我国企业竞争情报领域的研究成果.通过Python自动提取及预处理文献数据,再利用共词分析、LDA模型和知识图谱挖掘该领域的核心科研群体和热点主题,最后结合主题演化方法梳理企业竞争情报的发展脉络.该研究可为企业竞争情报领域今后的相关探索提...  相似文献   

14.
提出了一种基于潜在语义的科技文献主题挖掘方法,描述了科技文献的主题挖掘模型。对科技文献集进行预处理,计算特征词权重,构造出词汇-文献矩阵。用改进的LSI算法对稀疏矩阵进行降维得到固定的主题-文献矩阵。取权重最高的主题作为该文献的主题。该方法利用Frobenius范数来规范矩阵,对稀疏矩阵进行降维,可以快速精确地挖掘出科技文献的主题。  相似文献   

15.
伴随着互联网大数据时代的来临,网络论坛数据呈爆炸式增长,这类数据具有社会性、随意性、分散性等特点,难以被直接使用。而论坛主题挖掘技术能从复杂的论坛数据中识别出用户集中讨论的文本内容,并从中提取主题,以达到提炼论坛主要论点的目的。对论坛主题挖掘进行了问题描述和任务框架梳理,并依照任务框架对现有技术进行了分类,基本类型为论坛文本预处理、主题挖掘算法和主题建模,详细阐述了以上三类论坛主题挖掘技术的基本特征和典型方法,进行了比较与总结,对论坛主题挖掘当前存在的问题及其发展趋势进行了分析与讨论。  相似文献   

16.
The task-oriented nature of data mining (DM) has already been dealt successfully with the employment of intelligent agent systems that distribute tasks, collaborate and synchronize in order to reach their ultimate goal, the extraction of knowledge. A number of sophisticated multi-agent systems (MAS) that perform DM have been developed, proving that agent technology can indeed be used in order to solve DM problems. Looking into the opposite direction though, knowledge extracted through DM has not yet been exploited on MASs. The inductive nature of DM imposes logic limitations and hinders the application of the extracted knowledge on such kind of deductive systems. This problem can be overcome, however, when certain conditions are satisfied a priori. In this paper, we present an approach that takes the relevant limitations and considerations into account and provides a gateway on the way DM techniques can be employed in order to augment agent intelligence. This work demonstrates how the extracted knowledge can be used for the formulation initially, and the improvement, in the long run, of agent reasoning.  相似文献   

17.
A virtual enterprise (VE) is a dynamic alliance of companies collaborating for the accomplishment of a specific business goal. To establish a VE, it is very important for the VE initiator to select appropriate partners. General criteria such as price, lead time, quality, etc. are the major concerns for most VE initiators. However, in today’s environmentally conscious society, environmental issues such as enterprise green image, product eco-design, etc. are increasingly receiving attention. Thus, it is worth to research on how to select the appropriate collaborative partners to establish an ecological VE.The objective of this paper is to establish a multi-agent system platform for individual companies to form an ecological VE based on ontology theory and intelligent agents. The ontological approaches include shared ontology construction, ontology matching, ontology integration, ontology storage and ontology reasoning. In the generalized case that the VE initiator is a manufacturer and the collaborating partner are suppliers, the multi-agent system comprises three types of intelligent agents, namely, knowledge manager agent (KMrA), manufacturer agent (MA) and supplier agent (SA). MA and SA represent the capabilities and interests of the VE initiator and the VE partners, respectively. KMrA is in charge of functioning sub-tasks of the ontological approach. To select partners for the ecological VE, the VE initiator will also consider the environmental criteria, in addition to the general supplier selection criteria such as price, quantity, quality and lead time. The environmental criteria may include factors such as environmental management, green image, green product and pollution control. The complete set of selection criteria, including the environmental criteria, are categorised into quantitative or qualitative criteria. The formation of ecological VE is then divided into two stages, that is, candidate supplier selection based on qualitative criteria, and ultimate supplier selection based on quantitative criteria. A simplified example is introduced to illustrate and justify the proposed ontological approaches and intelligent agent platform.  相似文献   

18.
针对现阶段数字化矿山建设中存在矿井基础参数采集内容少、集成数据综合利用价值低、数据应用分析比较单一等问题,探讨了智能矿山平台的内涵,提出了智能矿山平台设计架构,详细介绍了平台实现的关键技术。该平台利用感知技术及高可靠性的现场控制技术,可实现对作业现场环境、人员、设备的全方位感知及物联控制,对矿山整体生产过程实现智能监控;利用大数据处理技术构建统一的数据运维层,可实现矿山主数据、实时监测监控数据、地理地测数据、运营管理数据等的统一管理、利用,提高数据利用率;利用大数据及深度学习技术将采集、录入及抽取的数据进行汇总、分类处理,实现数据的综合利用,提升矿山管控能力。  相似文献   

19.
Endoscopy is an important tool for gastric cancer screening. Due to the lack of effective decision support system for endoscopy, the detection of gastric cancer in the clinic is usually with low sensitivity. In this paper, we propose a Genetic Algorithm optimized Neural Network (GAoNN) approach for gastric cancer detection based on endoscopy reports mining. Considering the fact that gastric cancer sensitivity can significantly improve the 5-year survival rate of patients, both the prediction accuracy and the sensitivity are employed to construct a multiobjective optimization model for enhancing the classification performance of GAoNN. In particular, we extended an effective genetic algorithm Nondominated Sorting Genetic Algorithm II (NSGA-II) to train a neural network and reduced the complexity in training hyperparameters and improved the efficiency by substituting the computationally intensive stochastic gradient descent (SGD) algorithm in a neural network. Specifically, we designed the novel crossover and mutation operators and modified the nondominated ranking and crowding distance sorting procedures in NSGA-II for GAoNN. Through testing on 8,546 real-world endoscopy reports, we show that GAoNN achieves a prediction accuracy up to 83.74%, which is better than several competitors by significantly increasing sensitivity to 83.14%. GAoNN also reduces the training time by 30.94% when compared with conventional SGD-based training, which indicates the feasibility of GAoNN in clinical practice.  相似文献   

20.
为了解决传统的文本主题模型对微博主题挖掘准确率低及不考虑主题之间关联的问题,针对中文微博语料本身的特点,分析LDA和HMM模型优缺点,提出了微博主题挖掘模型MB-HL(Microblog-Hidden Markov Model Latent Dirichlet Allocation)。该模型用逐条微博作为处理单元,建立分布主题-词语矩阵并进行优化,通过LDA模型对微博用户不同的行为建模并提取特征,利用HMM模型强大的时序状态建模能力弥补LDA在主题相关性上的不足,采用Gibbs采样进行推理求解。在真实的新浪微博数据上对比实验表明MB-HL模型能提高近9%主题关键词的准确度,并能有效地发现主题之间的关联关系。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号