首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study analyses the online questions and chat messages automatically recorded by a live video streaming (LVS) system using data mining and text mining techniques. We apply data mining and text mining techniques to analyze two different datasets and then conducted an in-depth correlation analysis for two educational courses with the most online questions and chat messages respectively. The study found the discrepancies as well as similarities in the students’ patterns and themes of participation between online questions (student–instructor interaction) and online chat messages (student–students interaction or peer interaction). The results also identify disciplinary differences in students’ online participation. A correlation is found between the number of online questions students asked and students’ final grades. The data suggests that a combination of using data mining and text mining techniques for a large amount of online learning data can yield considerable insights and reveal valuable patterns in students’ learning behaviors. Limitations with data and text mining were also revealed and discussed in the paper.  相似文献   

2.
This paper describes data mining and data warehousing techniques that can improve the performance and usability of Intrusion Detection Systems (IDS). Current IDS do not provide support for historical data analysis and data summarization. This paper presents techniques to model network traffic and alerts using a multi-dimensional data model and star schemas. This data model was used to perform network security analysis and detect denial of service attacks. Our data model can also be used to handle heterogeneous data sources (e.g. firewall logs, system calls, net-flow data) and enable up to two orders of magnitude faster query response times for analysts as compared to the current state of the art. We have used our techniques to implement a prototype system that is being successfully used at Army Research Labs. Our system has helped the security analyst in detecting intrusions and in historical data analysis for generating reports on trend analysis. Recommended by: Ashfaq Khokhar  相似文献   

3.
本文将数据挖掘技术应用于企业级客户信息分析中,收集各行业企业信息数据,利用datastage工具,建立数据仓库,预计客户分布情况和潜在客户源。对客户按区域、行业分类,挖掘客户同级别竞争对手企业信息,及企业在同行中的定位,最后总结了企业级客户信息挖掘分析的应用。  相似文献   

4.
意见领袖是社交网络和社交媒体中的重要节点,是信息传播的关键性因素。在QQ群聊天中由于参与用户较多,各种话题比较繁杂,因此识别其中的意见领袖比较困难。基于此提出一种基于应答关系来挖掘QQ群中意见领袖的方法,该方法首先构建回应词词库,然后基于Aho-Corasick算法来匹配聊天文本中的回应词数据,构建出用户应答关系的网络结构,最后使用社交网络中重要节点识别的方法来发现意见领袖。该方法对QQ群中的意见领袖发现具有较高的准确率,在融合QQ群用户交互社交网络的节点重要性特征后,能够达到更好的意见领袖发现效果。  相似文献   

5.
An effective incident information management system needs to deal with several challenges. It must support heterogeneous distributed incident data, allow decision makers (DMs) to detect anomalies and extract useful knowledge, assist DMs in evaluating the risks and selecting an appropriate alternative during an incident, and provide differentiated services to satisfy the requirements of different incident management phases. To address these challenges, this paper proposes an incident information management framework that consists of three major components. The first component is a high-level data integration module in which heterogeneous data sources are integrated and presented in a uniform format. The second component is a data mining module that uses data mining methods to identify useful patterns and presents a process to provide differentiated services for pre-incident and post-incident information management. The third component is a multi-criteria decision-making (MCDM) module that utilizes MCDM methods to assess the current situation, find the satisfactory solutions, and take appropriate responses in a timely manner. To validate the proposed framework, this paper conducts a case study on agrometeorological disasters that occurred in China between 1997 and 2001. The case study demonstrates that the combination of data mining and MCDM methods can provide objective and comprehensive assessments of incident risks.  相似文献   

6.
Chronic asthmatic sufferers need to be constantly observed to prevent sudden attacks. In order to improve the efficiency and effectiveness of patient monitoring, we proposed in this paper a novel data mining mechanism for predicting attacks of chronic diseases by considering of both bio-signals of patients and environmental factors. We proposed two data mining methods, namely Pattern Based Decision Tree (PBDT) and Pattern Based Class-Association Rule (PBCAR). Both methods integrate the concepts of sequential pattern mining to extract features of asthma attacks, and then build classifiers with the concepts of decision tree mining and rule-based method respectively. Besides the general clinical data of patients, we considered environmental factors, which are related to many chronic diseases. For experimental evaluations, we adopted the children asthma allergic dataset collated from a hospital in Taiwan as well as the environmental factors like weather and air pollutant data. The experimental results show that PBCAR delivers 86.89% of accuracy and 84.12% of recall, and PBDT shows 87.52% accuracy and 85.59 of recall. These results also indicate that our methods can perform high accuracy and recall on predictions of chronic disease attacks. The readable rules of both classifiers can provide patients and healthcare workers with insights on essential illness related information. At the same time, additional environmental factors of input data are also proven to be valuable in predicting attacks.  相似文献   

7.
Combining data mining and Game Theory in manufacturing strategy analysis   总被引:1,自引:1,他引:0  
The work presented in this paper is result of a rapid increase of interest in game theoretical analysis and a huge growth of game related databases. It is likely that useful knowledge can be extracted from these databases. This paper argues that applying data mining algorithms together with Game Theory poses a significant potential as a new way to analyze complex engineering systems, such as strategy selection in manufacturing analysis. Recent research shows that combining data mining and Game Theory has not yet come up with reasonable solutions for the representation and structuring of the knowledge in a game. In order to examine the idea, a novel approach of fusing these two techniques has been developed in this paper and tested on real-world manufacturing datasets. The obtained results have been indicated the superiority of the proposed approach. Some fruitful directions for future research are outlined as well.  相似文献   

8.
A new stable version (“production version”) v5.28.00 of ROOT [1] has been published [2]. It features several major improvements in many areas, most noteworthy data storage performance as well as statistics and graphics features. Some of these improvements have already been predicted in the original publication Antcheva et al. (2009) [3]. This version will be maintained for at least 6 months; new minor revisions (“patch releases”) will be published [4] to solve problems reported with this version.

New version program summary

Program title: ROOTCatalogue identifier: AEFA_v2_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFA_v2_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: GNU Lesser Public License v.2.1No. of lines in distributed program, including test data, etc.: 2 934 693No. of bytes in distributed program, including test data, etc.: 1009Distribution format: tar.gzProgramming language: C++Computer: Intel i386, Intel x86-64, Motorola PPC, Sun Sparc, HP PA-RISCOperating system: GNU/Linux, Windows XP/Vista/7, Mac OS X, FreeBSD, OpenBSD, Solaris, HP-UX, AIXHas the code been vectorized or parallelized?: YesRAM: > 55 MbytesClassification: 4, 9, 11.9, 14Catalogue identifier of previous version: AEFA_v1_0Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 2499Does the new version supersede the previous version?: YesNature of problem: Storage, analysis and visualization of scientific dataSolution method: Object store, wide range of analysis algorithms and visualization methodsReasons for new version: Added features and corrections of deficienciesSummary of revisions: The release notes at http://root.cern.ch/root/v528/Version528.news.html give a module-oriented overview of the changes in v5.28.00. Highlights include
  • • 
    File format Reading of TTrees has been improved dramatically with respect to CPU time (30%) and notably with respect to disk space.
  • • 
    Histograms A new TEfficiency class has been provided to handle the calculation of efficiencies and their uncertainties, TH2Poly for polygon-shaped bins (e.g. maps), TKDE for kernel density estimation, and TSVDUnfold for singular value decomposition.
  • • 
    Graphics Kerning is now supported in TLatex, PostScript and PDF; a table of contents can be added to PDF files. A new font provides italic symbols. A TPad containing GL can be stored in a binary (i.e. non-vector) image file; add support for full-scene anti-aliasing. Usability enhancements to EVE.
  • • 
    Math New interfaces for generating random number according to a given distribution, goodness of fit tests of unbinned data, binning multidimensional data, and several advanced statistical functions were added.
  • • 
    RooFit Introduction of HistFactory; major additions to RooStats.
  • • 
    TMVA Updated to version 4.1.0, adding e.g. the support for simultaneous classification of multiple output classes for several multivariate methods.
  • • 
    PROOF Many new features, adding to PROOF?s usability, plus improvements and fixes.
  • • 
    PyROOT Support of Python 3 has been added.
  • • 
    Tutorials Several new tutorials were provided for above new features (notably RooStats).
A detailed list of all the changes is available at http://root.cern.ch/root/htmldoc/examples/V5.Additional comments: For an up-to-date author list see: http://root.cern.ch/drupal/content/root-development-team and http://root.cern.ch/drupal/content/former-root-developers.The distribution file for this program is over 30 Mbytes and therefore is not delivered directly when download or E-mail is requested. Instead a html file giving details of how the program can be obtained is sent.Running time: Depending on the data size and complexity of analysis algorithms.References:
  • [1] 
    http://root.cern.ch.
  • [2] 
    http://root.cern.ch/drupal/content/production-version-528.
  • [3] 
    I. Antcheva, M. Ballintijn, B. Bellenot, M. Biskup, R. Brun, N. Buncic, Ph. Canal, D. Casadei, O. Couet, V. Fine, L. Franco, G. Ganis, A. Gheata, D. Gonzalez Maline, M. Goto, J. Iwaszkiewicz, A. Kreshuk, D. Marcos Segura, R. Maunder, L. Moneta, A. Naumann, E. Offermann, V. Onuchin, S. Panacek, F. Rademakers, P. Russo, M. Tadel, ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization, Comput. Phys. Commun. 180 (2009) 2499.
  • [4] 
    http://root.cern.ch/drupal/content/root-version-v5-28-00-patch-release-notes.
  相似文献   

9.
In physics, a spectrum is, the series of colored bands diffracted and arranged in the order of their respective wave lengths by the passage of white light through a prism or other diffracting medium. Outside of physics, a spectrum is a condition that is not limited to a specific set of values but can vary infinitely within a continuum. In commerce, an effective visualization tool, especially for stakeholders or managers, is a brand spectrum diagram highlighting where the company’s brands and products are situated compared to other competitors. This paper investigates the research issues on product and brand spectrum in the beverage product market of Taiwan, which proposes using the Apriori algorithm of association rules, and clustering analysis based on an ontology-based data mining approach, for mining customer and product knowledge from the database. Knowledge extracted from data-mining results is illustrated as knowledge patterns, rules, and maps in order to propose suggestions and solutions to beverage firms for possible product development, promotion, and marketing.  相似文献   

10.
A method for analyzing production systems by applying multi-objective optimization and data mining techniques on discrete-event simulation models, the so-called Simulation-based Innovization (SBI) is presented in this paper. The aim of the SBI analysis is to reveal insight on the parameters that affect the performance measures as well as to gain deeper understanding of the problem, through post-optimality analysis of the solutions acquired from multi-objective optimization. This paper provides empirical results from an industrial case study, carried out on an automotive machining line, in order to explain the SBI procedure. The SBI method has been found to be particularly suitable in this case study as the three objectives under study, namely total tardiness, makespan and average work-in-process, are in conflict with each other. Depending on the system load of the line, different decision variables have been found to be influencing. How the SBI method is used to find important patterns in the explored solution set and how it can be valuable to support decision making in order to improve the scheduling under different system loadings in the machining line are addressed.  相似文献   

11.
In recent years, data mining has become one of the most popular techniques for data owners to determine their strategies. Association rule mining is a data mining approach that is used widely in traditional databases and usually to find the positive association rules. However, there are some other challenging rule mining topics like data stream mining and negative association rule mining. Besides, organizations want to concentrate on their own business and outsource the rest of their work. This approach is named “database as a service concept” and provides lots of benefits to data owner, but, at the same time, brings out some security problems. In this paper, a rule mining system has been proposed that provides efficient and secure solution to positive and negative association rule computation on XML data streams in database as a service concept. The system is implemented and several experiments have been done with different synthetic data sets to show the performance and efficiency of the proposed system.  相似文献   

12.
For product design and development, crowdsourcing shows huge potential for fostering creativity and has been regarded as one important approach to acquiring innovative concepts. Nevertheless, prior to the approach could be effectively implemented, the following challenges concerning crowdsourcing should be properly addressed: (1) burdensome concept review process to deal with a large amount of crowd-sourced design concepts; (2) insufficient consideration in integrating design knowledge and principles into existing data processing methods/algorithms for crowdsourcing; and (3) lack of a quantitative decision support process to identify better concepts. To tackle these problems, a product concept evaluation and selection approach, which comprises three modules, is proposed. These modules are respectively: (1) a data mining module to extract meaningful information from online crowd-sourced concepts; (2) a concept re-construction module to organize word tokens into a unified frame using domain ontology and extended design knowledge; and (3) a decision support module to select better concepts in a simplified manner. A pilot study on future PC (personal computer) design was conducted to demonstrate the proposed approach. The results show that the proposed approach is promising and may help to improve the concept review and evaluation efficiency; facilitate data processing using design knowledge; and enhance the reliability of concept selection decisions.  相似文献   

13.
The Iowa Flood Information System (IFIS) is a web-based platform developed at the Iowa Flood Center (IFC) in order to provide access to flood inundation maps, real-time flood conditions, flood forecasts, flood-related data, information, applications, and interactive visualizations for communities in Iowa. The IFIS provides community-centric watershed and river characteristics, rainfall conditions, and stream-flow data and visualization tools. Interactive interfaces allow access to inundation maps for different stage and return period values as well as to flooding scenarios with contributions from multiple rivers. Real-time and historical data of water levels, gauge heights, hourly and seasonal flood forecasts, and rainfall conditions are made available by integrating data from NEXRAD radars, IFC stream sensors, and USGS and National Weather Service (NWS) stream gauges. The IFIS provides customized flood-related data, information, and visualization for over 1000 communities in Iowa. To help reduce the damage from floods, the IFIS helps communities make better-informed decisions about the occurrence of floods and alerts communities in advance using NWS and IFC forecasts. The integrated and modular design and structure of the IFIS allows easy adaptation of the system in other regional and scientific domains. This paper provides an overview of the design and capabilities of the IFIS that was developed as a platform to provide one-stop access to flood-related information.  相似文献   

14.
孤立点挖掘在高等学校科技统计数据分析中的应用   总被引:1,自引:0,他引:1  
孤立点挖掘是一项有价值的、重要的知识发现,研究孤立点的异常行为能发现隐藏在数据中有价值的信息。本文在介绍孤立点及其挖掘算法的基础上,讨论了基于距离和的孤立点挖掘算法,并将该算法创新地应用于高等学校科技统计数据分析中。结果表明,该算法可以有效地挖掘出高等学校科技统计数据中的异常现象,对数据的真实性的核对起到非常重要的作用。  相似文献   

15.
In statistical data mining and spatial statistics, many problems (such as detection and clustering) can be formulated as optimization problems whose objective functions are functions of consecutive subsequences. Some examples are (1) searching for a high activity region in a Bernoulli sequence, (2) estimating an underlying boxcar function in a time series, and (3) locating a high concentration area in a point process. A comprehensive search algorithm always ends up with a high order of computational complexity. For example, if a length-n sequence is considered, the total number of all possible consecutive subsequences is A comprehensive search algorithm requires at least O(n2) numerical operations.

We present a multiscale-approximation-based approach. It is shown that most of the time, this method finds the exact same solution as a comprehensive search algorithm does. The derived multiscale approximation methods (MAMEs) have low complexity: for a length-n sequence, the computational complexity of an MAME can be as low as O(n). Numerical simulations verify these improvements.

The MAME approach is particularly suitable for problems having large size data. One known drawback is that this method does not guarantee the exact optimal solution in every single run. However, simulations show that as long as the underlying subjects possess statistical significance, a MAME finds the optimal solution with probability almost equal to one.  相似文献   


16.
Enterprise Resource Planning systems tend to deploy Supply Chain Management and/or Customer Relationship Management techniques, in order to successfully fuse information to customers, suppliers, manufacturers and warehouses, and therefore minimize system-wide costs while satisfying service level requirements. Although efficient, these systems are neither versatile nor adaptive, since newly discovered customer trends cannot be easily integrated with existing knowledge. Advancing on the way the above mentioned techniques apply on ERP systems, we have developed a multi-agent system that introduces adaptive intelligence as a powerful add-on for ERP software customization. The system can be thought of as a recommendation engine, which takes advantage of knowledge gained through the use of data mining techniques, and incorporates it into the resulting company selling policy. The intelligent agents of the system can be periodically retrained as new information is added to the ERP. In this paper, we present the architecture and development details of the system, and demonstrate its application on a real test case.  相似文献   

17.
Data Mining and Information Retrieval is an emerging interdisciplinary discipline dealing with Information Retrieval and Data Mining techniques. It has undergone rapid development with the advances in mathematics, statistics, information science, and computer science. In this paper, we present an empirical analysis of publication metadata obtained from 6 top-tier journals and 9 conferences for the first 16 years of the 21st Century, and evaluate the dynamic characteristics of Data Mining and Information Retrieval. We find a steady growth both in terms of productivity and impact, evidenced by the unabated number of publications/citations over the period of study. We note that the modality for co-operation in this field is changing from independent to collaborative. Furthermore, according to the citation pattern, the field is becoming open-minded as illustrated by a gradual decline of self-citation rates, which was dropped to 10% in 2015, nearly three times lower than what it was in 2000. Finally, we explore the inner structure relying on the topics evolution from the aspects of popular keywords/topics identification and evolution. Overall, this study provides insights of Data Mining and Information Retrieval behind its demonstrated growth in the recent past, with the ultimate goal of revealing its potential of driving scientific innovation in the future.  相似文献   

18.
数据挖掘技术在高校人力资源管理中的应用研究   总被引:2,自引:0,他引:2       下载免费PDF全文
针对一个真实的高校人力资源数据集,分析了在高校人力资源管理中适用的数据挖掘技术与过程,通过探索性的数据分析进行了特征值的离散化和特征值的归约、特征选择和构造等方面的分析,并给出了衡量教学科研人员科研能力水平的分类标签建议。利用决策树模型分析了影响教师科研能力的几个关键因素,聚类分析对教师的现状进行了客观而有效地描述,关联规则技术描述了教学、科研和社会工作等几方面的关系。研究分析的结果具有较好的解释性。  相似文献   

19.
This paper reports a work that was intended to reveal the connection between topics investigated by conference papers and journal papers. This work selected hundreds of papers in data mining and information retrieval from well-known databases and showed that the topics covered by conference papers in a year often leads to similar topics covered by journal papers in the subsequent year and vice versa. This study used some existing algorithms and combination of these algorithms to proposed a new detective procedure for the researchers to detect the new trend and get the academic intelligence from conferences and journals.The goal of this research is fourfold: First, the research investigates if the conference papers’ themes lead the journal papers’. Second, the research examines how the new research themes can be identified from the conference papers. Third, the research looks at a specific area such as information retrieval and data mining as an illustration. Fourth, the research studies any inconsistencies of the correlation between the conference papers and the journal papers.This study explores the connections between the academic publications. The methodologies of information retrieval and data mining can be exploited to discover the relationships between published papers among all topics. By discovering the connections between conference papers and journal papers, researchers can improve the effectiveness of their research by identifying academic intelligence.This study discusses how conference papers and journal papers are related. The topics of conference papers are identified to determine whether they represent new trend discussed in journal papers. An automatic examination procedure based on information retrieval and data mining is also proposed to minimize the time and human resources required to predict further research developments. This study develops a new procedure and collects a dataset to verify those problems. Analytical results demonstrate that the conference papers submitted to journals papers are similar each year. Conference papers certainly affect the journal papers published over three years. About 87.23% of data points from papers published in 1991–2007 support our assumption. The research is intended to help researchers identify new trend in their research fields, and focus on the urgent topics. This is particularly valuable for new researchers in their field, or those who wish to perform cross-domain studies.  相似文献   

20.
Experimental analysis of the performance of a proposed method is a crucial and necessary task in an investigation. In this paper, we focus on the use of nonparametric statistical inference for analyzing the results obtained in an experiment design in the field of computational intelligence. We present a case study which involves a set of techniques in classification tasks and we study a set of nonparametric procedures useful to analyze the behavior of a method with respect to a set of algorithms, such as the framework in which a new proposal is developed.Particularly, we discuss some basic and advanced nonparametric approaches which improve the results offered by the Friedman test in some circumstances. A set of post hoc procedures for multiple comparisons is presented together with the computation of adjusted p-values. We also perform an experimental analysis for comparing their power, with the objective of detecting the advantages and disadvantages of the statistical tests described. We found that some aspects such as the number of algorithms, number of data sets and differences in performance offered by the control method are very influential in the statistical tests studied. Our final goal is to offer a complete guideline for the use of nonparametric statistical procedures for performing multiple comparisons in experimental studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号