共查询到20条相似文献,搜索用时 0 毫秒
1.
This study analyses the online questions and chat messages automatically recorded by a live video streaming (LVS) system using data mining and text mining techniques. We apply data mining and text mining techniques to analyze two different datasets and then conducted an in-depth correlation analysis for two educational courses with the most online questions and chat messages respectively. The study found the discrepancies as well as similarities in the students’ patterns and themes of participation between online questions (student–instructor interaction) and online chat messages (student–students interaction or peer interaction). The results also identify disciplinary differences in students’ online participation. A correlation is found between the number of online questions students asked and students’ final grades. The data suggests that a combination of using data mining and text mining techniques for a large amount of online learning data can yield considerable insights and reveal valuable patterns in students’ learning behaviors. Limitations with data and text mining were also revealed and discussed in the paper. 相似文献
2.
This paper describes data mining and data warehousing techniques that can improve the performance and usability of Intrusion
Detection Systems (IDS). Current IDS do not provide support for historical data analysis and data summarization. This paper
presents techniques to model network traffic and alerts using a multi-dimensional data model and star schemas. This data model was used to perform network security analysis and detect denial of service attacks. Our data model can also
be used to handle heterogeneous data sources (e.g. firewall logs, system calls, net-flow data) and enable up to two orders
of magnitude faster query response times for analysts as compared to the current state of the art. We have used our techniques
to implement a prototype system that is being successfully used at Army Research Labs. Our system has helped the security
analyst in detecting intrusions and in historical data analysis for generating reports on trend analysis.
Recommended by: Ashfaq Khokhar 相似文献
3.
本文将数据挖掘技术应用于企业级客户信息分析中,收集各行业企业信息数据,利用datastage工具,建立数据仓库,预计客户分布情况和潜在客户源。对客户按区域、行业分类,挖掘客户同级别竞争对手企业信息,及企业在同行中的定位,最后总结了企业级客户信息挖掘分析的应用。 相似文献
4.
An effective incident information management system needs to deal with several challenges. It must support heterogeneous distributed incident data, allow decision makers (DMs) to detect anomalies and extract useful knowledge, assist DMs in evaluating the risks and selecting an appropriate alternative during an incident, and provide differentiated services to satisfy the requirements of different incident management phases. To address these challenges, this paper proposes an incident information management framework that consists of three major components. The first component is a high-level data integration module in which heterogeneous data sources are integrated and presented in a uniform format. The second component is a data mining module that uses data mining methods to identify useful patterns and presents a process to provide differentiated services for pre-incident and post-incident information management. The third component is a multi-criteria decision-making (MCDM) module that utilizes MCDM methods to assess the current situation, find the satisfactory solutions, and take appropriate responses in a timely manner. To validate the proposed framework, this paper conducts a case study on agrometeorological disasters that occurred in China between 1997 and 2001. The case study demonstrates that the combination of data mining and MCDM methods can provide objective and comprehensive assessments of incident risks. 相似文献
5.
Chronic asthmatic sufferers need to be constantly observed to prevent sudden attacks. In order to improve the efficiency and effectiveness of patient monitoring, we proposed in this paper a novel data mining mechanism for predicting attacks of chronic diseases by considering of both bio-signals of patients and environmental factors. We proposed two data mining methods, namely Pattern Based Decision Tree ( PBDT) and Pattern Based Class-Association Rule ( PBCAR). Both methods integrate the concepts of sequential pattern mining to extract features of asthma attacks, and then build classifiers with the concepts of decision tree mining and rule-based method respectively. Besides the general clinical data of patients, we considered environmental factors, which are related to many chronic diseases. For experimental evaluations, we adopted the children asthma allergic dataset collated from a hospital in Taiwan as well as the environmental factors like weather and air pollutant data. The experimental results show that PBCAR delivers 86.89% of accuracy and 84.12% of recall, and PBDT shows 87.52% accuracy and 85.59 of recall. These results also indicate that our methods can perform high accuracy and recall on predictions of chronic disease attacks. The readable rules of both classifiers can provide patients and healthcare workers with insights on essential illness related information. At the same time, additional environmental factors of input data are also proven to be valuable in predicting attacks. 相似文献
6.
The work presented in this paper is result of a rapid increase of interest in game theoretical analysis and a huge growth
of game related databases. It is likely that useful knowledge can be extracted from these databases. This paper argues that
applying data mining algorithms together with Game Theory poses a significant potential as a new way to analyze complex engineering systems, such as strategy selection in manufacturing
analysis. Recent research shows that combining data mining and Game Theory has not yet come up with reasonable solutions for the representation and structuring of the knowledge in a game. In order
to examine the idea, a novel approach of fusing these two techniques has been developed in this paper and tested on real-world
manufacturing datasets. The obtained results have been indicated the superiority of the proposed approach. Some fruitful directions
for future research are outlined as well. 相似文献
7.
In physics, a spectrum is, the series of colored bands diffracted and arranged in the order of their respective wave lengths by the passage of white light through a prism or other diffracting medium. Outside of physics, a spectrum is a condition that is not limited to a specific set of values but can vary infinitely within a continuum. In commerce, an effective visualization tool, especially for stakeholders or managers, is a brand spectrum diagram highlighting where the company’s brands and products are situated compared to other competitors. This paper investigates the research issues on product and brand spectrum in the beverage product market of Taiwan, which proposes using the Apriori algorithm of association rules, and clustering analysis based on an ontology-based data mining approach, for mining customer and product knowledge from the database. Knowledge extracted from data-mining results is illustrated as knowledge patterns, rules, and maps in order to propose suggestions and solutions to beverage firms for possible product development, promotion, and marketing. 相似文献
8.
A new stable version (“production version”) v5.28.00 of ROOT [1] has been published [2]. It features several major improvements in many areas, most noteworthy data storage performance as well as statistics and graphics features. Some of these improvements have already been predicted in the original publication Antcheva et al. (2009) [3]. This version will be maintained for at least 6 months; new minor revisions (“patch releases”) will be published [4] to solve problems reported with this version. New version program summaryProgram title: ROOT Catalogue identifier: AEFA_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFA_v2_0.html Program obtainable from: CPC Program Library, Queen?s University, Belfast, N. Ireland Licensing provisions: GNU Lesser Public License v.2.1 No. of lines in distributed program, including test data, etc.: 2 934 693 No. of bytes in distributed program, including test data, etc.: 1009 Distribution format: tar.gz Programming language: C++ Computer: Intel i386, Intel x86-64, Motorola PPC, Sun Sparc, HP PA-RISC Operating system: GNU/Linux, Windows XP/Vista/7, Mac OS X, FreeBSD, OpenBSD, Solaris, HP-UX, AIX Has the code been vectorized or parallelized?: Yes RAM: > 55 Mbytes Classification: 4, 9, 11.9, 14 Catalogue identifier of previous version: AEFA_v1_0 Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 2499 Does the new version supersede the previous version?: Yes Nature of problem: Storage, analysis and visualization of scientific data Solution method: Object store, wide range of analysis algorithms and visualization methods Reasons for new version: Added features and corrections of deficiencies Summary of revisions: The release notes at http://root.cern.ch/root/v528/Version528.news.html give a module-oriented overview of the changes in v5.28.00. Highlights include • File format Reading of TTrees has been improved dramatically with respect to CPU time (30%) and notably with respect to disk space. | • Histograms A new TEfficiency class has been provided to handle the calculation of efficiencies and their uncertainties, TH2Poly for polygon-shaped bins (e.g. maps), TKDE for kernel density estimation, and TSVDUnfold for singular value decomposition. | • Graphics Kerning is now supported in TLatex, PostScript and PDF; a table of contents can be added to PDF files. A new font provides italic symbols. A TPad containing GL can be stored in a binary (i.e. non-vector) image file; add support for full-scene anti-aliasing. Usability enhancements to EVE. | • Math New interfaces for generating random number according to a given distribution, goodness of fit tests of unbinned data, binning multidimensional data, and several advanced statistical functions were added. | • RooFit Introduction of HistFactory; major additions to RooStats. | • TMVA Updated to version 4.1.0, adding e.g. the support for simultaneous classification of multiple output classes for several multivariate methods. | • PROOF Many new features, adding to PROOF?s usability, plus improvements and fixes. | • PyROOT Support of Python 3 has been added. | • Tutorials Several new tutorials were provided for above new features (notably RooStats). |
A detailed list of all the changes is available at http://root.cern.ch/root/htmldoc/examples/V5. Additional comments: For an up-to-date author list see: http://root.cern.ch/drupal/content/root-development-team and http://root.cern.ch/drupal/content/former-root-developers.The distribution file for this program is over 30 Mbytes and therefore is not delivered directly when download or E-mail is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Depending on the data size and complexity of analysis algorithms. References:[2] http://root.cern.ch/drupal/content/production-version-528. | [3] I. Antcheva, M. Ballintijn, B. Bellenot, M. Biskup, R. Brun, N. Buncic, Ph. Canal, D. Casadei, O. Couet, V. Fine, L. Franco, G. Ganis, A. Gheata, D. Gonzalez Maline, M. Goto, J. Iwaszkiewicz, A. Kreshuk, D. Marcos Segura, R. Maunder, L. Moneta, A. Naumann, E. Offermann, V. Onuchin, S. Panacek, F. Rademakers, P. Russo, M. Tadel, ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization, Comput. Phys. Commun. 180 (2009) 2499. | [4] http://root.cern.ch/drupal/content/root-version-v5-28-00-patch-release-notes. | 相似文献
9.
A method for analyzing production systems by applying multi-objective optimization and data mining techniques on discrete-event simulation models, the so-called Simulation-based Innovization (SBI) is presented in this paper. The aim of the SBI analysis is to reveal insight on the parameters that affect the performance measures as well as to gain deeper understanding of the problem, through post-optimality analysis of the solutions acquired from multi-objective optimization. This paper provides empirical results from an industrial case study, carried out on an automotive machining line, in order to explain the SBI procedure. The SBI method has been found to be particularly suitable in this case study as the three objectives under study, namely total tardiness, makespan and average work-in-process, are in conflict with each other. Depending on the system load of the line, different decision variables have been found to be influencing. How the SBI method is used to find important patterns in the explored solution set and how it can be valuable to support decision making in order to improve the scheduling under different system loadings in the machining line are addressed. 相似文献
10.
In recent years, data mining has become one of the most popular techniques for data owners to determine their strategies. Association rule mining is a data mining approach that is used widely in traditional databases and usually to find the positive association rules. However, there are some other challenging rule mining topics like data stream mining and negative association rule mining. Besides, organizations want to concentrate on their own business and outsource the rest of their work. This approach is named “database as a service concept” and provides lots of benefits to data owner, but, at the same time, brings out some security problems. In this paper, a rule mining system has been proposed that provides efficient and secure solution to positive and negative association rule computation on XML data streams in database as a service concept. The system is implemented and several experiments have been done with different synthetic data sets to show the performance and efficiency of the proposed system. 相似文献
11.
For product design and development, crowdsourcing shows huge potential for fostering creativity and has been regarded as one important approach to acquiring innovative concepts. Nevertheless, prior to the approach could be effectively implemented, the following challenges concerning crowdsourcing should be properly addressed: (1) burdensome concept review process to deal with a large amount of crowd-sourced design concepts; (2) insufficient consideration in integrating design knowledge and principles into existing data processing methods/algorithms for crowdsourcing; and (3) lack of a quantitative decision support process to identify better concepts. To tackle these problems, a product concept evaluation and selection approach, which comprises three modules, is proposed. These modules are respectively: (1) a data mining module to extract meaningful information from online crowd-sourced concepts; (2) a concept re-construction module to organize word tokens into a unified frame using domain ontology and extended design knowledge; and (3) a decision support module to select better concepts in a simplified manner. A pilot study on future PC (personal computer) design was conducted to demonstrate the proposed approach. The results show that the proposed approach is promising and may help to improve the concept review and evaluation efficiency; facilitate data processing using design knowledge; and enhance the reliability of concept selection decisions. 相似文献
12.
In statistical data mining and spatial statistics, many problems (such as detection and clustering) can be formulated as optimization problems whose objective functions are functions of consecutive subsequences. Some examples are (1) searching for a high activity region in a Bernoulli sequence, (2) estimating an underlying boxcar function in a time series, and (3) locating a high concentration area in a point process. A comprehensive search algorithm always ends up with a high order of computational complexity. For example, if a length- n sequence is considered, the total number of all possible consecutive subsequences is
A comprehensive search algorithm requires at least O( n2) numerical operations. We present a multiscale-approximation-based approach. It is shown that most of the time, this method finds the exact same solution as a comprehensive search algorithm does. The derived multiscale approximation methods (MAMEs) have low complexity: for a length-n sequence, the computational complexity of an MAME can be as low as O(n). Numerical simulations verify these improvements. The MAME approach is particularly suitable for problems having large size data. One known drawback is that this method does not guarantee the exact optimal solution in every single run. However, simulations show that as long as the underlying subjects possess statistical significance, a MAME finds the optimal solution with probability almost equal to one. 相似文献
13.
Enterprise Resource Planning systems tend to deploy Supply Chain Management and/or Customer Relationship Management techniques, in order to successfully fuse information to customers, suppliers, manufacturers and warehouses, and therefore minimize system-wide costs while satisfying service level requirements. Although efficient, these systems are neither versatile nor adaptive, since newly discovered customer trends cannot be easily integrated with existing knowledge. Advancing on the way the above mentioned techniques apply on ERP systems, we have developed a multi-agent system that introduces adaptive intelligence as a powerful add-on for ERP software customization. The system can be thought of as a recommendation engine, which takes advantage of knowledge gained through the use of data mining techniques, and incorporates it into the resulting company selling policy. The intelligent agents of the system can be periodically retrained as new information is added to the ERP. In this paper, we present the architecture and development details of the system, and demonstrate its application on a real test case. 相似文献
14.
This paper reports a work that was intended to reveal the connection between topics investigated by conference papers and journal papers. This work selected hundreds of papers in data mining and information retrieval from well-known databases and showed that the topics covered by conference papers in a year often leads to similar topics covered by journal papers in the subsequent year and vice versa. This study used some existing algorithms and combination of these algorithms to proposed a new detective procedure for the researchers to detect the new trend and get the academic intelligence from conferences and journals.The goal of this research is fourfold: First, the research investigates if the conference papers’ themes lead the journal papers’. Second, the research examines how the new research themes can be identified from the conference papers. Third, the research looks at a specific area such as information retrieval and data mining as an illustration. Fourth, the research studies any inconsistencies of the correlation between the conference papers and the journal papers.This study explores the connections between the academic publications. The methodologies of information retrieval and data mining can be exploited to discover the relationships between published papers among all topics. By discovering the connections between conference papers and journal papers, researchers can improve the effectiveness of their research by identifying academic intelligence.This study discusses how conference papers and journal papers are related. The topics of conference papers are identified to determine whether they represent new trend discussed in journal papers. An automatic examination procedure based on information retrieval and data mining is also proposed to minimize the time and human resources required to predict further research developments. This study develops a new procedure and collects a dataset to verify those problems. Analytical results demonstrate that the conference papers submitted to journals papers are similar each year. Conference papers certainly affect the journal papers published over three years. About 87.23% of data points from papers published in 1991–2007 support our assumption. The research is intended to help researchers identify new trend in their research fields, and focus on the urgent topics. This is particularly valuable for new researchers in their field, or those who wish to perform cross-domain studies. 相似文献
15.
Experimental analysis of the performance of a proposed method is a crucial and necessary task in an investigation. In this paper, we focus on the use of nonparametric statistical inference for analyzing the results obtained in an experiment design in the field of computational intelligence. We present a case study which involves a set of techniques in classification tasks and we study a set of nonparametric procedures useful to analyze the behavior of a method with respect to a set of algorithms, such as the framework in which a new proposal is developed.Particularly, we discuss some basic and advanced nonparametric approaches which improve the results offered by the Friedman test in some circumstances. A set of post hoc procedures for multiple comparisons is presented together with the computation of adjusted p-values. We also perform an experimental analysis for comparing their power, with the objective of detecting the advantages and disadvantages of the statistical tests described. We found that some aspects such as the number of algorithms, number of data sets and differences in performance offered by the control method are very influential in the statistical tests studied. Our final goal is to offer a complete guideline for the use of nonparametric statistical procedures for performing multiple comparisons in experimental studies. 相似文献
16.
分布工技术的广泛应用给网络信息的安全性带来了问题,网络信息的安全包括信息传递的安全以及存储和存取的安全,基于共享内存和内存映像文件进行信息共享和信息的动态存取,在一定程度上保证了数据的安全性和完整性。 相似文献
17.
This is the first part of a large survey paper in which we analyze recent literature on Formal Concept Analysis (FCA) and some closely related disciplines using FCA. We collected 1072 papers published between 2003 and 2011 mentioning terms related to Formal Concept Analysis in the title, abstract and keywords. We developed a knowledge browsing environment to support our literature analysis process. We use the visualization capabilities of FCA to explore the literature, to discover and conceptually represent the main research topics in the FCA community. In this first part, we zoom in on and give an extensive overview of the papers published between 2003 and 2011 on developing FCA-based methods for knowledge processing. We also give an overview of the literature on FCA extensions such as pattern structures, logical concept analysis, relational concept analysis, power context families, fuzzy FCA, rough FCA, temporal and triadic concept analysis and discuss scalability issues. 相似文献
18.
当前,信息化正面临着一个全新的阶段,即以数据的深度挖掘和整合应用为核心的智慧化阶段,智慧校园已成为当前高校信息化建设的重要内容。分析高校信息化建设现状和常用数据挖掘模型的原理及应用,并重点根据FP-growth算法、K-means聚类、真实爛、分类和回归等模型对智慧校园学生综合测评系统的数据挖掘进行综述,旨在为数据挖掘技术与智慧校园的深度融合提供方案。 相似文献
19.
The ‘will, skill, tool’ model is a well-established theoretical framework that elucidates the conditions under which teachers are most likely to employ information and communication technologies (ICT) in the classroom. Past studies have shown that these three factors explain a very high degree of variance in the frequency of classroom ICT use. The present study replicates past findings using a different set of measures and hones in on possible subfactors. Furthermore, the study examines teacher affiliation for constructivist-style teaching, which is often considered to facilitate the pedagogical use of digital media. The study’s survey of 357 Swiss secondary school teachers reveals significant positive correlations between will, skill, and tool variables and the combined frequency and diversity of technology use in teaching. A multiple linear regression model was used to identify relevant subfactors. Five factors account for a total of 60% of the explained variance in the intensity of classroom ICT use. Computer and Internet applications are more often used by teachers in the classroom when: (1) teachers consider themselves to be more competent in using ICT for teaching; (2) more computers are readily available; (3) the teacher is a form teacher and responsible for the class; (4) the teacher is more convinced that computers improve student learning; and (5) the teacher more often employs constructivist forms of teaching and learning. The impact of constructivist teaching was small, however. 相似文献
20.
船舶舾装过程是造船工艺的难点。船舶舾装属于典型的劳动密集及技术密集型作业,工艺复杂,对精度要求高,对舾装件设计与制造协同要求高。舾装技术水平制约着整个造船技术水平。船舶舾装生产信息动态采集、分析与预警技术研究,是将船舶舾装进行信息化处理,可以提升船舶舾装生产效率、降低成本、缩短生产周期、提高产品质量,对我国造船业发展具有重要的意义。 相似文献
|