首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 10 毫秒
Data mining: an overview from a database perspective   总被引:15,自引:0,他引:15  
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information-providing services, such as data warehousing and online services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided and to increase business opportunities. In response to such a demand, this article provides a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented  相似文献   

Many organizations struggle with the massive amount of data they collect. Today, data does more than serve as the ingredients for churning out statistical reports. They help support efficient operations in many organizations, and to some extent, data provide the competitive intelligence organizations need to survive in today's economy. Data mining can't always deliver timely and relevant results because data are constantly changing. However, stream-data processing might be more effective, judging by the Matrix project.  相似文献   

Firewalls are a well-established security mechanism to restrict the traffic exchanged between networks to a certain subset of users and applications. In order to cope with new application types like multimedia, new firewall architectures are necessary. The performance of these new architectures is a critical factor because Quality of Service (QoS) demands of multimedia applications have to be taken into account. We show how the performance of firewall architectures for multimedia applications can be determined. We present a model to describe the performance of multimedia firewall architectures. This model can be used to dimension firewalls for usage with multimedia applications. In addition, we present the results of a lab experiment, used to evaluate the performance of a distributed firewall architecture and to validate the model.  相似文献   

The field of data mining has become accustomed to specifying constraints on patterns of interest. A large number of systems and techniques has been developed for solving such constraint-based mining problems, especially for mining itemsets. The approach taken in the field of data mining contrasts with the constraint programming principles developed within the artificial intelligence community. While most data mining research focuses on algorithmic issues and aims at developing highly optimized and scalable implementations that are tailored towards specific tasks, constraint programming employs a more declarative approach. The emphasis lies on developing high-level modeling languages and general solvers that specify what the problem is, rather than outlining how a solution should be computed, yet are powerful enough to be used across a wide variety of applications and application domains.This paper contributes a declarative constraint programming approach to data mining. More specifically, we show that it is possible to employ off-the-shelf constraint programming techniques for modeling and solving a wide variety of constraint-based itemset mining tasks, such as frequent, closed, discriminative, and cost-based itemset mining. In particular, we develop a basic constraint programming model for specifying frequent itemsets and show that this model can easily be extended to realize the other settings. This contrasts with typical procedural data mining systems where the underlying procedures need to be modified in order to accommodate new types of constraint, or novel combinations thereof. Even though the performance of state-of-the-art data mining systems outperforms that of the constraint programming approach on some standard tasks, we also show that there exist problems where the constraint programming approach leads to significant performance improvements over state-of-the-art methods in data mining and as well as to new insights into the underlying data mining problems. Many such insights can be obtained by relating the underlying search algorithms of data mining and constraint programming systems to one another. We discuss a number of interesting new research questions and challenges raised by the declarative constraint programming approach to data mining.  相似文献   

《Information Systems》2005,30(1):71-88
Many large organizations have multiple databases distributed in different branches, and therefore multi-database mining is an important task for data mining. To reduce the search cost in the data from all databases, we need to identify which databases are most likely relevant to a data mining application. This is referred to as database selection. For real-world applications, database selection has to be carried out multiple times to identify relevant databases that meet different applications. In particular, a mining task may be without reference to any specific application. In this paper, we present an efficient approach for classifying multiple databases based on their similarity between each other. Our approach is application-independent.  相似文献   

This paper examines the psychophysiological effects of mental workload in single-task and dual-task human-computer interaction. A mental arithmetic task and a manual error correction task were performed both separately and concurrently on a computer using verbal and haptic input devices. Heart rate, skin conductance, respiration and peripheral skin temperature were recorded in addition to objective performance measures and self-report questionnaires. Analysis of psychophysiological responses found significant changes from baseline for both single-task and dual-task conditions. There were also significant psychophysiological differences between the mental arithmetic task and the manual error correction task, but no differences in questionnaire results. Additionally, there was no significant psychophysiological difference between performing only the mental arithmetic task and performing both tasks at once. These findings suggest that psychophysiological measures respond differently to different types of tasks and that they do not always agree with performance or with participants’ subjective feelings.  相似文献   

We report results from a longitudinal study of information systems development (ISD) teams. We use data drawn from 60 ISD teams at 22 sites of 15 Fortune 500 organizations to explore variations in performance relative to these teams' social interactions. To do this, we characterize ISD as a form of new product development and focus on team-level social interactions with external stakeholders. Drawing on cluster analysis, we identify five patterns of team-level social interactions and the relationships of these patterns to a suite of objective and subjective measures of ISD performance. Analysis leads us to report three findings. First, data indicate that no one of the five identified patterns maximizes all performance measures. Second, data make clear that the most common approach to ISD is the least effective relative to our suite of performance measures. Third, data from this study show that early indications of ISD project success do not predict actual outcomes. These findings suggest two issues for research and practice. First, these findings indicate that varying patterns of social interactions lead to differences in ISD team performance. Second, the findings illustrate that singular measures of ISD performance are an oversimplification and that multiple measures of ISD performance are unlikely to agree.  相似文献   

Electronic Markets - The topic of customer engagement via social media is receiving increasing consideration in the literature. Previously, scholars’ use of the notion and dimensionality of...  相似文献   

The formulation and solution of multi‐criteria healthcare decision problems is of critical importance to the health and socio‐economic betterment of developing countries. The study shows how the multi‐criteria decision‐making method could facilitate implementation of healthcare performance analysis, especially for the public healthcare system of Bangladesh, which operates mainly through thana health complexes (THCs). We include outreach services and rural facilities together with ongoing THC activities, and analyze their relative performance. The methodology uses a phase of the Delphi method and of the analytic hierarchy process (AHP) approach. The outcome of Delphi is used as input for the hierarchical processing procedure in AHP and determines performance order of the healthcare activities. Results from AHP are discussed for implementation in decision‐making and the managerial policymaking process, towards improvement of overall healthcare performance.  相似文献   

The widespread growth of business blogs has created opportunities for companies as channels of marketing, communication, customer feedback, and mass opinion measurement. However, many blogs often contain similar information and the sheer volume of available information really challenges the ability of organizations to act quickly in today’s business environment. Thus, novelty mining can help to single out novel information out of a massive set of text documents. This paper explores the feasibility and performance of novelty mining and database optimization of business blogs, which have not been studied before. The results show that our novelty mining system can detect novelty in our dataset of business blogs with very high accuracy, and that database optimization can significantly improve the performance.  相似文献   

In this paper, we address the problem of visual instance mining, which is to automatically discover frequently appearing visual instances from a large collection of images. We propose a scalable mining method by leveraging the graph structure with images as vertices. Different from most existing approaches that focus on either instance-level similarities or image-level context properties, our method captures both information. In the proposed framework, the instance-level information is integrated during the construction of a sparse instance graph based on the similarity between augmented local features, while the image-level context is explored with a greedy breadth-first search algorithm to discover clusters of visual instances from the graph. This framework can tackle the challenges brought by small visual instances, diverse intra-class variations, as well as noise in large-scale image databases. To further improve the robustness, we integrate two techniques into the basic framework. First, to better cope with the increasing noise of large databases, weak geometric consistency is adopted to efficiently combine the geometric information of local matches into the construction of the instance graph. Second, we propose the layout embedding algorithm, which leverages the algorithm originally designed for graph visualization to fully explore the image database structure. The proposed method was evaluated on four annotated data sets with different characteristics, and experimental results showed the superiority over state-of-the-art algorithms on all data sets. We also applied our framework on a one-million Flickr data set and proved its scalability.  相似文献   

Two parameters, namely support and confidence, in association rule mining, are used to arrange association rules in either increasing or decreasing order. These two parameters are assigned values by counting the number of transactions satisfying the rule without considering user perspective. Hence, an association rule, with low values of support and confidence, but meaningful to the user, does not receive the same importance as is perceived by the user. Reflecting user perspective is of paramount importance in light of improving user satisfaction for a given recommendation system. In this paper, we propose a model and an algorithm to extract association rules, meaningful to a user, with an ad-hoc support and confidence by allowing the user to specify the importance of each transaction. In addition, we apply the characteristics of a concept lattice, a core data structure of Formal Concept Analysis (FCA) to reflect subsumption relation of association rules when assigning the priority to each rule. Finally, we describe experiment results to verify the potential and efficiency of the proposed method.  相似文献   

The effect of practice on the parallel organization and control of discrete, asymmetrical bimanual movements was investigated. Subjects performed a flexion movement in the left limb together with a flexion-extension-flexion movement in the right limb. Two groups, one of which received kinematic information feedback, were instructed to produce the different patterns simultaneously. A third group performed each movement in isolation at all times, serving as the baseline condition. The degree of success in parallel action organization was assessed at the qualitative (or structural) and quantitative (or metrical) level of movement specification. Findings revealed that the bimanual groups displayed a tendency to synchronize the patterns of motor output, resulting in (mutual) interference. However,the provision of augmented kinematic information feedback resulted in more successful metrical and structural dissociation of the limb actions. The results are discussed in support of a movement dynamics perspective on motoric dual-task performance. The relevance of the approach for human factors is also emphasized.  相似文献   

An encoding method has a direct effect on the quality and the representation of the discovered knowledge in data mining systems. Biological macromolecules are encoded by strings of characters, called primary structures. Knowing that data mining systems usually use relational tables to encode data, we have then to reencode these strings and transform them into relational tables. In this paper, we do a comparative study of the existing static encoding methods, that are based on the Biologist know-how, and our new dynamic encoding one, that is based on the construction of Discriminant and Minimal Substrings (DMS). Different classification methods are used to do this study. The experimental results show that our dynamic encoding method is more efficient than the static ones, to encode biological macromolecules within a data mining perspective.  相似文献   

《Information & Management》2004,41(3):335-349
Teamwork during IS development (ISD) is an important issue. This paper discusses the relationship between team structure and ISD team performance using a social network approach. Based on empirical evidence collected from 25 teams in a system analysis and design course, we found that:
  • (1)Group cohesion was positively related to overall performance.
  • (2)Group conflict indexes were not significantly correlated with overall performance.
  • (3)Group characteristics, e.g., cohesion and conflict, fluctuated in different phases, but in later stages, much less cohesion occurred and the advice network seemed to be very important.
  • (4)Group structures seemed to be a critical factor for good performance.
Further in-depth studies were conducted on teams exhibiting the highest and lowest performance to determine their differences from a sociogram analysis perspective.  相似文献   

Enterprises turn to their software applications to support their business processes. Over time, it is common for a company to end up with a wide range of applications, which are usually developed in-house by its information technology department or purchased from third-party specialized software companies. The result is a heterogeneous software ecosystem with applications developed in different technologies and frequently using different data models, which brings challenges when two or more applications have to collaborate to support a business process. Integration platforms are specialized software tools that help design, implement, run, and monitor integration solutions that orchestrate a set of applications. The run-time system is the component of integration platforms responsible for running integration solutions, which makes its performance a critically important issue. In this paper, we report our experience in evaluating and comparing four well-known open-source integration platforms in the context of a research project where performance was a central requirement to choose an integration platform. The evaluation was conducted using a decision-making methodology to build a ranking of candidate platforms by means of subjective and objective criteria. The subjective evaluation takes into account expert preferences and compares integration platforms using the analytic hierarchy process, which has been used in many applications related with decision-making. The objective evaluation is build on top of properties distributed on three dimensions, namely, message processing, hotspot detection, and fairness execution, which compose the research methodology we used. The evaluated platforms were ranked to identify the one with the best performance.  相似文献   

Previous studies carried out customer surveys by questionnaires to collect data for analyzing consumer requirements. In recent years, a large and growing body of literature has investigated the extraction of customer requirements and preferences from online reviews. However, since customer requirements change dynamically over time, traditional studies failed to obtain the change data of customer requirements and opinions based on sentiments expressed in reviews. In this paper, a new method for dynamically mining user requirements is proposed, which is used to analyze the changing behavior of product attributes and improve product design. Dynamic mining differs from the traditional need acquisition mainly in three aspects: (1) it involves dynamically mining user requirements over time (2) it adds changes in manufacturers’ opinions to the analysis (3) it allows for product improvement strategies based on the changing behavior of product attributes. First, text mining is adopted to collect customer and manufacturer review data for different time periods and extract product attributes. A Natural Language Processing tool is used to measure the importance weight and sentiment score of product attributes. Second, an approach for dynamically mining user requirements is introduced to classify product attributes and analyze the changes of attribute data in three categories over time. Finally, an improvement strategy for next-generation product design is developed based on the changing behavior of attributes. Moreover, a case study on vehicles based on online reviews was conducted to illustrate the proposed methodology. Our research suggests that the proposed approach can accurately mine customer requirements and lead to successful product improvement strategies for next-generation products.  相似文献   

Advances in the media and entertainment industries, including streaming audio and digital TV, present new challenges for managing and accessing large audio-visual collections. Current content management systems support retrieval using low-level features, such as motion, color, and texture. However, low-level features often have little meaning for naive users, who much prefer to identify content using high-level semantics or concepts. This creates a gap between systems and their users that must be bridged for these systems to be used effectively. To this end, in this paper, we first present a knowledge-based video indexing and content management framework for domain specific videos (using basketball video as an example). We will provide a solution to explore video knowledge by mining associations from video data. The explicit definitions and evaluation measures (e.g., temporal support and confidence) for video associations are proposed by integrating the distinct feature of video data. Our approach uses video processing techniques to find visual and audio cues (e.g., court field, camera motion activities, and applause), introduces multilevel sequential association mining to explore associations among the audio and visual cues, classifies the associations by assigning each of them with a class label, and uses their appearances in the video to construct video indices. Our experimental results demonstrate the performance of the proposed approach.  相似文献   

Performance analysis of computing systems is an increasingly difficult task due to growing system complexity. Traditional tools rely on ad hoc procedures. With these, determining which of the manifold system and workload parameters to examine is often a lengthy and highly speculative process. The analysis is often incomplete and, therefore, prone to revealing faulty conclusions and not uncovering useful tuning knowledge. We address this problem by introducing a data mining approach called ADMiRe (analyzer for data mining results). In this scheme, regression analysis is first applied to performance data to discover correlations between various system and workload parameters. The results of this analysis are summarized in sets of regression rules. The user can then formulate intuitive algebraic expressions to manipulate these sets of rules to capture critical information. To demonstrate this approach, we use ADMiRe to analyze an Oracle database system running the TPC-C (Transaction Processing Performance Council) benchmark. The results generated by ADMiRe were confirmed by Oracle experts. We also show that by applying ADMiRe to Microsoft Internet Information Server performance data, we can improve system performance by 20 percent.  相似文献   

基于过程挖掘的工作流性能分析   总被引:4,自引:0,他引:4  
介绍了工作流性能的分析基础和概念。针对复杂和具有非确定性的业务流程,通过基于 工作流日志的工作流过程挖掘算法,得到反映系统基本性能的工作流性能分析网。并应用到具有动 态、模糊控制流程的工作流系统的性能分析中。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号