首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data analysis often involves finding models that can explain patterns in data, and reduce possibly large data sets to more compact model‐based representations. In Statistics, many methods are available to compute model information. Among others, regression models are widely used to explain data. However, regression analysis typically searches for the best model based on the global distribution of data. On the other hand, a data set may be partitioned into subsets, each requiring individual models. While automatic data subsetting methods exist, these often require parameters or domain knowledge to work with. We propose a system for visual‐interactive regression analysis for scatter plot data, supporting both global and local regression modeling. We introduce a novel regression lens concept, allowing a user to interactively select a portion of data, on which regression analysis is run in interactive time. The lens gives encompassing visual feedback on the quality of candidate models as it is interactively navigated across the input data. While our regression lens can be used for fully interactive modeling, we also provide user guidance suggesting appropriate models and data subsets, by means of regression quality scores. We show, by means of use cases, that our regression lens is an effective tool for user‐driven regression modeling and supports model understanding.  相似文献   

2.
3.
Existing tools for scientific modeling offer little support for improving models in response to data, whereas computational methods for scientific knowledge discovery provide few opportunities for user input. In this paper, we present a language for stating process models and background knowledge in terms familiar to scientists, along with an interactive environment for knowledge discovery that lets the user construct, edit, and visualize scientific models, use them to make predictions, and revise them to better fit available data. We report initial studies in three domains that illustrate the operation of this environment and the results of a user study carried out with domain scientists. Finally, we discuss related efforts on model formalisms and revision and suggest priorities for additional research.  相似文献   

4.
Describes an approach for multiparadigmatic visual access integration of different interaction paradigms. The user is provided with an adaptive interface augmented by a user model, supporting different visual representations of both data and queries. The visual representations are characterized on the basis of the chosen visual formalisms, namely forms, diagrams and icons. To access different databases, a unified data model called the “graph model” is used as a common underlying formalism to which databases, expressed in the most popular data models, can be mapped. Graph model databases are queried through the adaptive interface. The semantics of the query operations is formally defined in terms of graphical primitives. Such a formal approach permits us to define the concept of an “atomic query”, which is the minimal portion of a query that can be transferred from one interaction paradigm to another and processed by the system. Since certain interaction modalities and visual representations are more suitable for certain user classes, the system can suggest to the user the most appropriate interaction modality as well as the visual representation, according to the user model. Some results on user model construction are presented  相似文献   

5.
Visual representations of time-series are useful for tasks such as identifying trends, patterns and anomalies in the data. Many techniques have been devised to make these visual representations more scalable, enabling the simultaneous display of multiple variables, as well as the multi-scale display of time-series of very high resolution or that span long time periods. There has been comparatively little research on how to support the more elaborate tasks associated with the exploratory visual analysis of timeseries, e.g., visualizing derived values, identifying correlations, or discovering anomalies beyond obvious outliers. Such tasks typically require deriving new time-series from the original data, trying different functions and parameters in an iterative manner. We introduce a novel visualization technique called ChronoLenses, aimed at supporting users in such exploratory tasks. ChronoLenses perform on-the-fly transformation of the data points in their focus area, tightly integrating visual analysis with user actions, and enabling the progressive construction of advanced visual analysis pipelines.  相似文献   

6.
In this paper, we examine user registration patterns in empirical WLAN traces, identify elusive patterns that are abused as user movements in constructing empirical mobility models, and analyze them to build up a realistic user mobility model. The examination shows that about 38–90% of transitions are irrelevant to actual user movements. In order to refine the elusive movements, we investigate the geographical relationships among APs and propose a filtering framework for removing them from the trace data. We then analyze the impact of the false-positive movements on an empirical mobility model. The numerical results indicate that the proposed framework improves the fidelity of the empirical mobility model. Finally, we devise an analytical model for characterizing realistic user movements, based on the analysis on the elusive user registration patterns, which emulates elusive user registration patterns and generates true user mobile patterns.  相似文献   

7.
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the related work focuses on semantic annotation of the data fields (source attributes). However, constructing a semantic model that explicitly describes the relationships between the attributes in addition to their semantic types is critical.We present a novel approach that exploits the knowledge from a domain ontology and the semantic models of previously modeled sources to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and the known semantic models to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.  相似文献   

8.
The paper describes the conceptual basis, main features and functionality of an interactive software tool developed in support of system identification education and discovery. This Interactive Tool for System Identification Education (ITSIE) has been developed using Sysquake, a Matlab-like language with fast execution and excellent facilities for interactive graphics, and is deliverd as a stand-alone executable that is readily accessible to students and engineers. ITSIE provides two distinct functional modes that are very useful from an educational and industrial point of view. The simulation mode enables the user to evaluate the main stages of system identification, from input signal design through model validation, simultaneously and interactively in one screen on a user-specified dynamical system. The real data mode allows the user to load experimental data obtained externally and identify suitable models in an interactive fashion. The interactive tool enables students and engineers in industry to discover a myriad of fundamental system identification concepts with a much lower learning curve than existing methods.  相似文献   

9.
User profiling by inferring user personality traits, such as age and gender, plays an increasingly important role in many real-world applications. Most existing methods for user profiling either use only one type of data or ignore handling the noisy information of data. Moreover, they usually consider this problem from only one perspective. In this paper, we propose a joint user profiling model with hierarchical attention networks (JUHA) to learn informative user representations for user profiling. Our JUHA method does user profiling based on both inner-user and inter-user features. We explore inner-user features from user behaviors (e.g., purchased items and posted blogs), and inter-user features from a user-user graph (where similar users could be connected to each other). JUHA learns basic sentence and bag representations from multiple separate sources of data (user behaviors) as the first round of data preparation. In this module, convolutional neural networks (CNNs) are introduced to capture word and sentence features of age and gender while the self-attention mechanism is exploited to weaken the noisy data. Following this, we build another bag which contains a user-user graph. Inter-user features are learned from this bag using propagation information between linked users in the graph. To acquire more robust data, inter-user features and other inner-user bag representations are joined into each sentence in the current bag to learn the final bag representation. Subsequently, all of the bag representations are integrated to lean comprehensive user representation by the self-attention mechanism. Our experimental results demonstrate that our approach outperforms several state-of-the-art methods and improves prediction performance.  相似文献   

10.
The process of knowledge discovery in databases consists of several steps that are iterative and interactive. In each application, to go through this process the user has to exploit different algorithms and their settings that usually yield multiple models. Model selection, that is, the selection of appropriate models or algorithms to achieve such models, requires meta-knowledge of algorithm/model and model performance metrics. Therefore, model selection is usually a difficult task for the user. We believe that simplifying the process of model selection for the user is crucial to the success of real-life knowledge discovery activities. As opposed to most related work that aims to automate model selection, in our view model selection is a semiautomatic process, requiring an effective collaboration between the user and the discovery system. For such a collaboration, our solution is to give the user the ability to try various alternatives and to compare competing models quantitatively by performance metrics, and qualitatively by effective visualization. This paper presents our research on model selection and visualization in the development of a knowledge discovery system called D2MS. The paper addresses the motivation of model selection in knowledge discovery and related work, gives an overview of D2MS, and describes its solution to model selection and visualization. It then presents the usefulness of D2MS model selection in two case studies of discovering medical knowledge in hospital data—on meningitis and stomach cancer—using three data mining methods of decision trees, conceptual clustering, and rule induction.  相似文献   

11.
We study how to build optimal light-field or plenoptic models. We quantify geometric errors in light-field representations and show how geometric error bounds directly affect rendering artifacts. Artifacts depend on how the light-field is parameterized, stored, reconstructed and rendered. We present the rendering artifacts and relate them to the presence of geometric errors in the four most common light-field implementations.We show how to optimize a light-field model. We take an arbitrary bounded object and construct the best possible representation using each of the four parameterizations. The best representation has the least geometric error bounds. We use two geometric errors, a positional and a directional error. We also quantify pixelation artifacts. Our analysis is useful to decide how to build a light-field model. It helps select a parameterization, build an optimal representation, and choose the samplings of the parameter spaces so that geometric errors and rendering artifacts are minimized.  相似文献   

12.
Language models are crucial for many tasks in NLP (Natural Language Processing) and n-grams are the best way to build them. Huge effort is being invested in improving n-gram language models. By introducing external information (morphology, syntax, partitioning into documents, etc.) into the models a significant improvement can be achieved. The models can however be improved with no external information and smoothing is an excellent example of such an improvement.In this article we show another way of improving the models that also requires no external information. We examine patterns that can be found in large corpora by building semantic spaces (HAL, COALS, BEAGLE and others described in this article). These semantic spaces have never been tested in language modeling before. Our method uses semantic spaces and clustering to build classes for a class-based language model. The class-based model is then coupled with a standard n-gram model to create a very effective language model.Our experiments show that our models reduce the perplexity and improve the accuracy of n-gram language models with no external information added. Training of our models is fully unsupervised. Our models are very effective for inflectional languages, which are particularly hard to model. We show results for five different semantic spaces with different settings and different number of classes. The perplexity tests are accompanied with machine translation tests that prove the ability of proposed models to improve performance of a real-world application.  相似文献   

13.
A data warehouse is an important decision support system with cleaned and integrated data for knowledge discovery and data mining systems. In reality, the data warehouse mining system has provided many applicable solutions in industries, yet there are still many problems causing users extra problems in discovering knowledge or even failing to obtain the real and useful knowledge they need. To improve the overall data warehouse mining process, we present an intelligent data warehouse mining approach incorporated with schema ontology, schema constraint ontology, domain ontology and user preference ontology. The structures of these ontologies are illustrated and how they benefit the mining process is also demonstrated by examples utilizing rule mining. Finally, we present a prototype multidimensional association mining system, which with intelligent assistance through the support of the ontologies, can help users build useful data mining models, prevent ineffective pattern generation, discover concept extended rules, and provide an active knowledge re-discovering mechanism.  相似文献   

14.
主流个性化推荐服务系统通常利用部署在云端的模型进行推荐,因此需要将用户交互行为等隐私数据上传到云端,这会造成隐私泄露的隐患。为了保护用户隐私,可以在客户端处理用户敏感数据,然而,客户端存在通信瓶颈和计算资源瓶颈。针对上述挑战,设计了一个基于云?端融合的个性化推荐服务系统。该系统将传统的云端推荐模型拆分成用户表征模型和排序模型,在云端预训练用户表征模型后,将其部署到客户端,排序模型则部署到云端;同时,采用小规模的循环神经网络(RNN)抽取用户交互日志中的时序信息来训练用户表征,并通过Lasso算法对用户表征进行压缩,从而在降低云端和客户端之间的通信量以及客户端的计算开销的同时防止推荐准确率的下跌。基于RecSys Challenge 2015数据集进行了实验,结果表明,所设计系统的推荐准确率和GRU4REC模型相当,而压缩后的用户表征体积仅为压缩前的34.8%,计算开销较低。  相似文献   

15.
Most mathematical programming models and knowledge-based systems in optimization from exist in various representations; however, the user is frequently not aware of this. For example, a model which is developed with a knowledge-based system such as the PM system of Krishnan (1988) will have several representations in Prolog and then will be translated into another representation in Structured Modeling before it is solved. Also, a model which is developed in the GAMS language will be translated into an MPS input form internally before the problem is passed to a solver such as MINOS. The results from MINOS are then passed back to GAMS and the user sees the results in the style of the GAMS representation of the model. This could be called a vertical set of model representations since the user can modify only one representation and the models are passed down directly to the solver.

This paper argues that in considering knowledge-based systems with optimization we should begin to employ a set of parallel model representations, any one of which the user can see and modify. These can be called horizontal model representations. For example, a given model might be represented in graphical, knowledge base, modeling language, and mathematical forms. The user would be able to modify any of these versions and have the other representations altered automatically to reflect the changes.  相似文献   


16.
开放式用户模型服务平台的设计与实现   总被引:1,自引:1,他引:0  
为了构建一个公共的共享数据的用户模型平台,给各个接入该平台的网站提供更全面、更准确的用户信息,平台提供了数据接口和算法接口用于与第三方网站的交互,重点研究了如何解决来自不同数据源的用户数据的冲突,从而形成统一的用户模型的问题,最终实现了算法和模型以及数据的共享。实验结果表明,该平台更能准确、全面地构建用户模型。  相似文献   

17.
Exploring data using visualization systems has been shown to be an extremely powerful technique. However, one of the challenges with such systems is an inability to completely support the knowledge discovery process. More than simply looking at data, users will make a semipermanent record of their visualizations by printing out a hard copy. Subsequently, users will mark and annotate these static representations, either for dissemination purposes or to augment their personal memory of what was witnessed. In this paper, we present a model for recording the history of user explorations in visualization environments, augmented with the capability for users to annotate their explorations. A prototype system is used to demonstrate how this provenance information can be recalled and shared. The prototype system generates interactive visualizations of the provenance data using a spatio-temporal technique. Beyond the technical details of our model and prototype, results from a controlled experiment that explores how different history mechanisms impact problem solving in visualization environments are presented  相似文献   

18.
文中利用知识发现领域中的信息扩张机制,研究提出一种对意外规则取舍和理解的新处理方法——三度(规则支持度、可信度、充分性因子)变化趋势分析,再结合规则前件、后件的支持度的历史变化规律的呈现和分析,得到可保留的规则。并帮助用户充分理解规则,合理运用规则。这项研究对知识发现的后处理与可实现性、实用性起着重要的作用。  相似文献   

19.
Information visualization (InfoVis), the study of transforming data, information, and knowledge into interactive visual representations, is very important to users because it provides mental models of information. The boom in big data analytics has triggered broad use of InfoVis in a variety of domains, ranging from finance to sports to politics. In this paper, we present a comprehensive survey and key insights into this fast-rising area. The research on InfoVis is organized into a taxonomy that contains four main categories, namely empirical methodologies, user interactions, visualization frameworks, and applications, which are each described in terms of their major goals, fundamental principles, recent trends, and state-of-the-art approaches. At the conclusion of this survey, we identify existing technical challenges and propose directions for future research.  相似文献   

20.
In diverse and self-governed multiple clouds context, the service management and discovery are greatly challenged by the dynamic and evolving features of services. How to manage the features of cloud services and support accurate and efficient service discovery has become an open problem in the area of cloud computing. This paper proposes a field model of multiple cloud services and corresponding service discovery method to address the issue. Different from existing researches, our approach is inspired by Bohr atom model. We use the abstraction of energy level and jumping mechanism to describe services status and variations, and thereby to support the service demarcation and discovery. The contributions of this paper are threefold. First, we propose the abstraction of service energy level to represent the status of services, and service jumping mechanism to investigate the dynamic and evolving features as the variations and re-demarcation of cloud services according to their energy levels. Second, we present user acceptable service region to describe the services satisfying users’ requests and corresponding service discovery method, which can significantly decrease services search scope and improve the speed and precision of service discovery. Third, a series of algorithms are designed to implement the generation of field model, user acceptable service regions, service jumping mechanism, and user-oriented service discovery.We have conducted an extensive experiments on QWS dataset to validate and evaluate our proposed models and algorithms. The results show that field model can well support the representation of dynamic and evolving aspects of services in multiple clouds context and the algorithms can improve the accuracy and efficiency of service discovery.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号