首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we address extractive summarization of long threads in online discussion fora. We present an elaborate user evaluation study to determine human preferences in forum summarization and to create a reference data set. We showed long threads to ten different raters and asked them to create a summary by selecting the posts that they considered to be the most important for the thread. We study the agreement between human raters on the summarization task, and we show how multiple reference summaries can be combined to develop a successful model for automatic summarization. We found that although the inter-rater agreement for the summarization task was slight to fair, the automatic summarizer obtained reasonable results in terms of precision, recall, and ROUGE. Moreover, when human raters were asked to choose between the summary created by another human and the summary created by our model in a blind side-by-side comparison, they judged the model’s summary equal to or better than the human summary in over half of the cases. This shows that even for a summarization task with low inter-rater agreement, a model can be trained that generates sensible summaries. In addition, we investigated the potential for personalized summarization. However, the results for the three raters involved in this experiment were inconclusive. We release the reference summaries as a publicly available dataset.  相似文献   

2.
As course management systems (CMS) gain popularity in facilitating teaching. A forum is a key component to facilitate the interactions among students and teachers. Content analysis is the most popular way to study a discussion forum. But content analysis is a human labor intensity process; for example, the coding process relies heavily on manual interpretation; and it is time and energy consuming. In an asynchronous virtual learning environment, an instructor needs to keep monitoring the discussion forum from time to time in order to maintain the quality of a discussion forum. However, it is time consuming and difficult for instructors to fulfill this need especially for K12 teachers. This research proposes a genre classification system, called GCS, to facilitate the automatic coding process. We treat the coding process as a document classification task via modern data mining techniques. The genre of a posting can be perceived as an announcement, a question, clarification, interpretation, conflict, assertion, etc. This research examines the coding coherence between GCS and experts’ judgment in terms of recall and precision, and discusses how we adjust the parameters of the GCS to improve the coherence. Based on the empirical results, GCS adopts the cascade classification model to achieve the automatic coding process. The empirical evaluation of the classified genres from a repository of postings in an online course on earth science in a senior high school shows that GCS can effectively facilitate the coding process, and the proposed cascade model can deal with the imbalanced distribution nature of discussion postings. These results imply that GCS based on the cascade model can perform as an automatic posting coding system.  相似文献   

3.
Likelihood-based marginal regression modelling for repeated, or otherwise clustered, categorical responses is computationally demanding. This is because the number of measures needed to describe the associations within a cluster increase geometrically with increasing cluster size. The proposed estimation methods typically describe the associations using odds ratios, which result in computationally unfeasible solutions for large cluster sizes. An alternative method for joint modelling of the regression, association, and dropout mechanism for clustered categorical responses is presented. The joint distribution of a multivariate categorical response is described by utilizing the mean parameterization, which facilitates maximum likelihood estimation in two important respects. The models are illustrated by analyses of the presence and absence of schizophrenia symptoms on 86 patients at 12 repeated time-points, and a survey of opinions of 607 adults regarding government spending on nine different targets, measured on a common 3-level ordinal scale. Free software is available.  相似文献   

4.
A framework for virtual disassembly analysis   总被引:2,自引:0,他引:2  
Product reuse or recyclability is enhanced by designing the product for inexpensive and efficient disassembly. However, accomplishing enhanced product design requires design for disassembly (DFD) tools. This paper presents a disassembly framework that consists of design modules; both of these are embodied in the geometric DFD tool. These modules consist of different tasks including: selection of the appropriate disassembly method; producing an optimized disassembly sequence; evaluating a disassembly sequence for cost; producing design change recommendations. These considerations make a product easier to disassemble and therefore have potential benefit to the environment.  相似文献   

5.
This paper addresses in an integrated and systematic fashion the relatively overlooked but increasingly important issue of measuring and characterizing the geometrical properties of nerve cells and structures, an area often called neuromorphology. After discussing the main motivation for such an endeavour, a comprehensive mathematical framework for characterizing neural shapes, capable of expressing variations over time, is presented and used to underline the main issues in neuromorphology. Three particularly powerful and versatile families of neuromorphological approaches, including differential measures, symmetry axes/skeletons, and complexity, are presented and their respective potentials for applications in neuroscience are identified. Examples of applications of such measures are provided based on experimental investigations related to automated dendrogram extraction, mental retardation characterization, and axon growth analysis.  相似文献   

6.
The open-source Java software framework JStatCom is presented which supports the development of rich desktop clients for data analysis in a rather general way. The concept is to solve all recurring tasks with the help of reusable components and to enable rapid application development by adopting a standards based approach which is readily supported by existing programming tools. Furthermore, JStatCom allows to call external procedures from within Java that are written in other languages, for example Gauss, Ox or Matlab. This way it is possible to reuse an already existing code base for numerical routines written in domain-specific programming languages and to link them with the Java world. A reference application for JStatCom is the econometric software package JMulTi, which will shortly be introduced.  相似文献   

7.
Location analysis decisions are interrelated and should be made within a single decision-making framework. A framework within which a number of location strategies can be placed is presented. Location-allocation models are improved in two ways: 1) the allocation rule is developed to more accurately reflect customer choice processes; and 2) the objective function is developed to incorporate future changes. Computational support for this framework is described.  相似文献   

8.
9.
A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and evaluation results are presented.  相似文献   

10.
Summary It is shown how to express data flow analysis in a denotational framework by means of abstract interpretation. A continuation style formulation naturally leads to the MOP (Meet Over all Paths) solution, whereas a direct style formulation leads to the MFP (Maximal Fixed Point) solution.  相似文献   

11.
A structurally motivated framework for discriminant analysis   总被引:1,自引:1,他引:1  
Over the last few years, a lot of algorithms for discriminant analysis (DA) have been developed. Although having different motivations, they all inject structure information in data into their own within- and between-class scatters. However, to our best knowledge, there has not been yet a systematical examination about (1) which structure granularities lurk in data; (2) which structure granularities are utilized in scatters of a DA algorithm; (3) whether new DA algorithms can be developed based on existing structure granularities. In this paper, the established so-called structurally motivated (SM) framework for DA and its unified mathematical formulation of the ratio trace exactly answers them. It categorizes these DA algorithms from the viewpoint of constructing scatters based on different-granularity structures in data, identifies their applicable scenarios for different structure types, and provides insights into developing new DA algorithms. Inspired by the insight, we find that cluster granularity lying in the middle of granularity spectrum in SM framework can still be further utilized and exploited. As a result, the three DA algorithms based on the cluster granularity are derived from the SM framework and from the injection of the cluster structure information into the respective within-class, between-class and joint both scatter matrices for the classical MDA, and these corresponding algorithms are, respectively, called as SWDA, SBDA and SWBDA. The injection of cluster structure information makes the proposed three algorithms able to fit relatively complicated data not only more effectively, but also with the regularization technique obtain more projections than the classical MDA, which is very helpful for more effective DA. Moreover, MDA becomes their special case when the cluster numbers of all classes are set to 1. Our experiments on the benchmarks (face and UCI databases) here show that the proposed algorithms yield encouraging results.  相似文献   

12.
一种用于事件重构的时间分析框架   总被引:1,自引:0,他引:1  
本文针对Windows系统取证提出了一种新的时间分析框架,框架改进了传统的计算机取证中时间信息的提取方法,提出了粗、细两种粒度的分析步骤,在传统的人工分析中加入了聚类算法和启发式规则,最终为事件重构分析提供了可能。本文首先介绍了框架的总体结构,然后描述了改进的时间提取方法,接着介绍了粗、细两种聚类分析模块和规则分析模块,最后对框架的优缺点进行了总结。  相似文献   

13.
Static analysis tools, such as resource analyzers, give useful information on software systems, especially in real-time and safety-critical applications. Therefore, the question of the reliability of the obtained results is highly important. State-of-the-art static analyzers typically combine a range of complex techniques, make use of external tools, and evolve quickly. To formally verify such systems is not a realistic option. In this work, we propose a different approach whereby, instead of the tools, we formally verify the results of the tools. The central idea of such a formal verification framework for static analysis is the method-wise translation of the information about a program gathered during its static analysis into specification contracts that contain enough information for them to be verified automatically. We instantiate this framework with costa, a state-of-the-art static analysis system for sequential Java programs, for producing resource guarantees and KeY, a state-of-the-art verification tool, for formally verifying the correctness of such resource guarantees. Resource guarantees allow to be certain that programs will run within the indicated amount of resources, which may refer to memory consumption, number of instructions executed, etc. Our results show that the proposed tool cooperation can be used for automatically producing verified resource guarantees.  相似文献   

14.
15.
Various machine learning techniques have been applied to different problems in survival analysis in the last decade. They were usually adapted to learning from censored survival data by using the information on observation time. This includes learning from parts of the data or interventions to the learning algorithms. Efficient models were established in various fields of clinical medicine and bioinformatics. In this paper, we propose a pre-processing method for adapting the censored survival data to be used with ordinary machine learning algorithms. This is done by pre-assigning censored instances a positive or negative outcome according to their features and observation time. The proposed procedure calculates the goodness of fit of each censored instance to both the distribution of positives and the spoiled distribution of negatives in the entire dataset and relabels that instance accordingly. We performed a thorough empirical testing of our method in a simulation study and on two real-world medical datasets, using the naive Bayes classifier and decision trees. When compared to one of the popular ML methods dealing with survival, our method provided good results, especially when applied to heavily censored data.  相似文献   

16.
Detection and recognition of textual information in an image or video sequence is important for many applications. The increased resolution and capabilities of digital cameras and faster mobile processing allow for the development of interesting systems. We present an application based on the capture of information presented at a slide-show presentation or at a poster session. We describe the development of a system to process the textual and graphical information in such presentations. The application integrates video and image processing, document layout understanding, optical character recognition (OCR), and pattern recognition. The digital imaging device captures slides/poster images, and the computing module preprocesses and annotates the content. Various problems related to metric rectification, key-frame extraction, text detection, enhancement, and system integration are addressed. The results are promising for applications such as a mobile text reader for the visually impaired. By using powerful text-processing algorithms, we can extend this framework to other applications, e.g., document and conference archiving, camera-based semantics extraction, and ontology creation.Received: 18 December 2003, Revised: 1 November 2004, Published online: 2 February 2005  相似文献   

17.
A framework for analysis of data quality research   总被引:14,自引:0,他引:14  
Organizational databases are pervaded with data of poor quality. However, there has not been an analysis of the data quality literature that provides an overall understanding of the state-of-art research in this area. Using an analogy between product manufacturing and data manufacturing, this paper develops a framework for analyzing data quality research, and uses it as the basis for organizing the data quality literature. This framework consists of seven elements: management responsibilities, operation and assurance costs, research and development, production, distribution, personnel management, and legal function. The analysis reveals that most research efforts focus on operation and assurance costs, research and development, and production of data products. Unexplored research topics and unresolved issues are identified and directions for future research provided  相似文献   

18.
19.
In passive RFID Dense Reader Environments, a large number of passive RFID readers coexist in a single facility. Dense environments are particularly susceptible to reader-to-tag and reader-to-reader collisions. Both may degrade the system performance, decreasing the number of identified tags per time unit. Some proposals have been suggested to avoid or handle these collisions, but requiring extra hardware or making a non-efficient use of the network resources. This paper proposes MALICO, a distributed mechanism-based protocol that exploits a maximum-likelihood estimator to improve the performance of the well-known Colorwave protocol. Using the derivation of the joint occupancy distribution of urns and balls via a bivariate inclusion and exclusion formula, MALICO permits every reader to estimate the number of neighboring readers (potential colliding readers). This information helps readers to schedule the identification time with the aim at decreasing collision probability among neighboring readers. MALICO provides higher throughput than the distributed state-of-the-art proposals for dense reader environments and can be implemented in real RFID systems without extra hardware.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号