共查询到20条相似文献,搜索用时 15 毫秒
1.
Textual data mining for industrial knowledge management and text classification: A business oriented approach 总被引:1,自引:0,他引:1
N. Ur-RahmanJ.A. Harding 《Expert systems with applications》2012,39(5):4729-4739
Textual databases are useful sources of information and knowledge and if these are well utilised then issues related to future project management and product or service quality improvement may be resolved. A large part of corporate information, approximately 80%, is available in textual data formats. Text Classification techniques are well known for managing on-line sources of digital documents. The identification of key issues discussed within textual data and their classification into two different classes could help decision makers or knowledge workers to manage their future activities better. This research is relevant for most text based documents and is demonstrated on Post Project Reviews (PPRs) which are valuable source of information and knowledge. The application of textual data mining techniques for discovering useful knowledge and classifying textual data into different classes is a relatively new area of research. The research work presented in this paper is focused on the use of hybrid applications of text mining or textual data mining techniques to classify textual data into two different classes. The research applies clustering techniques at the first stage and Apriori Association Rule Mining at the second stage. The Apriori Association Rule of Mining is applied to generate Multiple Key Term Phrasal Knowledge Sequences (MKTPKS) which are later used for classification. Additionally, studies were made to improve the classification accuracies of the classifiers i.e. C4.5, K-NN, Naïve Bayes and Support Vector Machines (SVMs). The classification accuracies were measured and the results compared with those of a single term based classification model. The methodology proposed could be used to analyse any free formatted textual data and in the current research it has been demonstrated on an industrial dataset consisting of Post Project Reviews (PPRs) collected from the construction industry. The data or information available in these reviews is codified in multiple different formats but in the current research scenario only free formatted text documents are examined. Experiments showed that the performance of classifiers improved through adopting the proposed methodology. 相似文献
2.
This paper discusses a novel approach to implementing OpenMP on clusters. Traditional approaches to do so rely on Software Distributed Shared Memory systems to handle shared data. We discuss these and then introduce an alternative approach that translates OpenMP to Global Arrays (GA), explaining the basic strategy. GA requires a data distribution. We do not expect the user to supply this; rather, we show how we perform data distribution and work distribution according to the user-supplied OpenMP static loop schedules. An inspector–executor strategy is employed for irregular applications in order to gather information on accesses to potentially non-local data, group non-local data transfers and overlap communications with local computations. Furthermore, a new directive INVARIANT is proposed to provide information about the dynamic scope of data access patterns. This directive can help us generate efficient codes for irregular applications using the inspector–executor approach. We also illustrate how to deal with some hard cases containing reshaping and strided accesses during the translation. Our experiments show promising results for the corresponding regular and irregular GA codes. 相似文献
3.
Rachel Edita Oñate Roxas Allan Borra Charibeth Ko Cheng Nathalie Rose Lim Ethel Chuajoy Ong Michelle Wendy Tan 《Language Resources and Evaluation》2008,42(2):183-195
In this paper, we present the building of various language resources for a multi-engine bi-directional English-Filipino Machine
Translation (MT) system. Since linguistics information on Philippine languages are available, but as of yet, the focus has
been on theoretical linguistics and little is done on the computational aspects of these languages, attempts are reported
here on the manual construction of these language resources such as the grammar, lexicon, morphological information, and the
corpora which were literally built from almost non-existent digital forms. Due to the inherent difficulties of manual construction,
we also discuss our experiments on various technologies for automatic extraction of these resources to handle the intricacies
of the Filipino language, designed with the intention of using them for the MT system. To implement the different MT engines
and to ensure the improvement of translation quality, other language tools (such as the morphological analyzer and generator,
and the part of speech tagger) were developed. 相似文献
4.
5.
6.
将社区帮扶人员的信息整合进入物联网技术和移动互联网技术,可以使管理中的各种事务得到更为及时的处理和更有效的反馈。文章介绍了研究基于物联网技术的社区帮扶人员管理系统的目的,给出了系统智能终端的设计框架,同时结合实现过程中的具体情况和开发平台,采用客户端服务器端(C/S)模式,对移动办公系统进行设计,最后详细介绍了手机端Android与服务器_NETC群通信的WebService技术。 相似文献
7.
Sabine Bergler 《Machine Translation》1994,9(3-4):155-182
This paper addresses two types of mismatches in the translation of reported speech between German and English. The first mismatch is between the repeated use of the reported speech construction in English and the use of subjunctive in German used to indicate continued attribution. The second mismatch concerns the difference in usage of metonymic extensions in the subject position of reported speech. Examples show the different styles of reporting the utterances of somebody else. A well-structured lexicon is presented as one step to the solution of the problems presented. One key feature of the proposed lexicon is a meta-lexical organization of basic word entries, which is shown to facilitate the translation process. We contrast our notions of lexical structure with different recent proposals in machine translation. 相似文献
8.
针对高速网络处理应用对不定长海量数据的缓存要求,提出了一种高速内存池结构--自适应变长块内存池(SVBSMP)内存池.该内存池结构吸收了Apache内存池和固定块内存池技术特点,具有较快的分配和回收内存的速度以及较好的空间管理特点,特别适合高速海量IP包处理应用.性能实验表明,该内存池管理结构具有良好的时间和空间特性,运行速度较直接系统调用malloc/free有23%的分配速度提升,空间上较传统的定长内存块分配方式节省约52%的内存空间. 相似文献
9.
Performability measures are often defined for analyzing the worth of fault-tolerant systems whose performance is gracefully degradable. Accordingly, performability evaluation is inherently well suited for application of reward model solution techniques. On the other hand, the complexity of performability evaluation for solving engineering problems may prevent us from utilizing those techniques directly, suggesting the need for approaches that would enable us to exploit reward model solution techniques through problem transformation. In this paper, we present a performability modeling effort that analyzes the guarded-operation duration for onboard software upgrading. More specifically, we define a “performability index” Y that quantifies the extent to which the guarded operation with a duration φ reduces the expected total performance degradation. In order to solve for Y, we progressively translate its formulation until it becomes an aggregate of constituent measures conducive to efficient reward model solutions. Based on the reward-mapping-enabled intermediate model, we specify reward structures in the composite base model which is built on three stochastic activity network reward models. We describe the model-translation approach and show its feasibility for design-oriented performability modeling. 相似文献
10.
Jan L.G. Dietz Ruud van der Pol Floris Wiesman 《Journal of Intelligent Information Systems》1997,8(1):77-101
The amount of information available to information workers recently has becomeoverwhelming. This confronts information workers with two majorproblems: finding the information needed, and accessing it; they arecalled the search problem and the access problem, respectively. Asthe main result of our research an architecture is specified of anautomated tool that provides integrated support for searching andaccessing multimedia documents that may be located at arbitraryplaces. The architecture contains a database with information aboutthe documents and with thesaurus-like information. The architecturealso contains a browse mechanism and a query mechanism for inspectingthe database. In the design process of the architecture, severalfundamental questions arose, like “What is a document?”and “ What is a medium kind?”. The developed answers tosome of these questions are considered to have a general characterand thus to be useful also outside the scope of the research at hand.The paper concludes with an overview of the current status of theproject and a discussion of future work. 相似文献
11.
The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is no longer limited to matching keywords against documents, but instead complex information needs can be expressed in a structured way, with precise answers as results. In this paper, we present Hermes, an infrastructure for data Web search that addresses a number of challenges involved in realizing search on the data Web. To provide an end-user oriented interface, we support expressive user information needs by translating keywords into structured queries. We integrate heterogeneous Web data sources with automatically computed mappings. Schema-level mappings are exploited in constructing structured queries against the integrated schema. These structured queries are decomposed into queries against the local Web data sources, which are then processed in a distributed way. Finally, heterogeneous result sets are combined using an algorithm called map join, making use of data-level mappings. In evaluation experiments with real life data sets from the data Web, we show the practicability and scalability of the Hermes infrastructure. 相似文献
12.
13.
Alan Messer Anugeetha Kunjithapatham Phuong Nguyen Priyang Rathod Mithun Sheshagiri Doreen Cheng Simon Gibbs 《Pervasive and Mobile Computing》2008,4(6):871-888
The Internet has become an extremely popular source of entertainment and information. But, despite its growing popularity, most websites today are accessed by keyword search via web browsers, making it difficult for home consumers to locate Internet content of interest on their TVs or other devices that lack keyboards. In this paper, we present assistive technologies, enabling users to easily locate Internet content related to the TV program they are watching. Access is enabled via an intuitive user interface on the TV screen or by using a secondary personal device, and thus avoiding disrupting the viewing experience of the other TV users. 相似文献
14.
We develop a data structure for maintaining a dynamic multiset that uses bits and O(1) words, in addition to the space required by the n elements stored, supports searches in worst-case time and updates in amortized time. Compared to earlier data structures, we improve the space requirements from O(n) bits to bits, but the running time of updates is amortized, not worst-case. 相似文献
15.
In this paper, we present a prototype system, an integrated data management system, which is capable of querying, retrieving, and visualizing datasets with heterogeneous formats and large sizes without requiring users to have any knowledge of any other specific software. Our system has three distinguished characteristics: (1) modular structure and simple architecture which make it easy and feasible for users to add new functions and features to the system, (2) a new search concept and method based on the bounding box and on dynamically delineated watershed boundary from GIS (Geographic Information System), and (3) no requirement on having any knowledge about or installation of any other complicated software. The architecture of our integrated data management system is based on a metadata approach, which consists of four components including a metadata mechanism and a Java-based application engine. The metadata mechanism in conjunction with the Java-based application engine allows users to access and retrieve diverse data formats and structures from many heterogeneous hydrological data sources. The visualization component of the system makes it possible for users to view their queried data first before spending time retrieving them. The extensible and integrative characteristics of our system are illustrated by an example in which new and unique functions for data merging and GIS-based data querying are added to the system. Although the data sources and applications shown in this prototype system are related to the field of hydrology, the ideas, approaches, and system architecture are not domain-specific, and can be used/applied to other fields as well. 相似文献
16.
Plagiarism refers to the act of presenting external words, thoughts, or ideas as one’s own, without providing references to the sources from which they were taken. The exponential growth of different digital document sources available on the Web has facilitated the spread of this practice, making the accurate detection of it a crucial task for educational institutions. In this article, we present DOCODE 3.0, a Web system for educational institutions that performs automatic analysis of large quantities of digital documents in relation to their degree of originality. Since plagiarism is a complex problem, frequently tackled at different levels, our system applies algorithms in order to perform an information fusion process from multi data source to all these levels. These algorithms have been successfully tested in the scientific community in solving tasks like the identification of plagiarized passages and the retrieval of source candidates from the Web, among other multi data sources as digital libraries, and have proven to be very effective. We integrate these algorithms into a multi-tier, robust and scalable JEE architecture, allowing many different types of clients with different requirements to consume our services. For users, DOCODE produces a number of visualizations and reports from the different outputs to let teachers and professors gain insights on the originality of the documents they review, allowing them to discover, understand and handle possible plagiarism cases and making it easier and much faster to analyze a vast number of documents. Our experience here is so far focused on the Chilean situation and the Spanish language, offering solutions to Chilean educational institutions in any of their preferred Virtual Learning Environments. However, DOCODE can easily be adapted to increase language coverage. 相似文献
17.
Grid-partition index: a hybrid method for nearest-neighbor queries in wireless location-based services 总被引:1,自引:0,他引:1
Baihua Zheng Jianliang Xu Wang-Chien Lee Dik Lun Lee 《The VLDB Journal The International Journal on Very Large Data Bases》2006,15(1):21-39
Traditional nearest-neighbor (NN) search is based on two basic indexing approaches: object-based indexing and solution-based
indexing. The former is constructed based on the locations of data objects: using some distance heuristics on object locations.
The latter is built on a precomputed solution space. Thus, NN queries can be reduced to and processed as simple point queries
in this solution space. Both approaches exhibit some disadvantages, especially when employed for wireless data broadcast in
mobile computing environments.
In this paper, we introduce a new index method, called the grid-partition index, to support NN search in both on-demand access and periodic broadcast modes of mobile computing. The grid-partition index
is constructed based on the Voronoi diagram, i.e., the solution space of NN queries. However, it has two distinctive characteristics.
First, it divides the solution space into grid cells such that a query point can be efficiently mapped into a grid cell around
which the nearest object is located. This significantly reduces the search space. Second, the grid-partition index stores
the objects that are potential NNs of any query falling within the cell. The storage of objects, instead of the Voronoi cells, makes
the grid-partition index a hybrid of the solution-based and object-based approaches. As a result, it achieves a much more
compact representation than the pure solution-based approach and avoids backtracked traversals required in the typical object-based
approach, thus realizing the advantages of both approaches.
We develop an incremental construction algorithm to address the issue of object update. In addition, we present a cost model
to approximate the search cost of different grid partitioning schemes. The performances of the grid-partition index and existing
indexes are evaluated using both synthetic and real data. The results show that, overall, the grid-partition index significantly
outperforms object-based indexes and solution-based indexes. Furthermore, we extend the grid-partition index to support continuous-nearest-neighbor
search. Both algorithms and experimental results are presented.
Edited by R. Guting 相似文献
18.
We describe a new multi-phase, color-based image retrieval system (FOCUS) which is capable of identifying multi-colored query objects in an image in the presence of significant, interfering backgrounds. The query object may occur in arbitrary sizes, orientations, and locations in the database images. Scale and rotation invariant color features have been developed to describe an image, such that the matching process is fast even in the case of complex images. The first phase of processing matches the query object color with the color content of an image computed as the peaks in the color histogram of the image. The second phase matches the spatial relationships between color regions in the image with the query using a spatial proximity graph (SPG) structure designed for the purpose. Processing at coarse granularity is preferred over pixel-level processing to produce simpler graphs, which significantly reduces computation time during matching. The speed of the system and the small storage overhead make it suitable for use in large databases with online user interfaces. Test results with multi-colored query objects from man-made and natural domains show that FOCUS is quite effective in handling interfering backgrounds and large variations in scale. The experimental results on a database of diverse images highlights the capabilities of the system. 相似文献
19.
《International Journal of Parallel, Emergent and Distributed Systems》2013,28(2):159-183
File system metadata management has become a bottleneck for many data-intensive applications that rely on high-performance file systems. Part of the bottleneck is due to the limitations of an almost 50-year-old interface standard with metadata abstractions that were designed at a time when high-end file systems managed less than 100 MB. Today's high-performance file systems store 7–9 orders of magnitude more data, resulting in a number of data items for which these metadata abstractions are inadequate, such as directory hierarchies unable to handle complex relationships among data. Users of file systems have attempted to work around these inadequacies by moving application-specific metadata management to relational databases to make metadata searchable. Splitting file system metadata management into two separate systems introduces inefficiencies and systems management problems. To address this problem, we propose QMDS: a file system metadata management service that integrates all file system metadata and uses a graph data model with attributes on nodes and edges. Our service uses a query language interface for file identification and attribute retrieval. We present our metadata management service design and architecture and study its performance using a text analysis benchmark application. Results from our QMDS prototype show the effectiveness of this approach. Compared to the use of a file system and relational database, the QMDS prototype shows superior performance for both ingest and query workloads. 相似文献
20.
Georgios D. Styliaras Georgios K. Tsolis Chris M. Papaterpos 《International Journal on Digital Libraries》2007,8(1):61-78
In this paper, AssetCollector is presented, which is a system for managing collections of cultural assets. AssetCollector
covers the needs of collection curators towards defining, populating and searching a collection in a flexible way, while supporting
them in generating reports based on the collection’s assets and reusing them in order to build web sites and CD-ROMs. In order
to support the above functionality, the system provides the content structuring subsystem, the content input subsystem, the
search subsystem and the report subsystem. The use of the subsystems is straightforward and requires no technical skills from
the curators. AssetCollector has been successfully applied for organizing various collections of cultural assets in Greece,
such as archaeological sites, museums and published books. In the future, an evaluation procedure is planned in order to further
refine the use of the system according to the targeted users’ needs. Furthermore, more import and export facilities will be
provided, which will make the system compliant with widely accepted standards. 相似文献