首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The World Wide Web (WWW) has been recognized as the ultimate and unique source of information for information retrieval and knowledge discovery communities. Tremendous amount of knowledge are recorded using various types of media, producing enormous amount of web pages in the WWW. Retrieval of required information from the WWW is thus an arduous task. Different schemes for retrieving web pages have been used by the WWW community. One of the most widely used scheme is to traverse predefined web directories to reach a user's goal. These web directories are compiled or classified folders of web pages and are usually organized into hierarchical structures. The classification of web pages into proper directories and the organization of directory hierarchies are generally performed by human experts. In this work, we provide a corpus-based method that applies a kind of text mining techniques on a corpus of web pages to automatically create web directories and organize them into hierarchies. The method is based on the self-organizing map learning algorithm and requires no human intervention during the construction of web directories and hierarchies. The experiments show that our method can produce comprehensible and reasonable web directories and hierarchies.  相似文献   

2.
World cultural heritage is the accumulation and essence of the development of human civilization, as well as the rare and irreplaceable treasures bestowed by history. However, cultural heritage is increasingly exposed to various risks caused by natural and man-made factors. Flood risk is the most common and the most devastating risk for cultural heritage. This study proposes a visual analytics method that supports the visual analysis of flood risk from multiple aspects, including predicted flood peak flow, flood propagation, flood impact, and vulnerability. The proposed method can also provide the required information from multiple scales, including the basin-, site-, multi-cave-, and single-cave-scale levels. The combination of the visualization techniques of flood risk analysis will enable the proposed method to support users to make decisions with respect to mitigation measures. Lastly, the proposed method is evaluated by water experts and cultural heritage site managers.  相似文献   

3.
A text mining approach for automatic construction of hypertexts   总被引:1,自引:0,他引:1  
The research on automatic hypertext construction emerges rapidly in the last decade because there exists a urgent need to translate the gigantic amount of legacy documents into web pages. Unlike traditional ‘flat’ texts, a hypertext contains a number of navigational hyperlinks that point to some related hypertexts or locations of the same hypertext. Traditionally, these hyperlinks were constructed by the creators of the web pages with or without the help of some authoring tools. However, the gigantic amount of documents produced each day prevent from such manual construction. Thus an automatic hypertext construction method is necessary for content providers to efficiently produce adequate information that can be used by web surfers. Although most of the web pages contain a number of non-textual data such as images, sounds, and video clips, text data still contribute the major part of information about the pages. Therefore, it is not surprising that most of automatic hypertext construction methods inherit from traditional information retrieval research. In this work, we will propose a new automatic hypertext construction method based on a text mining approach. Our method applies the self-organizing map algorithm to cluster some at text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents will form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of at text documents collected from a newswire site. Although we only use Chinese text documents, our approach can be applied to any documents that can be transformed to a set of index terms.  相似文献   

4.
The automatic reconstruction of archeological pieces through the integration of a set of unknown segments is a highly complex problem which is still being researched. When only a few segments of the original piece are available, solutions exclusively based on computational algorithms are inefficient when attempting to create a credible whole restoration. Incomplete 3D puzzles must consequently be tackled by considering hybrid human/computer strategies. This paper presents a reconstruction approach in which the knowledge of human experts and computational solutions coexist together. Hypotheses, models and integration solutions originating from both humans and computers are thus continuously updated until an agreement is reached. This semi-automatic restoration approach has been tested on a set of ancient fractured pieces belonging to the remains of Roman sculptures at the well known Mérida site (Spain), and promising results have been obtained. The successful results and applicability of this method have led us to believe that computational solutions should evolve towards hybrid human-computer strategies.  相似文献   

5.
This is a review of methods for the automated analysis of texts. The features of algorithms and programs that are used at the morphological, lexical, syntactical, and discursive levels of a language system are described.  相似文献   

6.
Zhang  Haibo  Li  Kang  Kou  Jiaojiao  Chen  Xiaoxue  Hai  Linqi  Zhang  Junbo  Zhou  Mingquan  Geng  Guohua  Zhang  Shunli 《Multimedia Tools and Applications》2022,81(23):32817-32839
Multimedia Tools and Applications - With the help of laser scanner, the accurate digital information of cultural relics can be obtained. However, how to transfer the enormous and dense data by an...  相似文献   

7.

This paper presents a system developed for adaptive retrieval and the filtering of documents belonging to digital libraries available on the Web. This system, called InfoWeb, is currently in operation on the ENEA (National Entity for Alternative Energy) digital library Web site reserved to the cultural heritage and environment domain. InfoWeb records the user information needs in a user model, created through a representation, which extends the traditional vector space model and takes the form of a semantic network consisting of co-occurrences between index terms. The initial user model is built on the basis of stereotypes, developed through a clustering of the collection by using specific documents as a starting point. The user's query can be expanded in an adaptive way, using the user model formulated by the user himself. The system has been tested on the entire collection comprising about 14,000 documents in HTML/text format. The results of the experiments are satisfactory both in terms of performance and in terms of the system's ability to adapt itself to the user's shifting interests.  相似文献   

8.
9.
Heather Crawford  John Aycock 《Software》2008,38(14):1561-1567
Automatically generating ‘good’ domain names that are random yet pronounceable is a problem harder than it first appears. The problem is related to random word generation, and we survey and categorize existing techniques before presenting our own syllable‐based algorithm that produces higher‐quality results. Our results are also applicable elsewhere, in areas such as password generation, username generation, and even computer‐generated poetry. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

10.
Automatic mesh generation is one of the most important parts in CIMS (Computer Integrated Manufacturing System).A method based on mesh grading propagation which automatically produces a triangular mesh in a multiply connected planar region is presented in this paper.The method decomposes the planar region into convex subregions,using algorithms which run in linear time.For every subregion,an algorithm is used to generate shrinking polygons according to boundary gradings and form delaunay triangulation between two adjacent shrinking polygons,both in linear time.It automatically propagates boundary gradings into the interior of the region and produces satisfactory quasi-uniform mesh.  相似文献   

11.
Existing evolutionary approaches to automatic composition generate only a few melodies in a certain style that is specified by the setting of parameters or the design of fitness functions. Thus, their composition results cannot cover the various tastes of music. In addition, they are not able to deal with the multidimensional nature of music. This paper presents a novel multi-objective evolutionary approach to automatic melody composition in order to produce a variety of melodies at once. To this end, two conflicting fitness measures are investigated to evaluate the fitness of melody; (1) stability and (2) tension. Resorting to music theory, genetic operators (i.e., crossover and mutation) are newly designed to improve search capability in the multi-objective fitness space of music composition. The experimental results demonstrate the validity and effectiveness of the proposed approach. Moreover, the analysis of composition results proves that the proposed approach generates a set of pleasant and diverse melodies.  相似文献   

12.
13.
Sun  Xiao  He  Jiajin 《Multimedia Tools and Applications》2020,79(9-10):5439-5459
Multimedia Tools and Applications - As for the complexity of language structure, the semantic structure, and the relative scarcity of labeled data and context information, sentiment analysis has...  相似文献   

14.
Complex business models in large-scale enterprises deal with voluminous knowledge based on which most decisive official and technical documents are generated. Nowadays, template processors are available for generating such documents. However, the existing template processors are either labor intensive or complicated to suit well-established business model and knowledge repositories in a heterogeneous environment. Hence, a novel generalized adaptable and flexible template processor that utilizes the existing resources without modifying the business model is proposed. The tacit business intelligence defined as rules, knowledge repositories and document structure are the nodal agents of this approach. Further, an XML based Object Query Definition Markup Language for rule definition is newly suggested. The rules are reorganized into hierarchical DAG structured rules using a transformation algorithm and traversed using hybrid traversal. The required output document is represented through a template. Object wrappers act as the communicating agent between diversified datasets and the templates. The proposed architecture is modeled and implemented using set theory. It is experimented in a web-based distributed environment using JAVA and tested using a real world dataset of a large-scale engineering enterprise. The results demonstrate its adaptability and extensibility to any multi-organizational structure.  相似文献   

15.
This paper presents an application of a frequency domain approach to fault detection for the electro-mechanical test facility. An outline of the frequency domain design method is provided. The frequency domain residual generation is designed based on a linear model, and then tested on the various data sequences as given in the overview paper. Results of simulations, as well as a discussion of the method's capability, are also given.  相似文献   

16.
The Journal of Supercomputing - Software models at different levels of abstraction and from different perspectives contribute to the creation of compilable code in the implementation phase of the...  相似文献   

17.
Pattern Analysis and Applications - The most important intricacy when processing natural scene text images is the existence of fog, smoke or haze. These intrusion elements decrease the contrast and...  相似文献   

18.
Software testing is one of the most crucial and analytical aspect to assure that developed software meets prescribed quality standards. Software development process invests at least 50% of the total cost in software testing process. Optimum and efficacious test data design of software is an important and challenging activity due to the nonlinear structure of software. Moreover, test case type and scope determines the quality of test data. To address this issue, software testing tools should employ intelligence based soft computing techniques like particle swarm optimization (PSO) and genetic algorithm (GA) to generate smart and efficient test data automatically. This paper presents a hybrid PSO and GA based heuristic for automatic generation of test suites. In this paper, we described the design and implementation of the proposed strategy and evaluated our model by performing experiments with ten container classes from the Java standard library. We analyzed our algorithm statistically with test adequacy criterion as branch coverage. The performance adequacy criterion is taken as percentage coverage per unit time and percentage of faults detected by the generated test data. We have compared our work with the heuristic based upon GA, PSO, existing hybrid strategies based on GA and PSO and memetic algorithm. The results showed that the test case generation is efficient in our work.  相似文献   

19.
贵州省非物质文化遗产极为丰富,蕴涵着贵州各民族特有的精神价值、思维方式、想象力和文化意识,体现着贵州各民族的生命力和创造力。为了更好地发掘和保护贵州非物质文化遗产,通过对贵州省非物质文化遗产保护中心网的数据挖掘,采用词云分析、聚类分析和可视化技术等,对贵州非物质文化遗产的文本数据进行处理,并提取有价值的关键文本信息,为贵州非物质文化遗产的传承和发展提供依据。  相似文献   

20.
In recent years, Twitter has become one of the most important microblogging services of the Web 2.0. Among the possible uses it allows, it can be employed for communicating and broadcasting information in real time. The goal of this research is to analyze the task of automatic tweet generation from a text summarization perspective in the context of the journalism genre. To achieve this, different state-of-the-art summarizers are selected and employed for producing multi-lingual tweets in two languages (English and Spanish). A wide experimental framework is proposed, comprising the creation of a new corpus, the generation of the automatic tweets, and their assessment through a quantitative and a qualitative evaluation, where informativeness, indicativeness and interest are key criteria that should be ensured in the proposed context.From the results obtained, it was observed that although the original tweets were considered as model tweets with respect to their informativeness, they were not among the most interesting ones from a human viewpoint. Therefore, relying only on these tweets may not be the ideal way to communicate news through Twitter, especially if a more personalized and catchy way of reporting news wants to be performed. In contrast, we showed that recent text summarization techniques may be more appropriate, reflecting a balance between indicativeness and interest, even if their content was different from the tweets delivered by the news providers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号