期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Chaveevan Pechsiri Asanee Kawtrakul 《计算机科学技术学报》2007,22(6):877-889

Mining causality is essential to provide a diagnosis within multiple sentences or EDUs （Elementary Discourse Unit） This research aims at extracting the causality existing The research emphasizes the use of causality verbs because they make explicit in a certain way the consequent events of a cause, e.g., ＂Aphids suck the sap from rice leaves. Then leaves will shrink. Later, they will become yellow and dry.＂. A verb can also be the causal-verb link between cause and effect within EDU（s）, e.g., ＂Aphids suck the sap from rice leaves causing leaves to be shrunk＂（＂causing＂ is equivalent, to a causal-verb link in Thai）. The research confronts two main problems： identifying the interesting causality events from documents and identifying their boundaries. Then, we propose mining on verbs by using two different machine learning techniques, Naive Bayes classifier and Support Vector Machine. The resulted mining rules will be used for the identification and the causality extraction of the multiple EDUs from text. Our multiple EDUs extraction shows 0.88 precision with 0.75 recall from Naive Bayes classifier and 0.89 precision with 0.76 recall from Support Vector Machine. 相似文献

2.

Automatic building of an ontology on the basis of text corpora in Thai

Aurawan Imsombut Asanee Kawtrakul 《Language Resources and Evaluation》2008,42(2):137-149

This paper presents a methodology for automatic learning of ontologies from Thai text corpora, by extraction of terms and relations. A shallow parser is used to chunk texts on which we identify taxonomic relations with the help of cues: lexico-syntactic patterns and item lists. The main advantage of the approach is that it simplify the task of concept and relation labeling since cues help for identifying the ontological concept and hinting their relation. However, these techniques pose certain problems, i.e. cue word ambiguity, item list identification, and numerous candidate terms. We also propose the methodology to solve these problems by using lexicon and co-occurrence features and weighting them with information gain. The precision, recall and F-measure of the system are 0.74, 0.78 and 0.76, respectively.

Asanee KawtrakulEmail:

相似文献

3.

A multilingual ontology for infectious disease surveillance: rationale, design and challenges

Nigel Collier Ai Kawazoe Lihua Jin Mika Shigematsu Dinh Dien Roberto A. Barrero Koichi Takeuchi Asanee Kawtrakul 《Language Resources and Evaluation》2006,40(3-4):405-413

A lack of surveillance system infrastructure in the Asia-Pacific region is seen as hindering the global control of rapidly spreading infectious diseases such as the recent avian H5N1 epidemic. As part of improving surveillance in the region, the BioCaster project aims to develop a system based on text mining for automatically monitoring Internet news and other online sources in several regional languages. At the heart of the system is an application ontology which serves the dual purpose of enabling advanced searches on the mined facts and of allowing the system to make intelligent inferences for assessing the priority of events. However, it became clear early on in the project that existing classification schemes did not have the necessary language coverage or semantic specificity for our needs. In this article we present an overview of our needs and explore in detail the rationale and methods for developing a new conceptual structure and multilingual terminological resource that focusses on priority pathogens and the diseases they cause. The ontology is made freely available as an online database and downloadable OWL file. 相似文献