期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A novel retrieval approach reflecting variability of syntactic phrase representation

Young-In Song Kyoung-Soo Han Sang-Bum Kim So-Young Park Hae-Chang Rim 《Journal of Intelligent Information Systems》2008,31(3):265-286

In this paper, we introduce variability of syntactic phrases and propose a new retrieval approach reflecting the variability of syntactic phrase representation. With variability measure of a phrase, we can estimate how likely a phrase in a given query would appear in relevant documents and control the impact of syntactic phrases in a retrieval model. Various experimental results over different types of queries and document collections show that our retrieval model based on variability of syntactic phrases is very effective in terms of retrieval performance, especially for long natural language queries. 相似文献

2.

Multiple categorizations of products: cognitive modeling of customers through social media data mining

Gil-Young Song Youngjoon Cheon Kihwang Lee Heuiseok Lim Kyung-Yong Chung Hae-Chang Rim 《Personal and Ubiquitous Computing》2014,18(6):1387-1403

As various forms of social media are spreading, we often witness that an idea of an individual user drives macroscopic changes. From the perspectives of product development and marketing, the opinions left by potential consumers in online social network can generate big ripple effects. This study analyzes the user opinions in online space to grasp preferences toward various products psychologically categorized by users. We also suggest an aspect of the market mentally configured by users using network modeling while following the framework of economic sociology. Existing analyses on online market place are mainly dealing with structural issues such as inter-actor relationships and status measurement. This study, however, analyzes complex preferences regarding diverse products and brands and derives a new model for inter-market connections. We expect that our study will provide important consequences on digital marketing and community design of corporations planning word of mouth effect in online space. 相似文献

3.

A segment-based annotation tool for Korean treebanks with minimal human intervention

So-Young Park Young-In Song Hae-Chang Rim 《Language Resources and Evaluation》2006,40(3-4):281-289

In this paper, we propose a segment-based annotation tool providing appropriate interactivity between a human annotator and an automatic parser. The proposed annotation tool provides the preview of a complete sentence structure suggested by the parser, and updates the preview whenever the annotator cancels or selects each segmentation point. Thus, the annotator can select the proper sentence segments maximizing parsing accuracy and minimizing human intervention. Experimental results show that the proposed tool allows the annotator to be able to reduce human intervention by approximately 39% compared with manual annotation. Sejong Korean treebank, one of the large scale treebanks, was constructed with the proposed annotation tool. 相似文献

4.

FEATURE-BASED KOREAN GRAMMAR UTILIZING LEARNED CONSTRAINT RULES

So-Young Park Yong-Jae Kwak Hae-Chang Rim Heui-Seok Lim 《Computational Intelligence》2005,21(1):69-89

In this paper, we propose a feature-based Korean grammar utilizing the learned constraint rules in order to improve parsing efficiency. The proposed grammar consists of feature structures, feature operations, and constraint rules; and it has the following characteristics. First, a feature structure includes several features to express useful linguistic information for Korean parsing. Second, a feature operation generating a new feature structure is restricted to the binary-branching form which can deal with Korean properties such as variable word order and constituent ellipsis. Third, constraint rules improve efficiency by preventing feature operations from generating spurious feature structures. Moreover, these rules are learned from a Korean treebank by a decision tree learning algorithm. The experimental results show that the feature-based Korean grammar can reduce the number of candidates by a third of candidates at most and it runs 1.5 ∼ 2 times faster than a CFG on a statistical parser. 相似文献

5.

Automatic Word Spacing Using Probabilistic Models Based on Character n-grams

Do-Gil Lee Hae-Chang Rim Dongsuk Yook 《Intelligent Systems, IEEE》2007,22(1):28-35

On the Internet, information is largely in text form, which often includes such errors as spelling mistakes. These errors complicate natural language processing because most NLP applications aren't robust and assume that the input data is noise free. Preprocessing is necessary to deal with these errors and meet the growing need for automatic text processing. One kind of such preprocessing is automatic word spacing. This process decides correct boundaries between words in a sentence containing spacing errors, which are a type of spelling error. Except for some Asian languages such as Chinese and Japanese, most languages have explicit word spacing. In these languages, word spacing is crucial to increase readability and to accurately communicate a text's meaning. Automatic word spacing plays an important role not only as a spell-checker module but also as a preprocessor for a morphological analyzer, which is a fundamental tool for NLP applications. Furthermore, automatic word spacing can serve as a postprocessor for optical-character-recognition systems and speech recognition systems 相似文献

6.

Some Effective Techniques for Naive Bayes Text Classification 总被引：3，自引：0，他引：3

Sang-Bum Kim Kyoung-Soo Han Hae-Chang Rim Sung Hyon Myaeng 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(11):1457-1466

While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM 相似文献

7.

Word Sense Disambiguation Using the Classification Information Model 总被引：1，自引：0，他引：1

Ho Lee Hae-Chang Rim Hungyun Seo 《Computers and the Humanities》2000,34(1-2):141-146

A Classification Information Model is a pattern classification model.The model decides the proper class of an input instance by integrating individual decisions, each of which is made with each feature in the pattern.Each individual decision is weighted according to the distributional property of the feature deriving the decision. An individual decision and its weight are represented as classification information which is extracted from the training instances.In the word sense disambiguation based on the model, the proper sense of an input instance is determined by the weighted sum of whole individual decisions derived from the features contained in the instance. 相似文献

8.

Resolving Ambiguous Segmentation of Korean Compound Nouns Using Statistics and Rules

Bo-Hyun Yun Yong-Jae Kwak & Hae-Chang Rim 《Computational Intelligence》1999,15(2):101-113

Korean compound nouns may be written as a sequence of characters without blanks between unit nouns. For Korean processing systems, Korean compound nouns have to be first segmented into a sequence of unit nouns. However, the segmentation task is difficult because a sequence of characters may be ambiguously segmented to several sequences of appropriate unit nouns. Moreover, this task is not trivial because Korean compound nouns may include many unknown unit nouns.
This paper proposes a new method for KCNS (Korean Compound Noun Segmentation) and reports on the appliccation of such a segmentationtechnique to enhance the performance of an information retrieval system. According to our method, compound nouns are first segmented by using a dictionary and structure patterns. If they are ambiguously segmented, we resolve the ambiguities by using statistical information and a preference rule. Moreover, we employ three kinds of heuristics in order to segment compound nouns with unknown unit nouns.
To evaluate KCNS, we use three kinds of data from various domains. Experimental results show that the precision of KCNS's output is approximately 96% on average, regardless of domains. The effectiveness of using the segmented unit nouns provided by KCNS for indexing is proved by improving retrieval performance of our information retrieval system. 相似文献

9.

Discovering High-Quality Threaded Discussions in Online Forums

下载免费PDF全文

Jung-Tae Lee Min-Chul Yang Hae-Chang Rim 《计算机科学技术学报》2014,(3):519-531

Archives of threaded discussions generated by users in online forums and discussion boards contain valuable knowledge on various topics. However, not all threads are useful because of deliberate abuses, such as trolling and flaming, that are commonly observed in online conversations. The existence of various users with different levels of expertise also makes it difficult to assume that every discussion thread stored online contains high-quality contents. Although finding high-quality threads automatically can help both users and search engines sift through a huge amount of thread archives and make use of these potentially useful resources effectively, no previous work to our knowledge has performed a study on such task. In this paper, we propose an automatic method for distinguishing high-quality threads from low-quality ones in online discussion sites. We first suggest four different artificial measures for inducing overall quality of a thread based on ratings of its posts. We then propose two tasks involving prediction of thread quality without using post rating information. We adopt a popular machine learning framework to solve the two prediction tasks. Experimental results on a real world forum archive demonstrate that our method can significantly improve the prediction performance across all four measures of thread quality on both tasks. We also compare how different types of features derived from various aspects of threads contribute to the overall performance and investigate key features that play a crucial role in discovering high-quality threads in online discussion sites. 相似文献

10.

A new generative opinion retrieval model integrating multiple ranking factors

Seung-Wook Lee Young-In Song Jung-Tae Lee Kyoung-Soo Han Hae-Chang Rim 《Journal of Intelligent Information Systems》2012,38(2):487-505

In this paper, we present clear and formal definitions of ranking factors that should be concerned in opinion retrieval and propose a new opinion retrieval model which simultaneously combines the factors from the generative modeling perspective. The proposed model formally unifies relevance-based ranking with subjectivity detection at the document level by taking multiple ranking factors into consideration: topical relevance, subjectivity strength, and opinion-topic relatedness. The topical relevance measures how strongly a document relates to a given topic, and the subjectivity strength indicates the likelihood that the document contains subjective information. The opinion-topic relatedness reflects whether the subjective information is expressed with respect to the topic of interest. We also present the universality of our model by introducing the model’s derivations that represent other existing opinion retrieval approaches. Experimental results on a large-scale blog retrieval test collection demonstrate that not only are the individual ranking factors necessary in opinion retrieval but they cooperate advantageously to produce a better document ranking when used together. The retrieval performance of the proposed model is comparable to that of previous systems in the literature. 相似文献