首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
In the current era of the internet, people use online media for conversation, discussion, chatting, and other similar purposes. Analysis of such material where more than one person is involved has a spate challenge as compared to other text analysis tasks. There are several approaches to identify users’ emotions from the conversational text for the English language, however regional or low resource languages have been neglected. The Urdu language is one of them and despite being used by millions of users across the globe, with the best of our knowledge there exists no work on dialogue analysis in the Urdu language. Therefore, in this paper, we have proposed a model which utilizes deep learning and machine learning approaches for the classification of users’ emotions from the text. To accomplish this task, we have first created a dataset for the Urdu language with the help of existing English language datasets for dialogue analysis. After that, we have preprocessed the data and selected dialogues with common emotions. Once the dataset is prepared, we have used different deep learning and machine learning techniques for the classification of emotion. We have tuned the algorithms according to the Urdu language datasets. The experimental evaluation has shown encouraging results with 67% accuracy for the Urdu dialogue datasets, more than 10, 000 dialogues are classified into five emotions i.e., joy, fear, anger, sadness, and neutral. We believe that this is the first effort for emotion detection from the conversational text in the Urdu language domain.  相似文献   

2.
When speakers of different languages interact, they are likely to influence each other: contact leaves traces in the linguistic record, which in turn can reveal geographical areas of past human interaction and migration. However, other factors may contribute to similarities between languages. Inheritance from a shared ancestral language and universal preference for a linguistic property may both overshadow contact signals. How can we find geographical contact areas in language data, while accounting for the confounding effects of inheritance and universal preference? We present sBayes, an algorithm for Bayesian clustering in the presence of confounding effects. The algorithm learns which similarities are better explained by confounders, and which are due to contact effects. Contact areas are free to take any shape or size, but an explicit geographical prior ensures their spatial coherence. We test sBayes on simulated data and apply it in two case studies to reveal language contact in South America and the Balkans. Our results are supported by findings from previous studies. While we focus on detecting language contact, the method can also be used to uncover other traces of shared history in cultural evolution, and more generally, to reveal latent spatial clusters in the presence of confounders.  相似文献   

3.
The necessity of adding safety information which is in minority languages into standards is discussed.The requirements of the WTO-TBT agreement and IEC are analyzed.The relevant laws and foundation technology for minority languages in China are discussed.The precedents of standards or technical regulations are enumerated. The conclusion is put forward that safety information in minority languages should be added into standards as soon as possible.  相似文献   

4.
《工程(英文)》2020,6(3):275-290
Natural language processing (NLP) is a subfield of artificial intelligence that focuses on enabling computers to understand and process human languages. In the last five years, we have witnessed the rapid development of NLP in tasks such as machine translation, question-answering, and machine reading comprehension based on deep learning and an enormous volume of annotated and unannotated data. In this paper, we will review the latest progress in the neural network-based NLP framework (neural NLP) from three perspectives: modeling, learning, and reasoning. In the modeling section, we will describe several fundamental neural network-based modeling paradigms, such as word embedding, sentence embedding, and sequence-to-sequence modeling, which are widely used in modern NLP engines. In the learning section, we will introduce widely used learning methods for NLP models, including supervised, semi-supervised, and unsupervised learning; multitask learning; transfer learning; and active learning. We view reasoning as a new and exciting direction for neural NLP, but it has yet to be well addressed. In the reasoning section, we will review reasoning mechanisms, including the knowledge, existing non-neural inference methods, and new neural inference methods. We emphasize the importance of reasoning in this paper because it is important for building interpretable and knowledge-driven neural NLP models to handle complex tasks. At the end of this paper, we will briefly outline our thoughts on the future directions of neural NLP.  相似文献   

5.
Shreesh Chaudhary 《Sadhana》1994,19(1):129-146
The present paper identifies some nonlinguistic and linguistic barriers that will have to be overcome by any system for automatic and simultaneous communication of news, commercial advertisements, and other items of information and entertainment by mass media across some Indian languages. The paper also presents a brief account of some theories for the representation of knowledge of language in a language-independent manner, because such theories can make simultaneous communication of an item across different languages rather easy. But more research is required in this field before the relevant knowledge can be represented in a language-independent manner. Presently automatic and simultaneous communication of an item from any language to many other languages does not seem easy. However, it seems that in a very limited way a beginning can be made in the direction of such communication by human editors aided by tools developed by computer scientists so far. Revised and expanded version of a paper presented at the discussion meeting on “Artificial intelligence and expert system technologies in the Indian context” held at Indian Institute of Science, Bangalore, India, July 22–26, 1991 However, I alone am responsible for mistakes, if any, here.  相似文献   

6.
It has been proposed that a serial founder effect could have caused the present observed pattern of global phonemic diversity. Here we present a model that simulates the human range expansion out of Africa and the subsequent spatial linguistic dynamics until today. It does not assume copying errors, Darwinian competition, reduced contrastive possibilities or any other specific linguistic mechanism. We show that the decrease of linguistic diversity with distance (from the presumed origin of the expansion) arises under three assumptions, previously introduced by other authors: (i) an accumulation rate for phonemes; (ii) small phonemic inventories for the languages spoken before the out-of-Africa dispersal; (iii) an increase in the phonemic accumulation rate with the number of speakers per unit area. Numerical simulations show that the predictions of the model agree with the observed decrease of linguistic diversity with increasing distance from the most likely origin of the out-of-Africa dispersal. Thus, the proposal that a serial founder effect could have caused the present observed pattern of global phonemic diversity is viable, if three strong assumptions are satisfied.  相似文献   

7.
In the field of natural language processing (NLP), the advancement of neural machine translation has paved the way for cross-lingual research. Yet, most studies in NLP have evaluated the proposed language models on well-refined datasets. We investigate whether a machine translation approach is suitable for multilingual analysis of unrefined datasets, particularly, chat messages in Twitch. In order to address it, we collected the dataset, which included 7,066,854 and 3,365,569 chat messages from English and Korean streams, respectively. We employed several machine learning classifiers and neural networks with two different types of embedding: word-sequence embedding and the final layer of a pre-trained language model. The results of the employed models indicate that the accuracy difference between English, and English to Korean was relatively high, ranging from 3% to 12%. For Korean data (Korean, and Korean to English), it ranged from 0% to 2%. Therefore, the results imply that translation from a low-resource language (e.g., Korean) into a high-resource language (e.g., English) shows higher performance, in contrast to vice versa. Several implications and limitations of the presented results are also discussed. For instance, we suggest the feasibility of translation from resource-poor languages for using the tools of resource-rich languages in further analysis.  相似文献   

8.
9.
The text classification process has been extensively investigated in various languages, especially English. Text classification models are vital in several Natural Language Processing (NLP) applications. The Arabic language has a lot of significance. For instance, it is the fourth mostly-used language on the internet and the sixth official language of the United Nations. However, there are few studies on the text classification process in Arabic. A few text classification studies have been published earlier in the Arabic language. In general, researchers face two challenges in the Arabic text classification process: low accuracy and high dimensionality of the features. In this study, an Automated Arabic Text Classification using Hyperparameter Tuned Hybrid Deep Learning (AATC-HTHDL) model is proposed. The major goal of the proposed AATC-HTHDL method is to identify different class labels for the Arabic text. The first step in the proposed model is to pre-process the input data to transform it into a useful format. The Term Frequency-Inverse Document Frequency (TF-IDF) model is applied to extract the feature vectors. Next, the Convolutional Neural Network with Recurrent Neural Network (CRNN) model is utilized to classify the Arabic text. In the final stage, the Crow Search Algorithm (CSA) is applied to fine-tune the CRNN model’s hyperparameters, showing the work’s novelty. The proposed AATC-HTHDL model was experimentally validated under different parameters and the outcomes established the supremacy of the proposed AATC-HTHDL model over other approaches.  相似文献   

10.
11.
Powerful computers are needed for processing tasks related to human languages these days. Human languages, also called natural languages, are highly versatile systems of encoding information and can capture information of various domains. To enable a computer to process information in human languages, the language needs to be appropriately ‘described’ to the computer, i.e. the language needs to be ‘modelled’. In this work, we present an approach for acquisition of morphology of inflectional language like Hindi. It is an unsupervised learning approach, suitable for languages with a rich concatenative morphology. Broadly, our work is carried out in three steps: 1. Acquire the morphology of Hindi from a raw (un annotated) Central Institute of Indian Languages (CIIL), Mysore text corpus, 2. prepare clusters and prepare stem bag and suffix bag, 3. use the morphological knowledge to decompose given word as stems and suffixes according to their morphological behaviour and add new words. A prime motivation behind this work is to eventually develop an unsupervised morphological analyser which is language-independent (used for Hindi). Second motivation is to develop a Morphological segmentation which is language-independent as it is shown that study of morphology would benefit to a range of NLP tasks such as speech recognition, speech synthesis, machine translation and information retrieval. Though Hindi is an important and a national language in India, little computational work has been done so far in this direction. Our work is one of the first efforts in this regard and can be considered pioneering. There are many such languages for which it is very important to have a suitable but inexpensive computational acquisition process. Languages receive very little attention of computational linguistic research both in terms of availability of funds and number of researchers. We however do not claim that our approach is a solution for all such languages. Different languages have characteristics that require individual research attention.  相似文献   

12.
The origin of Malagasy DNA is half African and half Indonesian, nevertheless the Malagasy language, spoken by the entire population, belongs to the Austronesian family. The language most closely related to Malagasy is Maanyan (Greater Barito East group of the Austronesian family), but related languages are also in Sulawesi, Malaysia and Sumatra. For this reason, and because Maanyan is spoken by a population which lives along the Barito river in Kalimantan and which does not possess the necessary skill for long maritime navigation, the ethnic composition of the Indonesian colonizers is still unclear. There is a general consensus that Indonesian sailors reached Madagascar by a maritime trek, but the time, the path and the landing area of the first colonization are all disputed. In this research, we try to answer these problems together with other ones, such as the historical configuration of Malagasy dialects, by types of analysis related to lexicostatistics and glottochronology that draw upon the automated method recently proposed by the authors. The data were collected by the first author at the beginning of 2010 with the invaluable help of Joselinà Soafara Néré and consist of Swadesh lists of 200 items for 23 dialects covering all areas of the island.  相似文献   

13.
语言和文化间密不可分,无论是在文化差别还是在文化融合方面,它起着举足轻重的作用。文化是人类长期发展的结果,不同的文化有其不同的特点,文化差异的存在是必然的。语言教学同样存在差异。本文主要从地理环境因素来阐述文化差别这一问题。  相似文献   

14.
Microbial communities display complex population dynamics, both in frequency and absolute density. Evolutionary game theory provides a natural approach to analyse and model this complexity by studying the detailed interactions among players, including competition and conflict, cooperation and coexistence. Classic evolutionary game theory models typically assume constant population size, which often does not hold for microbial populations. Here, we explicitly take into account population growth with frequency-dependent growth parameters, as observed in our experimental system. We study the in vitro population dynamics of the two commensal bacteria (Curvibacter sp. (AEP1.3) and Duganella sp. (C1.2)) that synergistically protect the metazoan host Hydra vulgaris (AEP) from fungal infection. The frequency-dependent, nonlinear growth rates observed in our experiments indicate that the interactions among bacteria in co-culture are beyond the simple case of direct competition or, equivalently, pairwise games. This is in agreement with the synergistic effect of anti-fungal activity observed in vivo. Our analysis provides new insight into the minimal degree of complexity needed to appropriately understand and predict coexistence or extinction events in this kind of microbial community dynamics. Our approach extends the understanding of microbial communities and points to novel experiments.  相似文献   

15.
ThePaninian framework proposeskarakas as semanticosyntactic relations that play a crucial role in mediating between surface form and meaning. The framework accounts for theta-role assignment, active passive, and control in a uniform manner. It has been successfully used in building an extremely fast prototype machine-translation system between two Indian languages. The constraint parser and the generator are designed with information theoretic considerations. Paninian framework is particularly suited to free word order languages. As most human languages are relatively word-order free, the Paninian framework should be explored as a serious contender for such languages. Based on the Paninian theory, the concept of language accessor oranusaraka has emerged, which has the potential to overcome the language barrier in India. In this paper non-English words occur many times and hence are italicized only at first mention. Several people have worked on the implementation of the core parser: Sivasubramanian, B Srinivas, P V Ravisankar on the earlier version, and Jayvant Anantpur, Vasudev Verma, Amba Kulkarni on the current version. Mr V N Narayana has been working on the Kannada-Hindi anusaraka.  相似文献   

16.
The frequency with which we use different words changes all the time, and every so often, a new lexical item is invented or another one ceases to be used. Beyond a small sample of lexical items whose properties are well studied, little is known about the dynamics of lexical evolution. How do the lexical inventories of languages, viewed as entire systems, evolve? Is the rate of evolution of the lexicon contingent upon historical factors or is it driven by regularities, perhaps to do with universals of cognition and social interaction? We address these questions using the Google Books N-Gram Corpus as a source of data and relative entropy as a measure of changes in the frequency distributions of words. It turns out that there are both universals and historical contingencies at work. Across several languages, we observe similar rates of change, but only at timescales of at least around five decades. At shorter timescales, the rate of change is highly variable and differs between languages. Major societal transformations as well as catastrophic events such as wars lead to increased change in frequency distributions, whereas stability in society has a dampening effect on lexical evolution.  相似文献   

17.
王龙  杨俊安  陈雷  林伟 《声学技术》2015,34(5):431-436
语言模型是语音识别系统的重要组成部分,目前的主流是n-gram模型。然而n-gram模型存在一些不足,对语句中长距信息描述差、数据稀疏是影响模型性能的两个重要因素。针对不足,研究者提出循环神经网络(Recurrent Neural Network,RNN)建模技术,在英语语言模型建模上取得了较好的效果。根据汉语特点将RNN建模方法应用于汉语语言建模,并结合两种模型的优点,提出了模型融合构建方法。实验结果表明:相比传统的n-gram语言模型,采用RNN训练的汉语语言模型困惑度(Per PLexity,PPL)有了下降,在对汉语电话信道的语音识别上,系统错误率也有下降,将两种语言模型融合后,系统识别错误率更低。  相似文献   

18.
The formation of sentences is a highly structured and history-dependent process. The probability of using a specific word in a sentence strongly depends on the ‘history’ of word usage earlier in that sentence. We study a simple history-dependent model of text generation assuming that the sample-space of word usage reduces along sentence formation, on average. We first show that the model explains the approximate Zipf law found in word frequencies as a direct consequence of sample-space reduction. We then empirically quantify the amount of sample-space reduction in the sentences of 10 famous English books, by analysis of corresponding word-transition tables that capture which words can follow any given word in a text. We find a highly nested structure in these transition tables and show that this ‘nestedness’ is tightly related to the power law exponents of the observed word frequency distributions. With the proposed model, it is possible to understand that the nestedness of a text can be the origin of the actual scaling exponent and that deviations from the exact Zipf law can be understood by variations of the degree of nestedness on a book-by-book basis. On a theoretical level, we are able to show that in the case of weak nesting, Zipf''s law breaks down in a fast transition. Unlike previous attempts to understand Zipf''s law in language the sample-space reducing model is not based on assumptions of multiplicative, preferential or self-organized critical mechanisms behind language formation, but simply uses the empirically quantifiable parameter ‘nestedness’ to understand the statistics of word frequencies.  相似文献   

19.
李芃  杨萍 《包装工程》2001,22(3):26-28
通过对平面设计视觉语言的简明性、生动性,民族性,时代性要素的分析,认为具有感染力的视觉形旬,能促进商品销售,使商品在激烈竞争中立于不败之地。  相似文献   

20.
The evolution of antibiotic resistance (AR) increases treatment cost and probability of failure, threatening human health worldwide. The relative importance of individual antibiotic use, environmental transmission and rates of introduction of resistant bacteria in explaining community AR patterns is poorly understood. Evaluating their relative importance requires studying a region where they vary. The construction of a new road in a previously roadless area of northern coastal Ecuador provides a valuable natural experiment to study how changes in the social and natural environment affect the epidemiology of resistant Escherichia coli. We conducted seven bi-annual 15 day surveys of AR between 2003 and 2008 in 21 villages. Resistance to both ampicillin and sulphamethoxazole was the most frequently observed profile, based on antibiogram tests of seven antibiotics from 2210 samples. The prevalence of enteric bacteria with this resistance pair in the less remote communities was 80 per cent higher than in more remote communities (OR = 1.8 [1.3, 2.3]). This pattern could not be explained with data on individual antibiotic use. We used a transmission model to help explain this observed discrepancy. The model analysis suggests that both transmission and the rate of introduction of resistant bacteria into communities may contribute to the observed regional scale AR patterns, and that village-level antibiotic use rate determines which of these two factors predominate. While usually conceived as a main effect on individual risk, antibiotic use rate is revealed in this analysis as an effect modifier with regard to community-level risk of resistance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号