首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring “islands of pronunciation reduction” that contain (potential) MWEs can be identified in a large speech corpus.  相似文献   

2.
Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-specified, and have meaning only when interpreted by humans. We are interested in developing computational systems which represent words, utterances, and underlying concepts in terms of sensory-motor experiences leading to richer levels of machine understanding. A key element of this work is the development of effective architectures for processing multisensory data. Inspired by theories of infant cognition, we present a computational model which learns words from untranscribed acoustic and video input. Channels of input derived from different sensors are integrated in an information-theoretic framework. Acquired words are represented in terms of associations between acoustic and visual sensory experience. The model has been implemented in a real-time robotic system which performs interactive language learning and understanding. Successful learning has also been demonstrated using infant-directed speech and images.  相似文献   

3.
Yoshiharu Masuda 《AI & Society》1996,10(3-4):259-272
Although this study does not directly prove the effects of vibrotactile stimulation and body movements, it does try to exhibit some observable effects on production of connected speech among three intellectually disabled subjects. Analysis of the subjects' utterances seems to show a certain improvement in prosodic features such as rhythmic structures and Fo (fundamental frequency) movement. On the other hand, segmental features like articulation of vowels and consonants remained relatively unchanged.  相似文献   

4.
This paper describes the development of LSESpeak, a spoken Spanish generator for Deaf people. This system integrates two main tools: a sign language into speech translation system and an SMS (Short Message Service) into speech translation system. The first tool is made up of three modules: an advanced visual interface (where a deaf person can specify a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, an emotional text to speech (TTS) converter to generate spoken Spanish. The visual interface allows a sign sequence to be defined using several utilities. The emotional TTS converter is based on Hidden Semi-Markov Models (HSMMs) permitting voice gender, type of emotion, and emotional strength to be controlled. The second tool is made up of an SMS message editor, a language translator and the same emotional text to speech converter. Both translation tools use a phrase-based translation strategy where translation and target language models are trained from parallel corpora. In the experiments carried out to evaluate the translation performance, the sign language-speech translation system reported a 96.45 BLEU and the SMS-speech system a 44.36 BLEU in a specific domain: the renewal of the Identity Document and Driving License. In the evaluation of the emotional TTS, it is important to highlight the improvement in the naturalness thanks to the morpho-syntactic features, and the high flexibility provided by HSMMs when generating different emotional strengths.  相似文献   

5.
The quality of a pull request is the primary factor integrators consider for its acceptance or rejection. Code smells indicate sub-optimal design or implementation choices in the source code that often lead to a fault-prone outcome, threatening the quality of pull requests. This study explores code smells in 21k pull requests from 25 popular Java projects. We find that both accepted (37%) and rejected (44%) pull requests have code smells, affected mainly by god classes and long methods. Besides, we observe that smelly pull requests are more complex and challenging to understand as they have significantly large sizes, long latency times, more discussion and review comments, and are submitted by contributors with less experience. Our results show that features used in previous studies for pull request acceptance prediction could be potentially employed to predict smell in incoming pull requests. We propose a dynamic approach to predict the presence of such code smells in the newly added pull requests. We evaluate our approach on a dataset of 25 Java projects extracted from GitHub. We further conduct a benchmark study to compare the performance of eight machine learning classifiers. Results of the benchmark study show that XGBoost is the best-performing classifier for smell prediction.  相似文献   

6.
Schema matching is an important step in database integration. It identifies elements in two or more databases that have the same meaning. A multitude of schema matching methods have been proposed, but little is known about how humans assign meaning to database elements or assess the similarity of meaning of database elements. This paper presents an initial experimental study based on five theories of meaning that compares the effects of seven factors on the perceived similarity of database elements. Implications for schema matching research are discussed and guidance for future research is offered.  相似文献   

7.
Two hundred and forty-seven companies in the United States (US) were surveyed to determine whether various factors were associated with ISDN implementation success. Previously identified implementation factors from research in innovation adoption and diffusion, IT implementation, and IS planning were selected as likely to be important in the study. The results indicated that compatibility, relative advantage, complexity, champion, management support, openness, and formalization factors were indicative of ISDN implementation success.  相似文献   

8.
在分析一般语音语料库建设方法的基础上,结合实际语料库需求和地域语言特点,提出了适用于电话信道维吾尔语口语语料库建设的设计规范及语音采集、标注方法,建立了时长300小时的电话信道维吾尔语口语语料库,并就电话信道对线性预测倒谱系数(LPCC)、梅尔倒谱系数(MFCC)、自适应成分加权特征(ACW)倒谱等语音特征参数的影响进行分析研究。  相似文献   

9.
Spoken language resources (SLRs) are essential for both research and application development. In this article we clarify the concept of SLR validation. We define validation and how it differs from evaluation. Further, relevant principles of SLR validation are outlined. We argue that the best way to validate SLRs is to implement validation throughout SLR production and have it carried out by an external and experienced institute. We address which tasks should be carried out by the validation institute, and which not. Further, we list the basic issues that validation criteria for SLR should address. A standard validation protocol is shown, illustrating how validation can prove its value throughout the production phase in terms of pre-validation, full validation and pre-release validation.
Henk van den HeuvelEmail:
  相似文献   

10.
Initial phoneme is used in spoken word recognition models. These are used to activate words starting with that phoneme in spoken word recognition models. Such investigations are critical for classification of initial phoneme into a phonetic group. A work is described in this paper using an artificial neural network (ANN) based approach to recognize initial consonant phonemes of Assamese words. A self organizing map (SOM) based algorithm is developed to segment the initial phonemes from its word counterpart. Using a combination of three types of ANN structures, namely recurrent neural network (RNN), SOM and probabilistic neural network (PNN), the proposed algorithm proves its superiority over the conventional discrete wavelet transform (DWT) based phoneme segmentation. The algorithm is exclusively designed on the basis of Assamese phonemical structure which consists of certain unique features and are grouped into six distinct phoneme families. Before applying the segmentation approach using SOM, an RNN is used to take some localized decision to classify the words into six phoneme families. Next the SOM segmented phonemes are classified into individual phonemes. A two-class PNN classification is performed with clean Assamese phonemes, to recognize the segmented phonemes. The validation of recognized phonemes is checked by matching the first formant frequency of the phoneme. Formant frequency of Assamese phonemes, estimated using the pole or formant location determination from the linear prediction model of vocal tract, is used effectively as a priori knowledge in the proposed algorithm.  相似文献   

11.
Numerous studies can be found on expert systems (ES) as a specific IT in the literature. However, their focus has been mainly on system development from a technical perspective. Based on an empirical study on expert systems diffusion in 20 British banking organizations, this paper reports the findings from the study. The study finds that ES infusion concentrated in specific processes that require extensive knowledge in banking. There is a strong organizational perspective towards expert systems development in these banking organizations rather than the technical perspective. ES diffusion and top management commitment are closely associated. The existence of IT strategy aligned with business strategy (and/or with an A1 element within it) is not a good predictor for ES adoptions. Three cases of expert systems applications in banking are given in the paper. These results are useful for practitioners in managing their intelligent systems projects and researchers for further studies in this area.  相似文献   

12.
The question of the “manner in which an existing software architecture affects requirements decision-making” is considered important in the research community; however, to our knowledge, this issue has not been scientifically explored. We do not know, for example, the characteristics of such architectural effects. This paper describes an exploratory study on this question. Specific types of architectural effects on requirements decisions are identified, as are different aspects of the architecture together with the extent of their effects. This paper gives quantitative measures and qualitative interpretation of the findings. The understanding gained from this study has several implications in the areas of: project planning and risk management, requirements engineering (RE) and software architecture (SA) technology, architecture evolution, tighter integration of RE and SA processes, and middleware in architectures. Furthermore, we describe several new hypotheses that have emerged from this study, that provide grounds for future empirical work. This study involved six RE teams (of university students), whose task was to elicit new requirements for upgrading a pre-existing banking software infrastructure. The data collected was based on a new meta-model for requirements decisions, which is a bi-product of this study.  相似文献   

13.
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output.  相似文献   

14.
Using the Internet, “public” computing grids can be assembled using “volunteered” PCs. To achieve this, volunteers download and install a software application capable of sensing periods of low local processor activity. During such times, this program on the local PC downloads and processes a subset of the project's data. At the completion of processing, the results are uploaded to the project and the cycle repeats.  相似文献   

15.
Palimpsest is a novel purely-visual language intended to support exploratory live programming. It demonstrates a new paradigm for the visual representation of constraint programming that may be appropriate to future generations of keyboardless and touchscreen devices. The current application domain is that of creative image manipulation, although the paradigm can support a wider range of computational expression. The combination of constraint semantics expressed via a novel image-layering metaphor provides a new approach to supporting a gradual slope of abstraction from direct manipulation to behaviour specification. Exploratory evaluations with a range of users give an indication of likely audiences, and opportunities for future development and application.  相似文献   

16.
The recent growth of service industries as well as the rise of e-commerce has increased the number of online customer service workers. Research on face-to-face service work has shown that these workers are expected to display certain emotions in the course of their work, a phenomenon known as emotional labor. However, little is known about emotional communication among online customer service workers. We explored emotional labor in an online context by examining the degree of emotional presence in mediated service interactions and its relationship with workers' acting strategies (i.e., surface acting, deep acting). Further, we examined if emotional presence and acting strategies predict job satisfaction as well as burnout. Data collected from 130 online customer service workers indicated that they perceive the highest emotional presence in phone conversations, followed by email and chat. Although there was little relationship between emotional presence and acting strategies, those who engage in surface acting are less satisfied with their job and more likely to experience burnout. In addition, those who feel a higher degree of emotional presence over the phone tend to experience higher job satisfaction and less burnout. These findings suggest that online customer service workers also engage in emotional labor.  相似文献   

17.
This paper presents work towards recognizing facial expressions that are used in sign language communication. Facial features are tracked to effectively capture temporal visual cues on the signers' face during signing. Face shape constraints are used for robust tracking within a Bayesian framework. The constraints are specified through a set of face shape subspaces learned by Probabilistic Principal Component Analysis (PPCA). An update scheme is also used to adapt to persons with different face shapes. Two tracking algorithms are presented, which differ in the way the face shape constraints are enforced. The results show that the proposed trackers can track facial features with large head motions, substantial facial deformations, and temporary facial occlusions by hand. The tracked results are input to a recognition system comprising Hidden Markov Models (HMM) and a support vector machine (SVM) to recognize six isolated facial expressions representing grammatical markers in American sign language (ASL). Tracking error of less than four pixels (on 640×480 videos) was obtained with probability greater than 90%; in comparison the KLT tracker yielded this accuracy with 76% probability. Recognition accuracy obtained for ASL facial expressions was 91.76% in person dependent tests and 87.71% in person independent tests.  相似文献   

18.
Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.  相似文献   

19.
We use a sequence of synthetic aperture radar (SAR) images to map differences in the flood and ebb tidal currents in a cove along the coast of Nova Scotia, Canada. The asymmetry in the tidal flow determines the flushing rate of the cove which, in turn, has a significant effect on biological production within the cove and its potential for aquaculture. We find significant differences in the SAR images collected on flood and ebb tides. Specifically there are well-defined lines on the flood images and large whorls on the ebb. A three-dimensional hydrodynamic model of the tidal currents is used to interpret the SAR images. In particular we use the model flow fields to wind back the SAR images to an earlier stage of the tide in an attempt to determine the physical origin of the features in the images. We conclude that the most likely explanation for the ebb-tide whorls is the advection of surface slicks.  相似文献   

20.
Computer and video gaming are often considered to be potential routes to the development of aptitude and interest in using other forms of information technology (IT). The purpose of this exploratory study was to determine the extent to which young people who play games engage in related IT practices, such as creating and sharing content or creating fan sites. Additional goals were to identify differences in such practices according to grade level, gender, and access to IT-related resources in the home, as well as to explore relationships between engagement in game-related practices and perceived proficiency in general computer-related skills.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号