期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Towards the generation of dialogue acts in socio-affective ECAs: a corpus-based prosodic analysis

Rachel?Bawden Email author View author&#;s OrcID profile Chloé?Clavel Frédéric?Landragin 《Language Resources and Evaluation》2016,50(4):821-838

We present a corpus-based prosodic analysis with the aim of uncovering the relationship between dialogue acts, personality and prosody in view to providing guidelines for the ECA Greta’s text-to-speech system. The corpus used is the SEMAINE corpus, featuring four different personalities, further annotated for dialogue acts and prosodic features. In order to show the importance of the choice of dialogue act taxonomy, two different taxonomies were used, the first corresponding to Searle’s taxonomy of speech acts and the second, inspired by Bunt’s DIT++, including a division of directive acts into finer categories. Our results show that finer-grained distinctions are important when choosing a taxonomy. We also show with some preliminary results that the prosodic correlates of dialogue acts are not always as cited in the literature and prove more complex and variable. By studying the realisation of different directive acts, we also observe differences in the communicative strategies of the ECA depending on personality, in view to providing input to a speech system. 相似文献

2.

VoCMex: a voice corpus in Mexican Spanish for research in speaker recognition

José-Martín Olguín-Espinoza Pedro Mayorga-Ortiz Hugo Hidalgo-Silva Luis Vizcarra-Corral Mónica-Livier Mendiola-Cárdenas 《International Journal of Speech Technology》2013,16(3):295-302

Voice corpus is an essential element for automatic speaker recognition systems. In order for a corpus to be useful in recognition tasks, it must contain recordings from several speakers pronouncing phonetically balanced utterances; recorded through several sessions using different recording media. This work shows the methodology, development and evaluation of a Mexican Spanish Corpus referred as to VoCMex, which is aimed to support research on speaker recognition. It contains telephone and microphone recordings of 20 male and 13 female speakers, obtained through three sessions. In order to validate the usefulness of the corpus, a speaker identification system was developed and the recognition results were similar compared against those obtained using a known voice corpus. 相似文献

3.

An audio-visual corpus for multimodal automatic speech recognition

Andrzej Czyzewski Bozena Kostek Piotr Bratoszewski Jozef Kotus Marcin Szykulski 《Journal of Intelligent Information Systems》2017,49(2):167-192

相似文献

4.

Glissando: a corpus for multidisciplinary prosodic studies in Spanish and Catalan

Juan María Garrido David Escudero Lourdes Aguilar Valentín Cardeñoso Emma Rodero Carme de-la-Mota César González Carlos Vivaracho Sílvia Rustullet Olatz Larrea Yesika Laplaza Francisco Vizcaíno Eva Estebas Mercedes Cabrera Antonio Bonafonte 《Language Resources and Evaluation》2013,47(4):945-971

Literature review on prosody reveals the lack of corpora for prosodic studies in Catalan and Spanish. In this paper, we present a corpus intended to fill this gap. The corpus comprises two distinct data-sets, a news subcorpus and a dialogue subcorpus, the latter containing either conversational or task-oriented speech. More than 25 h were recorded by twenty eight speakers per language. Among these speakers, eight were professional (four radio news broadcasters and four advertising actors). The entire material presented here has been transcribed, aligned with the acoustic signal and prosodically annotated. Two major objectives have guided the design of this project: (i) to offer a wide coverage of representative real-life communicative situations which allow for the characterization of prosody in these two languages; and (ii) to conduct research studies which enable us to contrast the speakers different speaking styles and discursive practices. All material contained in the corpus is provided under a Creative Commons Attribution 3.0 Unported License. 相似文献

5.

InSight Interaction: a multimodal and multifocal dialogue corpus

Geert Brône Bert Oben 《Language Resources and Evaluation》2015,49(1):195-214

相似文献

6.

Stylistic analysis of a corpus of twentieth-century Spanish narrative

Estelle Irizarry 《Computers and the Humanities》1990,24(4):265-274

Statistical information on a substantial corpus of representative Spanish texts is needed in order to determine the significance of data about individual authors or texts by means of comparison. This study describes the organization and analysis of a 150,000-word corpus of 30 well-known twentieth-century Spanish authors. Tables show the computational results of analyses involving sentences, segments, quotations, and word length.The article explains the considerations that guided content, selection, and sample size, and describes special editing needed for the input of Spanish text. Separate sections highlight and comment upon some of the findings.The corpus and the tables provide objective data for studies of homogeneity and heterogeneity. The format of the tables permits others to add to the original 30 authors, organize the results by categories, or use the cumulative results for normative comparisons.Estelle Irizarry is Professor of Spanish at Georgetown University and author of 20 books and annotated editions dealing with Hispanic literature, art, and hoaxes. Her latest book, an edition of Infortunios de Alonso Ramirez, treats the disputed authorship of Spanish America's first novel. She is Courseware Editor of CHum. 相似文献

7.

An open diachronic corpus of historical Spanish

Felipe Sánchez-Martínez Isabel Martínez-Sempere Xavier Ivars-Ribes Rafael C. Carrasco 《Language Resources and Evaluation》2013,47(4):1327-1342

The impact-es diachronic corpus of historical Spanish compiles over one hundred books—containing approximately 8 million words—in addition to a complementary lexicon which links more than 10,000 lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under an open license (Creative Commons by-nc-sa) in order to permit their intensive exploitation in linguistic research. Approximately 7 % of the words in the corpus (a selection aimed at enhancing the coverage of the most frequent word forms) have been annotated with their lemma, part of speech, and modern equivalent. This paper describes the annotation criteria followed and the standards, based on the Text Encoding Initiative recommendations, used to represent the texts in digital form. 相似文献

8.

An interface for mutual disambiguation of recognition errors in a multimodal navigational assistant

Henry Lieberman Amy Chu 《Multimedia Systems》2007,12(4-5):393-402

Users often have tasks that can be accomplished with the aid of multiple media – for example with text, sound and pictures. For example, communicating an urban navigation route can be expressed with pictures and text. Today’s mobile devices have multimedia capabilities; cell phones have cameras, displays, sound output, and (soon) speech recognition. Potentially, these multimedia capabilities can be used for multimedia-intensive tasks, but two things stand in the way. First, recognition of visual input and speech recognition still remain unreliable. Second, the mechanics of integrating multiple media and recognition systems remains daunting for users. We address both these issues in a system, MARCO, multimodal agent for route construction. MARCO collects route information by taking pictures of landmarks, accompanied by verbal directions. We combine results from off-the-shelf speech recognition and optical character recognition to achieve better recognition of route landmarks than either recognition system alone. MARCO automatically produces an illustrated, step-by-step guide to the route. 相似文献

9.

Automatic coding of dialogue acts in collaboration protocols

Gijsbert Erkens Jeroen Janssen 《International Journal of Computer-Supported Collaborative Learning》2008,3(4):447-470

Although protocol analysis can be an important tool for researchers to investigate the process of collaboration and communication, the use of this method of analysis can be time consuming. Hence, an automatic coding procedure for coding dialogue acts was developed. This procedure helps to determine the communicative function of messages in online discussions by recognizing discourse markers and cue phrases in the utterances. Five main communicative functions are distinguished: argumentative, responsive, informative, elicitative, and imperative. A total of 29 different dialogue acts are specified and recognized automatically in collaboration protocols. The reliability of the automatic coding procedure was determined by comparing automatically coded dialogue acts to hand-coded dialogue acts by a human rater. The validity of the automatic coding procedure was examined using three different types of analyses. First, an examination of group differences was used (dialogue acts used by female versus male students). Ideally, the coding procedure should be able to distinguish between groups who are likely to communicate differently. Second, to examine the validity of the automatic coding procedure through examination of experimental intervention, the results of the automatic coding procedure of students, with access to a tool that visualizes the degree of participation of each student, were compared to students who did not have access to this tool. Finally, the validity of the automatic coding procedure of dialogue acts was examined using correlation analyses. Results of the automatic coding procedure of dialogue acts of utterances (form) were related to results of a manual coding procedure of the collaborative activities to which the utterances refer (content). The analyses presented in this paper indicate promising results concerning the reliability and validity of the automatic coding procedure for dialogue acts. However, limitations of the procedure were also found and discussed. 相似文献

10.

A multi-modal dialogue analysis method for medical interviews based on design of interaction corpus

Yuichi Koyama Yuichi Sawamoto Yasushi Hirano Shoji Kajita Kenji Mase Tomio Suzuki Kimiko Katsuyama Kazunobu Yamauchi 《Personal and Ubiquitous Computing》2010,14(8):767-778

We propose a multi-modal dialogue analysis method for medical interviews that hierarchically interprets nonverbal interaction patterns in a bottom-up manner and simultaneously visualizes the topic structure. Our method aims to provide physicians with the clues generally overlooked by conventional dialogue analysis to form a cycle of dialogue practice and analysis. We introduce a motif and a pattern cluster in the designs of the hierarchical indices of interaction and exploit the Jensen–Shannon divergence (JSD) metric to reduce the number of usable indices. We applied the proposed interpretation method of interaction patterns to develop a corpus of interviews. The results of a summary reading experiment confirmed the validity of the developed indices. Finally, we discussed the integrated analysis of the topic structure and a nonverbal summary. 相似文献

11.

An automatic speech recognition system for spontaneous Punjabi speech corpus

Yogesh Kumar Navdeep Singh 《International Journal of Speech Technology》2017,20(2):297-303

Automatic speech recognition is the central part of the wheel towards the natural person-to-machine interaction technique. Due to the high disparity of speaking styles, speech recognition surely demands composite methods to constitute this irregularity. A speech recognition method can work in numerous distinct states such as speaker dependent/independent speech, isolated/continuous/spontaneous speech recognition, for less to very large vocabulary. The Punjabi language is being spoken by concerning 104 million peoples in India, Pakistan and other countries with Punjabi migrants. The Punjabi language is written in Gurmukhi writing in Indian Punjab, while in Shahmukhi writing in Pakistani Punjab. In the paper, the objective is to build the speaker independent automatic spontaneous speech recognition system for the Punjabi language. The system is also capable to recognize the spontaneous Punjabi live speech. So far, no work has to be achieved in the area of spontaneous speech recognition system for the Punjabi language. The user interfaces for Punjabi live speech system is created by using the java programming. Till now, automatic speech system is trained with 6012 Punjabi words and 1433 Punjabi sentences. The performance measured in terms of recognition accuracy which is 93.79% for Punjabi words and 90.8% for Punjabi sentences. 相似文献

12.

An annotated corpus for the analysis of VP ellipsis

Johan Bos Jennifer Spenader 《Language Resources and Evaluation》2011,45(4):463-494

Verb Phrase Ellipsis (VPE) has been studied in great depth in theoretical linguistics, but empirical studies of VPE are rare. We extend the few previous corpus studies with an annotated corpus of VPE in all 25 sections of the Wall Street Journal corpus (WSJ) distributed with the Penn Treebank. We annotated the raw files using a stand-off annotation scheme that codes the auxiliary verb triggering the elided verb phrase, the start and end of the antecedent, the syntactic type of antecedent (VP, TV, NP, PP or AP), and the type of syntactic pattern between the source and target clauses of the VPE and its antecedent. We found 487 instances of VPE (including predicative ellipsis, antecedent-contained deletion, comparative constructions, and pseudo-gapping) plus 67 cases of related phenomena such as do so anaphora. Inter-annotator agreement was high, with a 0.97 average F-score for three annotators for one section of the WSJ. Our annotation is theory neutral, and has better coverage than earlier efforts that relied on automatic methods, e.g. simply searching the parsed version of the Penn Treebank for empty VP’s achieves a high precision (0.95) but low recall (0.58) when compared with our manual annotation. The distribution of VPE source–target patterns deviates highly from the standard examples found in the theoretical linguistics literature on VPE, once more underlining the value of corpus studies. The resulting corpus will be useful for studying VPE phenomena as well as for evaluating natural language processing systems equipped with ellipsis resolution algorithms, and we propose evaluation measures for VPE detection and VPE antecedent selection. The stand-off annotation is freely available for research purposes. 相似文献

13.

An intelligent multimedia information system for multimodal content extraction and querying

Yazici Adnan Koyuncu Murat Yilmaz Turgay Sattari Saeid Sert Mustafa Gulen Elvan 《Multimedia Tools and Applications》2018,77(2):2225-2260

Multimedia Tools and Applications - This paper introduces an intelligent multimedia information system, which exploits machine learning and database technologies. The system extracts semantic... 相似文献

14.

HEU Emotion: a large-scale database for multimodal emotion recognition in the wild

Chen Jing Wang Chenhui Wang Kejun Yin Chaoqun Zhao Cong Xu Tao Zhang Xinyi Huang Ziqiang Liu Meichen Yang Tao 《Neural computing & applications》2021,33(14):8669-8685

Neural Computing and Applications - The study of affective computing in the wild setting is underpinned by databases. Existing multimodal emotion databases in the real-world conditions are few and... 相似文献

15.

Architecture and dialogue design for a voice operated information system

Luis Villarejo Javier Hernando Núria Castell Jaume Padrell Alberto Abad 《Applied Intelligence》2006,24(3):253-261

相似文献

16.

Method of multimodal biometric data analysis for optimal efficiency evaluation of recognition algorithms and systems

V. V. Lobantsov I. A. Matveev A. B. Murynin 《Pattern Recognition and Image Analysis》2011,21(3):515-518

A primary consideration of this paper is to determine different factors influencing the reliability of performance evaluations of remote person recognition algorithms and systems. The authors suggest a method for determining and computing quantitative quality criteria of multimodal biometric data and consider the possibility of extrapolating test results to various practical applications. The functions of biometric data quality and biometric data artificiality that are introduced as a measure of proximity of the available biometric data to biometric data registered “naturally,” i.e., data of unaware and noncollaborative subjects, are under examination in this paper. 相似文献

17.

A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals

Snaith Mark Conway Nicholas Beinema Tessa De Franco Dominic Pease Alison Kantharaju Reshmashree Janier Mathilde Huizing Gerwin Pelachaud Catherine op den Akker Harm 《Language Resources and Evaluation》2021,55(4):1077-1092

Language resources for studying doctor–patient interaction are rare, primarily due to the ethical issues related to recording real medical consultations. Rarer still are resources that involve more than one healthcare professional in consultation with a patient, despite many chronic conditions requiring multiple areas of expertise for effective treatment. In this paper, we present the design, construction and output of the Patient Consultation Corpus, a multimodal corpus of simulated consultations between a patient portrayed by an actor, and at least two healthcare professionals with different areas of expertise. As well as the transcribed text from each consultation, the corpus also contains audio and video where for each consultation: the audio consists of individual tracks for each participant, allowing for clear identification of speakers; the video consists of two framings for each participant—upper-body and face—allowing for close analysis of behaviours and gestures. Having presented the design and construction of the corpus, we then go on to briefly describe how the multi-modal nature of the corpus allows it to be analysed from several different perspectives.

相似文献

18.

An integration of online and pseudo-online information for cursive word recognition

Steinherz T Rivlin E Intrator N Neskovic P 《IEEE transactions on pattern analysis and machine intelligence》2005,27(5):669-683

In this paper, we present a novel method to extract stroke order independent information from online data. This information, which we term pseudo-online, conveys relevant information on the offline representation of the word. Based on this information, a combination of classification decisions from online and pseudo-online cursive word recognizers is performed to improve the recognition of online cursive words. One of the most valuable aspects of this approach with respect to similar methods that combine online and offline classifiers for word recognition is that the pseudo-online representation is similar to the online signal and, hence, word recognition is based on a single engine. Results demonstrate that the pseudo-online representation is useful as the combination of classifiers perform better than those based solely on pure online information. 相似文献

19.

The CARES corpus: a database of older adult actor simulated emergency dialogue for developing a personal emergency response system

Victoria Young Alex Mihailidis 《International Journal of Speech Technology》2013,16(1):55-73

相似文献

20.

Multimedia design: the effects of relating multimodal information 总被引：1，自引：0，他引：1

M. Dubois & I. Vial 《Journal of Computer Assisted Learning》2000,16(2):157-165

Abstract Few models describe learner behaviour during the simultaneous processing of several types of information, yet this is the defining characteristic of the use of multimedia tools, which bring together media in different informational formats (fixed or moving images, sound, text). Following studies in cognitive psychology concerning the increase in the ability to form mental images of words, this study was aimed at defining how different multimedia presentation modes affect the learning of foreign language vocabulary (Russian). A statistically significant effect was observed on word memorisation in the different information presentation modes, suggesting better processing when there is co-referencing of the different sources, especially when the encoding and tests modes are the same. In addition to these experimental results, some principles for the design of multimodal learning tools are discussed. 相似文献