期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Automated music video generation using multi-level feature-based segmentation 总被引：1，自引：1，他引：0

Jong-Chul Yoon In-Kwon Lee Siwoo Byun 《Multimedia Tools and Applications》2009,41(2):197-214

We show how to create a music video automatically, using computable characteristics of the video and music to promote coherent matching. We analyze the flow of both music and video, and then segment them into sequences of near-uniform flow. We extract features from the both video and music segments, and then find matching pairs. The granularity of the matching process can be adapted by extending the segmentation process to several levels. Our approach drastically reduces the skill required to make simple music videos.

Siwoo ByunEmail:

Jong-Chul Yoon received his B.S. and M.S. degree in Media from Ajou University in 2003 and 2005, respectively. He is currently a Ph.D. candidate in the Computer Science from Yonsei University. His research interests include computer animation, multi-media control, and geometric modeling. In-Kwon Lee received his B.S. degree in Computer Science from Yonsei University in 1989 and earned his M.S. and Ph.D. in Computer Science from POSTECH in 1992 and 1997, respectively. Currently, he is teaching and researching in the area of computer animation, geometric modeling, and computational music in Yonsei University. Siwoo Byun received his B.S. degree in Computer Science from Yonsei University in 1989 and earned his M.S. and Ph.D. in Computer Science from Korea Advanced Institute of Science and Technology (KAIST) in 1991 and 1999, respectively. Currently, he is teaching and researching in the area of distributed database systems, mobile computing, and fault-tolerant systems in Anyang University. 相似文献

2.

Virtual assistant: enhancing content acquisition by eliciting information from humans

Motoyuki?Ozeki Shunichi?Maeda Kanako?Obata Yuichi?Nakamura Email author 《Multimedia Tools and Applications》2009,44(3):433-448

In this paper, we propose the “Virtual Assistant,” a novel framework for supporting knowledge capturing in videos. The Virtual Assistant is an artificial agent that simulates a human assistant shown in TV programs and prompts users to provide feedback by asking questions. This framework ensures that sufficient information is provided in the captured content while users interact in a natural and enjoyable way with the agent. We developed a prototype agent based on a chatbot-like approach and applied it to a daily cooking scene. Experimental results demonstrate the potential of the Virtual Assistant framework, as it allows a person to provide feedback easily with few interruptions and elicits a variety of useful information.

Yuichi NakamuraEmail: URL: http://www.ccm.media.kyoto-u.ac.jp/index.php

Motoyuki Ozeki received his B.E, M.E. and Ph.D. degrees in engineering from University of Tsukuba, in 2000 and 2005, respectively. He worked as an assistant professor at Kyoto University since 2005. He is currently an assistant professor at Kyoto Institute of Technology. His research interests are in the areas of human-agent interaction and cognitive science. Shunichi Maeda received his B.E and M.E. degrees in electronical engineering from Kyoto University, in 2008. He is currently working in Patent Office (KAJI-SUHARA & ASSOCIATES). Kanako Obata received her B.E. degree in economics from Osaka Prefecture University in2004. She is currently an educational assistant at Kyoto University as since 2004. Her research interests are human-communication and cooking. Yuichi Nakamura received his BE degree in 1985, his ME and PhD degrees in electronical engineering from Kyoto University in 1987 and 1992, respectively. He worked as assistant professor at University of Tsukuba since 1993 and as associate professor since 1999. He is currently a professor at Kyoto University. His research interests and activities include human-computer interactions, video analysis, and video utilization for knowledge sources. 相似文献

3.

Supplementary loss concealment technique for image transmission through data hiding

Kyung-Su?Kim Hae-Yeoun?Lee Email author Heung-Kyu?Lee 《Multimedia Tools and Applications》2009,44(1):1-16

相似文献

4.

gBFlavor: a new tool for fast and automatic generation of generic bitstream syntax descriptions 总被引：1，自引：0，他引：1

Davy Van Deursen Wesley De Neve Davy De Schrijver Rik Van de Walle 《Multimedia Tools and Applications》2008,40(3):453-494

相似文献

5.

Semantic image classification using statistical local spatial relations model 总被引：1，自引：1，他引：0

Dongfeng Han Wenhui Li Zongcheng Li 《Multimedia Tools and Applications》2008,39(2):169-188

In this paper, a statistical model called statistical local spatial relations (SLSR) is presented as a novel technique of a learning model with spatial and statistical information for semantic image classification. The model is inspired by probabilistic Latent Semantic Analysis (PLSA) for text mining. In text analysis, PLSA is used to discover topics in a corpus using the bag-of-word document representation. In SLSR, we treat image categories as topics, therefore an image containing instances of multiple categories can be modeled as a mixture of topics. More significantly, SLSR introduces spatial relation information as a factor which is not present in PLSA. SLSR has rotation, scale, translation and affine invariant properties and can solve partial occlusion problems. Using the Dirichlet process and variational Expectation-Maximization learning algorithm, SLSR is developed as an implementation of an image classification algorithm. SLSR uses an unsupervised process which can capture both spatial relations and statistical information simultaneously. The experiments are demonstrated on some standard data sets and show that the SLSR model is a promising model for semantic image classification problems.

Wenhui Li (Corresponding author)Email:

Dongfeng Han received the B.Sc. 2002 and M.S. 2005 in computer science and technology from Jilin University, Changchun, P. R. China. From 2005, he pursuits the PhD degree in computer science and technology Jilin University. His research interests include computer vision, image processing, machine learning and pattern recognition. Wenhui Li received the PhD degree in computer science from Jilin University in 1996. Now he is a professor of Jilin University. His research interests include computer vision, computer graphic and virtual reality. Zongcheng Li undergraduated student of Shandong University of Technology, P. R. China. His research interests include computer vision and image processing. 相似文献

6.

Creating ambient music spaces in real and virtual worlds

Jakob Frank Thomas Lidy Ewald Peiszer Ronald Genswaider Andreas Rauber 《Multimedia Tools and Applications》2009,44(3):449-468

Sound and, specifically, music is a medium that is used for a wide range of purposes in different situations in very different ways. Ways for music selection and consumption range from completely passive, almost unnoticed perception of background sound environments to the very specific selection of a particular recording of a piece of music with a specific orchestra and conductor at a certain event. Different systems and interfaces exist for the broad range of needs in music consumption. Locating a particular recording is well supported by traditional search interfaces via metadata. Other interfaces support the automatic creation of playlists via artist or album selection, up to more artistic installations of sound environments that users can navigate through. In this paper we present a set of systems that support the creation of as well as the navigation in musical spaces, both in the real world as well as in virtual environments. We show common principles and point out further directions for a more direct coupling of the various spaces and interaction methods, creating ambient sound environments and providing organic interaction with music for different purposes.

Andreas RauberEmail:

Jakob Frank is a Research Assistant at the Department of Software Technology and Interactive Systems of the Vienna University of Technology (TU Vienna). He received his Bachelor in Computer Science from the Vienna University of Technology in 2006. His research focus is on music information retrieval, especially on mobile devices and multi-user audio interaction. He was co-organizer of the ISMIR 2007 conference and served as co-reviewer for several major international conferences. Thomas Lidy is a Research Assistant at the Department of Software Technology and Interactive Systems of the Vienna University of Technology (TU Vienna). He received his MSc in Computer Science from the Vienna University of Technology in 2007. His research focus is on music information retrieval, in particular feature extraction methods for digital audio, music classification, and clustering and visualization of digital music libraries. He participates actively in the annual MIREX benchmarking campaign and was co-organizer of the ISMIR 2007 conference. He is author of numerous papers in refereed international conferences and workshops and served as co-reviewer for several major international conferences. In 2007, he was awarded the Distinguished Young Alumnus Award and also received a Microsoft Sponsorship Award. Ewald Peiszer is a freelance web application and software developer with a strong scientific background. He received his MSc degree in Computer Science from Vienna University of Technology in 2007 with a master’s thesis on automatic audio segmentation. Working towards combining Music Information Retrieval (MIR) techniques with Virtual Reality infrastructure he completed an internship at the Center for Computer Graphics and Virtual Reality, Ewha Womans University (Seoul). Occasionally, he (co-)authors articles on MIR topics which is also a focus of his freelance projects. Ronald Genswaider graduated as Master of Economics in 2008 at the Department of Software Technology and Interactive Systems of the Vienna University of Technology (TU Vienna) as well as Master of Arts in the Department of Digital Arts at the University of Applied Arts in Vienna. He is working in Vienna as a free digital artist, Web developer and researcher. Currently he is working in various research projects in the R&D department at bwin and taking part in the exhibition “YOU_ser—Century of the consumer” at the ZKM in Karlsruhe, Germany. Andreas Rauber is Associate Professor at the Department of Software Technology and Interactive Systems of the Vienna University of Technology (TU Vienna). He received his MSc and PhD in Computer Science from the Vienna University of Technology in 1997 and 2000, respectively. He is actively involved in several research projects in the field of Digital Libraries, focusing on text and music information retrieval, the organization and exploration of large information spaces, as well as Web archiving and digital preservation. He has published numerous papers in refereed journals and international conferences and served as PC member and reviewer for several major journals, conferences and workshops. He also co-organized the ECDL 2005 and ISMIR 2007 conferences. 相似文献

7.

Analyzing the efficacy of using digital ink devices in a learning environment

Akila Varadarajan Nilesh Patel Bruce Maxim William I. Grosky 《Multimedia Tools and Applications》2008,40(2):211-239

There has been increased interest on the impact of mobile devices such as PDAs and Tablet PCs in introducing new pedagogical approaches and active learning experiences. We propose an intelligent system that efficiently addresses the inherent subjectivity in student perception of note taking and information retrieval. We employ the idea of cross indexing the digital ink notes with matching electronic documents in the repository. Latent Semantic Indexing is used to perform document and page level indexing. Thus for each retrieved document, the user can go over to the relevant pages that match the query. Techniques to handle problems such as polysemy (multiple meanings of a word) in large databases, document folding and no match for query are discussed. We tested our system for its performance, usability and effectiveness in the learning process. The results from the exploratory studies reveal that the proposed system provides a highly enhanced student learning experience, thereby facilitating high test scores.

William I. GroskyEmail:

Akila Varadarajan is a Senior Software Engineer at Motorola, IL with the Mobile devices division. Prior joining Motorola, she was a Software development intern at Autodesk, MI and Graduate Research assistant at University of Michigan - Dearborn. She received her MS in Computer Engineering from University of Michigan in 2006 and her BS in Computer Engineering from Madurai Kamaraj University, India in 2003. She is interested in Mobile computing - specifically Human Factors of Mobile Computing, Information retrieval and pattern recognition. Nilesh Patel is Assistant Professor in the department of Computer Science and Engineering at Oakland University, MI. He received his PhD and MS in Computer Science from Wayne State University, MI in 1997 and 1993. He is interested in Multimedia Information Processing - specifically audio and video indexing, retrieval and event detection, Pattern Recognition, Distributed Data Mining in a heterogeneous environment, and Computer Vision with special interest in medical imaging. Dr. Patel has also served in the automotive sector for several years and developed interest in Telematics and Mobile Computing. Bruce Maxim has worked as a software engineer for the past 31 years. He is a member of the Computer and Information Science faculty at the University of Michigan-Dearborn since 1985. He serves as the computing laboratory supervisor and head of the undergraduate programs in Computer Science, Software Engineering, and Information Systems. He has created more than 15 Computer and Information Science courses dealing with software engineering, game design, artificial intelligence, user interface design, web engineering, software quality, and computer programming. He has authored or co-authored four books on programming and software engineering. He has most recently served on the pedagogy subcommittee for Software Engineering 2004 and contributed to the IDGA Game Curriculum Framework 2008 guidelines. William I. Grosky is currently Professor and Chair of the Department of Computer and Information Science at University of Michigan - Dearborn, Dearborn, Michigan. Prior to joining the University of Michigan in 2001, he was Professor and Chair of the Department of Computer Science at Wayne State University, Detroit, Michigan. Before joining Wayne State University in 1976, he was an Assistant Professor in the Department of Information and Computer Science at Georgia Tech, Atlanta, Georgia. He received his B.S. in Mathematics from MIT in 1965, his M.S. in Applied Mathematics from Brown University in 1968, and his Ph.D. in Engineering and Applied Science from Yale University in 1971. 相似文献

8.

Computational linguistics for metadata building (CLiMB): using text mining for the automatic identification,categorization, and disambiguation of subject terms for image metadata

Judith L. Klavans Carolyn Sheffield Eileen Abels Jimmy Lin Rebecca Passonneau Tandeep Sidhu Dagobert Soergel 《Multimedia Tools and Applications》2009,42(1):115-138

In this paper, we present a system using computational linguistic techniques to extract metadata for image access. We discuss the implementation, functionality and evaluation of an image catalogers’ toolkit, developed in the Computational Linguistics for Metadata Building (CLiMB) research project. We have tested components of the system, including phrase finding for the art and architecture domain, functional semantic labeling using machine learning, and disambiguation of terms in domain-specific text vis a vis a rich thesaurus of subject terms, geographic and artist names. We present specific results on disambiguation techniques and on the nature of the ambiguity problem given the thesaurus, resources, and domain-specific text resource, with a comparison of domain-general resources and text. Our primary user group for evaluation has been the cataloger expert with specific expertise in the fields of painting, sculpture, and vernacular and landscape architecture.

Carolyn SheffieldEmail:

Judith L. Klavans is a Senior Research Scientist at the University of Maryland Institute for Advanced Computer Studies (UMIACS), and Principal Investigator on the Mellon-funded Computational Linguistics for Metadata Building (CLiMB) and IMLS-supported T³ research projects. Her research includes text-mining from corpora and dictionaries, disambiguation, and multilingual multidocument summarization. Previously, she directed the Center for Research on Information Access at Columbia University. Carolyn Sheffield holds an M.L.S. from the University of Maryland and her research interests include access issues surrounding visual and time-based materials. She designs, conducts and analyzes the CLiMB user studies and works closely with image catalogers to ensure that the CLiMB system reflects their needs and workflow. Eileen Abels is Masters’ Program Director and Professor in the College of Information Science and Technology at Drexel University. Prior to joining Drexel in January 2007, Dr. Abels spent more than 15 years at the College of Information Studies at the University of Maryland. Her research focuses on user needs and information behaviors. She works with a broad range of information users including translators, business school students and faculty, engineers, scientists, and members of the general public. Dr. Abels holds a PhD from the University of California, Los Angeles. Jimmy Lin’s research interests lie at the intersection of natural language processing and information retrieval. His work integrates knowledge- and data-driven approaches to address users’ information needs. Rebecca J. Passonneau is a Research Scientist at the Center for Computational Learning Systems, Columbia University. Her areas of interest include linking empirical research methods on corpora with computational models of language processing, the intersection of language and context in semantics and pragmatics, corpus design and analysis, and evaluation methods for NLP. Her current projects involve working with machine learning for the Consolidated Edison utility company, and designing an experimental dialog system to take patron book orders by phone for the Andrew Heiskell Braille and Talking Book library. Tandeep Sidhu is the Software Developer and Research Assistant for the CLiMB project. He is incharge of designing the CLiMB Toolkit as well as the NLP modules behind the Toolkit. He is currently pursuing his MS degree in Computer Science. Dagobert Soergel has been teaching information organization at the University of Maryland since 1970 and is an internationally known expert in Knowledge Organization Systems and in Digital Libraries. In the CLiMB project he served as general consultant and was specially involved in the design of study on the relationship between an image and cataloging terms assigned to it. 相似文献

9.

T-MAESTRO and its authoring tool: using adaptation to integrate entertainment into personalized t-learning 总被引：1，自引：1，他引：0

Marta Rey-López Rebeca P. Díaz-Redondo Ana Fernández-Vilas José J. Pazos-Arias Martín López-Nores Jorge García-Duque Alberto Gil-Solla Manuel Ramos-Cabrer 《Multimedia Tools and Applications》2008,40(3):409-451

Interactive Digital TV opens new learning possibilities where new forms of education are needed. On the one hand, the combination of education and entertainment is essential to boost the participation of viewers in TV learning (t-learning), overcoming their typical passiveness. On the other hand, researchers broadly agree that in order to prevent the learner from abandoning the learning experience, it is necessary to take into account his/her particular needs and preferences by means of a personalized experience. Bearing this in mind, this paper introduces a new approach to the conception of personalized t-learning: edutainment and entercation experiences. These experiences combine TV programs and learning contents in a personalized way, with the aim of using the playful nature of TV to make learning more attractive and to engage TV viewers in learning. This paper brings together our work in constructing edutainment/entercation experiences by relating TV and learning contents. Taking personalization one step further, we propose the adaptation of learning contents by defining A-SCORM (Adaptive-SCORM), an extension of the ADL SCORM standard. Over and above the adaptive add-ons, this paper focuses on two fundamental entities for the proposal: (1) an Intelligent Tutoring System, called T-MAESTRO, which constructs the t-learning experiences by applying semantic knowledge about the t-learners; and (2) the authoring tool which allow teachers to create adaptive courses with a minimal technical background.

Manuel Ramos-CabrerEmail:

Marta Rey-López is an assistant professor and a Ph.D. student in the Department of Telematics Engineering at the University of Vigo, where she received her degree in Telecommunication Engineering in 2004. Since 2004 she belongs to the Interactive Digital TV Lab, her research interests focus on the combination of TV programs and interactive applications for TV to provide distance education through this medium. Her more recent research deals with the application of Web 2.0 technologies to establish the relationships between those two different types of contents. Rebeca P. Díaz-Redondo is an associate professor in the Department of Telematics Engineering at the University of Vigo, where she received her Ph.D. in Computer Science in 2002, in the field of Software Engineering. She is a member of the Interactive Digital TV Lab, and her major research interests are interactive applications for TV as well as how they interact with the smart home environment. Ana Fernández-Vilas received her Ph.D. in Computer Science from the University of Vigo in 2002, in the field of Software Engineering. Since 1997, she is an associate professor in the Department of Telematics Engineering (University of Vigo). She is engaged in web services technologies and ubiquitous computing environments, being a member of the Interactive Digital TV Lab. José J. Pazos-Arias received his Ph.D. in Computer Science from the Department of Telematics Engineering the Polytechnic University of Madrid in 1995 in the field of Software Engineering. He is currently the head of the Networking and Software Engineering Group at the University of Vigo, which is currently involved with projects on middleware and applications for Interactive Digital TV that include learning through TV, recommendation of TV programmes, personalised advertising and t-government. Martín López-Nores is an assistant professor in the Department of Telematics Engineering of the University of Vigo since 2003, where he received his Ph.D. in Computer Science in 2006 in the field of Software Engineering techniques and its application to the field of Interactive Digital TV. He is a member of the Interactive Digital TV Lab, where he is especially interested in personalization of advertising and education. Jorge García-Duque is an associate professor in the Department of Telematics Engineering at the University of Vigo, where he received his Ph.D. in Computer Science in 2000, in the field of Software Engineering. His major research interests are related to the development of new software methodologies and services for Interactive Digital TV. Alberto Gil-Solla is an associate professor in the Department of Telematics Engineering at the University of Vigo, and a member of the Software Engineering Research Group. He received his Ph.D. in Computer Science from the University of Vigo in 2000, in the field of Software Engineering. He is involved with different aspects of middleware design and interactive multimedia services. Manuel Ramos-Cabrer received his Ph.D. in Telematics from the University of Vigo in 2000, in the field of Software Engineering, where he is an associate professor in Telematics Engineering since 2001. His research topics are Interactive Digital TV concentrating on recommender systems, integration with smart home environments and interactive applications design and development. 相似文献

10.

Receiver-side semantic reasoning for digital TV personalization in the absence of return channels 总被引：4，自引：3，他引：1

Martín López-Nores Yolanda Blanco-Fernández José J. Pazos-Arias Jorge García-Duque Manuel Ramos-Cabrer Alberto Gil-Solla Rebeca P. Díaz-Redondo Ana Fernández-Vilas 《Multimedia Tools and Applications》2009,41(3):407-436

Experience has proved that interactive applications delivered through Digital TV must provide personalized information to the viewers in order to be perceived as a valuable service. Due to the limited computational power of DTV receivers (either domestic set-top boxes or mobile devices), most of the existing systems have opted to place the personalization engines in dedicated servers, assuming that a return channel is always available for bidirectional communication. However, in a domain where most of the information is transmitted through broadcast, there are still many cases of intermittent, sporadic or null access to a return channel. In such situations, it is impossible for the servers to learn who is watching TV at the moment, and so the personalization features become unavailable. To solve this problem without sacrificing much personalization quality, this paper introduces solutions to run a downsized semantic reasoning process in the DTV receivers, supported by a pre-selection of material driven by audience stereotypes in the head-end. Evaluation results are presented to prove the feasibility of this approach, and also to assess the quality it achieves in comparison with previous ones.

Ana Fernández-VilasEmail:

Martín López-Nores received the Ph.D. degree in Computer Science from the University of Vigo in 2006. His research deals primarily with the design of personalization architectures for a range of DTV applications, considering both fixed and mobile receivers. Yolanda Blanco-Fernández received the Ph.D. degree in Computer Science from the University of Vigo in 2007. Her research is focused on knowledge representation, semantic reasoning technologies and recommender systems. José J. Pazos-Arias received the Ph.D. degree in Computer Science from the Madrid University of Technology (UPM) in 1995, and worked with Alcatel Laboratories in Madrid prior to joining the University of Vigo. He is the founder and director of the Networking & Software Engineering Group, which is currently involved with several projects related to DTV middleware and applications. Jorge García-Duque received the Ph.D. degree in Computer Science from the University of Vigo in 2000. His research is focused on the deployment of information services over heterogeneous networks of consumer devices. Manuel Ramos-Cabrer received the Ph.D. degree in Computer Science from the University of Vigo in 2000. His research interests include the application of artificial intelligence techniques to personalization systems. Alberto Gil-Solla received the Ph.D. degree in Computer Science from the University of Vigo in 2000. His research is currently involved with different aspects of middleware design and interactive multimedia services. Rebeca P. Díaz-Redondo received the Ph.D. degree in Computer Science from the University of Vigo in 2002. Her research is now focused on interactive DTV applications playing a central role in the control of smart home environments. Ana Fernández-Vilas received the Ph.D. degree in Computer Science from the University of Vigo in 2002. Her research interests deal with Web Services technologies and ubiquitous computing environments. 相似文献

11.

Personalized retrieval of sports video based on multi-modal analysis and user preference acquisition

Yi-Fan?Zhang Email author Changsheng?Xu Xiaoyu?Zhang Hanqing?Lu 《Multimedia Tools and Applications》2009,44(2):305-330

In this paper, we present a novel framework on personalized retrieval of sports video, which includes two research tasks: semantic annotation and user preference acquisition. For semantic annotation, web-casting texts which are corresponding to sports videos are firstly captured from the webpages using data region segmentation and labeling. Incorporating the text, we detect events in the sports video and generate video event clips. These video clips are annotated by the semantics extracted from web-casting texts and indexed in a sports video database. Based on the annotation, these video clips can be retrieved from different semantic attributes according to the user preference. For user preference acquisition, we utilize click-through data as a feedback from the user. Relevance feedback is applied on text annotation and visual features to infer the intention and interested points of the user. A user preference model is learned to re-rank the initial results. Experiments are conducted on broadcast soccer and basketball videos and show an encouraging performance of the proposed method.

Hanqing LuEmail:

Yi-Fan Zhang received the B.E. degree from Southeast University, Nanjing, China, in 2004. He is currently pursuing the Ph.D. degree at National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. In 2007, he was an intern student in Institute for Infocomm Research, Singapore. Currently he is an intern student in China-Singapore Institute of Digital Media. His research interests include multimedia, video analysis and pattern recognition. Changsheng Xu (M’97–SM’99) received the Ph.D. degree from Tsinghua University, Beijing, China in 1996. Currently he is Professor of Institute of Automation, Chinese Academy of Sciences and Executive Director of China-Singapore Institute of Digital Media. He was with Institute for Infocomm Research, Singapore from 1998 to 2008. He was with the National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences from 1996 to 1998. His research interests include multimedia content analysis, indexing and retrieval, digital watermarking, computer vision and pattern recognition. He published over 150 papers in those areas. Dr. Xu is an Associate Editor of ACM/Springer Multimedia Systems Journal. He served as Short Paper Co-Chair of ACM Multimedia 2008, General Co-Chair of 2008 Pacific-Rim Conference on Multimedia (PCM2008) and 2007 Asia-Pacific Workshop on Visual Information Processing (VIP2007), Program Co-Chair of VIP2006, Industry Track Chair and Area Chair of 2007 International Conference on Multimedia Modeling (MMM2007). He also served as Technical Program Committee Member of major international multimedia conferences, including ACM Multimedia Conference, International Conference on Multimedia & Expo, Pacific-Rim Conference on Multimedia, and International Conference on Multimedia Modeling. Xiaoyu Zhang received the B.S. degree in computer science from Nanjing University of Science and Technology in 2005. He is a Ph.D. candidate of National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. He is currently a student in China-Singapore Institute of Digital Media. His research interests include image retrieval, video analysis, and machine learning. Hanqing Lu (M’05–SM’06) received the Ph.D. degree in Huazhong University of Sciences and Technology, Wuhan, China in 1992. Currently he is Professor of Institute of Automation, Chinese Academy of Sciences. His research interests include image similarity measure, video analysis, object recognition and tracking. He published more than 100 papers in those areas. 相似文献

12.

Crossing textual and visual content in different application scenarios

Julien Ah-Pine Marco Bressan Stephane Clinchant Gabriela Csurka Yves Hoppenot Jean-Michel Renders 《Multimedia Tools and Applications》2009,42(1):31-56

This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.

Gabriela CsurkaEmail:

Dr. Julien Ah-Pine joined the XRCE Grenoble as Research Engineer in 2007. He is part of the Textual and Visual Pattern Analysis group and his current research activities are related to multi-modal information retrieval and machine learning. He received his PhD degree in mathematics from Pierre and Marie Curie University (University of Paris 6). From 2003 to 2007, he was with Thales Communications, working on relational analysis, data and text mining methods and social choice theory. Dr. Marco Bressan is Area Manager of the Textual and Visual Pattern Analysis area at Xerox Research Centre Europe. His main research interests are statistical learning and classification; image and video semantic scene understanding; image enhancement and aesthetics; object detection and recognition, particularly when dealing with uncontrolled environments. Prior to Xerox, several of his contributions in these fields were applied to a variety of scenarios including biometric solutions, data mining, CBIR and industrial vision. Dr. Bressan holds a BA in Applied Mathematics from the University of Buenos Aires, a M.Sc. in Computer Vision from the Computer Vision Centre in Spain and a Ph.D. in Computer Science and Artificial Intelligence from the Autonomous University of Barcelona. He is an active member of the network of Argentinean researchers abroad and one of the founders of the network of computer vision and cognitive science researchers. Stephane Clinchant is Ph.D. Student at University Joseph Fourier (Grenoble, France) and at the Xerox Research Centre Europe, that he joined in 2005. Before joining XRCE, Stephane obtained a Master Degree in Computer Sciences in 2005 from the Ecole Nationale Superieure d’Electrotechnique, d’Informatique, d’Hydraulique et des Telecommunications (France). His current research interests mainly focus on Machine Learning for Natural Language Processing and Multimedia Information Access. Dr. Gabriela Csurka is a research scientist in the Textual and Visual Pattern Analysis team at Xerox Research Centre Europe (XRCE). She obtained her Ph.D. degree (1996) in Computer Science from University of Nice Sophia - Antipolis. Before joining XRCE in 2002, she worked in fields such as stereo vision and projective reconstruction at INRIA (Sophia Antipolis, Rhone Alpes and IRISA) and image and video watermarking at University of Geneva and Institute Eurécom, Sophia Antipolis. Author of several publications in main journals and international conferences, she is also an active reviewer both for journals and conferences. Her current research interest concerns the exploration of new technologies for image content and aesthetic analysis, cross-modal image categorization and semantic based image segmentation. Yves Hoppenot is in charge of the development and integration of new technologies in our European research Technology Showroom. He is a software expert for the production, office and services sectors. Yves joined the Xerox Research Centre Europe in 2001. He graduated from the Ecole National Superieure des Telecommunications, Brest in France, and received a Master of Science degree from the Tampere University of Technology in Finland. Dr. Jean-Michel Renders joined the XRCE Grenoble as Research Engineer in 2001. His current research interests mainly focus on Machine Learning techniques applied to Statistical Natural Language Processing and Text Mining. Before joining XRCE, Jean-Michel obtained a PhD in Applied Sciences from the University of Brussels in 1993. He started his research activities in 1988, in the field of Robotics Dynamics and Control. Then, he joined the Joint Research Center of the European Communities to work on biologial metaphors (Genetic Algorithms, Neural Networks and Immune Networks) applied to process control. After spending one year as Visiting Scientist at York University (England), he spent 4 years applying Artificial Intelligence and Machine Learning Techniques in Industry (Tractebel - Suez). Then, he worked as Data Mining Senior Consultant and led projects in most major Belgian banks and utilities. 相似文献

13.

Media objects for user-centered similarity matching

Jean Martinet Shin’ichi Satoh Yves Chiaramella Philippe Mulhem 《Multimedia Tools and Applications》2008,39(2):263-291

相似文献

14.

Concept detection and keyframe extraction using a visual thesaurus 总被引：1，自引：0，他引：1

Evaggelos Spyrou Giorgos Tolias Phivos Mylonas Yannis Avrithis 《Multimedia Tools and Applications》2009,41(3):337-373

相似文献

15.

Improving the space cost of <Emphasis Type="Italic">k</Emphasis>-NN search in metric spaces by using distance estimators

Benjamin Bustos Gonzalo Navarro 《Multimedia Tools and Applications》2009,41(2):215-233

Similarity searching in metric spaces has a vast number of applications in several fields like multimedia databases, text retrieval, computational biology, and pattern recognition. In this context, one of the most important similarity queries is the k nearest neighbor (k-NN) search. The standard best-first k-NN algorithm uses a lower bound on the distance to prune objects during the search. Although optimal in several aspects, the disadvantage of this method is that its space requirements for the priority queue that stores unprocessed clusters can be linear in the database size. Most of the optimizations used in spatial access methods (for example, pruning using MinMaxDist) cannot be applied in metric spaces, due to the lack of geometric properties. We propose a new k-NN algorithm that uses distance estimators, aiming to reduce the storage requirements of the search algorithm. The method stays optimal, yet it can significantly prune the priority queue without altering the output of the query. Experimental results with synthetic and real datasets confirm the reduction in storage space of our proposed algorithm, showing savings of up to 80% of the original space requirement.

Gonzalo NavarroEmail:

Benjamin Bustos is an assistant professor in the Department of Computer Science at the University of Chile. He is also a researcher at the Millennium Nucleus Center for Web Research. His research interests are similarity searching and multimedia information retrieval. He has a doctoral degree in natural sciences from the University of Konstanz, Germany. Contact him at bebustos@dcc.uchile.cl. Gonzalo Navarro earned his PhD in Computer Science at the University of Chile in 1998, where he is now Full Professor. His research interests include similarity searching, text databases, compression, and algorithms and data structures in general. He has coauthored a book on string matching and around 200 international papers. He has (co)chaired international conferences SPIRE 2001, SCCC 2004, SPIRE 2005, SIGIR Posters 2005, IFIP TCS 2006, and ENC 2007 Scalable Pattern Recognition track; and belongs to the Editorial Board of Information Retrieval Journal. He is currently Head of the Department of Computer Science at University of Chile, and Head of the Millenium Nucleus Center for Web Research, the largest Chilean project in Computer Science research. 相似文献

16.

Parallel neural networks for multimodal video genre classification 总被引：2，自引：2，他引：0

Maurizio Montagnuolo Alberto Messina 《Multimedia Tools and Applications》2009,41(1):125-159

Improvements in digital technology have made possible the production and distribution of huge quantities of digital multimedia data. Tools for high-level multimedia documentation are becoming indispensable to efficiently access and retrieve desired content from such data. In this context, automatic genre classification provides a simple and effective solution to describe multimedia contents in a structured and well understandable way. We propose in this article a methodology for classifying the genre of television programmes. Features are extracted from four informative sources, which include visual-perceptual information (colour, texture and motion), structural information (shot length, shot distribution, shot rhythm, shot clusters duration and saturation), cognitive information (face properties, such as number, positions and dimensions) and aural information (transcribed text, sound characteristics). These features are used for training a parallel neural network system able to distinguish between seven video genres: football, cartoons, music, weather forecast, newscast, talk show and commercials. Experiments conducted on more than 100 h of audiovisual material confirm the effectiveness of the proposed method, which reaches a classification accuracy rate of 95%.

Alberto MessinaEmail:

Maurizio Montagnuolo Born in 1975, Maurizio Montagnuolo received his Laurea degree in Telecommunications Engineering from the Polytechnic of Turin in 2004, after developing his thesis at the RAI Research Centre. Currently, he is attending the Ph.D. course in “Business and Management” at the University of Turin, in collaboration with RAI, and supported by EuriX S.r.l., Turin. His main research interests concern the semantic classification of audiovisual content. Alberto Messina is from the RAI—Radiotelevisione Italiana Centre for Research and Technological Innovation (CRIT), Turin. He began his collaboration as a research engineer with RAI in 1996, when he completed his MS Thesis in Electronic Engineering (at Politecnico di Torino) about objective quality evaluation of MPEG2 video coding. After starting his career as a designer of RAI’s Multimedia Catalogue, he has been involved in several internal and international research projects in the field of digital archiving, with particular emphasis on automated documentation, and automated production. His current interests are ranging from file formats and metadata standards to the domain of content analysis and information extraction algorithms, where he now concentrates his main focus. Recently, he has started promising research activities concerning semantic information extraction from the numerical analysis of audiovisual material, particularly in the field of conceptual characterisation of multimedia objects, genre classification of multimedia items, automatic editorial segmentation of TV programmes. He is also author of technical and scientific publications in this subject area. He has extensive collaborations with the local University of Torino—Computer Science Department, which include common research projects and students’ tutorship. To complete his scientific formation, he has recently decided to take a PhD in the area of Computer Science. He is active member of several EBU projects including P/TVFILE, P/MAG and P/CP, chairman of the P/SCAIE project dealing with automatic metadata extraction techniques. He is currently working in the EU PrestoSpace project in the Metadata Access and Delivery area. He has served as Programme Committee Member in a Special Track of the 10th Conference of Italian Association of Artificial Intelligence, and in the First Workshop on Ambient media Delivery and Interactive Television (AMDIT08). 相似文献

17.

Attributing semantics to personal photographs 总被引：1，自引：1，他引：0

Rodrigo F. Carvalho Sam Chapman Fabio Ciravegna 《Multimedia Tools and Applications》2009,42(1):73-96

A major bottleneck for the efficient management of personal photographic collections is the large gap between low-level image features and high-level semantic contents of images. This paper proposes and evaluates two methodologies for making appropriate (re)use of natural language photographic annotations for extracting references to people, location and objects and propagating any location references encountered to previously unannotated images. The evaluation identifies the strengths of each approach and shows extraction and propagation results with promising accuracy.

Fabio CiravegnaEmail:

Rodrigo F. Carvalho is a Ph.D. student at the University of Sheffield, UK. His research interests lie in the application of contextual and social information for enhancing image related metadata with the intent of improving future retrieval and sharing of photographic resources. He has worked previously on European and commercial research projects targeted towards the extraction of information for use in emergency response scenarios and for the management of personal photographic memories. Sam Chapman is a senior Research Associate at the University of Sheffield, UK. His research investigates cutting edge semantic technology to facilitate knowledge processes across large organisations with a focus upon search, acquisition and integration of knowledge from various media. He works on a number of european, national and commercial research projects concerning the needs of aerospace, historical research, archaeology support, personal photographic memories and emergency response amongst others. He is also the Director of Technology for Knowledge Now Ltd where he commercialises knowledge acquisition and query technologies to aid a wide variety of industries. Fabio Ciravegna is Professor of Language and Knowledge Technologies at the University of Sheffield. He is Director of the European Integrated Project IST X-Media (), and principal investigator in several European and National projects. He coordinates industrial projects funded by Rolls-Royce plc, Kodak Eastman and Lycos Europe. He is member of the editorial board of the International Journal on “Web Semantics” and of the “International Journal of Human Computer Studies”. Fabio is general chair of the 6th European Semantic Web Conference (2009) (). He is director of K-Now Ltd, a spin-off company supporting dynamic distributed communities in large organizations. He holds a Ph.D. from the University of East Anglia and a doctorship from the University of Torino, Italy. 相似文献

18.

An RTP/RTCP based approach for multimedia group and inter-stream synchronization 总被引：1，自引：0，他引：1

Fernando Boronat Seguí Juan Carlos Guerri Cebollada Jaime Lloret Mauri 《Multimedia Tools and Applications》2008,40(2):285-319

Most multimedia group and inter-stream synchronization techniques define or use proprietary protocols with new control messages. Many multimedia applications have been developed using RTP/RTCP as the standard for transmission of multimedia streams over IP networks. Instead of defining a new protocol, we propose the use of RTP/RTCP to provide synchronization. We take advantage of the feedback capabilities provided by RTCP and the ability to extend the protocol by extending and creating RTCP messages containing synchronization information. We have implemented our proposal and tested it in our University WAN. Our experiments have shown that network load resulting from synchronization is minimized and that asynchronies are within acceptable limits for multimedia applications.

Jaime Lloret MauriEmail:

Dr. Fernando Boronat Seguí was born in Gandia, (Spain) and went to the Polytechnic University of Valencia (UPV) in Spain, where he obtained, in 1993, his M.Sc. in Telecommunications Engineering. In 1994 he worked for a couple of years for Telecommunication Companies before moving back to the UPV in 1996 where he is Lecturer in the Communications Department at the Escuela Politécnica Superior de Gandia. He obtained his PhD degree in 2004 and his topics of interest are Communication networks, Multimedia Systems and Multimedia Synchronization Protocols. He is IEEE member since 1993 and is involved in several IPCs of national and international conferences. Dr. Juan Carlos Guerri Cebollada obtained PhD degree in 1997 and is Lecturer at UPV and he also is the person responsible for the Multimedia Communications Research Group, included in the Instituto de Telecomunicaciones y Aplicaciones Multimedia (iTEAM) at the UPV. He is involved in several IPCs of national and international conferences. Dr. Jaime Lloret Mauri received his M.Sc. in Physics in 1997, his M.Sc. in Electronic Engineering in 2003 at University of Valencia (Spain) and his Ph.D. in telecommunication engineering from the UPV in 2006. He is a Cisco Certified Network Professional Instructor and he also teaches in the EPSG at the UPV. He has been working as a network administrator in several companies. Nowadays he is researching on P2P Networks and on sensor Networks. He is a member of IASTED, and is involved in several IPCs of national and international conferences. 相似文献

19.

Using visual and text features for direct marketing on multimedia messaging services domain

Sebastiano Battiato Giovanni Maria Farinella Giovanni Giuffrida Catarina Sismeiro Giuseppe Tribulato 《Multimedia Tools and Applications》2009,42(1):5-30

Traditionally, direct marketing companies have relied on pre-testing to select the best offers to send to their audience. Companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the industry has been under increased pressure to further optimize learning, in particular when facing severe time and learning space constraints. The main contribution of the present work is to demonstrate that direct marketing firms can exploit the information on visual content to optimize the learning phase. This paper proposes a two-phase learning strategy based on a cascade of regression methods that takes advantage of the visual and text features to improve and accelerate the learning process. Experiments in the domain of a commercial Multimedia Messaging Service (MMS) show the effectiveness of the proposed methods and a significant improvement over traditional learning techniques. The proposed approach can be used in any multimedia direct marketing domain in which offers comprise both a visual and text component.

Giuseppe TribulatoEmail:

Sebastiano Battiato was born in Catania, Italy, in 1972. He received the degree in Computer Science (summa cum laude) in 1995 and his Ph.D in Computer Science and Applied Mathematics in 1999. From 1999 to 2003 he has lead the “Imaging” team c/o STMicroelectronics in Catania. Since 2004 he works as a Researcher at Department of Mathematics and Computer Science of the University of Catania. His research interests include image enhancement and processing, image coding and camera imaging technology. He published more than 90 papers in international journals, conference proceedings and book chapters. He is co-inventor of about 15 international patents. He is reviewer for several international journals and he has been regularly a member of numerous international conference committees. He has participated in many international and national research projects. He is an Associate Editor of the SPIE Journal of Electronic Imaging (Specialty: digital photography and image compression). He is director of ICVSS (International Computer Vision Summer School). He is a Senior Member of the IEEE. Giovanni Maria Farinella is currently contract researcher at Dipartimento di Matematica e Informatica, University of Catania, Italy (IPLAB research group). He is also associate member of the Computer Vision and Robotics Research Group at University of Cambridge since 2006. His research interests lie in the fields of computer vision, pattern recognition and machine learning. In 2004 he received his degree in Computer Science (egregia cum laude) from University of Catania. He was awarded a Ph.D. (Computer Vision) from the University of Catania in 2008. He has co-authored several papers in international journals and conferences proceedings. He also serves as reviewer numerous international journals and conferences. He is currently the co-director of the International Summer School on Computer Vision (ICVSS). Giovanni Giuffrida is an assistant professor at University of Catania, Italy. He received a degree in Computer Science from the University of Pisa, Italy in 1988 (summa cum laude), a Master of Science in Computer Science from the University of Houston, Texas, in 1992, and a Ph.D. in Computer Science, from the University of California in Los Angeles (UCLA) in 2001. He has an extensive experience in both the industrial and academic world. He served as CTO and CEO in the industry and served as consultant for various organizations. His research interest is on optimizing content delivery on new media such as Internet, mobile phones, and digital tv. He published several papers on data mining and its applications. He is a member of ACM and IEEE. Catarina Sismeiro is a senior lecturer at Imperial College Business School, Imperial College London. She received her Ph.D. in Marketing from the University of California, Los Angeles, and her Licenciatura in Management from the University of Porto, Portugal. Before joining Imperial College Catarina had been and assistant professor at Marshall School of Business, University of Southern California. Her primary research interests include studying pharmaceutical markets, modeling consumer behavior in interactive environments, and modeling spatial dependencies. Other areas of interest are decision theory, econometric methods, and the use of image and text features to predict the effectiveness of marketing communications tools. Catarina’s work has appeared in innumerous marketing and management science conferences. Her research has also been published in the Journal of Marketing Research, Management Science, Marketing Letters, Journal of Interactive Marketing, and International Journal of Research in Marketing. She received the 2003 Paul Green Award and was the finalist of the 2007 and 2008 O’Dell Awards. Catarina was also a 2007 Marketing Science Institute Young Scholar, and she received the D. Antonia Adelaide Ferreira award and the ADMES/MARKTEST award for scientific excellence. Catarina is currently on the editorial boards of the Marketing Science journal and the International Journal of Research in Marketing. Giuseppe Tribulato was born in Messina, Italy, in 1979. He received the degree in Computer Science (summa cum laude) in 2004 and his Ph.D in Computer Science in 2008. From 2005 he has lead the research team at Neodata Group. His research interests include data mining techniques, recommendation systems and customer targeting. 相似文献

20.

SMIL State: an architecture and implementation for adaptive time-based web applications 总被引：1，自引：1，他引：0

Jack?Jansen Email author Dick?C.?A.?Bulterman 《Multimedia Tools and Applications》2009,43(3):203-224

In this paper we examine adaptive time-based web applications (or presentations). These are interactive presentations where time dictates which parts of the application are presented (providing the major structuring paradigm), and that require interactivity and other dynamic adaptation. We investigate the current technologies available to create such presentations and their shortcomings, and suggest a mechanism for addressing these shortcomings. This mechanism, SMIL State, can be used to add user-defined state to declarative time-based languages such as SMIL or SVG animation, thereby enabling the author to create control flows that are difficult to realize within the temporal containment model of the host languages. In addition, SMIL State can be used as a bridging mechanism between languages, enabling easy integration of external components into the web application. Finally, SMIL State enables richer expressions for content control. This paper defines SMIL State in terms of an introductory example, followed by a detailed specification of the State model. Next, the implementation of this model is discussed. We conclude with a set of potential use cases, including dynamic content adaptation and delayed insertion of custom content such as advertisements.

Dick C. A. BultermanEmail:

Jack Jansen Is a researcher at Centrum Wiskunde en Informatica (CWI), with over 25 years of experience in multimedia and distributed systems. Empowering people to put available technology to a use they themselves envision is his driving principle. This results in activities ranging from languages, such as Python, via web standardization work (SMIL, Rich Web Application Backplane) to implementing systems for accessible and reusable multimedia (Ambulant). Recently, he has finally started to pursue a PhD. Dick Bulterman Is head of distributed multimedia systems research at CWI, the Dutch national center for mathematics and computer science in Amsterdam. He is also a professor of computer science at the VU University in Amsterdam. Dr. Bulterman received his Ph.D. in computer science from Brown University in Providence RI (USA) in 1981. He has been co-chair of the W3C working group on synchronized multimedia since 2007; this group released the SMIL 3.0 Recommendation in late 2008. Bulterman has been active in the Document Engineering community since 2005. He is past program chair and past general chair of the ACM DocEng Symposium. He is also past chair of ACM Multimedia of and IEEE ISM. Dick Bulterman lives in Amsterdam with his wife and two children. 相似文献