首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.

The task of audio and music generation in the waveform domain has become possible due to recent advances in deep learning. Generative Adversarial Networks (GANs) are a type of generative model that has achieved success in areas such as image, video and audio generation. However, realistic audio generation with GANs is still a challenge, thanks to the specific characteristics inherent to this kind of data. In this paper we propose a GAN model that employs the self-attention mechanism and produces small chunks of music conditioned by instrument. We compare our model to a baseline and run ablation studies in order to demonstrate its superiority. We also suggest some applications of the model, particularly in the area of computer assisted composition.

  相似文献   

2.

This paper presents Scene2Wav, a novel deep convolutional model proposed to handle the task of music generation from emotionally annotated video. This is important because when paired with the appropriate audio, the resulting music video is able to enhance the emotional effect it has on viewers. The challenge lies in transforming the video to audio domain and generating music. Our proposed encoder Scene2Wav uses a convolutional sequence encoder to embed dynamic emotional visual features from low-level features in the colour space, namely Hue, Saturation and Value. The decoder Scene2Wav is a proposed conditional SampleRNN which uses that emotional visual feature embedding as condition to generate novel emotional music. The entire model is fine-tuned in an end-to-end training fashion to generate a music signal evoking the intended emotional response from the listener. By taking into consideration the emotional and generative aspect of it, this work is a significant contribution to the field of Human-Computer Interaction. It is also a stepping stone towards the creation of an AI movie and/or drama director, which is able to automatically generate appropriate music for trailers and movies. Based on experimental results, this model can effectively generate music that is preferred to the user when compared to the baseline model and able to evoke correct emotions.

  相似文献   

3.
Zhang  Hui  Zhang  Kejun  Cao  Yingping  Zheng  Jun  Huang  Xiaoyi  Yang  Changyuan  Sun  Lingyun 《Multimedia Tools and Applications》2020,79(9-10):5649-5670

Video plays a great important role in online apparel sales, which is a vital tool for publicity and to provide consumers with space of imagination. However, as the apparel market rapidly updates in large amounts every day, creating videos for fast increasing clothes can be challenging and labor-consuming. Considering this, we present ApVideor, a music-driven video generation system customized for displaying clothes. This system consists of two main modules: music recommendation module and audio-visual synthesis module. The former assists users in searching background music that matches the apparel style, while the latter combines the audio and visuals into a video by music-driven approaches. Our user study suggests that this system makes the video creation process significantly easier and faster than manual creation. Meanwhile, the viewer test suggests that apparel-displaying videos created using our system are of comparable quality to those created manually by people who have worked with video editing.

  相似文献   

4.
ABSTRACT

This article presents an innovative use of wearable technology for music performers, allowing them to interact with a mobile application that runs as a hands-free music score turner between Google Glass and a portable device such as a smartphone or a tablet. Using wink detection feature and head motion sensing of Google Glass, the application enables users to send a trigger without using their hands to turn pages of documents displayed on a nearby mobile device. The impetus for this work was music instrument players who need to turn music score sheets during their performance.

Beyond serving music performers, this project has potential benefits to users in any setting where restricted hand use makes it difficult to read documents or perform other computer-related tasks.  相似文献   

5.
Huang  Weijian  Wu  Jianhua  Song  Weihu  Wang  Zehua 《Applied Intelligence》2022,52(9):10297-10306

Knowledge Graph has attracted a wide range of attention in the field of recommendation, which is usually applied as auxiliary information to solve the problem of data sparsity. However, most recommendation models cannot effectively mine the associations between the items to be recommended and the entities in the Knowledge Graph. In this paper, we propose CAKR, a knowledge graph recommendation method based on the cross attention unit, which is similar to MKR, a multi-task feature learning general framework that uses knowledge graph embedding tasks to assist recommendation tasks. Specifically, we design a new method to optimize the feature interaction between the items and the corresponding entities in the Knowledge Graph and propose a feature cross-unit combined with the attention mechanism to enhance the recommendation effect. Through extensive experiments on the public datasets of movies, books, and music, we prove that CAKR is better than MKR and other knowledge graph recommendation methods so that the new feature cross-unit designed in this paper is effective in improving the accuracy of the recommendation system.

  相似文献   

6.
研究情境特征在文本分类中的作用,提出了一种层级双向LSTM模型用于情感分类问题。该模型首先将句子分词,把词向量作为第一层双向LSTM模型的输入;其次从文档中提取出稠密、连续的向量作为情境特征;然后将第一层模型的输出向量和情境向量共同输入第二层双向LSTM;最后将这种层级双向的LSTM模型的输出向量通过sigmoid函数进行分类。情境向量作用于每个句子,一致的情感得到增强,不一致的情感被弱化,从而提高了分类的精度。在两个公开数据集上的实验表明,整合了情境特征的层级双向LSTM取得较优的精度。除此之外,通过在一个包含两万余条中文评论的公开数据集上对模型进行测试,表明该模型测试正确率相比于普通的LSTM和双向LSTM都有提升,说明情境特征对于提升情感分类的作用比较显著。  相似文献   

7.

Since many years ago, musicians have composed music based on the images that they have had in their minds. On the other hand, music affects people’s imagination while hearing it. This research provides a method that can transform shape to music and music to shape. This method defines musical notations for horizontal, diagonal and vertical line segments, filled circle and curve with different colors, which are the basis of many shapes in transforming shapes into music. Then these primary mappings are generalized to more complex forms to transform any shape. Moreover, music can be transformed into shape by this method. For this transformation, primary musical notations such as simple notes, notes joined by a legato, notes with a staccato, notes joined by a legato and have crescendo or decrescendo and notes with an accent or a trill are defined. These primary musical notations are generalized to more complex forms to transform any music into shape. Also, the method of this research can be used in music cryptography. It employs mapping of notes in a twelve-tone equal musical system into shapes and mappings of shapes with an equal line width and different colors into music.

  相似文献   

8.
Music filesharing systems based on peer-to-peer technology are increasingly menacing the entire music industry. After neglecting the issue in the beginning, the major music labels are now struggling to enforce their property rights in the Internet in order to maintain their profits — up to now without substantial impact. The iTunes Music Store, presented by Apple Inc. in April 2003 is the first commercial service able to persuade a considerable number of internet users to pay for digital music they download. This article discusses the major factors which contribute to iTunes’ success and analyses the service from different perspectives. From a technological point of view, the digital rights management systems stand out as they are quite liberal and even include peer-to-peer elements. The business perspective suggests that its success lies in the integration of the music industry’s need for compensation with the customers’ requirements to exchange music files and to write them to CD. Taking a new institutional economics approach, the service pays attention to the fact that in the Internet age some (intellectual) property rights can not be enforced in a way the music industry was used to in pre-Internet times. Leaving some property rights not allocated the remaining property rights can be enforced more efficiently by presenting an innovative and easy-to-use service.  相似文献   

9.
Zhang  Hong  Tian  Chunwei  You  Lei  Li  Zhengming  Zong  Ming  Huang  Kan 《Multimedia Tools and Applications》2021,80(21-23):32091-32109

Color music has attracted great interest in real applications. However, the mismatching problem between music and color has not been resolved. This paper proposes a novel mechanism to map the corresponding relation between music and color, which is embedded into a device with a micro-processor to play music color flashing. The proposed mechanism deduces perfect fifth relation among the wavelengths of lights and determines 12 colors corresponding to musical notes of Twelve-tone equal temperament is determined. Specifically, when a piece of music is playing, the audio signal is sampled and transformed by Fast Fourier Transform (FFT). The method can judge color corresponding to a note and research the mixed light effect of RGB LED driven by PWM outputs. Extended experiments show that the effect of music playing with matching colors flashing in real time is reached, and the color of the mixed lights can automatically match with arbitrary music being played. The paper can reveal relationships between music and color from the perspective of frequency spectrum, and promote the development of the color music, which has broad applications.

  相似文献   

10.

In the past decades, a large number of music pieces are uploaded to the Internet every day through social networks, such as Last.fm, Spotify and YouTube, that concentrates on music and videos. We have been witnessing an ever-increasing amount of music data. At the same time, with the huge amount of online music data, users are facing an everyday struggle to obtain their interested music pieces. To solve this problem, music search and recommendation systems are helpful for users to find their favorite content from a huge repository of music. However, social influence, which contains rich information about similar interests between users and users’ frequent correlation actions, has been largely ignored in previous music recommender systems. In this work, we explore the effects of social influence on developing effective music recommender systems and focus on the problem of social influence aware music recommendation, which aims at recommending a list of music tracks for a target user. To exploit social influence in social influence aware music recommendation, we first construct a heterogeneous social network, propose a novel meta path-based similarity measure called WPC, and denote the framework of similarity measure in this network. As a step further, we use the topological potential approach to mine social influence in heterogeneous networks. Finally, in order to improve music recommendation by incorporating social influence, we present a factor graphic model based on social influence. Our experimental results on one real world dataset verify that our proposed approach outperforms current state-of-the-art music recommendation methods substantially.

  相似文献   

11.
The correlation between music and human motion has attracted widespread research attention. Although recent studies have successfully generated motion for singers, dancers, and musicians, few have explored motion generation for orchestral conductors. The generation of music-driven conducting motion should consider not only the basic music beats, but also mid-level music structures, high-level music semantic expressions, and hints for different parts of orchestras (strings, woodwind, etc.). However, most existing conducting motion generation methods rely heavily on human-designed rules, which significantly limits the quality of generated motion. Therefore, we propose a novel Music Motion Synchronized Generative Adversarial Network (M2S-GAN), which generates motions according to the automatically learned music representations. More specifically, M2S-GAN is a cross-modal generative network comprising four components: 1) a music encoder that encodes the music signal; 2) a generator that generates conducting motion from the music codes; 3) a motion encoder that encodes the motion; 4) a discriminator that differentiates the real and generated motions. These four components respectively imitate four key aspects of human conductors: understanding music, interpreting music, precision and elegance. The music and motion encoders are first jointly trained by a self-supervised contrastive loss, and can thus help to facilitate the music motion synchronization during the following adversarial learning process. To verify the effectiveness of our method, we construct a large-scale dataset, named ConductorMotion100, which consists of unprecedented 100 hours of conducting motion data. Extensive experiments on ConductorMotion100 demonstrate the effectiveness of M2S-GAN. Our proposed approach outperforms various comparison methods both quantitatively and qualitatively. Through visualization, we show that our approach can generate plausible, diverse, and music-synchronized conducting motion.  相似文献   

12.

Learning music has been demonstrated to provide many benefits for children. However, music students, especially beginners, often suffer from lack of motivation and even can be frustrated if their musical skills do not improve as they practice over and over. In such situations, they usually end up dropping out of music school. To face this challenge, in this work a novel approach based on mixed reality and gamification is proposed to motivate music students. This approach has been validated thanks to HoloMusic XP, a multimedia tool that helps students learn music and piano. The devised architecture that supports HoloMusic XP has been designed and developed to scale when new music concepts must be addressed. Thanks to the use of mixed reality, the usually steep learning curve for beginner students can be mitigated and complex music concepts can be simplified due to the use of visual metaphors. The system has been evaluated in a real environment by teachers and students to measure its effectiveness and usability. After conducting the experiments, an increase in the students’ motivation and a general understanding of the multimedia representation have been achieved.

  相似文献   

13.

Music categorization based on acoustic features extracted from music clips and user-defined tags forms the basis of recent music recommendation applications, because relevant tags can be automatically assigned based on the feature values and their relation to tags. In practice, especially for handheld lightweight mobile devices, there is a certain limitation on the computational capacity, owing to consumers’ usage behavior or battery consumption. This also limits the maximum number of acoustic features to be extracted, and results in the necessity of identifying a compact feature subset that is used for the music categorization process. In this study, we propose an approach to compact feature subset-based multi-label music categorization for mobile music recommendation services. Experimental results using various multi-labeled music datasets reveal that the proposed approach yields better performance when compared to conventional approach.

  相似文献   

14.
音乐生成是一种使用算法来生成音乐序列的研究。本文针对音乐样本特征提取以及自动作曲问题提出了一种基于音乐隐式特征和循环神经网络(recurrent neural network, RNN)的多声部音乐生成算法。该方法通过使用栈式自编码器对多声部音乐序列每个时间步的音符隐式特征进行提取,结合长短期记忆循环神经网络(long short-term memory, LSTM),以序列预测的方式搭建了基于隐式特征的音乐生成模型。仿真结果表明,该音乐生成算法在使用相同风格的音乐数据训练后,得到的模型可以生成旋律与和弦匹配较好的多声部音乐数据。  相似文献   

15.

The reading of music text from a computer screen was compared to paper in a laboratory controlled study. Computer-based animated score tracking devices of three types were tested, as well as a static screen representation of the music text and its paper-based counterpart. A proof-reading exercise was given to subjects, which involved them listening to pieces of music and identifying intentional errors in the score. Their subjective views were also recorded. No significant difference between the five presentation styles were apparent in the proofreading study. However, subjects showed a significant preference for animation over paper and static representation. The most popular style of animation was where each note on the score was marked in time to the music. The medium of paper performed better overall than the static screen representation.  相似文献   

16.

Little attention has focused so far on evaluating the success of online communities. This paper begins to identify some key determinants of sociability and usability that help to determine their success. Determinants of sociability include obvious measures such as the number of participants in a community, the number of messages per unit of time, members' satisfaction, and some less obvious measures such as amount of reciprocity, the number of on-topic messages, trustworthiness and several others. Measures of usability include numbers of errors, productivity, user satisfaction and others. The list is not exhaustive but it is intended to provide a starting point for research on this important topic that will lead to develop of metrics. To avoid creating false impressions it is advisable to use several measures and to triangulate with qualitative data, particularly from ethnographic studies.  相似文献   

17.

According to narratology or narrative theory, a piece of artwork should tell a story based on its various tensions. In this study, an automated music composition algorithm using musical tension energy was proposed; this algorithm can generate a musical piece by changing the musical tension. The proposed innovative Algorithmic Composition Musical Tension Energy (ACMTE) method uses the level of musical tension; this level is determined primarily by the chord progression and also the musical parameters of pitch interval and rhythm. The effects of musical tension energy on those parameters were analyzed. This paper presents a formula that unifies all generated parts. The experimental results demonstrate that thousands of beautiful pieces can easily be made without the use of a music database. This algorithmic composition method can be easily applied in both streaming media and to portable music devices, such as smart phones, notebooks, and MP3 players.

  相似文献   

18.
《Ergonomics》2012,55(10):1504-1505
This study investigated whether gradual or abrupt music change towards more calming music is most effective in calming drivers during high-demand driving situations. Twenty-eight participants were subjected to two types of music change (gradual, abrupt) in a within-subject design. First, a relatively happy mood was induced with personally selected music during an eight-minute simulated high-demand drive. The drive then continued and the mood was changed either gradually or abruptly. Subjective results showed successful music mood induction irrespective of gradual or abrupt changes. The results further showed lower skin conductance (less arousal) and more facial corrugator muscle tension (more sadness) during the abrupt music change. Fewer accidents occurred during the abrupt music mood change. To conclude, the results support the abrupt way of changing music type to down-regulate one's mood: during high-demand driving, abrupt changes in music led to more physiological calmness and improved driving performance, and were thus safer and more effective.

Practitioner Summary: The current study shows that during high-demand drives, drivers are calmed more effectively using abrupt music changes compared to gradual music changes. This is illustrated by reductions in physiological arousal and improved driving behaviour. Hence, in-car music presentation can be used as a tool to improve driver's mood and behaviour.  相似文献   

19.
Li  Juan  Luo  Jing  Ding  Jianhang  Zhao  Xi  Yang  Xinyu 《Multimedia Tools and Applications》2019,78(9):11563-11584

Music regional classification, which is an important branch of music automatic classification, aims at classifying folk songs according to different regional style. Chinese folk songs have developed various regional musical styles in the process of its evolution. Regional classification of Chinese folk songs can promote the development of music recommendation systems which recommending proper style of music to users and improve the efficiency of the music retrieval system. However, the accuracy of existing music regional classification systems is not high enough, because most methods do not consider temporal characteristics of music for both features extraction and classification. In this paper, we proposed an approach based on conditional random field (CRF) which can fully take advantage of the temporal characteristics of musical audio features for music regional classification. Considering the continuity, high dimensionality and large size of the audio feature data, we employed two ways to calculate the label sequence of musical audio features in CRF, which are Gaussian Mixture Model (GMM) and Restricted Boltzmann Machine (RBM). The experimental results demonstrated that the proposed method based on CRF-RBM outperforms other existing music regional classifiers with the best accuracy of 84.71% on Chinese folk songs datasets. Besides, when the proposed methods were applied to the Greek folk songs dataset, the CRF-RBM model also performs the best.

  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号