The task of audio and music generation in the waveform domain has become possible due to recent advances in deep learning. Generative Adversarial Networks (GANs) are a type of generative model that has achieved success in areas such as image, video and audio generation. However, realistic audio generation with GANs is still a challenge, thanks to the specific characteristics inherent to this kind of data. In this paper we propose a GAN model that employs the self-attention mechanism and produces small chunks of music conditioned by instrument. We compare our model to a baseline and run ablation studies in order to demonstrate its superiority. We also suggest some applications of the model, particularly in the area of computer assisted composition.
相似文献This paper presents Scene2Wav, a novel deep convolutional model proposed to handle the task of music generation from emotionally annotated video. This is important because when paired with the appropriate audio, the resulting music video is able to enhance the emotional effect it has on viewers. The challenge lies in transforming the video to audio domain and generating music. Our proposed encoder Scene2Wav uses a convolutional sequence encoder to embed dynamic emotional visual features from low-level features in the colour space, namely Hue, Saturation and Value. The decoder Scene2Wav is a proposed conditional SampleRNN which uses that emotional visual feature embedding as condition to generate novel emotional music. The entire model is fine-tuned in an end-to-end training fashion to generate a music signal evoking the intended emotional response from the listener. By taking into consideration the emotional and generative aspect of it, this work is a significant contribution to the field of Human-Computer Interaction. It is also a stepping stone towards the creation of an AI movie and/or drama director, which is able to automatically generate appropriate music for trailers and movies. Based on experimental results, this model can effectively generate music that is preferred to the user when compared to the baseline model and able to evoke correct emotions.
相似文献Video plays a great important role in online apparel sales, which is a vital tool for publicity and to provide consumers with space of imagination. However, as the apparel market rapidly updates in large amounts every day, creating videos for fast increasing clothes can be challenging and labor-consuming. Considering this, we present ApVideor, a music-driven video generation system customized for displaying clothes. This system consists of two main modules: music recommendation module and audio-visual synthesis module. The former assists users in searching background music that matches the apparel style, while the latter combines the audio and visuals into a video by music-driven approaches. Our user study suggests that this system makes the video creation process significantly easier and faster than manual creation. Meanwhile, the viewer test suggests that apparel-displaying videos created using our system are of comparable quality to those created manually by people who have worked with video editing.
相似文献Knowledge Graph has attracted a wide range of attention in the field of recommendation, which is usually applied as auxiliary information to solve the problem of data sparsity. However, most recommendation models cannot effectively mine the associations between the items to be recommended and the entities in the Knowledge Graph. In this paper, we propose CAKR, a knowledge graph recommendation method based on the cross attention unit, which is similar to MKR, a multi-task feature learning general framework that uses knowledge graph embedding tasks to assist recommendation tasks. Specifically, we design a new method to optimize the feature interaction between the items and the corresponding entities in the Knowledge Graph and propose a feature cross-unit combined with the attention mechanism to enhance the recommendation effect. Through extensive experiments on the public datasets of movies, books, and music, we prove that CAKR is better than MKR and other knowledge graph recommendation methods so that the new feature cross-unit designed in this paper is effective in improving the accuracy of the recommendation system.
相似文献Since many years ago, musicians have composed music based on the images that they have had in their minds. On the other hand, music affects people’s imagination while hearing it. This research provides a method that can transform shape to music and music to shape. This method defines musical notations for horizontal, diagonal and vertical line segments, filled circle and curve with different colors, which are the basis of many shapes in transforming shapes into music. Then these primary mappings are generalized to more complex forms to transform any shape. Moreover, music can be transformed into shape by this method. For this transformation, primary musical notations such as simple notes, notes joined by a legato, notes with a staccato, notes joined by a legato and have crescendo or decrescendo and notes with an accent or a trill are defined. These primary musical notations are generalized to more complex forms to transform any music into shape. Also, the method of this research can be used in music cryptography. It employs mapping of notes in a twelve-tone equal musical system into shapes and mappings of shapes with an equal line width and different colors into music.
相似文献Color music has attracted great interest in real applications. However, the mismatching problem between music and color has not been resolved. This paper proposes a novel mechanism to map the corresponding relation between music and color, which is embedded into a device with a micro-processor to play music color flashing. The proposed mechanism deduces perfect fifth relation among the wavelengths of lights and determines 12 colors corresponding to musical notes of Twelve-tone equal temperament is determined. Specifically, when a piece of music is playing, the audio signal is sampled and transformed by Fast Fourier Transform (FFT). The method can judge color corresponding to a note and research the mixed light effect of RGB LED driven by PWM outputs. Extended experiments show that the effect of music playing with matching colors flashing in real time is reached, and the color of the mixed lights can automatically match with arbitrary music being played. The paper can reveal relationships between music and color from the perspective of frequency spectrum, and promote the development of the color music, which has broad applications.
相似文献In the past decades, a large number of music pieces are uploaded to the Internet every day through social networks, such as Last.fm, Spotify and YouTube, that concentrates on music and videos. We have been witnessing an ever-increasing amount of music data. At the same time, with the huge amount of online music data, users are facing an everyday struggle to obtain their interested music pieces. To solve this problem, music search and recommendation systems are helpful for users to find their favorite content from a huge repository of music. However, social influence, which contains rich information about similar interests between users and users’ frequent correlation actions, has been largely ignored in previous music recommender systems. In this work, we explore the effects of social influence on developing effective music recommender systems and focus on the problem of social influence aware music recommendation, which aims at recommending a list of music tracks for a target user. To exploit social influence in social influence aware music recommendation, we first construct a heterogeneous social network, propose a novel meta path-based similarity measure called WPC, and denote the framework of similarity measure in this network. As a step further, we use the topological potential approach to mine social influence in heterogeneous networks. Finally, in order to improve music recommendation by incorporating social influence, we present a factor graphic model based on social influence. Our experimental results on one real world dataset verify that our proposed approach outperforms current state-of-the-art music recommendation methods substantially.
相似文献![点击此处可从《计算机科学技术学报》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Learning music has been demonstrated to provide many benefits for children. However, music students, especially beginners, often suffer from lack of motivation and even can be frustrated if their musical skills do not improve as they practice over and over. In such situations, they usually end up dropping out of music school. To face this challenge, in this work a novel approach based on mixed reality and gamification is proposed to motivate music students. This approach has been validated thanks to HoloMusic XP, a multimedia tool that helps students learn music and piano. The devised architecture that supports HoloMusic XP has been designed and developed to scale when new music concepts must be addressed. Thanks to the use of mixed reality, the usually steep learning curve for beginner students can be mitigated and complex music concepts can be simplified due to the use of visual metaphors. The system has been evaluated in a real environment by teachers and students to measure its effectiveness and usability. After conducting the experiments, an increase in the students’ motivation and a general understanding of the multimedia representation have been achieved.
相似文献Music categorization based on acoustic features extracted from music clips and user-defined tags forms the basis of recent music recommendation applications, because relevant tags can be automatically assigned based on the feature values and their relation to tags. In practice, especially for handheld lightweight mobile devices, there is a certain limitation on the computational capacity, owing to consumers’ usage behavior or battery consumption. This also limits the maximum number of acoustic features to be extracted, and results in the necessity of identifying a compact feature subset that is used for the music categorization process. In this study, we propose an approach to compact feature subset-based multi-label music categorization for mobile music recommendation services. Experimental results using various multi-labeled music datasets reveal that the proposed approach yields better performance when compared to conventional approach.
相似文献According to narratology or narrative theory, a piece of artwork should tell a story based on its various tensions. In this study, an automated music composition algorithm using musical tension energy was proposed; this algorithm can generate a musical piece by changing the musical tension. The proposed innovative Algorithmic Composition Musical Tension Energy (ACMTE) method uses the level of musical tension; this level is determined primarily by the chord progression and also the musical parameters of pitch interval and rhythm. The effects of musical tension energy on those parameters were analyzed. This paper presents a formula that unifies all generated parts. The experimental results demonstrate that thousands of beautiful pieces can easily be made without the use of a music database. This algorithmic composition method can be easily applied in both streaming media and to portable music devices, such as smart phones, notebooks, and MP3 players.
相似文献Music regional classification, which is an important branch of music automatic classification, aims at classifying folk songs according to different regional style. Chinese folk songs have developed various regional musical styles in the process of its evolution. Regional classification of Chinese folk songs can promote the development of music recommendation systems which recommending proper style of music to users and improve the efficiency of the music retrieval system. However, the accuracy of existing music regional classification systems is not high enough, because most methods do not consider temporal characteristics of music for both features extraction and classification. In this paper, we proposed an approach based on conditional random field (CRF) which can fully take advantage of the temporal characteristics of musical audio features for music regional classification. Considering the continuity, high dimensionality and large size of the audio feature data, we employed two ways to calculate the label sequence of musical audio features in CRF, which are Gaussian Mixture Model (GMM) and Restricted Boltzmann Machine (RBM). The experimental results demonstrated that the proposed method based on CRF-RBM outperforms other existing music regional classifiers with the best accuracy of 84.71% on Chinese folk songs datasets. Besides, when the proposed methods were applied to the Greek folk songs dataset, the CRF-RBM model also performs the best.
相似文献