首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Steganography is a very useful technique which aims at preventing loss of privacy during the process of data communication, especially over the internet. It can involve different forms of media like image, video (i.e., image sequence), audio etc. We propose a novel steganographic approach in spatial domain using pixel value differencing (PVD) or sample value differencing (SVD) technique and Galois field (GF (28)) operations in order to provide a two layered security for hiding message bits. Our method not only has a very high embedding capacity, but is also capable of withstanding statistical attacks. The proposed method embeds from 2 to 6 bits of the message per pixel in each image component, whereas it can embed a minimum of 6 bits and a maximum of 13 bits of message per sample in audio component at the expense of no perceivable distortion and loss of the cover media quality.

  相似文献   

2.
Abstract

While rich support for a wide variety of media such as text, video and image is common among contemporary hypermedia systems, so too is the inadequate support for audio. The primary reason that audio has not attracted as much attention as other media can be attributed to its obvious lack of visual identity. The main focus of this work was to identify a generic and meaningful visual representation of audio within a hypermedia context, and significantly promote hypermedia support for audio through the provision of a sound viewer.

This paper describes the inherent difficulties in providing a consistent interface to audio, and discusses in some depth the issues raised during the development process. The sound viewer is then introduced and the associated concepts described. The creation and traversal of links to and from audio are facilitated by the sound viewer across formats including WAV (proprietary digital sound file format from Microsoft), CD (Compact Disc) Audio and MIDI (Musical Instrument Digital Interface). The resultant viewer provides a unified and extensible framework for interacting with audio from within an open hypermedia environment. The open hypermedia system Microcosm was used as the development platform for this work. Microcosm can be augmented to supply a hypermedia link service to additional media with minimal overhead.  相似文献   

3.

Recently, with the advent of Convolutional Neural Network (CNN) era, Neural style transfer on images has become a very active research topic and the style of an image can be transferred to another image through a CNN so that the image retains both its own content and another style of image. In this work, we propose an algorithm for audio style transfer that uses the force of CNN to generate a new audio from a style audio. We use Continuous Wavelet Transfer(CWT) to convert the audio into a spectrogram and then use the spectrogram as the representation of the audio image through image style transfer method to obtain a new image, and finally, generate an audio using iterative phase reconstruction with Griffin-Lim. We succeed in transferring audio such as light music but had difficulty in transferring audio that has lyrics and high-level metrics such as emotion or tone. We propose several measures to improve the quality of audio and a lot of experimental results shows that our method is better than other methods in terms of sound quality.

  相似文献   

4.

The task of audio and music generation in the waveform domain has become possible due to recent advances in deep learning. Generative Adversarial Networks (GANs) are a type of generative model that has achieved success in areas such as image, video and audio generation. However, realistic audio generation with GANs is still a challenge, thanks to the specific characteristics inherent to this kind of data. In this paper we propose a GAN model that employs the self-attention mechanism and produces small chunks of music conditioned by instrument. We compare our model to a baseline and run ablation studies in order to demonstrate its superiority. We also suggest some applications of the model, particularly in the area of computer assisted composition.

  相似文献   

5.

It is a necessity to protect sensitive information in digital form from an adversary who may indulge in cyber-crimes such as modification, masquerading, and replaying of data. Security systems designed to counter such attacks must keep abreast of the adversary. In this paper, we have proposed a novel multi-image crypto-stego technique using Rabin cryptosystem and Arnold transform that provides a mechanism to hide digital data in the form of text, image, audio, and video. The proposed technique is a novel approach for (n,n) secret sharing that prevents attack by an intruder impersonating as a shareholder. In the proposed technique, the header information is created to retrieve data in the correct order. Randomized encrypted data and partial header information are camouflaged in the edges of multiple images in an adaptive manner. Minimal and distribution sequence keys distribute data in shares. Experimental results yield high values of PSNR and low values of MSE for the audio, image, video signals. Further, as the entropy values for original cover image coincide with the crypto-stego image up to the third place of decimal, the secret message will go unnoticed. Sensitivity analysis reveals that even a minor variation in a single share makes the recovery of the secret message infeasible. Comparison with the state of the art techniques indicates that the proposed technique either scores over its competitors or performs equally well in terms of standard evaluation metrics.

  相似文献   

6.
Liu  Caifeng  Feng  Lin  Liu  Guochao  Wang  Huibing  Liu  Shenglan 《Multimedia Tools and Applications》2021,80(5):7313-7331

Music genre classification based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of the existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audio and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the multi-scale time-frequency information into considerations, which transfers more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. The experiments are evaluated on the benchmark datasets including GTZAN, Ballroom, and Extended Ballroom. The experimental results show that the proposed method can achieve 93.9%, 96.7%, 97.2% classification accuracies respectively, which to the best of our knowledge, are the best results on these public datasets so far. It is notable that the trained model by our proposed network possesses tiny size, only 0.18M, which can be applied in mobile phones or other devices with limited computational resources. Codes and model will be available at https://github.com/CaifengLiu/music-genre-classification.

  相似文献   

7.
8.
A model of a digital image as a sequence of its mappings to a sum of invariant and variable representations is proposed. The invariant representation is independent of the changes in the variable representation in certain calculated limits and is interpreted as a virtual carrier of the variable representation (message). The information quantity (natural, intrinsic, and noise) is interpreted sensu stricto with respect to the quantity of the arbitrary digital data of the message transferred by the image. A characteristic application involves the solution of the problem on the embedding of digital information into the image via two independent communication channels. Mikhail V. Kharinov. Born 1953. Graduated from Leningrad State University in 1978. Received candidate’s degree in 1993. Senior researcher at the St. Petersburg Institute of Information Science and Automation, Russian Academy of Sciences. Scientific interests: analysis of numerical information, system of numerical representation, hierarchical data structures, idempotent transformations, unified algorithms for processing of images and audio signals, color transformation of images. Author of 60 papers, including a patent.  相似文献   

9.
Abstract

In order to determine whether videophones are appropriate communication tools for psychometric assessments, we need to determine whether the quality of videophones is adequate to enable this type of assessment or whether it places a burden on the communication. The purpose of this study is to measure the subjective quality of video and audio features of commercially available videophones in the context of a psychometric assessment session. We recruited 52 subjects who used the videophone to participate in a psychometric assessment using the Perceived Stress Scale. After each session, participants filled out the ITU-T P.920 that assesses the context-specific quality of the video-call. Findings indicate that the overall audio and image quality of the video-call was satisfactory and participants perceived the videophones as useful in the context of psychometric assessment. These findings strengthen the call for use of video mediated communication in home and hospice settings and disease management.  相似文献   

10.

Communication is termed as exchanging the information (audio, video, text and image) from one end (transmitter) to another end (receiver). When video data are compressed and transmitted to another side, compression reduces the bandwidth size and memory required to transmit the video. Some traditional techniques are used in video transmission but it includes drawbacks, such as more compression time and low quality due to compression. To overcome these drawbacks the MPEG7-MBBMC (Modified Block Based Motion Compensated) technique is developed. Here the input video signals are collected from the dataset and the signals are splitted into three bands. Discrete Wavelet Transform (DWT) is applied for each band and quantization process occurs. The DWT and quantization process are applied in the MPEG7 compression, which offers high compression factors. Next, encoder is applied to convert the packets into small packets by using modified block based motion compensated (MBBMC) technique. The Motion compensation establishes a correspondence between elements of nearby images in the video sequence. The Forward Error Correction (FEC) is used to reduce the distortion in the encoder video packet. Then the Channel Pattern Integration (CPI) is applied to find the best available channel. The encoded video packets are transmitted by the best available channel. In receiver side the error correction code is applied to decode the video packets and reconstructs the decoded packet by decompression. It improves the quality of the video and in future it will help for much development in the field of multimedia.

  相似文献   

11.
ABSTRACT

Sketching, a dimensionality reduction technique, has received much attention in the statistics community. In this paper, we study sketching in the context of Newton's method for solving finite-sum optimization problems in which the number of variables and data points are both large. We study two forms of sketching that perform dimensionality reduction in data space: Hessian subsampling and randomized Hadamard transformations. Each has its own advantages, and their relative tradeoffs have not been investigated in the optimization literature. Our study focuses on practical versions of the two methods in which the resulting linear systems of equations are solved approximately, at every iteration, using an iterative solver. The advantages of using the conjugate gradient method vs. a stochastic gradient iteration are revealed through a set of numerical experiments, and a complexity analysis of the Hessian subsampling method is presented.  相似文献   

12.

The fast development of communication and technology has created new challenges to transfer data securely. The techniques widely used to secure the data are cryptography and steganography. This paper presents a video steganography method to secure the information to be transmitted. Information transmitted can be an image, audio, text or video. This article presents a new technique that embeds data in the spatial domain of the cover video frame. The method employs chaotic maps to generate Random Positions (RP) to hide the information bits, random numbers for selecting the frames at which the information to be hidden and confusion order to encrypt the cover frame. The video frame is first selected based on Frame Selection (FS) is encrypted by applying Confusion Order (CO) and then embedding is carried out on the random positions generated. After embedding, the decrypted cover frame is replaced in a video sequence for transmission. This method provides three-level security in extracting the hidden secret information and also 25% of embedding capacity. Experimental outcomes (PSNR and payload) confirm that the method is competent.

  相似文献   

13.

The Secret Sharing Scheme plays a vital role in cryptography which allows to transmit the secret digital information (image, video, audio, handwriting, etc.,) over a communication channel. This cryptographic technique involves encrypting the secret images into noisy shares and transmitted. The transmitted image shares are reconstructed using simple logical computation. In this paper, we propose a secure (n, n)- Multi-Secret-Sharing (MSS) scheme using image scrambling algorithm which is based on the logistic chaotic sequence generated using the secret key which is retrieved from the geometric pattern named as spirograph which drawn by the users with their private values. Also, decomposition and recombination of image pixels which points to change the position and values of the pixels. The experimental results estimate that the standard metrics NPCR, UACI, Entropy, Coefficient Correlation values proves the rigidness of the implemented algorithm.

  相似文献   

14.
ABSTRACT

Medical image watermarking has been widely recognized as a relevant technique for enhancing data security, image fidelity, authenticity and content verification in the current e-health environment where medical images are stored, retrieved and transmitted over networks. Medical image watermarking preserves image quality that is mandatory for medical diagnosis and treatment. The present paper highlights essential needs of medical image watermarking with a review of developments since 2000 and simulated experiments to demonstrate the significance of watermarking in medical information management.  相似文献   

15.

Authenticating the veracity and integrity of digital media content is the most important application of fragile watermarking technique. Recently, fragile watermarking schemes for digital audio signals are developed to not only detect the malicious falsification, but also recover the tampered audio content. However, they are fragile against synchronization counterfeiting attack, which greatly narrows the applicability of audio watermarking schemes. In this paper, a novel source coding scheme for authenticating audio signal based on set partitioning in hierarchical trees (SPIHT) encoding and chaotic dynamical system with capability of self-recovery and anti-synchronization counterfeiting attack is proposed. For self-recovery feature, the compressed version of audio signal generated by SPIHT source coding and protected against maliciously tampering by repeated coding is embedded into the original audio signal. Besides, for robustness against synchronization counterfeiting attack feature, based on the position and content of audio section, check bits are generated by Hash algorithm and chaotic sequence, and taken as part of fragile watermark. Simulation results show the self-embedding audio authentication scheme is recoverable with proper audio quality, and it has capability against synchronization counterfeiting attack.

  相似文献   

16.
ABSTRACT

In modern electronic communication, ensuring security during transferring of data is crucial. Digital steganography is the technique that ensures the same, in which any secret data is hidden into the cover mediums such as images, audio or video files. Resistance to removal and invisibility of hidden data are the two important requirements of any steganographic system. This paper proposes a new approach to hide any secret data in images. Also, this paper proposes a new approach which hides an executable file in images. The executable files we have considered are Windows PE (Portable-Executable) files. Moreover from the experimental results, it is proven that the proposed approach is suitable for hiding all type of digital files including exe files. Also, comparing with the existing approaches, this approach shows better performance.  相似文献   

17.

We describe an artificial high-level vision system for the symbolic interpretation of data coming from a video camera that acquires the image sequences of moving scenes. The system is based on ARSOM neural networks that learn to generate the perception-grounded predicates obtained by image sequences. The ARSOM neural networks also provide a three-dimensional estimation of the movements of the relevant objects in the scene. The vision system has been employed in two scenarios: the monitoring of a robotic arm suitable for space operations, and the surveillance of an electronic data processing (EDP) center.  相似文献   

18.
ABSTRACT

Embedding a hidden stream of bits in a cover file to prevent illegal use is called digital watermarking. The cover file could be a text, image, video, or audio. In this study, we propose invisible watermarking based on the text included in a webpage. Watermarks are based on predefined structural and syntactic rules, which are encrypted and then converted into zero-width control characters using binary model classification before embedding into a webpage. This concept means that HTML (Hyper Text Markup Language) is used as a cover file to embed the hashed and transparent zero-width watermarks. We have implemented the proposed invisible watermarking against various attacks to reach optimum robustness.  相似文献   

19.
Because of the media digitization, a large amount of information such as speech, audio and video data is produced everyday. In order to retrieve data from these databases quickly and precisely, multimedia technologies for structuring and retrieving of speech, audio and video data are strongly required. In this paper, we overview the multimedia technologies such as structuring and retrieval of speech, audio and video data, speaker indexing, audio summarization and cross media retrieval existing today for TV news detabase. The main purpose of structuring is to produce tables of contents and indices from audio and video data automatically. In order to make these technologies feasible, first, processing units such as words on audio data and shots on video data are extracted. On a second step, they are meaningfully integrated into topics. Furthermore, the units extracted from different types of media are integrated for higher functions. Yasuo Ariki, Ph.D.: He is a Professor in the Department of Electronics and Informatics at the Ryukoku University. He received his B.E., M.E. and Ph.D. in information science from Kyoto University in 1974, 1976 and 1979, respectively. He had been an Assistant in Kyoto University from 1980 to 1990, and stayed at Edinburgh University as visiting academic from 1987 to 1990. His research interests are in speech and image recognition and in information retrieval and database. He is a member of IPSJ, IEICE, ASJ, Soc. Artif. Intel. and IEEE.  相似文献   

20.
As the amount of multimedia data is increasing day-by-day thanks to cheaper storage devices and increasing number of information sources, the machine learning algorithms are faced with large-sized datasets. When original data is huge in size small sample sizes are preferred for various applications. This is typically the case for multimedia applications. But using a simple random sample may not obtain satisfactory results because such a sample may not adequately represent the entire data set due to random fluctuations in the sampling process. The difficulty is particularly apparent when small sample sizes are needed. Fortunately the use of a good sampling set for training can improve the final results significantly. In KDD’03 we proposed EASE that outputs a sample based on its ‘closeness’ to the original sample. Reported results show that EASE outperforms simple random sampling (SRS). In this paper we propose EASIER that extends EASE in two ways. (1) EASE is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. EASIER, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. (2) EASE was shown to work on IBM QUEST dataset which is a categorical count data set. EASIER, in addition, is shown to work on continuous data of images and audio features. We have successfully applied EASIER to image classification and audio event identification applications. Experimental results show that EASIER outperforms SRS significantly. Surong Wang received the B.E. and M.E. degree from the School of Information Engineering, University of Science and Technology Beijing, China, in 1999 and 2002 respectively. She is currently studying toward for the Ph.D. degree at the School of Computer Engineering, Nanyang Technological University, Singapore. Her research interests include multimedia data processing, image processing and content-based image retrieval. Manoranjan Dash obtained Ph.D. and M. Sc. (Computer Science) degrees from School of Computing, National University of Singapore. He has worked in academic and research institutes extensively and has published more than 30 research papers (mostly refereed) in various reputable machine learning and data mining journals, conference proceedings, and books. His research interests include machine learning and data mining, and their applications in bioinformatics, image processing, and GPU programming. Before joining School of Computer Engineering (SCE), Nanyang Technological University, Singapore, as Assistant Professor, he worked as a postdoctoral fellow in Northwestern University. He is a member of IEEE and ACM. He has served as program committee member of many conferences and he is in the editorial board of “International journal of Theoretical and Applied Computer Science.” Liang-Tien Chia received the B.S. and Ph.D. degrees from Loughborough University, in 1990 and 1994, respectively. He is an Associate Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. He has recently been appointed as Head, Division of Computer Communications and he also holds the position of Director, Centre for Multimedia and Network Technology. His research interests include image/video processing & coding, multimodal data fusion, multimedia adaptation/transmission and multimedia over the Semantic Web. He has published over 80 research papers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号