Generative adversarial networks (GANs) are paid more attention to dealing with the end-to-end speech enhancement in recent years. Various GAN-based enhancement methods are presented to improve the quality of reconstructed speech. However, the performance of these GAN-based methods is worse than those of masking-based methods. To tackle this problem, we propose speech enhancement method with a residual dense generative adversarial network (RDGAN) contributing to map the log-power spectrum (LPS) of degraded speech to the clean one. In detail, a residual dense block (RDB) architecture is designed to better estimate the LPS of clean speech, which can extract rich local features of LPS through densely connected convolution layers. Meanwhile, sequential RDB connections are incorporated on various scales of LPS. It significantly increases the feature learning flexibility and robustness in the time-frequency domain. Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments. Specifically, in the untrained acoustic test with limited priors, e.g., unmatched signal-to-noise ratio (SNR) and unmatched noise category, RDGAN can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes. It indicates that our method is more generalized in untrained conditions. 相似文献
Most existing vision-language pre-training methods focus
on understanding tasks and use BERT-like loss functions (masked language
modeling and image-text matching) during pre-training. Despite their good
performance in the understanding of downstream tasks, such as visual
question answering, image-text retrieval, and visual entailment, these
methods cannot generate information. To tackle this problem, this study
proposes Unified multimodal pre-training for Vision-Language understanding
and generation (UniVL). The proposed UniVL is capable of handling both
understanding tasks and generation tasks. It expands existing pre-training
paradigms and uses random masks and causal masks simultaneously, where
causal masks are triangular masks that mask future tokens, and such
pre-trained models can have autoregressive generation abilities. Moreover, several vision-language understanding tasks are turned into text generation
tasks according to specifications, and the prompt-based method is employed
for fine-tuning of different downstream tasks. The experiments show that
there is a trade-off between understanding tasks and generation tasks when
the same model is used, and a feasible way to improve both tasks is to use
more data. The proposed UniVL framework attains comparable performance to
recent vision-language pre-training methods in both understanding tasks and
generation tasks. Moreover, the prompt-based generation method is more
effective and even outperforms discriminative methods in few-shot scenarios. 相似文献
This paper proposes a sequential design scheme for switching ℌ∞ LPV (Linear Parameter-Varying) control, aiming to reduce the computational complexity of the associated optimization problem. Different from the traditional approach that simultaneously designs switching LPV controllers and solves a high-dimensional optimization problem, the proposed sequential design approach renders a bundle of low-dimensional optimization problems to be solved iteratively. Individual ℌ∞ LPV controller for each subregion is synthesized by independent PLMIs (Parametric Linear Matrix Inequalities) to guarantee ℌ∞ performance, and controller variables are interpolated on the overlapped subregions such that the ℌ∞ performance is also guaranteed on the overlapped subregion. Numerical examples are used to demonstrate the effectiveness of this method to reduce the computational load in each design iteration and improved ℌ∞ performance over the conventional simultaneous design method with well-tuned interpolation coefficient.