Abstract: | This paper addresses the problem of unsupervised speech separation based on robust non‐negative matrix factorization (RNMF) with β‐divergence, when neither speech nor noise training data is available beforehand. We propose a robust version of non‐negative matrix factorization, inspired by the recently developed sparse and low‐rank decomposition, in which the data matrix is decomposed into the sum of a low‐rank matrix and a sparse matrix. Efficient multiplicative update rules to minimize the β‐divergence‐based cost function are derived. A convolutional extension of the proposed algorithm is also proposed, which considers the time dependency of the non‐negative noise bases. Experimental speech separation results show that the proposed convolutional RNMF successfully separates the repeating time‐varying spectral structures from the magnitude spectrum of the mixture, and does so without any prior training. |