On sound field control problems in time-domain

Published

October 7, 2024

Abstract

Don’t solve them in frequency domain.

The synthesis of room acoustic impulse responses with some desired properties using an array of loudspeakers is a typical problem in acoustics. One measures the room impulse responses from each loudspeaker over a reproduction area, and designs finite impulse-response filters for each loudspeaker that are optimized to create a certain target system-response as closely as possible. As the goal is to match the pressure field of the loudspeaker array with a given target, we will call idea pressure matching.

The pressure-matching problem can be cast as a simple linear Least-Squares problem \(\min_b ||Ax - b||\) with an enormous \(A\) matrix. In fact, it might be so large that it won’t fit into computer memory for scenarios with many sources, many measurement locations and long impulse responses.

A very common solution to the memory issue is to solve instead a related frequency-domain pressure-matching problem and hope that its solution will be a good subsitute for the original time-domain problem (Kirkeby et al. 1998). However, the two problems are not equivalent. The frequency-domain variant of the problem

Kirkeby, O., P. A. Nelson, H. Hamada, and F. Orduna-Bustamante. 1998. “Fast Deconvolution of Multichannel Systems Using Regularization.” IEEE Transactions on Speech and Audio Processing 6 (2, 2): 189–94. https://doi.org/10.1109/89.661479.

is not well posed and needs to be regularized,
has non-causal solutions that lead to artifacts like pre- and post.

The engineer will typically try to reduce these issues with modeling-delays, windowing of the responses, and tuning of the regularization parameter. This often leads to a helplessly manual fiddling process to get the solution right.

This note describes a simple, exact, principled solution to solving the pressure-matching problem in the time-domain using the tool of applied mathematics that is made for handling the problem of large matrices: iterative, matrix-free solvers.

Pressure-matching

We first formalize the problem in discrete time.

Let there be \(N\) sound sources, \(M\) measurement locations and a finite impulse-response input-filter \(w_j\) for each sound source. Let \(k\) be the current sample, \(x\) the input signal and \(y_{ij}\) the output of source \(j\) at location \(i\). The linear system \(x \mapsto y_{ij}\) is given by the convolution with \(w_j\) and \(h_{ij},\) \[ \require{physics} y_{ij}[k] = (h_{ij} * w_j * x)[k] =: (p_{ij} * x)[k] \] The total system output at a single location is the sum of pressure fields generated by each loudspeaker, \[ y_i[k] = \sum_j y_{ij}[k] = \left(\left[\sum_{j=0}^{N-1} h_{ij} * w_j \right] * x\right)[k] = \sum_j (p_{ij} * x)[k] =: (p_i * x)[k] . \] As the systems are linear, this holds for any signal \(x,\) and it is more convienient to only work with the impulse responses \[ p_i[k] = \sum_{j=0}^{N-1} (h_{ij} * w_j)[k]. \] The goal of pressure-matching is finding a suitable set \(\{w_j\}\) such that the system responses \(\{p_i\}\) are close to some target system responses \(\{\hat p_i\}\).

If \(h_{ij}\) and \(w_j\) are nonzero only over some window of length \(n\), they can be seen as \(n\)-dimensional vectors — or finite impulse-response filters of length \(n\). A convolution with \(h_{ij}\) then implements a linear map via a Toeplitz matrix \(H_{ij}\in \mathbb R^{(2n-1) \times n}\) where each row is a shifted copy of the convolutional kernel \(h_{ij}\). With that, we can write the system response at location \(i\) as \[ \underbrace{\mqty*(p_i[0] \\ p_i[1] \\ \vdots \\ p_i[2n-2])}_{p_i} = \sum_j \underbrace{\mqty*( h_{ij}[0] & 0 & \cdots & 0 \\ h_{ij}[1] & h_{ij}[0] & \ddots & \vdots \\ \vdots & h_{ij}[1] & \ddots & 0 \\ h_{ij}[n-1] & \vdots & \ddots & h_{ij}[0] \\ 0 & h_{ij}[n-1] & & h_{ij}[1] \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & h_{ij}[n-1] )}_{H_{i,j}} \underbrace{\mqty*( w_j[0] \\ w_j[1] \\ \vdots \\ w_j[n-1] )}_{w_j}. \] The pressure-matching problem for all measurement locations in block-matrix form is then \[ \mqty*(p_0 \\ p_1 \\ \vdots \\ p_{M-1}) = \mqty*( H_{0,0} & H_{0,1} & \cdots & H_{0,M-1} \\ H_{1,0} & H_{1,1} & \cdots & H_{1,M-1} \\ \vdots & \vdots & \ddots & \vdots \\ H_{N-1,0} & H_{N-1,1} & \cdots & H_{N-1,M-1} \\ ) \mqty*( w_0 \\ w_1 \\ \vdots \\ w_{N-1} ) \] or simply \(p = Hw\).

To find the filters \(w\) that result in the best approximation of the target response \(\hat p\) we solve the (possibly regularized) linear Least-Squares problem \[ \min_w ||\hat p - Hw||^2. \qquad(1)\]

The above \(H\) matrix of size \((2n-1)M \times nN\) can be enormous. For example, let’s say there are \(N=20\) sources, \(M=60\) measurement locations and impulse responses of length \(n=800\) (50 ms at 16 kHz samplerate) encoded as 32 bit floats. Then 17 GB memory are needed to store \(H\).

n = 50e-3 * 16e3; N = 20; M = 60
print(f"size(H) = {(2*n-1)*M*n*M * 4 / 1024 / 1024 / 1024:.1f} GB")

size(H) = 17.2 GB

The size even scales as \(\mathcal O(n^2),\) requiring at least 68 GB of memory for 100ms long impulse responses.

Solution via matrix-free solver

To solve Equation 1 on machines with limited memory, we use solvers that don’t require the materialization of \(H.\) Instead, algorithms like LSQR and LSMR only require subroutines for computing the matrix-vector products \(Hw\) and vector-matrix products \(pH^T\). This is the reason why such algorithms are called matrix-free.

Matrix-vector multiplication with \(H\) is straightforward and just involves computing the convolutions \(H_{ij} v_j = h_{ij} * v_j\) and summing over all sources. In Python, this operation is implemented by np.convolve(h_ij, v_j, mode="full").

Left multiplications with \(H^T\) (in this context called the adjoint operation) on the other hand involve products with matrices \[ H_{ij}^T = \mqty*( h_{ij}[0] & h_{ij}[1] & \cdots & h_{ij}[n-1] & 0 & \cdots & 0 \\ 0 & h_{ij}[0] & h_{ij}[1] & \cdots & h_{ij}[n-1] & \ddots & \vdots \\ \vdots & \ddots & \ddots & \ddots & & \ddots & 0 \\ 0 & \cdots & 0 & h_{ij}[0] & h_{ij}[1] & \cdots & h_{ij}[n-1] \\ ) \in \mathbb R^{n \times (2n-1)}. \] Looking closely, \(v_j H_{ij}^T\) represents the values of the discrete-time cross-correlation of \(v_j\) and \(h_{ij}\) where the two input signals overlap completely. In Python, np.correlate(v_j, h_ij, mode="valid") computes this operation.

Here is an implementation of such a convolutional operator that is created from an \(n\times N \times M\) array that holds the measured impulse responses \(h_{ij}\). To use it with the SciPy solvers, we implement it as a sublcass of LinearOperator

from functools import reduce

import numpy as np
from scipy.sparse.linalg import LinearOperator


class ConvolutionOperator(LinearOperator):

    def __init__(self, A, is_adjoint=False):
        self.dtype = A.dtype
        self.A = A
        self.is_adjoint = is_adjoint

        if is_adjoint:
            t, m, n = A.shape
            self.shape = (m * t, (t * 2 - 1) * n)
        else:
            t, n, m = A.shape
            self.shape = ((t * 2 - 1) * n, m * t)

    def _matvec(self, v):
        _, n, m = self.A.shape

        # Make sure _matvec accepts arrays of shape (_, 1) and (_,)
        v = v.squeeze()

        # Split the stacked vector
        vs = np.split(v, m)

        if self.is_adjoint:
            op = lambda ATij, vj: np.correlate(vj, ATij, mode="valid")
        else:
            op = lambda Aij, vj: np.convolve(Aij, vj, mode="full")

        ys = [
            reduce(np.add, (op(self.A[:, i, j], vs[j]) for j in range(m)))
            for i in range(n)
        ]

        return np.concatenate(ys)

    def _adjoint(self):
        AT = np.transpose(self.A, (0, 2, 1))
        return ConvolutionOperator(AT, is_adjoint=not self.is_adjoint)

The following functions solves the pressure-matching problem in Equation 1 using lsmr.

from scipy.sparse.linalg import lsmr


def pressure_matching_time_domain(irs, target, reg=0, **kwargs):
    """Compute FIR filters via pressure matching in time-domain.

    Solves the following regularized least-square problem:

        min_w  ||p - H*w||² + reg*||w||²

    Where H_ij is the Toeplitz matrix that represents the impulse response
    from source j to location i, the vectors p_i is the target responses at
    location i, and the vector w_j is the FIR filter for sources j.

    Parameters
    ----------
    irs : np.ndarray[shape=(n, M, N), dtype=float]
        Impulse responses such that `H_ij = irs[:, i, j]`
    target : np.ndarray[shape=(n, M), dtype=float]
        Target field such that `p_i = target[:, i]`
    reg : float
        Regularization parameter

    Returns
    -------
    coeffs : np.ndarray[shape=(n, N), dtype=float]
        FIR filter coefficients such that `w_j = coeffs[:, j]`

    """
    n, M, N = irs.shape
    assert target.shape == (n, M), "Check your dimensions"
    # Zeropad target responses to length 2n-1
    pi = np.concatenate((target, np.zeros((n - 1, M))))
    # Stack along all microphones
    p = np.concatenate(pi.T, axis=0)
    H = ConvolutionOperator(irs)
    w, istop, *_ = lsmr(H, p, damp=reg, **kwargs)
    coeffs = w.reshape(N, -1).T
    return coeffs

Solution to the frequency-domain variant

For comparison, here is a function that solves the related problem in frequency-domain:

def pressure_matching_frequency_domain(irs, target, reg=0):
    """Compute FIR filters via pressure matching in frequency-domain.

    Solves the regularized least-square problem for each frequency:

        min_{w(f)}  ||p(f) - H(f)*w(f)||² + reg*||w(f)||²

    where, at each frequency, H_ij is the complex gain of the transfer
    function from source j to location i, p_i is the complex gain of
    the target field at location i, and w_j is the complex gain of the
    FIR filter of source j.

    Parameters
    ----------
    irs : np.ndarray[shape=(n, M, N), dtype=float]
        Impulse responses such that `H_ij = irs[:, i, j]`
    target : np.ndarray[shape=(n, M), dtype=float]
        Target field at microphones such that `p_i = target[:, i]`
    reg : float
        Regularization parameter

    Returns
    -------
    coeffs : np.ndarray[shape=(n, N), dtype=float]
        FIR filter coefficients such that `w_j = coeffs[:, j]`

    """
    n, _, N = irs.shape
    assert target.shape == (n, M), "Check your dimensions"

    # Transform to frequency-domain and zero-pad
    H = np.fft.rfft(irs, n=2 * n - 1, axis=0)
    p = np.fft.rfft(target, n=2 * n - 1, axis=0)

    # Solve equivalent unregularized Least-Squares for each frequency
    #
    # min_w ||[[H            ],         [[p],  ||^2
    #       || [sqrt(reg) * I]]  *  w -  [0]]  ||
    #
    nf = H.shape[0]
    w = np.zeros((nf, N), dtype=complex)
    sqrt_reg_times_I = np.sqrt(reg) * np.identity(N)
    zero = np.zeros(N)
    for f in range(nf):
        A = np.concatenate((H[f], sqrt_reg_times_I))
        b = np.concatenate((p[f], zero))
        w[f], *_ = np.linalg.lstsq(A, b, rcond=None)

    # Back to time-domain
    return np.fft.irfft(w, n=2 * n - 1, axis=0)