= 50e-3 * 16e3; N = 20; M = 60
n print(f"size(H) = {(2*n-1)*M*n*M * 4 / 1024 / 1024 / 1024:.1f} GB")
size(H) = 17.2 GB
October 7, 2024
The synthesis of room acoustic impulse responses with some desired properties using an array of loudspeakers is a typical problem in acoustics. One measures the room impulse responses from each loudspeaker over a reproduction area, and designs finite impulse-response filters for each loudspeaker that are optimized to create a certain target system-response as closely as possible. As the goal is to match the pressure field of the loudspeaker array with a given target, we will call idea pressure matching.
The pressure-matching problem can be cast as a simple linear Least-Squares problem \(\min_b ||Ax - b||\) with an enormous \(A\) matrix. In fact, it might be so large that it won’t fit into computer memory for scenarios with many sources, many measurement locations and long impulse responses.
A very common solution to the memory issue is to solve instead a related frequency-domain pressure-matching problem and hope that its solution will be a good subsitute for the original time-domain problem (Kirkeby et al. 1998). However, the two problems are not equivalent. The frequency-domain variant of the problem
The engineer will typically try to reduce these issues with modeling-delays, windowing of the responses, and tuning of the regularization parameter. This often leads to a helplessly manual fiddling process to get the solution right.
This note describes a simple, exact, principled solution to solving the pressure-matching problem in the time-domain using the tool of applied mathematics that is made for handling the problem of large matrices: iterative, matrix-free solvers.
We first formalize the problem in discrete time.
Let there be \(N\) sound sources, \(M\) measurement locations and a finite impulse-response input-filter \(w_j\) for each sound source. Let \(k\) be the current sample, \(x\) the input signal and \(y_{ij}\) the output of source \(j\) at location \(i\). The linear system \(x \mapsto y_{ij}\) is given by the convolution with \(w_j\) and \(h_{ij},\) \[ \require{physics} y_{ij}[k] = (h_{ij} * w_j * x)[k] =: (p_{ij} * x)[k] \] The total system output at a single location is the sum of pressure fields generated by each loudspeaker, \[ y_i[k] = \sum_j y_{ij}[k] = \left(\left[\sum_{j=0}^{N-1} h_{ij} * w_j \right] * x\right)[k] = \sum_j (p_{ij} * x)[k] =: (p_i * x)[k] . \] As the systems are linear, this holds for any signal \(x,\) and it is more convienient to only work with the impulse responses \[ p_i[k] = \sum_{j=0}^{N-1} (h_{ij} * w_j)[k]. \] The goal of pressure-matching is finding a suitable set \(\{w_j\}\) such that the system responses \(\{p_i\}\) are close to some target system responses \(\{\hat p_i\}\).
If \(h_{ij}\) and \(w_j\) are nonzero only over some window of length \(n\), they can be seen as \(n\)-dimensional vectors — or finite impulse-response filters of length \(n\). A convolution with \(h_{ij}\) then implements a linear map via a Toeplitz matrix \(H_{ij}\in \mathbb R^{(2n-1) \times n}\) where each row is a shifted copy of the convolutional kernel \(h_{ij}\). With that, we can write the system response at location \(i\) as \[ \underbrace{\mqty*(p_i[0] \\ p_i[1] \\ \vdots \\ p_i[2n-2])}_{p_i} = \sum_j \underbrace{\mqty*( h_{ij}[0] & 0 & \cdots & 0 \\ h_{ij}[1] & h_{ij}[0] & \ddots & \vdots \\ \vdots & h_{ij}[1] & \ddots & 0 \\ h_{ij}[n-1] & \vdots & \ddots & h_{ij}[0] \\ 0 & h_{ij}[n-1] & & h_{ij}[1] \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & h_{ij}[n-1] )}_{H_{i,j}} \underbrace{\mqty*( w_j[0] \\ w_j[1] \\ \vdots \\ w_j[n-1] )}_{w_j}. \] The pressure-matching problem for all measurement locations in block-matrix form is then \[ \mqty*(p_0 \\ p_1 \\ \vdots \\ p_{M-1}) = \mqty*( H_{0,0} & H_{0,1} & \cdots & H_{0,M-1} \\ H_{1,0} & H_{1,1} & \cdots & H_{1,M-1} \\ \vdots & \vdots & \ddots & \vdots \\ H_{N-1,0} & H_{N-1,1} & \cdots & H_{N-1,M-1} \\ ) \mqty*( w_0 \\ w_1 \\ \vdots \\ w_{N-1} ) \] or simply \(p = Hw\).
To find the filters \(w\) that result in the best approximation of the target response \(\hat p\) we solve the (possibly regularized) linear Least-Squares problem \[ \min_w ||\hat p - Hw||^2. \qquad(1)\]
The above \(H\) matrix of size \((2n-1)M \times nN\) can be enormous. For example, let’s say there are \(N=20\) sources, \(M=60\) measurement locations and impulse responses of length \(n=800\) (50 ms at 16 kHz samplerate) encoded as 32 bit floats. Then 17 GB memory are needed to store \(H\).
n = 50e-3 * 16e3; N = 20; M = 60
print(f"size(H) = {(2*n-1)*M*n*M * 4 / 1024 / 1024 / 1024:.1f} GB")
size(H) = 17.2 GB
The size even scales as \(\mathcal O(n^2),\) requiring at least 68 GB of memory for 100ms long impulse responses.
To solve Equation 1 on machines with limited memory, we use solvers that don’t require the materialization of \(H.\) Instead, algorithms like LSQR
and LSMR
only require subroutines for computing the matrix-vector products \(Hw\) and vector-matrix products \(pH^T\). This is the reason why such algorithms are called matrix-free.
Matrix-vector multiplication with \(H\) is straightforward and just involves computing the convolutions \(H_{ij} v_j = h_{ij} * v_j\) and summing over all sources. In Python, this operation is implemented by np.convolve(h_ij, v_j, mode="full")
.
Left multiplications with \(H^T\) (in this context called the adjoint operation) on the other hand involve products with matrices \[
H_{ij}^T
= \mqty*(
h_{ij}[0] & h_{ij}[1] & \cdots & h_{ij}[n-1] & 0 & \cdots & 0 \\
0 & h_{ij}[0] & h_{ij}[1] & \cdots & h_{ij}[n-1] & \ddots & \vdots \\
\vdots & \ddots & \ddots & \ddots & & \ddots & 0 \\
0 & \cdots & 0 & h_{ij}[0] & h_{ij}[1] & \cdots & h_{ij}[n-1] \\
) \in \mathbb R^{n \times (2n-1)}.
\] Looking closely, \(v_j H_{ij}^T\) represents the values of the discrete-time cross-correlation of \(v_j\) and \(h_{ij}\) where the two input signals overlap completely. In Python, np.correlate(v_j, h_ij, mode="valid")
computes this operation.
Here is an implementation of such a convolutional operator that is created from an \(n\times N \times M\) array that holds the measured impulse responses \(h_{ij}\). To use it with the SciPy solvers, we implement it as a sublcass of LinearOperator
from functools import reduce
import numpy as np
from scipy.sparse.linalg import LinearOperator
class ConvolutionOperator(LinearOperator):
def __init__(self, A, is_adjoint=False):
self.dtype = A.dtype
self.A = A
self.is_adjoint = is_adjoint
if is_adjoint:
t, m, n = A.shape
self.shape = (m * t, (t * 2 - 1) * n)
else:
t, n, m = A.shape
self.shape = ((t * 2 - 1) * n, m * t)
def _matvec(self, v):
_, n, m = self.A.shape
# Make sure _matvec accepts arrays of shape (_, 1) and (_,)
v = v.squeeze()
# Split the stacked vector
vs = np.split(v, m)
if self.is_adjoint:
op = lambda ATij, vj: np.correlate(vj, ATij, mode="valid")
else:
op = lambda Aij, vj: np.convolve(Aij, vj, mode="full")
ys = [
reduce(np.add, (op(self.A[:, i, j], vs[j]) for j in range(m)))
for i in range(n)
]
return np.concatenate(ys)
def _adjoint(self):
AT = np.transpose(self.A, (0, 2, 1))
return ConvolutionOperator(AT, is_adjoint=not self.is_adjoint)
The following functions solves the pressure-matching problem in Equation 1 using lsmr
.
from scipy.sparse.linalg import lsmr
def pressure_matching_time_domain(irs, target, reg=0, **kwargs):
"""Compute FIR filters via pressure matching in time-domain.
Solves the following regularized least-square problem:
min_w ||p - H*w||² + reg*||w||²
Where H_ij is the Toeplitz matrix that represents the impulse response
from source j to location i, the vectors p_i is the target responses at
location i, and the vector w_j is the FIR filter for sources j.
Parameters
----------
irs : np.ndarray[shape=(n, M, N), dtype=float]
Impulse responses such that `H_ij = irs[:, i, j]`
target : np.ndarray[shape=(n, M), dtype=float]
Target field such that `p_i = target[:, i]`
reg : float
Regularization parameter
Returns
-------
coeffs : np.ndarray[shape=(n, N), dtype=float]
FIR filter coefficients such that `w_j = coeffs[:, j]`
"""
n, M, N = irs.shape
assert target.shape == (n, M), "Check your dimensions"
# Zeropad target responses to length 2n-1
pi = np.concatenate((target, np.zeros((n - 1, M))))
# Stack along all microphones
p = np.concatenate(pi.T, axis=0)
H = ConvolutionOperator(irs)
w, istop, *_ = lsmr(H, p, damp=reg, **kwargs)
coeffs = w.reshape(N, -1).T
return coeffs
For comparison, here is a function that solves the related problem in frequency-domain:
def pressure_matching_frequency_domain(irs, target, reg=0):
"""Compute FIR filters via pressure matching in frequency-domain.
Solves the regularized least-square problem for each frequency:
min_{w(f)} ||p(f) - H(f)*w(f)||² + reg*||w(f)||²
where, at each frequency, H_ij is the complex gain of the transfer
function from source j to location i, p_i is the complex gain of
the target field at location i, and w_j is the complex gain of the
FIR filter of source j.
Parameters
----------
irs : np.ndarray[shape=(n, M, N), dtype=float]
Impulse responses such that `H_ij = irs[:, i, j]`
target : np.ndarray[shape=(n, M), dtype=float]
Target field at microphones such that `p_i = target[:, i]`
reg : float
Regularization parameter
Returns
-------
coeffs : np.ndarray[shape=(n, N), dtype=float]
FIR filter coefficients such that `w_j = coeffs[:, j]`
"""
n, _, N = irs.shape
assert target.shape == (n, M), "Check your dimensions"
# Transform to frequency-domain and zero-pad
H = np.fft.rfft(irs, n=2 * n - 1, axis=0)
p = np.fft.rfft(target, n=2 * n - 1, axis=0)
# Solve equivalent unregularized Least-Squares for each frequency
#
# min_w ||[[H ], [[p], ||^2
# || [sqrt(reg) * I]] * w - [0]] ||
#
nf = H.shape[0]
w = np.zeros((nf, N), dtype=complex)
sqrt_reg_times_I = np.sqrt(reg) * np.identity(N)
zero = np.zeros(N)
for f in range(nf):
A = np.concatenate((H[f], sqrt_reg_times_I))
b = np.concatenate((p[f], zero))
w[f], *_ = np.linalg.lstsq(A, b, rcond=None)
# Back to time-domain
return np.fft.irfft(w, n=2 * n - 1, axis=0)