Nuit Blanche pointed me toward the interesting paper “Re-Weighted L1 Dynamic Filtering for Time-Varying Sparse Signal Estimation” by Charles and Rozell (2012). They investigate the generalized Kalman filter situation which is a hidden Markov model with an infinite number of states and real valued observations. More specifically
$$x_n = f_n (x_{n-1}) + v_n$$
$$y_n = G_n x_{n-1} + \epsilon_n$$
where $x_n$ is the state, $f_n$ is the known state transition function, $y_n$ is a real valued observation vector, $G_n$ is the known observation matrix (full rank), $v_n$ is the process vector noise (not necessarily Gaussian), and $\epsilon_n$ is the observation noise vector (once again, not necessarily Gaussian). The object of the game is to estimate $x_n$ given the transition function, the observation matrices, and the observations $y_n$.
They sparsely represent the state vector $x_n$ is by $x_n = W z_n$ where $W$ is a matrix and $z_n$ is a vector that is mostly zeros. In order to achieve a sparse representation, a L1 norm regularization term is applied. This regularization usually corresponds to a Laplacian prior as in Bias pursuit denoising or LASSO regression. As with neural nets, sparsity has had an impact in many fields. They write
In particular, outside of the dynamic estimation regime, signal priors based on the notion of sparsity have led to state-of-the-art performance in many linear inverse problems (e.g., undersampling, inpainting, denoising, compressed sensing, etc. [3]) across several different application domains (e.g. natural images [3], audio [4], hyperspectral imagery [5], etc.).
The previous work integrating sparsity models into dynamic estimation has not yet followed the spirit
of the successful Kalman filter by propagating higher-order statistics from one time step to the next in a
way that is well-matched to the assumptions of the problem.As is pointed out in [8], propagating a covariance matrix is no longer necessarily the correct or optimal method once the
assumptions of the Kalman filter are not met. Instead, such a matrix is no more special than a simple parameter matrix, which
may or may not assist in the signal estimation.”
The paper includes several experimental results in section IV.