SILO: Solving Inverse Problems with Latent Operators

tl;dr: solve inverse problems with latent diffusion models by mimicking degradations in the latent space

Degraded
Degraded
Restored
Restored
Degraded
Degraded
Restored
Restored
Degraded
Degraded
Restored
Restored
Degraded
Degraded
Restored
Restored
Example restorations of FFHQ and COCO images undergone masking, Gaussian blur, and down-sampling using RealisticVisionV5.1 and StableDiffusionV1.5 models

Abstract

Consistent improvement of image priors over the years has led to the development of better inverse problem solvers. Diffusion models are the newcomers to this arena, posing the strongest known prior to date. Recently, such models operating in a latent space have become increasingly predominant due to their efficiency. In recent works, these models have been applied to solve inverse problems. Working in the latent space typically requires multiple applications of an Autoencoder during the restoration process, which leads to both computational and restoration quality challenges. In this work, we propose a new approach for handling inverse problems with latent diffusion models, where a learned degradation function operates within the latent space, emulating a known image space degradation. Usage of the learned operator reduces the dependency on the Autoencoder to only the initial and final steps of the restoration process, facilitating faster sampling and superior restoration quality. We demonstrate the effectiveness of our method on a variety of image restoration tasks and datasets, achieving significant improvements over prior art.

comparison to prior work
Computational schemes of prior work and SILO. (a): Prior work enforces consistency to the measurement in pixel space, resulting in differentiation through the decoder. (b): SILO keeps all calculations in the latent space. This allows faster reconstructions while improving their perceptual quality compared to prior work

SILO: Reconstruction Algorithm

Inputs

  • Measurement yy
  • Encoder E\mathcal{E}
  • Decoder D\mathcal{D}
  • Trained degradation operator HθH_\theta
  • Text condition C\mathcal{C}
  • Consistency scale η\eta

Output:

  • A reconstruction x^\hat{x}
  1. Initialization: Sample zTN(0,I)z_T \sim \mathcal{N}(0, I)
  2. Encoding: w=clamp(E(y),4,4)w = \text{clamp}(\mathcal{E}(y), -4, 4)
  3. For t=Tt = T to 11:
    • ϵ^Denoise(zt,t,C)\hat{\epsilon} \leftarrow \text{Denoise}(z_t, t, \mathcal{C})
    • z^0tPredictCleanSample(zt,ϵ^)\hat{z}^t_0 \leftarrow \text{PredictCleanSample}( z_t, \hat{\epsilon} )
    • zt1DDPMStep(zt,z^0t,ϵ^)z'_{t-1} \leftarrow \text{DDPMStep}(z_t, \hat{z}^t_0, \hat{\epsilon} )
    • w^tHθ(z^0t,t)\hat{w}^t \leftarrow H_\theta(\hat{z}^t_0, t)
    • zt1zt1ηztww^t2z_{t-1} \leftarrow z'_{t-1} - \eta \nabla_{z_t} \| w - \hat{w}^t \|_2
  4. Decoding: x^=D(z0)\hat{x} = \mathcal{D}(z_0)
  5. Return x^\hat{x}

Results

Below are some results of our method. More can be found in the paper.

xyOurs (RV)Ours (SD)ReSamplePSLD
hero fig
hero fig
hero fig

BibTeX citation

    @misc{roman2024academic,
  author = "{Roman Hauksson}",
  title = "Academic Project Page Template",
  year = "2024",
  howpublished = "\url{https://research-template.roman.technology}",
}