Endoscopic-CT: Learning-Based Photometric Reconstruction for Endoscopic Sinus Surgery

A Reiter; S Leonard; A Sinha; M Ishii; R H Taylor; G D Hager

doi:10.1117/12.2216296

Endoscopic-CT: Learning-Based Photometric Reconstruction for Endoscopic Sinus Surgery

Proc SPIE Int Soc Opt Eng. 2016 Feb-Mar:9784:978418. doi: 10.1117/12.2216296. Epub 2016 Mar 21.

Authors

A Reiter¹, S Leonard¹, A Sinha¹, M Ishii², R H Taylor¹, G D Hager¹

Affiliations

¹ Johns Hopkins University, Dept. of Computer Science, Baltimore, MD, USA.
² Johns Hopkins Medical Institutions, Dept. of Otolaryngology - Head and Neck Surgery, Baltimore, MD, USA.

Abstract

In this work we present a method for dense reconstruction of anatomical structures using white light endoscopic imagery based on a learning process that estimates a mapping between light reflectance and surface geometry. Our method is unique in that few unrealistic assumptions are considered (i.e., we do not assume a Lambertian reflectance model nor do we assume a point light source) and we learn a model on a per-patient basis, thus increasing the accuracy and extensibility to different endoscopic sequences. The proposed method assumes accurate video-CT registration through a combination of Structure-from-Motion (SfM) and Trimmed-ICP, and then uses the registered 3D structure and motion to generate training data with which to learn a multivariate regression of observed pixel values to known 3D surface geometry. We demonstrate with a non-linear regression technique using a neural network towards estimating depth images and surface normal maps, resulting in high-resolution spatial 3D reconstructions to an average error of 0.53mm (on the low side, when anatomy matches the CT precisely) to 1.12mm (on the high side, when the presence of liquids causes scene geometry that is not present in the CT for evaluation). Our results are exhibited on patient data and validated with associated CT scans. In total, we processed 206 total endoscopic images from patient data, where each image yields approximately 1 million reconstructed 3D points per image.

Keywords: 3D reconstruction; shape from shading; structure from motion; video-CT registration.

Grants and funding

R01 EB015530/EB/NIBIB NIH HHS/United States