Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery

1 Keio University, 2 University of Stuttgart

Multi-camera shadowless lamp (McSL). (a) The system comprises five units, each with six light sources and a camera. (b) The location and focus are adjusted to the surgical field by moving the central knob. (c) Our approach automatically aligns frames, centers the surgical field for stabilized outcomes, and detects frames upon motion for reconfigurations.

Abstract

Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera field of view. To avoid occlusion, the positions and angles of the camera must be frequently adjusted, which is highly labor-intensive. Prior work has addressed this issue by installing multiple cameras on a shadowless lamp and arranging them to fully surround the surgical area. This setup increases the chances of some cameras capturing an unobstructed view. However, manual image alignment is needed in post-processing since camera configurations change every time surgeons move the lamp for optimal lighting. This paper aims to fully automate this alignment task. The proposed method identifies frames in which the lighting system moves, realigns them, and selects the camera with the least occlusion to generate a video that consistently presents the surgical field from a fixed perspective. A user study involving surgeons demonstrated that videos generated by our method were superior to those produced by conventional methods in terms of the ease of confirming the surgical area and the comfort during video viewing. Additionally, our approach showed improvements in video quality over existing techniques. Furthermore, we implemented several synthesis options for the proposed view-synthesis method and conducted a user study to assess surgeons' preferences for each option.

Method Overview

Overview of the proposed method. The method consists of three major steps: view alignment, selection, and enhancement. X, X', Y, and Y' denote all input, aligned, stabilized, and enhanced video frames, respectively.

McSL Movement Detection

Robust McSL movement detection and homography estimation. (a) While the previous approach asks the video viewer to find consecutive video frames to perform McSL calibration, our algorithm can detect such video frames automatically. (b) The figures show the superimposed images of five cameras at (left) before and (right) after McSL moves. The red circles highlight example feature points of a common scene point (i.e., a big toe) to calculate the degree of misalignment. (c) We distinguish surgical fields and the others in hue to collect non-occluded frames for stable homography calculation. The red highlights represent the detected surgical fields.

Auto-Alignment Results

Auto-alignment with respect to camera 1 over time. The entire surgery lasts 1:39:19, and the times = 0:16:59 and 0:36:50 in the figure are the frame IDs detected by camera movement detection, indicating the moment when realignment becomes necessary.

Pixel Filling

The procedure for combining two images during filling missing region. The foreground image is the video Y with missing pixels, and the background image is the video from the warp destination viewpoint. First, regions in the foreground image with pixel values above 10 are extracted and blurred to generate an alpha mask. Using this mask, the foreground and background are blended to perform alpha blending at the boundaries. For pixels that remain missing after this process, the pixel values of the previous frame are blurred and retained.

Comparison with Baselines

Example video frames from the three methods (No-alignment, Manual-alignment, and Auto-alignment). The red lines indicate positions and orientations of certain parallels. In the videos generated with No-alignment, the positions and orientations change every time the light (i.e., cameras) moves, which makes video observations difficult. Manual-alignment shows more stable results while it shows greater misalignment compared to ours. Ours shows reduced misalignments between viewpoints.

Visual Enhancement Options

Example video frames from the three methods: Auto-alignment alone (Ours), Auto-alignment plus centering surgical field (Ours w/Centered), and Auto-alignment plus centering surgical field and filling missing region (Ours w/Centered+Filled). In the video of Ours w/Centered, the surgical field that the doctor wants to see is centralized; in the image of Ours w/Centered+Filled, the missing areas of pixel values are complemented.

Expert Review

Results of the expert review. A nonparametric Friedman test showed significant differences in all videos and factors. In addition, a Wilcoxon signed-rank test was conducted to evaluate significant differences only between the previously used manual-alignment and the proposed auto-alignment method. We observed significant differences for almost all videos and factors except for Video #3.

Subjective Evaluation of Visual Enhancements

Results of the subjective evaluation experiment of proposed methods with several functions. The Friedman test confirmed significant differences only in factor 2 of Video #3 and #4. Almost all the videos and factors did not show significantly higher (or lower) scores for Ours w/Centered or Ours w/Centered+Filled. * indicates a significant difference.