Fully Geometric Panoramic Localization

Abstract

We introduce a lightweight and accurate localization method that only utilizes the geometry of 2D-3D lines. Given a pre-captured 3D map, our approach localizes a panorama image, taking advantage of the holistic 360 degree view. The system mitigates potential privacy breaches or domain discrepancies by avoiding trained or hand-crafted visual descriptors. However, as lines alone can be ambiguous, we express distinctive yet compact spatial contexts from relationships between lines, namely the dominant directions of parallel lines and the intersection between non-parallel lines. The resulting representations are efficient in processing time and memory compared to conventional visual descriptor-based methods. Given the groups of dominant line directions and their intersections, we accelerate the search process to test thousands of pose candidates in less than a millisecond without sacrificing accuracy. We empirically show that the proposed 2D-3D matching can localize panoramas for challenging scenes with similar structures, dramatic domain shifts or illumination changes. Our fully geometric approach does not involve extensive parameter tuning or neural network training, making it a practical algorithm that can be readily deployed in the real world.

Video

Task Overview

We consider the task of localizing a panorama image against a 3D line map, which can be readily obtained from images or 3D scans. Unlike existing panoramic localization pipelines, our method performs localization under a fully geometric setup, only using lines in 2D and 3D.

Step 1: Input Preparation

Our method exploits lines and their intersections for performing localization. We first cluster lines using their principal directions in 2D and 3D, which are shown as the line colors in the figure above. Then, we pairwise intersect lines from distinctive princial directions and obtain three groups of intersection points.

Step 2: Pose Search with Distance Functions

Given the initial set of lines and intersections, we perform coarse pose search by comparing point and line distance functions. The distance functions are defined over the sphere as the geodesic distance to the nearest point or line. From uniformly sampled poses in the map, we extract poses that have similar distance function values to those in the query image.

Step 3: Pose Refinement with Line Intersections

We refine poses by aligning lines and their intersections on the sphere. First, at each pose we project 3D line segments onto the sphere. Then, we perform nearest neighbor matching within each intersection point group and optimize translation by minimizing the spherical distance between the matches. Finally, we refine rotation by aligning the line directions associated with each intersection point match.

Efficient Distance Function Comparison for Large-Scale Localization

To scale our pipeline for large-scale scenes, we propose an acceleration scheme for distance function comparison. We find that the distance function formulation in prior works does not scale well. Consider the equation below, which is the previously used formula for comparing distance functions [1].

Here denotes the robust cost function evaluated over uniformly sampled points on the sphere. Notice that the 3D distance function term is dependent on rotations.

This is problematic in large-scale scenarios, as the principal directions used for extracting the rotation pool varies for each query and scene. Therefore we propose a modified comparison formula, where the rotation is factored out from the 3D distance function.

As the 3D distance function is no longer dependent on rotation, its values can be pre-computed and cached prior to localization. This largely reduces runtime and enables our method to perform fast localization while not largely sacrificing accuracy.

Localization Performance Analysis

Our fully geometric pipeline can effectively localize in large-scale scenarios. The method shows competitive performance when compared against conventional visual feature-based localization methods, while maintaining a small map size and runtime.

Due to the fully geometric formulation based on lines, our method is robust to lighting changes. We evaluate our method under various lighting conditions, and show recall curves while shading the maximum and minimum recall values. Our method attains a high recall rate while showing only a small performance deviation amidst lighting variations.

Applicability to Floorplan Localization

As our method only exploits lines in 2D and 3D for localization, we find that it can also perform localization using floorplan maps. To elaborate, we evaluate our method using the lines from floorplan maps as shown above, without altering the localization hyperparameters.

Our method performs competitively against existing floorplan localization methods, which are specifically tailored for the task. Thus we expect our method to serve as a practical approach for various geometric localization setups.

Visualization of Pose Refinement

Below we show visualizations of our pose refinement process. Notice how the projected 3D lines approach the query image lines as the intersection points get aligned.

References

1. Junho Kim, Changwoon Choi, Hojun Jang, and Young Min Kim. Ldl: Line distance functions for panoramic localization, ICCV 2023

BibTeX

@InProceedings{Kim_2024_CVPR,
  author    = {Kim, Junho and Jeong, Jiwon and Kim, Young Min},
  title     = {Fully Geometric Panoramic Localization},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2024},
  
}