Article written by Martin Humenberger
Visual localization, which estimates the position and orientation of a camera in a map based on query images, and structure from motion (SFM), which is one of the most popular ways to build this visual localization map, are fundamental components in the development of technologies such as autonomous robots and self-driving vehicles.
However, a major barrier to research in visual localization and SfM lies in the format of the data itself. Although many good public datasets for evaluation now exist, these datasets are all structured using different formats. As a result, data importers and exporters must often be modified and coordinate systems or camera parameters almost always must be transformed. Even more data conversion is necessary if you want to use a combination of multiple tools in a single pipeline. Many times, existing data formats don’t include the types of data needed for a specific application, especially when it comes to data such as WiFi or other non-image sensors.
To overcome this research barrier, we created the kapture format. Kapture includes a wide range of data to store and comes with several format converters and data processing tools. On the one hand, it will make public datasets easier to use and on the other hand, it will allow processed data (such as local or global features or matches) to be shared easily (again in one common format). By sharing our kapture format with the research community, our aim is to facilitate future research and development in topics such as visual localization, SfM, VSLAM and sensor fusion.
The release contains the following main features:
- Data converters: To convert other datasets from and to kapture, we provide a set of converters for popular formats (e.g., COLMAP, OpenMVG, OpenSfM, bundler, nvm, and more).
- Example pipelines: As an example, we provide two full visual localization pipelines based on COLMAP. The first uses COLMAP SIFT features and COLMAP vocabulary tree matching, and the second uses our own custom features and matches, which obtained state-of-the-art results in the VisLocOdomMapCVPR2020
- Converted datasets: To help kapture users get started, we provide several ready-to-use datasets. These datasets are used for the localization challenge sponsored by NAVER LABS and held at the Workshop on Long-Term Visual Localization under Changing Conditions at ECCV 2020.
Details of the format
Given a known 3D space representation, visual localization is the problem of estimating the position and orientation of a camera using query images. Structure from Motion (SfM) is one of the most popular ways to reconstruct a 3D scene from an unordered set of images.
Kapture can be used to store many types of data collected for visual localization and SfM. Examples include:
- sensor parameters such as intrinsic and extrinsic camera parameters
- raw sensor data such as camera images or LiDAR data
- other sensor data such as GPS or WiFi signals
It can also store data computed during various stages of the process, such as
- 2D local features (keypoints and descriptors)
- 2D–2D matches between local features
- global features (e.g. for image retrieval)
- 3D reconstructions consisting of 3D points and keypoint observations
Kapture also includes a set of Python tools to load, save, and convert datasets to and from kapture. General formats such as COLMAP, openmvg, OpenSfM, bundler, image_folder, image_list, and nvm, as well as a few formats specific to particular datasets such as IDL_dataset_cvpr17, RobotCar_Seasons, ROSbag cameras+trajectory, SILDa, and virtual_gallery are all supported.
Our example pipelines take you through the process of localizing query images on a map, which consists of two major parts: i) building the map and ii) localizing a query. In each pipeline, we first show you how to build the map using SfM and known poses, and then we show you how to localize query images. We also explain how to use kapture to evaluate the precision of the obtained localization against the ground truth.
- Standard COLMAP pipeline: In the first example, we use COLMAP, which is a general-purpose SfM and multi-view stereo pipeline with both a graphical and command-line interface. We use a downloaded vocabulary tree that uses SIFT local features and vocabulary tree matching to build the map and perform the localization in COLMAP. Then, we import the results into kapture to evaluate them.
- Custom features and matches: Using our custom R2D2 and AP-GeM features pre-extracted from the sample database, we use kapture scripts to determine potential image pairs through image retrieval, compute the 2D–2D matches, perform the query matching, and evaluate the results. To build the initial map and visualize the results, we use COLMAP again to take advantage of its graphical interface. In this example, data can be easily ported between kapture and COLMAP at different stages thanks to kapture’s conversion tools.
We’ve already converted a number of datasets to the kapture format. For instance, kapture can be used to process all the datasets included in ECCV 2020’s Visual Localization Challenge (the Aachen Day-Night, Inloc, RobotCar Seasons, Extended CMU-Seasons, and SILDa Weather and Time of Day datasets). The images in these datasets are meant to include the many challenging scenarios that arise in real driving situations and include changes in the time of day, season of the year, and outdated reference representations. The datasets also include images with occlusion, motion blur, extreme viewpoint changes, and low-texture areas.
If you already have your SfM or visual localization processing tools up and running, you just need to integrate kapture support once, after which you can use all the datasets without any additional conversion or glue code writing. Additional details and how to obtain an updated list of datasets can be found in the kapture tutorial.
Contribute to kapture
If you find kapture useful, we encourage contributions!
You are welcome to provide your own dataset in kapture format (we’re happy to help), write new data converters, report bugs and suggest improvements, provide processed data (e.g., extracted features or matches) in kapture format, and add support for additional data types.
This article was first published on the blog of NAVER LABS Europe.
More information, news, and updates can be found on this website.
GitHub Repository: https://github.com/naver/kapture
License: BSD 3-Clause
Paper: M Humenberger, Y Cabon, N Guerin, J Morat, J Revaud, P Rerole, N Pion, C de Souza, V Leroy, and G Csurka: Robust Image Retrieval-based Visual Localization using kapture