Official RTAB-Map Forum - Re: Bundler export functionality

Re: Bundler export functionality

Posted by Timofejev on
URL: http://official-rtab-map-forum.206.s1.nabble.com/Bundler-export-functionality-tp5121p5125.html

Thank you Mathieu, it makes sense.
I guess, I will have to write some code myself then :) could you hint me into a right direction perhaps? I'd like to use the Rtabmap datasets with the ACG Localizer (aka Active Search). The ACG Localizer expects the data in Bundler format (luckily it's all ASCII, so should not be hard to reverse-engineer), namely it expects a list of images, a list of SIFT feature descriptors in every image and finally a list of 3D points, while for every 3D point the following is stored: 3D coordinates, color in RGB, list of frame indices it is visible in, and the indices of the SIFT features which correspond to the 3D point in the views.

edit: so I have actually implemented a big part of it. I'd be thankful if you could correct me if I am wrong somewhere. I found almost all the data in the Map_Node_Word table:
1) depth_x, depth_y, depth_z are the 3D coordinates of the word in the 3D space. The RGB color is missing here, but I can retrieve it from the actual image file by getting the x,y coordinate.
2) by node ID I can retrieve the camera pose and frame indices where the feature is visible
3) by word ID I can retrieve the descriptor. - There is something that I don't get yet: one word ID can have multiple different descriptor values for different node IDs - I guess you store here the descriptors of the same feature seen from different perspectives. Then what is stored with the same ID in the Words table? Is it some averaged feature descriptor vector?

Just for the case I attach the description straight from the bundler's documentation.

bundler wrote

The bundle files contain the estimated scene and camera geometry have
the following format:

    # Bundle file v0.3
    <num_cameras> <num_points>   [two integers]
    <camera1>
    <camera2>
     ...
    <cameraN>
    <point1>
    <point2>
     ...
    <pointM>

Each camera entry <cameraI> contains the estimated camera intrinsics
and extrinsics, and has the form:

    <f> <k1> <k2>   [the focal length, followed by two radial distortion coeffs]
    <R>             [a 3x3 matrix representing the camera rotation]
    <t>             [a 3-vector describing the camera translation]

The cameras are specified in the order they appear in the list of
images.

Each point entry <pointI> has the form:

    <position>      [a 3-vector describing the 3D position of the point]
    <color>         [a 3-vector describing the RGB color of the point]
    <view list>     [a list of views the point is visible in]

The view list begins with the length of the list (i.e., the number of
cameras the point is visible in).  The list is then given as a list of
quadruplets <camera> <key> <x> <y>, where <camera> is a camera index,
<key> the index of the SIFT keypoint where the point was detected in
that camera, and <x> and <y> are the detected positions of that
keypoint.  Both indices are 0-based (e.g., if camera 0 appears in the
list, this corresponds to the first camera in the scene file and the
first image in "list.txt").