Assuming that you have a map created from a Kinect run. Then, you use a webcam to localize in the 3D map. You can't do it with with the code as is.
However, it is doable. Two options:
1) Load the database, use OdometryMono with a calibrated webcam. BUT, the problem is that OdometryMono will not have any idea of the scale of the 3D map if it runs on its side (scale initialized in the first phase of OdometryMono). The. Maybe it would be possible to synchronize the scale of the odometry with the one used in the map, like changing the initialization stage of OdometryMono with a known pattern for which the 3D coordinates of the object are known (using OpenCV cv::solvePnP() like calibrating with a chessboard). You would have the scale directly. Howerver, if you move the camera outside the known pattern, the scale will still change over time. Well, OdometryMono is very experimental and the scale may change rapidly outside the known pattern (when new 3D points are added to local map). I would go with the second option if you just want to localize (without mapping) in a 3D map.
2) Initialize the local map of OdometryMono with the whole map contained in the database, then use only the PnP localization part of the odometry without adding new points to the local map. The position of the camera will be always computed from the know 3D map, so you can't move the camera outside that region but the scale will always be known. This may be cool for small areas though, I could check more deeply when I will have some time (I opened an issue: https://github.com/introlab/rtabmap/issues/27).
I basically follow what you suggested:
I use a DBReader to import data from a DB file to an OdometryMono Thread.
I use the odometry in the DB, so OdometryMono merely save all words and their 3D coordinates to memory_ (I suppose?).
Then I plan to change OdometryMono to only do the PnP to get an RGB image's rotation and location.
The RGB image is read from a JPG file using CameraImages.
Here is an illustration:
Phase 1: DBReader -> OdometryMono
Phase 2: CameraImages -> OdometryMono (Same thread instance in phase 1, only calculates PnP)
However, when I pass the RGB image from CameraImages thread to OdometryMono thread, I get an error message about missing fx and fy from line 119 in OdometryThread.cpp.
I'm wondering whether I can change CameraImages to let it calculate fx and fy from the image file's EXIF, as described here?
(Assuming the EXIF has all information that I need)
1) I think the best way to initialize the OodmetryMono local 3D map is to initialize the memory_ object with the database directly. The database doesn't save optimized poses of the map, so you should optimize them before adding the points to the local map. Here an example by modifying the lines 126-129 of OdometrMono to this:
std::string myDatabasePath = "rtabmap.db";
if(!memory_->init(myDatabasePath, false, ParametersMap()))
UERROR("Error initializing the memory for Mono Odometry.");
// get the graph
std::map<int, int> ids = memory_->getNeighborsId(memory_->getLastSignatureId(), 0, 0);
std::map<int, Transform> poses;
std::multimap<int, Link> links;
memory_->getMetricConstraints(uKeysSet(ids), poses, links);
//optimize the graph
std::map<int, Transform> optimizedPoses = optimizer.optimize(poses.begin()->first, poses, links);
// fill the local map
for(std::map<int, Transform>::iterator posesIter=optimizedPoses.begin();
const Signature * s = memory_->getSignature(posesIter->first);
// Transform 3D points accordingly to pose and add them to local map
const std::multimap<int, pcl::PointXYZ> & words3D = s->getWords3();
for(std::multimap<int, pcl::PointXYZ>::const_iterator pointsIter=words3D.begin();
localMap_.insert(std::make_pair(pointsIter->first, util3d::transformPoint(pointsIter->second, posesIter->second)));
2) The RGB images should be rectified. If they are already rectified, you only need to specify the focal lengths fx and fy (also cx and cy) for the sensor data sent to odometry. If you don't know the fx and fy values, you can calibrate your camera with rtabmap-calibration tool. Unlike CameraRGBD classes, which explicitly load calibration files, the Camera classes don't set fx,fy in CameraThread (see here). I plan to merge the Camera and CameraRGBD classes one day, but for now you can set the fx and fy values hard-coded. Example for line 133:
2a) Note that you could use the already calibrated RGB images from the Kinect for testing with OdometryMono (using CameraFreenect instead of CameraImages for example). There is an option to set CameraRGBD classes to send only RGB images.
In my case, I need to take RGB images from a smartphone, so I cannot use CameraRGBD.
I can hardcode fx, fy, cx, cy for now.
May I ask two more questions?
1. Why do I need to rectify the image to perform the PnP computation? In my understanding, image rectification is used to put matched points in two images onto same epipolar lines, so you can find matched points more easily (in one dimension).
PnP is trying to use RANSAC to find the R and t of a camera based on 3D and 2D matched points.
The matched points are calculated based on a series of computation:
1. project localMap_'s 3D points to previous and current RGB image.
2. refine the projection using image boundary, 2D point if the word in unique, and optical flow.
However, I don't see which part needs the previous image and current image to be rectified.
2. What is _localTransform in SensorData? I think this has confused me a lot while reading the OdometryMono codes.
Thanks again for your quick and detailed answers.
I really really appreciate that!
1. It is to remove distortion in the image so camera intrinsic K can be used alone to reproject 3D points to image plane without using distortion coefficients D. In cv::solvePnPRansac(), they are cameraMatrix and distCoeffs parameters. If the image is rectified, you don't need to specify distCoeffs. Maybe your smartphone has already a camera with very low distortion, so you may not specify them or explicitly rectify the image. Otherwise, the rtabmap-calibration gives also the D coefficients.
2. The local transform for SensorData is the transform from a fixed frame on the robot and the frame of the sensor. In ROS, in can be referred as /base_link -> /camera_link. By default in RTAB-Map hand-held mapping, the local transform is the rotation of the sensor from the image referential (x right, y down, z forward) to world referential (x forward, y left, z up). This transform is generally constant.
I ended up doing these steps to localize the RGB image:
1. use memory_->computeLikelihood() to find the most similar signature (say it is signature N) in memory to the new RGB image (say it is signature M)
2. Compute the transform between M and N using memory_->computeVisualTransform().
(This is basically the loop closure codes that you have.)
The step 1 works just fine, I can find a very similar signature in memory.
However, when I do the second step, I need to do bowEpipolarGeometry internally in memory.
The problem is, even though I have two very similar images (M and N), I always cannot get enough inliers.
One example warning message is "Memory.cpp:1981::computeVisualTransform() Not enough inliers 9 < 20 ".
I have changed the util3d codes to use different K for each image (because they are taken from different cameras).
Do you have some suggestions on how to debug the EpipolarGeometry codes?
Thank you very much!
The likelihood doesn't tell how many unique correspondences exist between two images. To have many inliers, correspondences should be unique, good and that depth is defined for the features. I don't think the bowEpipolarGeometry is the right thing to use. In your approach, you can have the 3D words of the matched signature directly, "words3" field of the RGBD Signature, that could be used for PnP estimation as the objectPoints argument. The imagePoints could be the matched 2D visual words in the RGB image. Look for example the new PnP code in Memory::computeVisualTransform() on devel branch. Here oldS would be the best match RGB-D image, and newS the RGB image.
Thanks! I changed my code to use the PnP codes in Memory.cpp.
However, even though I can find a similar image in the database, I often cannot find enough inliers.
For example, for the two images I attached, there are only 12 inliers, among which 0 is PnP inlier. (for some other images I can have 5-10 PnP inliers so I can calculate the transform)
These two images look very similar to me, and they seem to have many features.
So I'm wondering whether it's normal that there are not many inliers?
Based on my understanding, a word has to be unique in both images to be considered as an inlier.
It seems that I can increase the number of unique words by decreasing _nndrRatio in VWDictionary.cpp.
But I'm not sure whether that's a good idea?
Also, I'm using Kinect v1 to collect RGBD data.
Will using Xtion PRO LIVE increase the number of inliers?
I think you could test the RGB-only localization part using the same camera (Xtion without the depth) to see if you can have more inliers.
Also, you could have more inliers if you re-extract the features between matched images. By default, rtabmap extracts a maximum of 400 SURF features from each image when creating the map. So if your image you are trying to localize is not directly in the same field of view of the best matched image of the map, you may start with low matches. By re-extracting the features, you could extract more like 1000 features or even no limit. I have done this kind of process when evaluating loop closures (which don't have generally not many matches on large-scale). Code is here. "dataTo" would be the best match image on the map (RGB-D image) and "dataFrom" the new image (RGB-only)... it will only work if Memory::computeVisualTransform() uses PnP (for which the depth is not required for the second image).