GPU feature detectors slow

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

GPU feature detectors slow

DocBrowm
Hey!
I noticed that the GPU versions of Fast, Surf and ORB are slower than the cpu implementations. The total time is higher and also the cpu workload is increased. Is that normal ( why should I use then the GPU implemtantions?) or did I anything wrong during the compiling step og OpenCV? Maybe someone has an idea.

Best Regards

Michael
Reply | Threaded
Open this post in threaded view
|

Re: GPU feature detectors slow

matlabbe
Administrator
Hi,

I've made the same observations. On my laptop it is slower with the GPU, probably because of the time to upload/download features between the CPU and the GPU (there are maybe some optimizations that could be done to avoid too much transfers between the CPU and GPU). However, on a more recent computer (and more recent NVidia card), the time with GPU is 2x-3x less than with the CPU.

cheers
Reply | Threaded
Open this post in threaded view
|

Re: GPU feature detectors slow

DocBrowm
Thanks! That makes sense.
Other question, at the RTABMap parameter tutorial page you wrote in the section visual world that SURF and SIFT are preferred for large-scale environments.
Is the only reason that they are much more accurate compared to the others? Because those two are extremely slow on my device and I have to decide between performance and accuracy.

best regards
Michael
Reply | Threaded
Open this post in threaded view
|

Re: GPU feature detectors slow

matlabbe
Administrator
The main reason that I suggest SIFT or SURF for large-scale loop closure detection is that I've only benhmarked with SURF yet. I've seen recent papers using ORB as the vocabulary with good results. ORB is far more faster to compute. However, RTAB-Map lacks of a Binary search tree to speed-up nearest neighbor search for binary features like ORB (nearest neighbor is then done by BruteForce instead of using a KD-Tree like for SURF or SIFT). With ORB ("Kp/DetectorStrategy"=2, "Kp/NndrRatio"=0.9 and "Kp/NNStrategy"=3), I suggest to use less features / image (like "Kp/WordsPerImage"=200 with "LccBow/InlierDistance"=0.1) but activate re-extraction of features on loop closure detection ("LccReextract/Activated"=true).

If you do large-scale mapping, you may want to activate the memory management of RTAB-Map too. In the GUI it is called "T_time" and on ROS it is "Rtabmap/TimeThr" (in ms, default 0 or desactivated). For a loop closure detection update of 1 Hz, I set this time threshold to 700 ms so that mapping updates will always take around ~700 ms. I'll refer you to this paper for the results of activating the memory management.  

cheers