1. Yes, the local map could be assembled with all valid 3D visual words contained in the memory (STM), but for convenience, we cache them in localMap_.
2. It is for the initialization step of OdometryMono. Here the brief approach when starting odometry:
Phase init-a: Fill cornersMap_ with features from the first image acquired by the camera.
Phase init-b: Stay in this phase until the camera moved enough (translation about 30-40 cm looking the same objects than the first image) to get a second image to compute the first 3D points added to the local map. This step is critical to define the scale used. Note the code is very experimental, triangulating from more than 2 images would give better estimated 3D points (n-view epipolar geometry). Like in Structure form Motion (SfM) approaches.
Phase estimate: When the localMap_ is filled from the previous step, the estimated position of the camera is computed as new images are acquired, adding/removing 3D points from the localMap_.