跳转至

Lec.11: 3D Deep Learning

Feature matching

  1. SuperPoint

  2. CNN-based detectors:

    • train CNNs to detect corners

    • train CNNs to enforce repeatability: warp image and enforce equivariance $$ \min_f\frac{1}{n}\sum_{i=1}^n|f(g(\mathbf{I}))-g(f(\mathbf{I}))|^2 $$

  3. CNN-based descriptors

    • train descriptors by metric learning

    • constrastive loss

    \[ L_{tri}=\frac{1}{N}\sum_{i=1}^N\max(0,m+\|F_I(A)-F_{I'}(P)\|-\|F_I(A)-F_{I'}(N)\|)^2 \]
    • where is training data from

      • synthetic data
      • use MVS

Object Pose Estimation

  1. feature-matching-based methods

    • First,reconstruct object SfM model by input multi-view images
    • Then obtain 2D-3D correspondeces by lifting 2D-2D matches to 3D
    • Finally, object pose of query image can be solved by PnP
  2. Direct Pose Regression Methods

    • Directly regressing object pose of queryimage using a neural network
    • Need to render a large amount of images for training
  3. Keypoint detection methods

    • Using a CNN to detect pre-defined keypoints
    • Need to render a large amount ofimages for training

Dense Reconstruction

  1. MVSNet: predict cost volume from CNN features
image-20241128100945621
  1. implict representations 隐式表达,通过一个连续函数 Occpupacy, Signed distance function
image-20241128102209456image-20241128102443468

通过神经网络训练implict representations

image-20241128102605040
  1. single image to 3D

    • monoculer depth estimation: using network to guess depth from single image

    • scale ambiguity: the same object with different sizes and depths give the same image

      歧义性

    • loss function: scale-invarient depth error

\[ D_{SI}(y,y^*)=\frac{1}{n}\sum_{i=1}^n(\log y_i-\log y_i^*+\alpha(y,y^*))^2 \newline \alpha(y,y^*)=\frac{1}{n}\sum_{j=1}^n(\log y_j-\log y_j^*) \]

Deep learning for 3D understanding

  1. 3D ConvNets: High space/time complexity of high resolution voxels \(O(N^3)\)

  2. sparse ConvNets: using sparity of 3D shapes

    • Store the sparse surface signals (Octree)
    • Constrain the computation near the surface
    • Sparse convolution: compute inner product only at the active sites (nonzero entries)

    image-20241128112654618

  3. deep learning on point clouds

    • challenge

      • Point cloud is unrasterized data, irregular
      • Convolution cannot be applied
    • PointNet

      • 表示:N orderless points, each represented by a D dim coordinate

      • orderless的解决:max pooling makes the output invariant to the order of input points

      image-20241128113344501

      • 让网络旋转不变:estimate the transformation using another network (T-Net)

      image-20241128113444120

      • limitations

      • No local context for each point!

image-20241128113727314
  1. PointNet++: Multi-Scale PointNet

    1. Sampling: Sample anchor points by Farthest Point Sampling (FPS)
    2. Grouping: Find neighbourhood of anchor points
    3. Apply PointNet in each neighborhood to mimic convolution

      image-20241128115039978

  2. 3D semantic segmentation

    • Input: sensor data of a 3D scene (RGB/depth/point cloud …..)
    • Output: Label each point in point cloud with category label.
    • Possible solution: fuse 2D segmentation results in 3D 因为2D的效果比较好了
  3. 3D object detection

    • PointRCNN: RCNN for point cloud

    image-20241128115732543

    • Frustum PointNets: using 2D detectors to generate 3D proposals

评论