Lec.11: 3D Deep Learning¶
Feature matching¶
-
SuperPoint
-
CNN-based detectors:
-
train CNNs to detect corners
-
train CNNs to enforce repeatability: warp image and enforce equivariance $$ \min_f\frac{1}{n}\sum_{i=1}^n|f(g(\mathbf{I}))-g(f(\mathbf{I}))|^2 $$
-
-
CNN-based descriptors
-
train descriptors by metric learning
-
constrastive loss
\[ L_{tri}=\frac{1}{N}\sum_{i=1}^N\max(0,m+\|F_I(A)-F_{I'}(P)\|-\|F_I(A)-F_{I'}(N)\|)^2 \]-
where is training data from
- synthetic data
- use MVS
-
Object Pose Estimation¶
-
feature-matching-based methods
- First,reconstruct object SfM model by input multi-view images
- Then obtain 2D-3D correspondeces by lifting 2D-2D matches to 3D
- Finally, object pose of query image can be solved by PnP
-
Direct Pose Regression Methods
- Directly regressing object pose of queryimage using a neural network
- Need to render a large amount of images for training
-
Keypoint detection methods
- Using a CNN to detect pre-defined keypoints
- Need to render a large amount ofimages for training
Dense Reconstruction¶
- MVSNet: predict cost volume from CNN features

- implict representations 隐式表达,通过一个连续函数 Occpupacy, Signed distance function


通过神经网络训练implict representations

-
single image to 3D
-
monoculer depth estimation: using network to guess depth from single image
-
scale ambiguity: the same object with different sizes and depths give the same image
歧义性
-
loss function: scale-invarient depth error
-
Deep learning for 3D understanding¶
-
3D ConvNets: High space/time complexity of high resolution voxels \(O(N^3)\)
-
sparse ConvNets: using sparity of 3D shapes
- Store the sparse surface signals (Octree)
- Constrain the computation near the surface
- Sparse convolution: compute inner product only at the active sites (nonzero entries)
-
deep learning on point clouds
-
challenge
- Point cloud is unrasterized data, irregular
- Convolution cannot be applied
-
PointNet
-
表示:N orderless points, each represented by a D dim coordinate
-
orderless的解决:max pooling makes the output invariant to the order of input points
- 让网络旋转不变:estimate the transformation using another network (T-Net)
-
limitations
-
No local context for each point!
-
-

-
PointNet++: Multi-Scale PointNet
- Sampling: Sample anchor points by Farthest Point Sampling (FPS)
- Grouping: Find neighbourhood of anchor points
-
Apply PointNet in each neighborhood to mimic convolution
-
3D semantic segmentation
- Input: sensor data of a 3D scene (RGB/depth/point cloud …..)
- Output: Label each point in point cloud with category label.
- Possible solution: fuse 2D segmentation results in 3D 因为2D的效果比较好了
-
3D object detection
- PointRCNN: RCNN for point cloud
- Frustum PointNets: using 2D detectors to generate 3D proposals