2021 PatchmatchNet: Learned Multi-View Patchmatch Stereo - Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, Marc Pollefeys - [GITHUB] - [SUPP]
2022 IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo - Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Marc Pollefeys - [GITHUB]
Tanks and temples (TNT):
Best performing methods
Multi-Scale Geometric Consistency Guided Multi-View Stereo - Qingshan Xu, Wenbing Tao
ETH3D: ACMH (75.89 in 967 s)
TNT: ACMH (54.82+33.73)
TNT: ACMM (57.27+34.02)
- Diffusion-like propagation with checkerboard pattern instead of sequential propagation (for massive parallelism, but leads to weaker performance, view-selection problems?)
- Checkboard sampling: Instead of just evaluating 8 local samples [Gipuma] we firstly choose 8-best scored samples from 4 V-shaped and 4 long strip areas (i.e. from 4*7+4*11=72 candidates) (helps a good plane of a local shared region to spread further as much as possible). But may be we should choose one best candidate from each area (also leads to 8 hypothesises)?
- Multi-Hypothesis Joint View Selection: for each 8 propagated hypothesises we estimate matching cost with each neighboring view. For bad view: all 8 costs are high. So we have good views that are good for enough hypothesises number (and not too bad for majority of hypothesises). Each good view have weight - see (3). Now for each hyphothesis we can estimate total confidence w.r.t. weighted good views.
- Most important view: also if there is good view that was the best view on prev iteration - its' weight should be x2 increased. If it isn't good now - it should still be taken into account (with 0.2 weight). See (5).
- Good matching cost boundary: decreasing with iteration number - see (2). Idea: may be just check that view cost is not much worse then original local sample's cost (from which gyphotesis was propagated)?
- Cost function: bilateral weighted (Z?)NCC.
- Refinement after each red-black iteration: two new hypothesises - refined current one and new random one. There exist three conditions for the depth and normal: either of them, neither of them, or both of them are close to the optimal solution. So we combine our three hypothesises (old, refined and random one) leading to six combinations of their depths and normals. The one with lowest aggregated cost is chosen.
- Filtering: 5x5 median filter.
- Coarse-to-fine scheme: to support low-textured regions (because texture richness is a relative measure)
- Detail restorer: based on photometric consistency between adjustment scales
- Geometric consistency guidance: reliable depth estimates for low-textured areas obtained at coarser scales are retained at finer scales (to fix the ambiguities in low-textured areas) - by adding reprojection error to each view's cost. But relying on previous level of depthmaps of other images is dangerous (they can be taken from different distance), can't we overcome ambiguities in low-textured areas by adding bonus for the closest hyphotesis to the reliable depth from previous level of the current reference image?
- Twice geometric consistency guidance at each scale (so that if the neighboring depth maps are estimated more accurately, the depth map of the reference image will be further boosted).
- Detail Restorer: multi-scale geom consistency guidance also leads to blurred details. Disabling it in thin structures and boundaries can help (leaving only pure photometric consistency). If after upscaling cost is InitCost and then after MVS (single ACMH pass) it is NewCost: then if NewCost<InitCost-Epsilon then upscaled hyphotesis should be replaced with refined by MVS. Idea: If after upscaling without MVS NewPhotoconsistency<OldPhotoconsistency-Epsilon then the estimation is erroneous and we shouldn't use it as prior? Or if it is not a local extremum (check hypothesises with depth+-delta).
Pixelwise View Selection for Unstructured Multi-View Stereo - Johannes L. Schonberger, Enliang Zheng, Jan-Michael Frahm and Marc Pollefeys
Massively Parallel Multiview Stereopsis by Surface Normal Diffusion - Silvano Galliani, Katrin Lasinger, Konrad Schindler
Note that Gipuma fails on ETH3D and TNT because it relies on object-centered prior (see point below about Views selection - views direction should be replaced with per-pixel rays direction).
Massive parallelism via red-black checkboard pattern (diffusion-like scheme).
Multiview matcher instead of just two-view.
Patchmatch Stereo: Soft segmentation - which decreases the influence of pixels that differ a lot from the central one.
Patchmatch Stereo: Cost function - weighted combination of 0.1*color and 0.9*gradien differences.
Patchmatch Stereo: Minor filtering: left-right inconsistencies checks, filling holes by extending nearby planes, weighted median filtering.
Depth+Normal yields an affine distortion of the support windows - see PM-Huber: PatchMatch with Huber Regularization for Stereo Matching.
The propagation scheme: 20 neighbours for propagation in local patch with radius=5. (or 8 neighbours for speedup)
Intensity intead of RGB (i.e. grayscale) for 3x speedup.
Sparse Census Transform: only every other row and column in the window when evaluating the matching cost, resulting in a 4x gain.
Parameterization in scene space: detailed math about plane-induced homography.
Random initialization: detailed math about random normal from visible hemisphere.
Depth resolution is anisotropic: more densely sampled set of depths to chose from in the near field and a sparser set in the far field. And the search interval for the plane refinement step should be set proportional to the depth.
Views selection: viewing direction should differ enough (for big enough baseline) and not be extra big (to prevent too big perspective distortions). For speedup S=9 random images subset is chosen. Why not to do two-steps process? Each image will choose random pairs, after first round each image will have good-enough depth map, so we should launch second fast round which will allow to propagate depth hypothesize from pair images' depthmaps.
Cost aggregation: To remove occluded views matches from cost - we use sum of K=3 best matches. Why not to calculate second-best cost t and then sum all costs <1.5*t?
Planar Prior Assisted PatchMatch Multi-View Stereo - Qingshan Xu, Wenbing Tao
ETH3D: ACMP (81.51 in 1086 s)
Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume - Qingshan Xu, Wenbing Tao
TNT: CIDER (46.76+23.12)
TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo - Andrea Romanoni, Matteo Matteucci
ETH3D: TAPA-MVS (79.15 in 3374 s)
ETH3D: PLC (78.05 in 2192 s)
TNT: PLC_ (54.56+34.44)
TBA on CVPR 2020
MARMVS: Matching Ambiguity Reduced Multiple View Stereo for Efficient Large Scale Scene Reconstruction - Zhenyu Xu, Yiguang Liu, Xuelei Shi, Ying Wang, Yunan Zheng
ETH3D: MAR-MVS (81.84 in 3973 s on CPU?)
PMSGM: PatchMatch Semi-Global Matching for Efficient Stereo Correspondence - Xuchong Zhang, He Dai, Miaomiao Sang, Hongbin Sun, Nanning Zheng
Random initialization: t random depth hypotheses from range [0, dmax]
Spatial propagation: for odd iterations initial t disparities of current pixel are replaced by choosing the best t candidate disparities from neighbor pixels to the right/bottom. For even iterations - the same but from neighbours to the left/up. In raster-scanned manner order - i.e. no massive parallelism!
Random search: t disparities of pixel can be replaced with new random hypotheses if they are better.
WTA on per-pixel basis.
PatchMatchSGM - PMSGM
The same as PatchMatch but regularization added to spatial propagation - P1/P2 penalty like in SGM for disparity change 1/>1.
- ROB 2018 - Johannes Schönberger: COLMAP ROB (youtube)
- After each round replace all normals with normals estimated via plane-fitting on diagonal-neighbouring depth values?
- Number of initial random hypothesises should be proportional to dmax
- [Gipuma/Colmap] Calculate NCC with Soft segmentation/bilaterian/guided filter (to respect object edges, to suppress attached sky to walls)
Take into account NCC for pixel only from such image' pairs:
- [Colmap] Non-occluded prior - youtube
- [Colmap] Triangulation angle prior - big enough baseline youtube
- [Colmap] Close resolution prior (w.r.t. pixel world normal, image pairs pyramids required?)
- [Colmap] Maximum angle observation prior (or similar angle observation prior?) - we shoult not take into account close to 90 degrees oblique photos especially when we observe pixel from its normal direction and there are other such good images (is this the same as close resolution prior?)