datjko/topcon-post-cvpr2017.md

## topcon-post-cvpr2017.md

      
    Raw
  

              topcon-post-cvpr2017.md
            
          
CVPR 2017 topics


https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw/playlists all presentation and workshop videos

deep learnng on 3d

tasks


classification
segmentation (by class, by instance)
denoising, restoring, in-fill, up-sampling
cross-channels inference
summarization (including renering to 2d views)
attention regions and views
generating/hallucinating objects
feature learning

CVPR+ papers/projects


http://3dgan.csail.mit.edu/

3D Models From Photos, latent space, octree limitation


https://www.youtube.com/watch?v=HO1LYJb818Q AI Makes 3D Models From Photos | Two Minute Papers #122


https://arxiv.org/abs/1610.07584 Learning a Probabilistic Latent Space of Object Shapes
via 3D Generative-Adversarial Modeling


OctNet

"sparce" octree, (256^3 as "High Resolution", haha)


https://www.youtube.com/watch?v=qYyephF2BBw (12min)


https://arxiv.org/abs/1611.05009 OctNet: Learning Deep 3D Representations at High Resolutions
https://github.com/griegler/octnet Gernot Riegler, Andreas Geiger


O-CNN (Microsoft)

yet another "sparce" octree with 256^3 as as max resolution, haha


http://wang-ps.github.io/O-CNN_files/CNN3D.pdf O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis


https://github.com/Microsoft/O-CNN


PointNet (Leonidas J. Guibas and his PhD students)

fixed number of points (1024, haha)


https://arxiv.org/abs/1612.00593 PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation


https://arxiv.org/abs/1706.02413 PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
http://stanford.edu/~rqi/pointnet/
https://www.youtube.com/watch?v=Cge-hot0Oc0 PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation


FPNN (Leonidas J. Guibas and his PhD students)

good for global descriptor, bad for fine detailes recognition, loses in accuracy


https://arxiv.org/abs/1605.06240 FPNN: Field Probing Neural Networks for 3D Data


SnapNet

2d projections, requires custom training set, haha


https://sites.google.com/view/boulch/publications/2017_3dor_pointclouds Unstructured point cloud semantic labeling using deep segmentation networks


https://github.com/aboulch/snapnet


Why I am excited despite all these haha?


TTTL to the rescue
TTTL as drop-in replacement of octree grids will require

GPU implementation of TTTL
adopting TTTL to store truncated signed distance fields
adopting TTTL to store occupancy grid maps


I'm sure there is a way to apply DL to data in TTTL representation directly


Octree Generating Networks (Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox)

recoursively refines occupancy grid. yet another "sparce" octree with 512^3 as as "High-resolution 3D Outputs", haha


https://arxiv.org/abs/1703.09438 Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs


https://www.youtube.com/watch?v=KjSZpJUX5F4 (3min)


https://lmb.informatik.uni-freiburg.de/people/tatarchm/ogn/
https://lmb.informatik.uni-freiburg.de/people/dosovits/publication_all.html


OctNetFusion (Gernot Riegler, Ali Osman Ulusoy, Horst Bischof, Andreas Geiger)

denoising/shape completion. is based on octnet => 256^3 is the max max resolution, haha


https://www.youtube.com/watch?v=NM0_wJHWnTk (1min)


https://arxiv.org/abs/1704.01047


3DMatch

Learning point descriptor stable to affine transformations => better cloud-to-cloud registration


https://www.youtube.com/watch?v=gZrsJJtDvvA (4min)


http://3dmatch.cs.princeton.edu/
https://www.youtube.com/watch?v=qNVZl7bCjsU 3DMatch: Learning Local Geometric Descriptors From RGB-D Reconstructions


DSAC

differentiable implementation of ransac as back-propagation friendly operation in DL => huge improvement in accuracy


https://www.youtube.com/watch?v=YWSGq7CUSRA (12min) presentation


https://arxiv.org/abs/1611.05705 DSAC - Differentiable RANSAC for Camera Localization
http://cvlab-dresden.de/research/scene-understanding/pose-estimation/#DSAC


http://sscnet.cs.princeton.edu/ Semantic Scene Completion From a Single Depth Image

in-fill, octree limitations on resolution


https://www.youtube.com/watch?v=Yjpmouaap6M (4min)


https://arxiv.org/abs/1611.08974
https://www.youtube.com/watch?v=Aq7hLLIz5a0 (15min) CVPR presentation


http://3d-r2n2.stanford.edu/ 3D-R2N2: 3D Recurrent Reconstruction Neural Network


http://3d-r2n2.stanford.edu/viewer/


https://arxiv.org/abs/1708.01648 3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks


SurfaceNet (Yebin Liu et al)


https://arxiv.org/abs/1708.01749 SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis


http://www.liuyebin.com/


Deep Projective 3D Semantic Segmentation

like SnapNet, learns from 2d forward and pack projections


https://arxiv.org/pdf/1705.03428.pdf "Deep Projective 3D Semantic Segmentation" Felix Jaremo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg


https://www.youtube.com/watch?v=H94ASpItkLI (14min)


ML on graphs and manifolds

Cool. I'll make a separate presentation. Let's skip it for now.


http://geometricdeeplearning.com/
SyncSpecCNN

https://arxiv.org/abs/1612.00606 SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation
https://github.com/ericyi/SyncSpecCNN
https://www.youtube.com/watch?v=ClvLCXQ9Ipw (4min)


more links


http://3ddl.stanford.edu/CVPR17_Tutorial_MVCNN_3DCNN_v3.pdf slides
https://liu.diva-portal.org/smash/get/diva2:1091059/FULLTEXT01.pdf "Semantic Segmentation of Point Clouds using Deep Learning" by Patrik Tosteberg. Master of Science Thesis. Gives an overview.
https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-2-W3/339/2017/isprs-archives-XLII-2-W3-339-2017.pdf "A REVIEW OF
POINT CLOUDS SEGMENTATION AND CLASSIFICATION ALGORITHMS" 1–3 March 2017
http://www.semantic3d.net/ Large-Scale Point Cloud Classification Benchmark

Semantic Segmentation


Pyramid Scene Parsing Network


https://hszhao.github.io/projects/pspnet/index.html


https://www.youtube.com/watch?v=rB1BmBOkKTw (2min)


https://www.youtube.com/watch?v=aXdigiSDIak Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes (results t=10:00)


https://www.youtube.com/watch?v=NeHRthS32Fs (4min) Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation | Spotlight

http://personal.ie.cuhk.edu.hk/~ccloy/files/cvpr_2017_not.pdf


https://github.com/msracver/FCIS

https://arxiv.org/abs/1611.07709 Fully Convolutional Instance-aware Semantic Segmentation


https://arxiv.org/abs/1708.02551 Semantic Instance Segmentation with a Discriminative Loss Function


http://davischallenge.org/challenge2017/publications.html DAVIS Challenge on Video Object Segmentation 2017. CVPR Workshop. Challenge Publications.


http://www.vision.ee.ethz.ch/~cvlsegmentation/ segmentation meta project "Image and Video Segmentation @ ETHZ CVL. From evaluation to State-of-the-Art Results"

http://www.vision.ee.ethz.ch/~cvlsegmentation/osvos/ OSVOS: One-Shot Video Object Segmentation


training tricks


SimGAN

https://www.youtube.com/watch?v=vDW8qvsBtmQ (14min) Learning From Simulated and Unsupervised Images Through Adversarial Training (published by Apple, won best paper award)


https://www.youtube.com/watch?v=VhsTrWPvjcA (11min) Unsupervised Pixel-Level Domain Adaptation With Generative Adversarial Networks


https://www.youtube.com/watch?v=LV1slx9Ob7U (15min) Inverse Compositional Spatial Transformer Networks


https://www.youtube.com/watch?v=RDTcV9Zx1C4 (11min) Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach


https://www.youtube.com/watch?v=KMLSXxtguFE (4min) Local Binary Convolutional Neural Networks (Spotlight)


netdissect

Viewer for "Interpretable units". "Interpretable units are interesting because they hint that deep networks may not be completely opaque black boxes".


http://netdissect.csail.mit.edu/


https://www.youtube.com/watch?v=Xy6RcjXMa2c (14min)
http://netdissect.csail.mit.edu/final-network-dissection.pdf
https://github.com/CSAILVision/NetDissect


Black box optimization

Google Vizier

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf
https://cloud.google.com/ml-engine/docs/concepts/hyperparameter-tuning-overview
https://www.reddit.com/r/MachineLearning/duplicates/6rodo1/r_google_vizier_a_service_for_blackbox/


Facebook ActiVis (more ML specific)

https://arxiv.org/abs/1704.01942 ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models
https://code.facebook.com/posts/1072626246134461/introducing-fblearner-flow-facebook-s-ai-backbone/
https://www.nextplatform.com/2017/04/12/look-facebooks-interactive-neural-network-visualization-system/


https://arxiv.org/abs/1704.08792 DeepArchitect: Automatically Designing and Training Deep Architectures. https://github.com/negrinho/deep_architect


https://arxiv.org/abs/1705.07115 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics (Alex Kendall, Yarin Gal, Roberto Cipolla)


Illumination Estimation


Neural Face Editing With Intrinsic Image Disentangling (disentangled representation of intrinsic face properties, including shape (i.e. normals), albedo, and lighting, and an alpha matte)


http://www3.cs.stonybrook.edu/~cvl/content/neuralface/neuralface.html


https://arxiv.org/abs/1704.04131
https://www.youtube.com/watch?v=3D2a-O4RhHU (12min)


Real-time Geometry, Albedo and Motion Reconstruction Using a Single RGBD Camera

http://media.au.tsinghua.edu.cn/monofvv.html


https://www.youtube.com/watch?v=i7FpVaBA65I (8min)


http://www.liuyebin.com/


Deep Outdoor Illumination Estimation (+ camera parameters estimation)

https://arxiv.org/abs/1611.06403
https://www.youtube.com/watch?v=EAPZQeZuxSI (16min) svpr


https://arxiv.org/abs/1704.00090 Learning to Predict Indoor Illumination from a Single Image.


Physically-Based Rendering for Indoor Scene Understanding Using CNNs.

https://arxiv.org/abs/1612.07429
http://robots.princeton.edu/projects/2016/PBRS/


http://www.eecs.harvard.edu/~kalyans/


(my humble proposal)


https://junyanz.github.io/CycleGAN/ Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
(weakly supervised picture to picture learning) deserves separate presentation!


train it on real and ambient lighte image datasets and see if it works


SLAM, depth, egomotion, flow, fusion etc


https://www.youtube.com/watch?v=z_NJxbkQnBU (2min) CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction


https://www.youtube.com/watch?v=HWu39YkGKvI (14min) Unsupervised Learning of Depth and Ego-Motion From Video


https://www.youtube.com/watch?v=JSzUdVBmQP4 (2:30min) FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks


https://www.youtube.com/watch?v=hkj3sVaC6jg (3min) Fast Multi-frame Stereo Scene Flow with Motion Segmentation


http://graphics.stanford.edu/projects/bundlefusion/ BundleFusion: Real-time Globally Consistent 3D Reconstruction


https://www.youtube.com/watch?v=zLzhsyeAie4 Bundlefusion: 3D Scenes from 2D Videos | Two Minute Papers #81


https://www.youtube.com/watch?v=h0T_XtDwmEc (4min) KillingFusion - Non-Rigid 3D Reconstruction Without Correspondences | Spotlight


https://www.youtube.com/watch?v=CwCqcd5ibHI (4min) UltraStereo - Efficient Learning-Based Matching for Active Stereo Systems | Spotlight


https://www.youtube.com/watch?v=lk_yX-O_Y5c (4min) VolumeDeform: Real-time Volumetric Non-rigid Reconstruction (2016) (Matthias Niessner)


http://www.scan-net.org/


https://www.youtube.com/watch?v=Olx4OnoZWQQ (4min) ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes (CVPR 2017 Spotlight)


https://arxiv.org/abs/1705.04300 Challenges in Monocular Visual Odometry: Photometric Calibration, Motion Bias and Rolling Shutter Effect


https://www.youtube.com/watch?v=tni56485tNs (4 hour tutorial) Tutorial : Large-Scale Visual Place Recognition and Image-Based Localization


https://arxiv.org/abs/1612.01079 End-to-end Learning of Driving Models from Large-scale Video Datasets

https://www.youtube.com/watch?v=jxlNfUzbGAY (14min) End-To-End Learning of Driving Models From Large-Scale Video Datasets (+large dataset for autonomous driving with imu, gps etc)
https://github.com/gy20073/BDD_Driving_Model/


Other (mostly 2d)


https://www.youtube.com/watch?v=RzdPkZHv62U What's in a Question - Using Visual Questions as a Form of Supervision | Spotlight


https://www.youtube.com/watch?v=3ZhQKmSbNug Deep Learning on Lie Groups for Skeleton-Based Action Recognition | Spotlight


https://github.com/zsdonghao/SRGAN


https://arxiv.org/abs/1609.04802
https://www.youtube.com/watch?v=BXIR_SVCrsE (12min) Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network


https://www.youtube.com/watch?v=Hcz-h_yut84 (14min) Unrolling the Shutter: CNN to Correct Motion Distortions


https://www.youtube.com/watch?v=yd4j8ue521g (14min) Universal Adversarial Perturbations


https://blog.kitware.com/kitware-maps-development-of-toolkit-for-image-and-video-analysis/


https://arxiv.org/abs/1708.02977 Hierarchically-Attentive RNN for Album Summarization and Storytelling


https://arxiv.org/abs/1708.00838 An End-to-End Compression Framework Based on Convolutional Neural Networks


https://arxiv.org/abs/1604.03505 Counting Everyday Objects in Everyday Scenes


http://openaccess.thecvf.com/content_cvpr_2017/papers/Fu_Look_Closer_to_CVPR_2017_paper.pdf Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition


https://arxiv.org/abs/1509.07831 Deep Multimodal Embedding: Manipulating Novel Objects with Point-clouds, Language and Trajectories


http://robobarista.cs.cornell.edu/ videos


https://www.youtube.com/watch?v=pW6nZXeWlGM (4.5 min) Realtime Multi-Person 2D Human Pose Estimation


Whant more?


https://www.quora.com/What-are-the-most-interesting-CVPR-2017-papers-and-why


https://syncedreview.com/2017/08/07/cvpr-2017-the-fusion-of-deep-learning-and-computer-vision-whats-next/


https://people.eecs.berkeley.edu/~chaene/cvpr17tut/ CVPR 2017 Tutorial Geometric and Semantic 3D Reconstruction


https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw/playlists all presentation and workshop videos


http://openaccess.thecvf.com/CVPR2017.py all papers


http://davidstutz.de/3d-convolutional-neural-networks-a-reading-list/


https://research.fb.com/publications/

https://research.fb.com/publications/learning-features-by-watching-objects-move/
https://research.fb.com/publications/mask-r-cnn/
https://research.fb.com/advancing-computer-vision-technologies-at-cvpr-2017/
https://research.fb.com/introducing-caffe2-to-the-academic-community/


ML/DL generally


https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap
https://github.com/terryum/awesome-deep-learning-papers
https://github.com/shawnyuen/GANsPaperCollection
https://github.com/zhangqianhui/AdversarialNetsPapers/blob/master/README.md
http://yerevann.com/a-guide-to-deep-learning/
https://deeplearn.org/ papers
http://davidstutz.de