- https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw/playlists all presentation and workshop videos
- classification
- segmentation (by class, by instance)
- denoising, restoring, in-fill, up-sampling
- cross-channels inference
- summarization (including renering to 2d views)
- attention regions and views
- generating/hallucinating objects
- feature learning
-
3D Models From Photos, latent space, octree limitation
-
https://www.youtube.com/watch?v=HO1LYJb818Q AI Makes 3D Models From Photos | Two Minute Papers #122
- https://arxiv.org/abs/1610.07584 Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
-
-
OctNet
"sparce" octree, (256^3 as "High Resolution", haha)
-
- https://arxiv.org/abs/1611.05009 OctNet: Learning Deep 3D Representations at High Resolutions
- https://github.com/griegler/octnet Gernot Riegler, Andreas Geiger
-
-
O-CNN (Microsoft)
yet another "sparce" octree with 256^3 as as max resolution, haha
-
http://wang-ps.github.io/O-CNN_files/CNN3D.pdf O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis
- https://github.com/Microsoft/O-CNN
-
-
PointNet (Leonidas J. Guibas and his PhD students)
fixed number of points (1024, haha)
-
https://arxiv.org/abs/1612.00593 PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
- https://arxiv.org/abs/1706.02413 PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
- http://stanford.edu/~rqi/pointnet/
- https://www.youtube.com/watch?v=Cge-hot0Oc0 PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
-
-
FPNN (Leonidas J. Guibas and his PhD students)
good for global descriptor, bad for fine detailes recognition, loses in accuracy
-
https://arxiv.org/abs/1605.06240 FPNN: Field Probing Neural Networks for 3D Data
-
-
SnapNet
2d projections, requires custom training set, haha
-
https://sites.google.com/view/boulch/publications/2017_3dor_pointclouds Unstructured point cloud semantic labeling using deep segmentation networks
- https://github.com/aboulch/snapnet
-
-
Why I am excited despite all these haha?
- TTTL to the rescue
- TTTL as drop-in replacement of octree grids will require
- GPU implementation of TTTL
- adopting TTTL to store truncated signed distance fields
- adopting TTTL to store occupancy grid maps
- I'm sure there is a way to apply DL to data in TTTL representation directly
-
Octree Generating Networks (Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox)
recoursively refines occupancy grid. yet another "sparce" octree with 512^3 as as "High-resolution 3D Outputs", haha
-
https://arxiv.org/abs/1703.09438 Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs
-
- https://lmb.informatik.uni-freiburg.de/people/tatarchm/ogn/
- https://lmb.informatik.uni-freiburg.de/people/dosovits/publication_all.html
-
-
OctNetFusion (Gernot Riegler, Ali Osman Ulusoy, Horst Bischof, Andreas Geiger)
denoising/shape completion. is based on octnet => 256^3 is the max max resolution, haha
-
3DMatch
Learning point descriptor stable to affine transformations => better cloud-to-cloud registration
-
- http://3dmatch.cs.princeton.edu/
- https://www.youtube.com/watch?v=qNVZl7bCjsU 3DMatch: Learning Local Geometric Descriptors From RGB-D Reconstructions
-
-
DSAC
differentiable implementation of ransac as back-propagation friendly operation in DL => huge improvement in accuracy
-
https://www.youtube.com/watch?v=YWSGq7CUSRA (12min) presentation
- https://arxiv.org/abs/1611.05705 DSAC - Differentiable RANSAC for Camera Localization
- http://cvlab-dresden.de/research/scene-understanding/pose-estimation/#DSAC
-
-
http://sscnet.cs.princeton.edu/ Semantic Scene Completion From a Single Depth Image
in-fill, octree limitations on resolution
-
- https://arxiv.org/abs/1611.08974
- https://www.youtube.com/watch?v=Aq7hLLIz5a0 (15min) CVPR presentation
-
-
http://3d-r2n2.stanford.edu/ 3D-R2N2: 3D Recurrent Reconstruction Neural Network
-
- https://arxiv.org/abs/1708.01648 3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks
-
-
SurfaceNet (Yebin Liu et al)
-
https://arxiv.org/abs/1708.01749 SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis
- http://www.liuyebin.com/
-
-
Deep Projective 3D Semantic Segmentation
like SnapNet, learns from 2d forward and pack projections
-
https://arxiv.org/pdf/1705.03428.pdf "Deep Projective 3D Semantic Segmentation" Felix Jaremo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg
- https://www.youtube.com/watch?v=H94ASpItkLI (14min)
-
-
ML on graphs and manifolds
Cool. I'll make a separate presentation. Let's skip it for now.
- http://geometricdeeplearning.com/
- SyncSpecCNN
- https://arxiv.org/abs/1612.00606 SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation
- https://github.com/ericyi/SyncSpecCNN
- https://www.youtube.com/watch?v=ClvLCXQ9Ipw (4min)
- http://3ddl.stanford.edu/CVPR17_Tutorial_MVCNN_3DCNN_v3.pdf slides
- https://liu.diva-portal.org/smash/get/diva2:1091059/FULLTEXT01.pdf "Semantic Segmentation of Point Clouds using Deep Learning" by Patrik Tosteberg. Master of Science Thesis. Gives an overview.
- https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-2-W3/339/2017/isprs-archives-XLII-2-W3-339-2017.pdf "A REVIEW OF POINT CLOUDS SEGMENTATION AND CLASSIFICATION ALGORITHMS" 1–3 March 2017
- http://www.semantic3d.net/ Large-Scale Point Cloud Classification Benchmark
-
Pyramid Scene Parsing Network
-
https://www.youtube.com/watch?v=aXdigiSDIak Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes (results t=10:00)
-
https://www.youtube.com/watch?v=NeHRthS32Fs (4min) Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation | Spotlight
-
https://github.com/msracver/FCIS
- https://arxiv.org/abs/1611.07709 Fully Convolutional Instance-aware Semantic Segmentation
-
https://arxiv.org/abs/1708.02551 Semantic Instance Segmentation with a Discriminative Loss Function
-
http://davischallenge.org/challenge2017/publications.html DAVIS Challenge on Video Object Segmentation 2017. CVPR Workshop. Challenge Publications.
-
http://www.vision.ee.ethz.ch/~cvlsegmentation/ segmentation meta project "Image and Video Segmentation @ ETHZ CVL. From evaluation to State-of-the-Art Results"
- http://www.vision.ee.ethz.ch/~cvlsegmentation/osvos/ OSVOS: One-Shot Video Object Segmentation
-
SimGAN
- https://www.youtube.com/watch?v=vDW8qvsBtmQ (14min) Learning From Simulated and Unsupervised Images Through Adversarial Training (published by Apple, won best paper award)
-
https://www.youtube.com/watch?v=VhsTrWPvjcA (11min) Unsupervised Pixel-Level Domain Adaptation With Generative Adversarial Networks
-
https://www.youtube.com/watch?v=LV1slx9Ob7U (15min) Inverse Compositional Spatial Transformer Networks
-
https://www.youtube.com/watch?v=RDTcV9Zx1C4 (11min) Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
-
https://www.youtube.com/watch?v=KMLSXxtguFE (4min) Local Binary Convolutional Neural Networks (Spotlight)
-
netdissect
Viewer for "Interpretable units". "Interpretable units are interesting because they hint that deep networks may not be completely opaque black boxes".
-
Black box optimization
- Google Vizier
- Facebook ActiVis (more ML specific)
- https://arxiv.org/abs/1704.01942 ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models
- https://code.facebook.com/posts/1072626246134461/introducing-fblearner-flow-facebook-s-ai-backbone/
- https://www.nextplatform.com/2017/04/12/look-facebooks-interactive-neural-network-visualization-system/
- https://arxiv.org/abs/1704.08792 DeepArchitect: Automatically Designing and Training Deep Architectures. https://github.com/negrinho/deep_architect
-
https://arxiv.org/abs/1705.07115 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics (Alex Kendall, Yarin Gal, Roberto Cipolla)
-
Neural Face Editing With Intrinsic Image Disentangling (disentangled representation of intrinsic face properties, including shape (i.e. normals), albedo, and lighting, and an alpha matte)
-
Real-time Geometry, Albedo and Motion Reconstruction Using a Single RGBD Camera
-
Deep Outdoor Illumination Estimation (+ camera parameters estimation)
-
https://arxiv.org/abs/1704.00090 Learning to Predict Indoor Illumination from a Single Image.
-
Physically-Based Rendering for Indoor Scene Understanding Using CNNs.
-
-
(my humble proposal)
-
https://junyanz.github.io/CycleGAN/ Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (weakly supervised picture to picture learning) deserves separate presentation!
- train it on real and ambient lighte image datasets and see if it works
-
-
https://www.youtube.com/watch?v=z_NJxbkQnBU (2min) CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction
-
https://www.youtube.com/watch?v=HWu39YkGKvI (14min) Unsupervised Learning of Depth and Ego-Motion From Video
-
https://www.youtube.com/watch?v=JSzUdVBmQP4 (2:30min) FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
-
https://www.youtube.com/watch?v=hkj3sVaC6jg (3min) Fast Multi-frame Stereo Scene Flow with Motion Segmentation
-
http://graphics.stanford.edu/projects/bundlefusion/ BundleFusion: Real-time Globally Consistent 3D Reconstruction
-
https://www.youtube.com/watch?v=zLzhsyeAie4 Bundlefusion: 3D Scenes from 2D Videos | Two Minute Papers #81
-
-
https://www.youtube.com/watch?v=h0T_XtDwmEc (4min) KillingFusion - Non-Rigid 3D Reconstruction Without Correspondences | Spotlight
-
https://www.youtube.com/watch?v=CwCqcd5ibHI (4min) UltraStereo - Efficient Learning-Based Matching for Active Stereo Systems | Spotlight
-
https://www.youtube.com/watch?v=lk_yX-O_Y5c (4min) VolumeDeform: Real-time Volumetric Non-rigid Reconstruction (2016) (Matthias Niessner)
-
- https://www.youtube.com/watch?v=Olx4OnoZWQQ (4min) ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes (CVPR 2017 Spotlight)
-
https://arxiv.org/abs/1705.04300 Challenges in Monocular Visual Odometry: Photometric Calibration, Motion Bias and Rolling Shutter Effect
-
https://www.youtube.com/watch?v=tni56485tNs (4 hour tutorial) Tutorial : Large-Scale Visual Place Recognition and Image-Based Localization
-
https://arxiv.org/abs/1612.01079 End-to-end Learning of Driving Models from Large-scale Video Datasets
- https://www.youtube.com/watch?v=jxlNfUzbGAY (14min) End-To-End Learning of Driving Models From Large-Scale Video Datasets (+large dataset for autonomous driving with imu, gps etc)
- https://github.com/gy20073/BDD_Driving_Model/
-
https://www.youtube.com/watch?v=RzdPkZHv62U What's in a Question - Using Visual Questions as a Form of Supervision | Spotlight
-
https://www.youtube.com/watch?v=3ZhQKmSbNug Deep Learning on Lie Groups for Skeleton-Based Action Recognition | Spotlight
-
- https://arxiv.org/abs/1609.04802
- https://www.youtube.com/watch?v=BXIR_SVCrsE (12min) Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
-
https://www.youtube.com/watch?v=Hcz-h_yut84 (14min) Unrolling the Shutter: CNN to Correct Motion Distortions
-
https://www.youtube.com/watch?v=yd4j8ue521g (14min) Universal Adversarial Perturbations
-
https://blog.kitware.com/kitware-maps-development-of-toolkit-for-image-and-video-analysis/
-
https://arxiv.org/abs/1708.02977 Hierarchically-Attentive RNN for Album Summarization and Storytelling
-
https://arxiv.org/abs/1708.00838 An End-to-End Compression Framework Based on Convolutional Neural Networks
-
https://arxiv.org/abs/1604.03505 Counting Everyday Objects in Everyday Scenes
-
http://openaccess.thecvf.com/content_cvpr_2017/papers/Fu_Look_Closer_to_CVPR_2017_paper.pdf Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition
-
https://arxiv.org/abs/1509.07831 Deep Multimodal Embedding: Manipulating Novel Objects with Point-clouds, Language and Trajectories
-
https://www.youtube.com/watch?v=pW6nZXeWlGM (4.5 min) Realtime Multi-Person 2D Human Pose Estimation
-
https://www.quora.com/What-are-the-most-interesting-CVPR-2017-papers-and-why
-
https://people.eecs.berkeley.edu/~chaene/cvpr17tut/ CVPR 2017 Tutorial Geometric and Semantic 3D Reconstruction
-
https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw/playlists all presentation and workshop videos
-
http://openaccess.thecvf.com/CVPR2017.py all papers
-
http://davidstutz.de/3d-convolutional-neural-networks-a-reading-list/
- https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap
- https://github.com/terryum/awesome-deep-learning-papers
- https://github.com/shawnyuen/GANsPaperCollection
- https://github.com/zhangqianhui/AdversarialNetsPapers/blob/master/README.md
- http://yerevann.com/a-guide-to-deep-learning/
- https://deeplearn.org/ papers
- http://davidstutz.de