jim-ecker/ALICE_Scitech_gen_text.md

## ALICE_Scitech_gen_text.md

      
    Raw
  

              ALICE_Scitech_gen_text.md
            
          
    Slide 1: James Ecker, Benjamin Kelley, Danette Allen
AIAA SciTech, January 2021
Slide 1: Synthetic Data Generation for 3D Mesh Prediction and Spatial Reasoning During Multi-Agent Robotic Missions
Slide 2: Computer Vision During In-Space Assembly
Slide 2: Difficulties
Illumination
Angle
Orientation
Movement
Constraints
Energy
Mass
Slide 2: 2
Slide 2: January 2021
Slide 2: SciTech 2021
Slide 3: Computer Vision During In-Space Assembly
Slide 3: Difficulties
Illumination
Angle
Orientation
Movement
Constraints
Energy
Mass
Slide 3: 3
Slide 3: January 2021
Slide 3: SciTech 2021
Slide 3: High Degree of Variation Requires More Information
Slide 4: Computer Vision During In-Space Assembly
Slide 4: Difficulties
Illumination
Angle
Orientation
Movement
Constraints
Energy
Mass
Slide 4: 4
Slide 4: January 2021
Slide 4: SciTech 2021
Slide 4: High Degree of Variation Requires More Information
Slide 4: More Sensors = More Information = More Energy Use & More Mass
Slide 5: Mitigating the Constraints
Slide 5: 5
Slide 5: January 2021
Slide 5: SciTech 2021
Slide 5: High Degree of Variation Requires More Information
Slide 5: More Sensors =
More Information =
More Energy Use & More Mass
Slide 6: Mitigating the Constraints
Slide 6: 6
Slide 6: January 2021
Slide 6: SciTech 2021
Slide 6: High Degree of Variation Requires More Information
Slide 6: More Sensors =
More Information =
More Energy Use & More Mass
Slide 6:
Slide 6: Maximize
Slide 7: Mitigating the Constraints
Slide 7: 7
Slide 7: January 2021
Slide 7: SciTech 2021
Slide 7: High Degree of Variation Requires More Information
Slide 7: More Sensors =
More Information =
More Energy Use & More Mass
Slide 7:
Slide 7:
Slide 7:
Slide 7: Maximize
Slide 7: Minimize
Slide 8: Mitigating the Constraints
Slide 8: 8
Slide 8: January 2021
Slide 8: SciTech 2021
Slide 8: High Degree of Variation Requires More Information
Slide 8: More Sensors =
More Information =
More Energy Use & More Mass
Slide 8:
Slide 8:
Slide 8:
Slide 8: Maximize
Slide 8: Minimize
Slide 8: Predict 3D Mesh from Single View
Slide 9: Mitigating the Constraints
Slide 9: 9
Slide 9: January 2021
Slide 9: SciTech 2021
Slide 9: High Degree of Variation Requires More Information
Slide 9: More Sensors =
More Information =
More Energy Use & More Mass
Slide 9:
Slide 9:
Slide 9:
Slide 9: Maximize
Slide 9: Minimize
Slide 9: Single Camera
Slide 9: Predict 3D Mesh from Single View
Slide 10: Related Work
Slide 10: [1] He et al – Mask R-CNN
[2] Gkioxari et al – Mesh R-CNN
[3] Sonawani et al - Assistive Relative Pose Estimation for On-orbit Assembly using Convolutional Neural Networks
[4] Pal et al - 3D Point Cloud Generation from 2D Depth Camera Images using Successive Triangulation
[5] Valsesia et al - Learning Localized Representations of Point Clouds with Graph-Convolutional Generative Adversarial Networks
[6] Ramasinghe et al - Spectral-GANS for High Resolution 3D Point Cloud Generation
Slide 10: 10
Slide 10: January 2021
Slide 10: SciTech 2021
Slide 11: Synthesizing Data
Slide 11: 3D model of objects projected over 3D background in Blender
Can be extended to full simulation environments
ROS, Gazebo, Mujoco, etc
Variations in observations
Orientation of camera and light source
Relative orientation between objects
Number of objects in scene
Background
Sim to Reality Problem
Domain Randomization
Slide 11: 11
Slide 11: January 2021
Slide 11: SciTech 2021
Slide 12: Metadata for Training Mask R-CNN
Slide 12: 12
Slide 12: January 2021
Slide 12: SciTech 2021
Slide 13: Metadata for Training Mesh R-CNN
Slide 13: 13
Slide 13: January 2021
Slide 13: SciTech 2021
Slide 14: Building the Dataset
Slide 14: Generate a parent pool of data
20,000 image/metadata pairs
For each sample generated
Extract/Calculate ground truth to build metadata
Configure metadata to conform to model
Sample training set from parent pool
Sample –n (default: 1500) instances from parent pool randomly
Split into training and validation sets
--training-split (default:0.75) / 1 - --training-split (default: 1 – 0.75 = 0.25)
Merge all training/validation set metadata into one JSON, respectively
Slide 14: January 2021
Slide 14: SciTech 2021
Slide 14: 14
Slide 15: Mask Prediction – Mask R-CNN
Slide 15: Backbone
Resnet-50-FPN
Region Proposal Network
Applies a sliding window over a convolutional feature map to generate proposed bounding boxes for likely objects
Proposed regions are aligned to the feature map and sent to fully connected layer to classify a bounding box (regressor) and the object itself (soft max)
Mask Prediction
Generate a binary mask for pixels in proposed region using the aligned features
Slide 15: January 2021
Slide 15: SciTech 2021
Slide 15: 15
Slide 15: The Mask R-CNN Architecture [1]
Slide 16: Mask Prediction – Mask R-CNN Advantages
Slide 16: Transfer Learning
Use a pretrained network (Resnet) to initialize weights instead of training from scratch (random initial weights)
Lowers training time and generalization error
Region of Interest Alignment
Each region of interest is fed into a fixed-size input fully connected (FC) layer
Need to account for all pixels in ROI while conforming to fixed input size of FC
Bilinear Interpolation instead of Quantization
Slide 16: January 2021
Slide 16: SciTech 2021
Slide 16: 16
Slide 16: The Mask R-CNN Architecture [1]
Slide 17: Mask Prediction – Mask R-CNN Advantages
Slide 17: Transfer Learning
Use a pretrained network (Resnet) to initialize weights instead of training from scratch (random initial weights)
Lowers training time and generalization error
Region of Interest Alignment
Each region of interest is fed into a fixed-size input fully connected (FC) layer
Need to account for all pixels in ROI while conforming to fixed input size of FC
Bilinear Interpolation instead of Quantization
Slide 17: January 2021
Slide 17: SciTech 2021
Slide 17: 17
Slide 17: Masks provide a measure of visual explainability
Slide 17: The Mask R-CNN Architecture [1]
Slide 18: Mask Prediction – Mask R-CNN Performance
Slide 18: 98% instance segmentation accuracy
object
bounding box
mask
Slide 18: January 2021
Slide 18: SciTech 2021
Slide 18: 18
Slide 19: Mesh Prediction
Slide 19: Extends Mask R-CNN
Mesh Predictor
Voxel Prediction
Mesh Refinement
Slide 19: January 2021
Slide 19: SciTech 2021
Slide 19: 19
Slide 19: The Mesh R-CNN Architecture [2]
Slide 20: Mesh Prediction
Slide 20: Extends Mask R-CNN
Mesh Predictor
Voxel Prediction
Predicts a voxel occupancy grid
Cubify function binarizes voxel occupancy probabilities according to a threshold and generates a cuboid triangular mesh for each likely voxel
Mesh Refinement
Slide 20: January 2021
Slide 20: SciTech 2021
Slide 20: 20
Slide 20:
Slide 20:
Slide 20:
Slide 20:
Slide 20:
Slide 20: The Mesh R-CNN Architecture Voxel Branch[2]
Slide 21: Mesh Prediction
Slide 21: Extends Mask R-CNN
Mesh Predictor
Voxel Prediction
Predicts a voxel occupancy grid
Cubify function binarizes voxel occupancy probabilities according to a threshold and generates a cuboid triangular mesh for each likely voxel
Mesh Refinement
2 passes
Vertex alignment
Graph convolution
Vertex refinement
Slide 21: January 2021
Slide 21: SciTech 2021
Slide 21: 21
Slide 21:
Slide 21:
Slide 21:
Slide 21:
Slide 21:
Slide 21: The Mesh R-CNN Architecture Mesh Refinement Branch[2]
Slide 22: Metrics
Slide 22: January 2021
Slide 22: SciTech 2021
Slide 22: 22
Slide 22:
Slide 22:
Slide 22:
Slide 23: Results
Slide 23:
Ours (Custom Synthetic Data)
Chamfer (lower is better)
0.621
F1 (higher is better)
47.51
Theirs (Mesh R-CNN trained on Pix3D)
Chamfer (lower is better)
0.306
F1 (higher is better)
74.84
Slide 23: January 2021
Slide 23: SciTech 2021
Slide 23: 23
Slide 23:
Slide 24: Results
Slide 24:
Ours (Custom Synthetic Data)
Chamfer (lower is better)
0.621
F1 (higher is better)
47.51
Theirs (Pix3D)
Chamfer (lower is better)
0.306
F1 (higher is better)
74.84
Slide 24: January 2021
Slide 24: SciTech 2021
Slide 24: 24
Slide 24:
Slide 24: 2 Quadro 6000 RTX GPUs
Slide 24: 8 Tesla V100 GPUS
Slide 24: Requires hyperparameter tuning specific to hardware configuration
Slide 25: Conclusion
Slide 25: Generated a synthetic dataset capable of training state of the art 2D mask and 3D mesh prediction models
Can train each model end-to-end from no data to trained model
Future work
Hyperparameter tuning
Further domain randomization
Randomize object’s rendered skin
Extending Mesh R-CNN to use a Generative Adversarial Network to generate point clouds instead of voxel model
Higher resolution 3D mesh prediction
Slide 25: January 2021
Slide 25: SciTech 2021
Slide 25: 25
Slide 25:
Slide 26: References
Slide 26: He, K., Gkioxari, G., Dollár, P., and Girshick, R., “Mask R-CNN,”2017 IEEE International Conference on Computer Vision(ICCV), 2017, pp. 2980–2988. https://doi.org/10.1109/ICCV.2017.322.
Gkioxari, G., Johnson, J., and Malik, J., “Mesh R-CNN,”2019 IEEE/CVF International Conference on Computer Vision(ICCV), 2019, pp. 9784–9794. https://doi.org/10.1109/ICCV.2019.00988.
Sonawani, S. D., Alimo, R., Detry, R., Jeong, D., Hess, A., and Amor, H. B., “Assistive Relative Pose Estimation for On- orbitAssembly using Convolutional Neural Networks,”ArXiv, Vol. abs/2001.10673, 2020.
Pal, B., Khaiyum, S., and Kumaraswamy, Y. S., “3D point cloud generation from 2D depth camera images using successivetriangulation,”2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 2017, pp. 129–133.https://doi.org/10.1109/ICIMIA.2017.7975586.
Valsesia, D., Fracastoro, G., and Magli, E., “Learning Localized Representations of Point Clouds with Graph- ConvolutionalGenerative Adversarial Networks,”IEEE Transactions on Multimedia, 2019.
Ramasinghe, S., Khan, S. H., Barnes, N., and Gould, S., “Spectral-GANs for High-Resolution 3D Point-cloud Generation,”CoRR, Vol. abs/1912.01800, 2019. URL http://arxiv.org/abs/1912.01800.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., “Domain Randomization for Transferring Deep NeuralNetworks from Simulation to the Real World,”CoRR, Vol. abs/1703.06907, 2017. URL http://arxiv.org/abs/1703.06907.
He, K., Zhang, X., Ren, S., and Sun, J., “Deep Residual Learning for Image Recognition,”CoRR, Vol. abs/1512.03385, 2015.URL http://arxiv.org/abs/1512.03385.
Slide 26: January 2021
Slide 26: SciTech 2021
Slide 26: 26
Slide 26: