Please generate an extended abstract for the following text which was scraped from a presentation slide deck by expanding upon the topics presented based on your knowledge of them

Slide 1: James Ecker, Benjamin Kelley, Danette Allen

AIAA SciTech, January 2021

Slide 1: Synthetic Data Generation for 3D Mesh Prediction and Spatial Reasoning During Multi-Agent Robotic Missions

Slide 2: Computer Vision During In-Space Assembly

Slide 2: Difficulties

Illumination

Angle

Orientation

Movement

Constraints

Energy

Mass

Slide 2: 2

Slide 2: January 2021

Slide 2: SciTech 2021

Slide 3: Computer Vision During In-Space Assembly

Slide 3: Difficulties

Illumination

Angle

Orientation

Movement

Constraints

Energy

Mass

Slide 3: 3

Slide 3: January 2021

Slide 3: SciTech 2021

Slide 3: High Degree of Variation Requires More Information

Slide 4: Computer Vision During In-Space Assembly

Slide 4: Difficulties

Illumination

Angle

Orientation

Movement

Constraints

Energy

Mass

Slide 4: 4

Slide 4: January 2021

Slide 4: SciTech 2021

Slide 4: High Degree of Variation Requires More Information

Slide 4: More Sensors = More Information = More Energy Use & More Mass

Slide 5: Mitigating the Constraints

Slide 5: 5

Slide 5: January 2021

Slide 5: SciTech 2021

Slide 5: High Degree of Variation Requires More Information

Slide 5: More Sensors =

More Information =

More Energy Use & More Mass

Slide 6: Mitigating the Constraints

Slide 6: 6

Slide 6: January 2021

Slide 6: SciTech 2021

Slide 6: High Degree of Variation Requires More Information

Slide 6: More Sensors =

More Information =

More Energy Use & More Mass

Slide 6:

Slide 6: Maximize

Slide 7: Mitigating the Constraints

Slide 7: 7

Slide 7: January 2021

Slide 7: SciTech 2021

Slide 7: High Degree of Variation Requires More Information

Slide 7: More Sensors =

More Information =

More Energy Use & More Mass

Slide 7:

Slide 7:

Slide 7:

Slide 7: Maximize

Slide 7: Minimize

Slide 8: Mitigating the Constraints

Slide 8: 8

Slide 8: January 2021

Slide 8: SciTech 2021

Slide 8: High Degree of Variation Requires More Information

Slide 8: More Sensors =

More Information =

More Energy Use & More Mass

Slide 8:

Slide 8:

Slide 8:

Slide 8: Maximize

Slide 8: Minimize

Slide 8: Predict 3D Mesh from Single View

Slide 9: Mitigating the Constraints

Slide 9: 9

Slide 9: January 2021

Slide 9: SciTech 2021

Slide 9: High Degree of Variation Requires More Information

Slide 9: More Sensors =

More Information =

More Energy Use & More Mass

Slide 9:

Slide 9:

Slide 9:

Slide 9: Maximize

Slide 9: Minimize

Slide 9: Single Camera

Slide 9: Predict 3D Mesh from Single View

Slide 10: Related Work

Slide 10: [1] He et al – Mask R-CNN

[2] Gkioxari et al – Mesh R-CNN

[3] Sonawani et al - Assistive Relative Pose Estimation for On-orbit Assembly using Convolutional Neural Networks

[4] Pal et al - 3D Point Cloud Generation from 2D Depth Camera Images using Successive Triangulation

[5] Valsesia et al - Learning Localized Representations of Point Clouds with Graph-Convolutional Generative Adversarial Networks

[6] Ramasinghe et al - Spectral-GANS for High Resolution 3D Point Cloud Generation

Slide 10: 10

Slide 10: January 2021

Slide 10: SciTech 2021

Slide 11: Synthesizing Data

Slide 11: 3D model of objects projected over 3D background in Blender

Can be extended to full simulation environments

ROS, Gazebo, Mujoco, etc

Variations in observations

Orientation of camera and light source

Relative orientation between objects

Number of objects in scene

Background

Sim to Reality Problem

Domain Randomization

Slide 11: 11

Slide 11: January 2021

Slide 11: SciTech 2021

Slide 12: Metadata for Training Mask R-CNN

Slide 12: 12

Slide 12: January 2021

Slide 12: SciTech 2021

Slide 13: Metadata for Training Mesh R-CNN

Slide 13: 13

Slide 13: January 2021

Slide 13: SciTech 2021

Slide 14: Building the Dataset

Slide 14: Generate a parent pool of data

20,000 image/metadata pairs

For each sample generated

Extract/Calculate ground truth to build metadata

Configure metadata to conform to model

Sample training set from parent pool

Sample –n (default: 1500) instances from parent pool randomly

Split into training and validation sets

--training-split (default:0.75) / 1 - --training-split (default: 1 – 0.75 = 0.25)

Merge all training/validation set metadata into one JSON, respectively

Slide 14: January 2021

Slide 14: SciTech 2021

Slide 14: 14

Slide 15: Mask Prediction – Mask R-CNN

Slide 15: Backbone

Resnet-50-FPN

Region Proposal Network

Applies a sliding window over a convolutional feature map to generate proposed bounding boxes for likely objects

Proposed regions are aligned to the feature map and sent to fully connected layer to classify a bounding box (regressor) and the object itself (soft max)

Mask Prediction

Generate a binary mask for pixels in proposed region using the aligned features

Slide 15: January 2021

Slide 15: SciTech 2021

Slide 15: 15

Slide 15: The Mask R-CNN Architecture [1]

Slide 16: Mask Prediction – Mask R-CNN Advantages

Slide 16: Transfer Learning

Use a pretrained network (Resnet) to initialize weights instead of training from scratch (random initial weights)

Lowers training time and generalization error

Region of Interest Alignment

Each region of interest is fed into a fixed-size input fully connected (FC) layer

Need to account for all pixels in ROI while conforming to fixed input size of FC

Bilinear Interpolation instead of Quantization

Slide 16: January 2021

Slide 16: SciTech 2021

Slide 16: 16

Slide 16: The Mask R-CNN Architecture [1]

Slide 17: Mask Prediction – Mask R-CNN Advantages

Slide 17: Transfer Learning

Use a pretrained network (Resnet) to initialize weights instead of training from scratch (random initial weights)

Lowers training time and generalization error

Region of Interest Alignment

Each region of interest is fed into a fixed-size input fully connected (FC) layer

Need to account for all pixels in ROI while conforming to fixed input size of FC

Bilinear Interpolation instead of Quantization

Slide 17: January 2021

Slide 17: SciTech 2021

Slide 17: 17

Slide 17: Masks provide a measure of visual explainability

Slide 17: The Mask R-CNN Architecture [1]

Slide 18: Mask Prediction – Mask R-CNN Performance

Slide 18: 98% instance segmentation accuracy

object

bounding box

mask

Slide 18: January 2021

Slide 18: SciTech 2021

Slide 18: 18

Slide 19: Mesh Prediction

Slide 19: Extends Mask R-CNN

Mesh Predictor

Voxel Prediction

Mesh Refinement

Slide 19: January 2021

Slide 19: SciTech 2021

Slide 19: 19

Slide 19: The Mesh R-CNN Architecture [2]

Slide 20: Mesh Prediction

Slide 20: Extends Mask R-CNN

Mesh Predictor

Voxel Prediction

Predicts a voxel occupancy grid

Cubify function binarizes voxel occupancy probabilities according to a threshold and generates a cuboid triangular mesh for each likely voxel

Mesh Refinement

Slide 20: January 2021

Slide 20: SciTech 2021

Slide 20: 20

Slide 20:

Slide 20:

Slide 20:

Slide 20:

Slide 20:

Slide 20: The Mesh R-CNN Architecture Voxel Branch[2]

Slide 21: Mesh Prediction

Slide 21: Extends Mask R-CNN

Mesh Predictor

Voxel Prediction

Predicts a voxel occupancy grid

Cubify function binarizes voxel occupancy probabilities according to a threshold and generates a cuboid triangular mesh for each likely voxel

Mesh Refinement

2 passes

Vertex alignment

Graph convolution

Vertex refinement

Slide 21: January 2021

Slide 21: SciTech 2021

Slide 21: 21

Slide 21:

Slide 21:

Slide 21:

Slide 21:

Slide 21:

Slide 21: The Mesh R-CNN Architecture Mesh Refinement Branch[2]

Slide 22: Metrics

Slide 22: January 2021

Slide 22: SciTech 2021

Slide 22: 22

Slide 22:

Slide 22:

Slide 22:

Slide 23: Results

Slide 23:

Ours (Custom Synthetic Data)

Chamfer (lower is better)

0.621

F1 (higher is better)

47.51

Theirs (Mesh R-CNN trained on Pix3D)

Chamfer (lower is better)

0.306

F1 (higher is better)

74.84

Slide 23: January 2021

Slide 23: SciTech 2021

Slide 23: 23

Slide 23:

Slide 24: Results

Slide 24:

Ours (Custom Synthetic Data)

Chamfer (lower is better)

0.621

F1 (higher is better)

47.51

Theirs (Pix3D)

Chamfer (lower is better)

0.306

F1 (higher is better)

74.84

Slide 24: January 2021

Slide 24: SciTech 2021

Slide 24: 24

Slide 24:

Slide 24: 2 Quadro 6000 RTX GPUs

Slide 24: 8 Tesla V100 GPUS

Slide 24: Requires hyperparameter tuning specific to hardware configuration

Slide 25: Conclusion

Slide 25: Generated a synthetic dataset capable of training state of the art 2D mask and 3D mesh prediction models

Can train each model end-to-end from no data to trained model

Future work

Hyperparameter tuning

Further domain randomization

Randomize object’s rendered skin

Extending Mesh R-CNN to use a Generative Adversarial Network to generate point clouds instead of voxel model

Higher resolution 3D mesh prediction

Slide 25: January 2021

Slide 25: SciTech 2021

Slide 25: 25

Slide 25:

Slide 26: References

Slide 26: He, K., Gkioxari, G., Dollár, P., and Girshick, R., “Mask R-CNN,”2017 IEEE International Conference on Computer Vision(ICCV), 2017, pp. 2980–2988. https://doi.org/10.1109/ICCV.2017.322.

Gkioxari, G., Johnson, J., and Malik, J., “Mesh R-CNN,”2019 IEEE/CVF International Conference on Computer Vision(ICCV), 2019, pp. 9784–9794. https://doi.org/10.1109/ICCV.2019.00988.

Sonawani, S. D., Alimo, R., Detry, R., Jeong, D., Hess, A., and Amor, H. B., “Assistive Relative Pose Estimation for On- orbitAssembly using Convolutional Neural Networks,”ArXiv, Vol. abs/2001.10673, 2020.

Pal, B., Khaiyum, S., and Kumaraswamy, Y. S., “3D point cloud generation from 2D depth camera images using successivetriangulation,”2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 2017, pp. 129–133.https://doi.org/10.1109/ICIMIA.2017.7975586.

Valsesia, D., Fracastoro, G., and Magli, E., “Learning Localized Representations of Point Clouds with Graph- ConvolutionalGenerative Adversarial Networks,”IEEE Transactions on Multimedia, 2019.

Ramasinghe, S., Khan, S. H., Barnes, N., and Gould, S., “Spectral-GANs for High-Resolution 3D Point-cloud Generation,”CoRR, Vol. abs/1912.01800, 2019. URL http://arxiv.org/abs/1912.01800.

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., “Domain Randomization for Transferring Deep NeuralNetworks from Simulation to the Real World,”CoRR, Vol. abs/1703.06907, 2017. URL http://arxiv.org/abs/1703.06907.

He, K., Zhang, X., Ren, S., and Sun, J., “Deep Residual Learning for Image Recognition,”CoRR, Vol. abs/1512.03385, 2015.URL http://arxiv.org/abs/1512.03385.

Slide 26: January 2021

Slide 26: SciTech 2021

Slide 26: 26

Slide 26: