Please generate an extended abstract for the following text which was scraped from a presentation slide deck by expanding upon the topics presented based on your knowledge of them Slide 1: James Ecker, Benjamin Kelley, Danette Allen AIAA SciTech, January 2021 Slide 1: Synthetic Data Generation for 3D Mesh Prediction and Spatial Reasoning During Multi-Agent Robotic Missions Slide 2: Computer Vision During In-Space Assembly Slide 2: Difficulties Illumination Angle Orientation Movement Constraints Energy Mass Slide 2: 2 Slide 2: January 2021 Slide 2: SciTech 2021 Slide 3: Computer Vision During In-Space Assembly Slide 3: Difficulties Illumination Angle Orientation Movement Constraints Energy Mass Slide 3: 3 Slide 3: January 2021 Slide 3: SciTech 2021 Slide 3: High Degree of Variation Requires More Information Slide 4: Computer Vision During In-Space Assembly Slide 4: Difficulties Illumination Angle Orientation Movement Constraints Energy Mass Slide 4: 4 Slide 4: January 2021 Slide 4: SciTech 2021 Slide 4: High Degree of Variation Requires More Information Slide 4: More Sensors = More Information = More Energy Use & More Mass Slide 5: Mitigating the Constraints Slide 5: 5 Slide 5: January 2021 Slide 5: SciTech 2021 Slide 5: High Degree of Variation Requires More Information Slide 5: More Sensors = More Information = More Energy Use & More Mass Slide 6: Mitigating the Constraints Slide 6: 6 Slide 6: January 2021 Slide 6: SciTech 2021 Slide 6: High Degree of Variation Requires More Information Slide 6: More Sensors = More Information = More Energy Use & More Mass Slide 6: Slide 6: Maximize Slide 7: Mitigating the Constraints Slide 7: 7 Slide 7: January 2021 Slide 7: SciTech 2021 Slide 7: High Degree of Variation Requires More Information Slide 7: More Sensors = More Information = More Energy Use & More Mass Slide 7: Slide 7: Slide 7: Slide 7: Maximize Slide 7: Minimize Slide 8: Mitigating the Constraints Slide 8: 8 Slide 8: January 2021 Slide 8: SciTech 2021 Slide 8: High Degree of Variation Requires More Information Slide 8: More Sensors = More Information = More Energy Use & More Mass Slide 8: Slide 8: Slide 8: Slide 8: Maximize Slide 8: Minimize Slide 8: Predict 3D Mesh from Single View Slide 9: Mitigating the Constraints Slide 9: 9 Slide 9: January 2021 Slide 9: SciTech 2021 Slide 9: High Degree of Variation Requires More Information Slide 9: More Sensors = More Information = More Energy Use & More Mass Slide 9: Slide 9: Slide 9: Slide 9: Maximize Slide 9: Minimize Slide 9: Single Camera Slide 9: Predict 3D Mesh from Single View Slide 10: Related Work Slide 10: [1] He et al – Mask R-CNN [2] Gkioxari et al – Mesh R-CNN [3] Sonawani et al - Assistive Relative Pose Estimation for On-orbit Assembly using Convolutional Neural Networks [4] Pal et al - 3D Point Cloud Generation from 2D Depth Camera Images using Successive Triangulation [5] Valsesia et al - Learning Localized Representations of Point Clouds with Graph-Convolutional Generative Adversarial Networks [6] Ramasinghe et al - Spectral-GANS for High Resolution 3D Point Cloud Generation Slide 10: 10 Slide 10: January 2021 Slide 10: SciTech 2021 Slide 11: Synthesizing Data Slide 11: 3D model of objects projected over 3D background in Blender Can be extended to full simulation environments ROS, Gazebo, Mujoco, etc Variations in observations Orientation of camera and light source Relative orientation between objects Number of objects in scene Background Sim to Reality Problem Domain Randomization Slide 11: 11 Slide 11: January 2021 Slide 11: SciTech 2021 Slide 12: Metadata for Training Mask R-CNN Slide 12: 12 Slide 12: January 2021 Slide 12: SciTech 2021 Slide 13: Metadata for Training Mesh R-CNN Slide 13: 13 Slide 13: January 2021 Slide 13: SciTech 2021 Slide 14: Building the Dataset Slide 14: Generate a parent pool of data 20,000 image/metadata pairs For each sample generated Extract/Calculate ground truth to build metadata Configure metadata to conform to model Sample training set from parent pool Sample –n (default: 1500) instances from parent pool randomly Split into training and validation sets --training-split (default:0.75) / 1 - --training-split (default: 1 – 0.75 = 0.25) Merge all training/validation set metadata into one JSON, respectively Slide 14: January 2021 Slide 14: SciTech 2021 Slide 14: 14 Slide 15: Mask Prediction – Mask R-CNN Slide 15: Backbone Resnet-50-FPN Region Proposal Network Applies a sliding window over a convolutional feature map to generate proposed bounding boxes for likely objects Proposed regions are aligned to the feature map and sent to fully connected layer to classify a bounding box (regressor) and the object itself (soft max) Mask Prediction Generate a binary mask for pixels in proposed region using the aligned features Slide 15: January 2021 Slide 15: SciTech 2021 Slide 15: 15 Slide 15: The Mask R-CNN Architecture [1] Slide 16: Mask Prediction – Mask R-CNN Advantages Slide 16: Transfer Learning Use a pretrained network (Resnet) to initialize weights instead of training from scratch (random initial weights) Lowers training time and generalization error Region of Interest Alignment Each region of interest is fed into a fixed-size input fully connected (FC) layer Need to account for all pixels in ROI while conforming to fixed input size of FC Bilinear Interpolation instead of Quantization Slide 16: January 2021 Slide 16: SciTech 2021 Slide 16: 16 Slide 16: The Mask R-CNN Architecture [1] Slide 17: Mask Prediction – Mask R-CNN Advantages Slide 17: Transfer Learning Use a pretrained network (Resnet) to initialize weights instead of training from scratch (random initial weights) Lowers training time and generalization error Region of Interest Alignment Each region of interest is fed into a fixed-size input fully connected (FC) layer Need to account for all pixels in ROI while conforming to fixed input size of FC Bilinear Interpolation instead of Quantization Slide 17: January 2021 Slide 17: SciTech 2021 Slide 17: 17 Slide 17: Masks provide a measure of visual explainability Slide 17: The Mask R-CNN Architecture [1] Slide 18: Mask Prediction – Mask R-CNN Performance Slide 18: 98% instance segmentation accuracy object bounding box mask Slide 18: January 2021 Slide 18: SciTech 2021 Slide 18: 18 Slide 19: Mesh Prediction Slide 19: Extends Mask R-CNN Mesh Predictor Voxel Prediction Mesh Refinement Slide 19: January 2021 Slide 19: SciTech 2021 Slide 19: 19 Slide 19: The Mesh R-CNN Architecture [2] Slide 20: Mesh Prediction Slide 20: Extends Mask R-CNN Mesh Predictor Voxel Prediction Predicts a voxel occupancy grid Cubify function binarizes voxel occupancy probabilities according to a threshold and generates a cuboid triangular mesh for each likely voxel Mesh Refinement Slide 20: January 2021 Slide 20: SciTech 2021 Slide 20: 20 Slide 20: Slide 20: Slide 20: Slide 20: Slide 20: Slide 20: The Mesh R-CNN Architecture Voxel Branch[2] Slide 21: Mesh Prediction Slide 21: Extends Mask R-CNN Mesh Predictor Voxel Prediction Predicts a voxel occupancy grid Cubify function binarizes voxel occupancy probabilities according to a threshold and generates a cuboid triangular mesh for each likely voxel Mesh Refinement 2 passes Vertex alignment Graph convolution Vertex refinement Slide 21: January 2021 Slide 21: SciTech 2021 Slide 21: 21 Slide 21: Slide 21: Slide 21: Slide 21: Slide 21: Slide 21: The Mesh R-CNN Architecture Mesh Refinement Branch[2] Slide 22: Metrics Slide 22: January 2021 Slide 22: SciTech 2021 Slide 22: 22 Slide 22: Slide 22: Slide 22: Slide 23: Results Slide 23: Ours (Custom Synthetic Data) Chamfer (lower is better) 0.621 F1 (higher is better) 47.51 Theirs (Mesh R-CNN trained on Pix3D) Chamfer (lower is better) 0.306 F1 (higher is better) 74.84 Slide 23: January 2021 Slide 23: SciTech 2021 Slide 23: 23 Slide 23: Slide 24: Results Slide 24: Ours (Custom Synthetic Data) Chamfer (lower is better) 0.621 F1 (higher is better) 47.51 Theirs (Pix3D) Chamfer (lower is better) 0.306 F1 (higher is better) 74.84 Slide 24: January 2021 Slide 24: SciTech 2021 Slide 24: 24 Slide 24: Slide 24: 2 Quadro 6000 RTX GPUs Slide 24: 8 Tesla V100 GPUS Slide 24: Requires hyperparameter tuning specific to hardware configuration Slide 25: Conclusion Slide 25: Generated a synthetic dataset capable of training state of the art 2D mask and 3D mesh prediction models Can train each model end-to-end from no data to trained model Future work Hyperparameter tuning Further domain randomization Randomize object’s rendered skin Extending Mesh R-CNN to use a Generative Adversarial Network to generate point clouds instead of voxel model Higher resolution 3D mesh prediction Slide 25: January 2021 Slide 25: SciTech 2021 Slide 25: 25 Slide 25: Slide 26: References Slide 26: He, K., Gkioxari, G., Dollár, P., and Girshick, R., “Mask R-CNN,”2017 IEEE International Conference on Computer Vision(ICCV), 2017, pp. 2980–2988. https://doi.org/10.1109/ICCV.2017.322. Gkioxari, G., Johnson, J., and Malik, J., “Mesh R-CNN,”2019 IEEE/CVF International Conference on Computer Vision(ICCV), 2019, pp. 9784–9794. https://doi.org/10.1109/ICCV.2019.00988. Sonawani, S. D., Alimo, R., Detry, R., Jeong, D., Hess, A., and Amor, H. B., “Assistive Relative Pose Estimation for On- orbitAssembly using Convolutional Neural Networks,”ArXiv, Vol. abs/2001.10673, 2020. Pal, B., Khaiyum, S., and Kumaraswamy, Y. S., “3D point cloud generation from 2D depth camera images using successivetriangulation,”2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 2017, pp. 129–133.https://doi.org/10.1109/ICIMIA.2017.7975586. Valsesia, D., Fracastoro, G., and Magli, E., “Learning Localized Representations of Point Clouds with Graph- ConvolutionalGenerative Adversarial Networks,”IEEE Transactions on Multimedia, 2019. Ramasinghe, S., Khan, S. H., Barnes, N., and Gould, S., “Spectral-GANs for High-Resolution 3D Point-cloud Generation,”CoRR, Vol. abs/1912.01800, 2019. URL http://arxiv.org/abs/1912.01800. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., “Domain Randomization for Transferring Deep NeuralNetworks from Simulation to the Real World,”CoRR, Vol. abs/1703.06907, 2017. URL http://arxiv.org/abs/1703.06907. He, K., Zhang, X., Ren, S., and Sun, J., “Deep Residual Learning for Image Recognition,”CoRR, Vol. abs/1512.03385, 2015.URL http://arxiv.org/abs/1512.03385. Slide 26: January 2021 Slide 26: SciTech 2021 Slide 26: 26 Slide 26: