Last active
March 13, 2024 03:09
-
-
Save mrbid/99e07fdf87d96a9dcf4678e985cd27dc to your computer and use it in GitHub Desktop.
manually configuring a text-to-3d pipeline
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Zero123plus (https://github.com/SUDO-AI-3D/zero123plus) output views are a fixed set of camera poses: | |
Azimuth (relative to input view): 30, 90, 150, 210, 270, 330. | |
Elevation (absolute): 30, -20, 30, -20, 30, -20. | |
To generate the images from Zero123++ it's easier to just use: | |
https://huggingface.co/spaces/sudo-ai/zero123plus-demo-space | |
and enable both background removals | |
StableSAM is usually used to remove backgrounds: | |
https://github.com/abhishekkrthakur/StableSAM | |
otherwise you can also try to magic select them in GIMP or Krita | |
or even use a lower quality network such as https://huggingface.co/spaces/Xenova/remove-background-web | |
https://ezgif.com/sprite-cutter/ | |
can quickly cut the multi-image into individual images. | |
MVDream could be considered better than Zero123++ and allows custom angles, the main difference | |
with Zero123++ and MVDream is that with MVDream you have to start with a text prompt and with | |
Zero123++ you have to start with an input image. | |
https://github.com/bytedance/MVDream | |
Zero123++ outputs 320x320 images and MVDream outputs 256x256 images. | |
While the transforms.json allows you to specify a depth map and Omnidata is generally used to | |
produce depth and normal maps for text-to-3D it tends to make little to no difference: | |
https://github.com/EPFL-VILAB/omnidata/tree/main/omnidata_tools/torch | |
But don't take my word for it because the depth map seems to be an important step in Stable-Dreamfusion: | |
https://github.com/ashawkey/stable-dreamfusion | |
Although you can also get a depth map from MiDaS and ZoeDepth, I'm not sure exactly what types | |
of depth map are supported by either nerf.studio or instant-ngp. | |
https://github.com/isl-org/MiDaS | |
https://github.com/isl-org/ZoeDepth | |
The transforms.json file is partly documented at these urls: | |
https://docs.nerf.studio/quickstart/data_conventions.html | |
https://github.com/NVlabs/instant-ngp/blob/master/docs/nerf_dataset_tips.md | |
--- highlights | |
{ | |
"camera_model": "OPENCV_FISHEYE", // camera model type [OPENCV, OPENCV_FISHEYE] | |
"fl_x": 1072.0, // focal length x | |
"fl_y": 1068.0, // focal length y | |
"cx": 1504.0, // principal point x | |
"cy": 1000.0, // principal point y | |
"w": 3008, // image width | |
"h": 2000, // image height | |
"k1": 0.0312, // first radial distortion parameter, used by [OPENCV, OPENCV_FISHEYE] | |
"k2": 0.0051, // second radial distortion parameter, used by [OPENCV, OPENCV_FISHEYE] | |
"k3": 0.0006, // third radial distortion parameter, used by [OPENCV_FISHEYE] | |
"k4": 0.0001, // fourth radial distortion parameter, used by [OPENCV_FISHEYE] | |
"p1": -6.47e-5, // first tangential distortion parameter, used by [OPENCV] | |
"p2": -1.37e-7, // second tangential distortion parameter, used by [OPENCV] | |
"frames": // ... per-frame intrinsics and extrinsics parameters | |
} | |
{ | |
// ... | |
"frames": [ | |
{ | |
"file_path": "images/frame_00001.jpeg", | |
"transform_matrix": [ | |
// [+X0 +Y0 +Z0 X] | |
// [+X1 +Y1 +Z1 Y] | |
// [+X2 +Y2 +Z2 Z] | |
// [0.0 0.0 0.0 1] | |
[1.0, 0.0, 0.0, 0.0], | |
[0.0, 1.0, 0.0, 0.0], | |
[0.0, 0.0, 1.0, 0.0], | |
[0.0, 0.0, 0.0, 1.0] | |
] | |
// Additional per-frame info | |
} | |
] | |
} | |
The aabb_scale parameter causes the NeRF implementation to trace rays out | |
to a larger or smaller bounding box containing the background elements. | |
This value seems like it needs to be a multiple of two. | |
--- | |
What isn't documented by those URL's is the rotation parameter which seems | |
to be a normalised value of PI*2. It's hard to know what values can be | |
omitted and which can't as it seems to vary from purpose to purpose but | |
many NeRF files do contain this rotation parameter as shown below. | |
{ | |
"camera_angle_x": 0.6194058656692505, | |
"frames": [ | |
{ | |
"file_path": "./train/r_0", | |
"rotation": 0.012566370614359171, | |
"transform_matrix": [ | |
[ | |
-0.9754950404167175, | |
-0.1484755426645279, | |
-0.16237139701843262, | |
-0.6545401215553284 | |
], | |
[ | |
-0.22002151608467102, | |
0.6582863330841064, | |
0.7198954820632935, | |
2.901991844177246 | |
], | |
[ | |
0.0, | |
0.7379797697067261, | |
-0.6748228073120117, | |
-2.7202980518341064 | |
], | |
[ | |
0.0, | |
0.0, | |
0.0, | |
1.0 | |
] | |
] | |
} | |
] | |
} | |
--- | |
The Blender NeRF plugin might help in generating the camera matricies for the transforms.json file | |
https://github.com/maximeraafat/BlenderNeRF | |
--- | |
COLMAP (https://github.com/colmap/colmap) is a really great piece of | |
software for Structure-from-Motion (SfM) and Multi-View Stereo (MVS) | |
however it is not suitable for use on Zero123plus outputs to generate | |
a transforms.json because there is no motion only rotations. | |
sudo apt install colmap | |
It's also worth mentioning that nerf.studio wont load a transforms.json | |
unless you specify the image type in the "file_path" parameter, as where | |
instant-ngp (https://github.com/NVlabs/instant-ngp) will, none of the | |
original NeRF samples specify the filetype. (https://github.com/bmild/nerf) | |
The original NeRF paper (https://www.matthewtancik.com/nerf) dataset can be downloaded here: | |
https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1?usp=sharing | |
--- | |
Ratinod has an example transforms.json that he created specifically for the Zero123++ datset: | |
https://github.com/SUDO-AI-3D/zero123plus/issues/11#issuecomment-1781951276 | |
I loaded it into nerf.studio and the camera transforms look ok, it's hard to be completely | |
sure as in it's current state nerf.studio gives very little statistical information about | |
cameras, it only shows a visual representation of their orientation. | |
It would really help if nerf.studio or instant-ngp allowed you to modify the camera transforms | |
inside the GUI software until it visually looked correct, it seems like Taichi a CPU-only version | |
of instant-ngp written in Python allows you to do this but I haven't had much luck getting it to work yet. | |
https://github.com/Linyou/taichi-ngp-renderer | |
https://github.com/kwea123/ngp_pl | |
https://github.com/Kai-46/nerfplusplus | |
This saves some time setting up the transforms: | |
https://www.andre-gaschler.com/rotationconverter/ | |
I am not sure if the latent-nerf project will perform better on Zero123plus outputs than instant-ngp does: | |
https://github.com/eladrich/latent-nerf | |
--- | |
https://huggingface.co/spaces/LiheYoung/Depth-Anything | |
https://huggingface.co/spaces/bookbot/Image-Upscaling-Playground | |
https://huggingface.co/spaces/hongfz16/3DTopia | |
https://huggingface.co/spaces/stabilityai/TripoSR | |
https://huggingface.co/spaces/flamehaze1115/Wonder3D-demo | |
https://huggingface.co/spaces/liuyuan-pal/SyncDreamer | |
https://huggingface.co/spaces/sudo-ai/zero123plus-demo-space | |
https://github.com/naver/dust3r - Like instant-ngp but calculates the transforms.json, depth, etc for you. | |
--- | |
To be continued... | |
In the meantime I have an introductory article: https://ai.plainenglish.io/text-to-3d-b607bf245031 | |
and an Itch.io mega-thread on the topic: https://itch.io/t/3519795/share-your-favourite-sources-of-free-3d-content-for-games | |
mirrored here: https://gist.github.com/mrbid/6a01c854b9279310f95d5601a8215574 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment