Created
June 8, 2025 19:27
-
-
Save BlackBoyZeus/eb01f52d514b2d4ae80aa6a22526ded6 to your computer and use it in GitHub Desktop.
CrowdFace Demo with BAGEL Integration (Complete Version with Secure Token Handling)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# CrowdFace Demo: Neural-Adaptive Crowd Segmentation with Contextual Pixel-Space Advertisement Integration\n", | |
"\n", | |
"[](https://colab.research.google.com/github/BlackBoyZeus/CrowdFace/blob/main/CrowdFace_Demo.ipynb)\n", | |
"\n", | |
"This notebook demonstrates the CrowdFace system, which combines state-of-the-art segmentation models with contextual advertisement placement." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Setup and Dependencies\n", | |
"\n", | |
"First, let's install the required dependencies." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"!pip install torch torchvision opencv-python transformers diffusers accelerate safetensors huggingface_hub" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Clone the Bagel repository", | |
"!git clone https://github.com/ByteDance-Seed/Bagel.git", | |
"# Add Bagel to the Python path", | |
"import sys", | |
"sys.path.append(\"Bagel\")", | |
"", | |
"# Clone the CrowdFace repository to get the implementation files", | |
"!git clone https://github.com/BlackBoyZeus/CrowdFace.git", | |
"sys.path.append(\"CrowdFace\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import os", | |
"import torch", | |
"import numpy as np", | |
"import cv2", | |
"from PIL import Image", | |
"from tqdm.notebook import tqdm", | |
"from huggingface_hub import snapshot_download", | |
"from copy import deepcopy", | |
"from typing import Dict, List, Optional, Tuple, Union, Any", | |
"", | |
"# Set Hugging Face token", | |
"# Replace with your own token or set in environment variables", | |
"HUGGINGFACE_TOKEN = None", | |
"", | |
"# For Google Colab, we can use the secrets module for more secure handling", | |
"try:", | |
" from google.colab import userdata", | |
" # If running in Colab and token is set in secrets, use that instead", | |
" if userdata.get(\"HUGGINGFACE_TOKEN\"):", | |
" HUGGINGFACE_TOKEN = userdata.get(\"HUGGINGFACE_TOKEN\")", | |
" print(\"Running in Google Colab\")", | |
"except:", | |
" # Try to get from environment variables", | |
" if os.environ.get(\"HUGGINGFACE_TOKEN\"):", | |
" HUGGINGFACE_TOKEN = os.environ.get(\"HUGGINGFACE_TOKEN\")", | |
" print(\"Running in local environment\")", | |
" ", | |
"# If no token is set, prompt the user", | |
"if not HUGGINGFACE_TOKEN:", | |
" print(\"\nIMPORTANT: You need a Hugging Face token to access the models.\")", | |
" print(\"Please set your token in the cell below.\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Set your Hugging Face token here if not set above", | |
"if not HUGGINGFACE_TOKEN:", | |
" HUGGINGFACE_TOKEN = \"\" # Enter your token here" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Model Loading\n", | |
"\n", | |
"### 1. Load SAM2 (Segment Anything Model 2)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from transformers import SamModel, SamProcessor", | |
"", | |
"# Load SAM2 model and processor", | |
"model_id = \"facebook/sam2\"", | |
"sam_processor = SamProcessor.from_pretrained(model_id, token=HUGGINGFACE_TOKEN)", | |
"sam_model = SamModel.from_pretrained(model_id, token=HUGGINGFACE_TOKEN)", | |
"", | |
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"", | |
"sam_model = sam_model.to(device)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 2. Load RVM (Robust Video Matting)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Clone RVM repository if not already present", | |
"!git clone https://github.com/PeterL1n/RobustVideoMatting.git", | |
"sys.path.append(\"RobustVideoMatting\")", | |
"", | |
"try:", | |
" from model import MattingNetwork", | |
" ", | |
" # Load RVM model", | |
" rvm_model = MattingNetwork(\"mobilenetv3\").eval().to(device)", | |
" ", | |
" # Download RVM weights", | |
" if not os.path.exists(\"rvm_mobilenetv3.pth\"):", | |
" !wget https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3.pth", | |
" ", | |
" # Load weights", | |
" rvm_model.load_state_dict(torch.load(\"rvm_mobilenetv3.pth\", map_location=device))", | |
" print(\"RVM model loaded successfully\")", | |
"except Exception as e:", | |
" print(f\"Error loading RVM model: {e}\")", | |
" print(\"Will use fallback methods for matting\")", | |
" rvm_model = None" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 3. Load BAGEL Model" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Download and set up BAGEL model", | |
"save_dir = \"models/BAGEL-7B-MoT\"", | |
"repo_id = \"ByteDance-Seed/BAGEL-7B-MoT\"", | |
"cache_dir = save_dir + \"/cache\"", | |
"", | |
"try:", | |
" print(\"Downloading BAGEL model (this may take some time)...\")", | |
" snapshot_download(cache_dir=cache_dir,", | |
" local_dir=save_dir,", | |
" repo_id=repo_id,", | |
" local_dir_use_symlinks=False,", | |
" resume_download=True,", | |
" token=HUGGINGFACE_TOKEN,", | |
" allow_patterns=[\"*.json\", \"*.safetensors\", \"*.bin\", \"*.py\", \"*.md\", \"*.txt\"],", | |
" )", | |
" print(\"BAGEL model downloaded successfully!\")", | |
"except Exception as e:", | |
" print(f\"Error downloading BAGEL model: {e}\")", | |
" print(\"Will use fallback methods for scene understanding and ad placement.\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Initialize BAGEL model", | |
"try:", | |
" from accelerate import infer_auto_device_map, load_checkpoint_and_dispatch, init_empty_weights", | |
" from Bagel.data.transforms import ImageTransform", | |
" from Bagel.data.data_utils import add_special_tokens", | |
" from Bagel.modeling.bagel import (", | |
" BagelConfig, Bagel, Qwen2Config, Qwen2ForCausalLM, SiglipVisionConfig, SiglipVisionModel", | |
" )", | |
" from Bagel.modeling.qwen2 import Qwen2Tokenizer", | |
" from Bagel.modeling.bagel.qwen2_navit import NaiveCache", | |
" from Bagel.modeling.autoencoder import load_ae", | |
" from Bagel.inferencer import InterleaveInferencer", | |
" ", | |
" model_path = save_dir", | |
" ", | |
" # LLM config preparing", | |
" llm_config = Qwen2Config.from_json_file(os.path.join(model_path, \"llm_config.json\"))", | |
" llm_config.qk_norm = True", | |
" llm_config.tie_word_embeddings = False", | |
" llm_config.layer_module = \"Qwen2MoTDecoderLayer\"", | |
" ", | |
" # ViT config preparing", | |
" vit_config = SiglipVisionConfig.from_json_file(os.path.join(model_path, \"vit_config.json\"))", | |
" vit_config.rope = False", | |
" vit_config.num_hidden_layers = vit_config.num_hidden_layers - 1", | |
" ", | |
" # VAE loading", | |
" vae_model, vae_config = load_ae(local_path=os.path.join(model_path, \"ae.safetensors\"))", | |
" ", | |
" # Bagel config preparing", | |
" config = BagelConfig(", | |
" visual_gen=True,", | |
" visual_und=True,", | |
" llm_config=llm_config, ", | |
" vit_config=vit_config,", | |
" vae_config=vae_config,", | |
" vit_max_num_patch_per_side=70,", | |
" connector_act=\"gelu_pytorch_tanh\",", | |
" latent_patch_size=2,", | |
" max_latent_size=64,", | |
" )", | |
" ", | |
" # Initialize model with empty weights", | |
" with init_empty_weights():", | |
" language_model = Qwen2ForCausalLM(llm_config)", | |
" vit_model = SiglipVisionModel(vit_config)", | |
" model = Bagel(language_model, vit_model, config)", | |
" model.vit_model.vision_model.embeddings.convert_conv2d_to_linear(vit_config, meta=True)", | |
" ", | |
" # Load tokenizer and add special tokens", | |
" tokenizer = Qwen2Tokenizer.from_pretrained(model_path)", | |
" tokenizer, new_token_ids, _ = add_special_tokens(tokenizer)", | |
" ", | |
" # Set up transforms", | |
" vae_transform = ImageTransform(1024, 512, 16)", | |
" vit_transform = ImageTransform(980, 224, 14)", | |
" ", | |
" # Set up device map for model loading", | |
" device_map = infer_auto_device_map(", | |
" model,", | |
" max_memory={i: \"80GiB\" for i in range(torch.cuda.device_count())},", | |
" no_split_module_classes=[\"Bagel\", \"Qwen2MoTDecoderLayer\"],", | |
" )", | |
" ", | |
" # Define modules that should be on the same device", | |
" same_device_modules = [", | |
" \"language_model.model.embed_tokens\",", | |
" \"time_embedder\",", | |
" \"latent_pos_embed\",", | |
" \"vae2llm\",", | |
" \"llm2vae\",", | |
" \"connector\",", | |
" ]", | |
" ", | |
" # Load model weights", | |
" model = load_checkpoint_and_dispatch(", | |
" model, ", | |
" os.path.join(model_path, \"pytorch_model.bin\"),", | |
" device_map=device_map,", | |
" offload_folder=None,", | |
" offload_state_dict=False,", | |
" same_device_modules=same_device_modules,", | |
" )", | |
" ", | |
" # Initialize the inferencer", | |
" bagel_inferencer = InterleaveInferencer(", | |
" model=model, ", | |
" vae_model=vae_model, ", | |
" tokenizer=tokenizer, ", | |
" vae_transform=vae_transform, ", | |
" vit_transform=vit_transform, ", | |
" new_token_ids=new_token_ids", | |
" )", | |
" ", | |
" print(\"BAGEL model initialized successfully!\")", | |
"except Exception as e:", | |
" print(f\"Error initializing BAGEL model: {e}\")", | |
" print(\"Will use fallback methods for scene understanding and ad placement.\")", | |
" bagel_inferencer = None" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Import CrowdFace Components\n", | |
"\n", | |
"Now let's import the CrowdFace components from the repository." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Import CrowdFace components", | |
"from CrowdFace.src.python.bagel.scene_understanding import BAGELSceneUnderstanding", | |
"from CrowdFace.src.python.bagel.ad_placement import BAGELAdPlacement", | |
"from CrowdFace.src.python.bagel.ad_optimization import BAGELAdOptimization", | |
"from CrowdFace.src.python.crowdface_pipeline import CrowdFacePipeline", | |
"", | |
"# Initialize the CrowdFace pipeline", | |
"pipeline = CrowdFacePipeline(", | |
" sam_model=sam_model,", | |
" sam_processor=sam_processor,", | |
" rvm_model=rvm_model if \"rvm_model\" in locals() else None,", | |
" bagel_inferencer=bagel_inferencer if \"bagel_inferencer\" in locals() else None", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Upload and Process a Video\n", | |
"\n", | |
"Now let's upload a video and process it with our CrowdFace pipeline." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# For Google Colab, add file upload widget", | |
"try:", | |
" from google.colab import files", | |
" uploaded = files.upload()", | |
" video_path = next(iter(uploaded.keys()))", | |
" print(f\"Uploaded video: {video_path}\")", | |
"except ImportError:", | |
" # If not in Colab, use a sample video", | |
" # Download a sample video", | |
" !wget -O sample_video.mp4 https://pixabay.com/videos/download/video-41758_source.mp4?attachment", | |
" video_path = \"sample_video.mp4\"", | |
" print(f\"Using sample video: {video_path}\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Create or download a sample ad image", | |
"try:", | |
" # Try to upload an ad image", | |
" from google.colab import files", | |
" print(\"Upload an ad image:\")", | |
" uploaded = files.upload()", | |
" ad_path = next(iter(uploaded.keys()))", | |
" print(f\"Uploaded ad image: {ad_path}\")", | |
"except (ImportError, StopIteration):", | |
" # If not in Colab or no file uploaded, create a sample ad", | |
" ad_img = np.ones((300, 500, 4), dtype=np.uint8) * 255", | |
" # Add some text", | |
" cv2.putText(ad_img, \"SAMPLE AD\", (50, 150), cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 0, 255, 255), 5)", | |
" cv2.imwrite(\"sample_ad.png\", ad_img)", | |
" ad_path = \"sample_ad.png\"", | |
" print(\"Created a sample ad image\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Process the video", | |
"output_path = \"output_video.mp4\"", | |
"pipeline.process_video(", | |
" video_path=video_path,", | |
" ad_image=ad_path,", | |
" output_path=output_path,", | |
" max_frames=100 # Limit to 100 frames for faster processing", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Display the Results\n", | |
"\n", | |
"Let's display the output video to see the results." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from IPython.display import Video", | |
"", | |
"# Display the output video", | |
"Video(output_path)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# For Google Colab, add download option", | |
"try:", | |
" from google.colab import files", | |
" files.download(output_path)", | |
"except ImportError:", | |
" print(f\"Output video saved to {output_path}\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Conclusion\n", | |
"\n", | |
"In this notebook, we've demonstrated the CrowdFace system, which combines:\n", | |
"\n", | |
"1. **SAM2** for precise crowd segmentation", | |
"2. **RVM** for high-quality video matting", | |
"3. **BAGEL** for intelligent scene understanding and ad placement\n", | |
"\n", | |
"This integration creates a sophisticated product that's difficult to replicate because it combines multiple cutting-edge AI models in a way that enhances each component. The result is a system that not only places ads in videos but does so with an understanding of scene context, optimal placement, and content optimization." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.10.0" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment