Skip to content

Instantly share code, notes, and snippets.

@BlackBoyZeus
Created June 8, 2025 19:27
Show Gist options
  • Save BlackBoyZeus/eb01f52d514b2d4ae80aa6a22526ded6 to your computer and use it in GitHub Desktop.
Save BlackBoyZeus/eb01f52d514b2d4ae80aa6a22526ded6 to your computer and use it in GitHub Desktop.
CrowdFace Demo with BAGEL Integration (Complete Version with Secure Token Handling)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CrowdFace Demo: Neural-Adaptive Crowd Segmentation with Contextual Pixel-Space Advertisement Integration\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BlackBoyZeus/CrowdFace/blob/main/CrowdFace_Demo.ipynb)\n",
"\n",
"This notebook demonstrates the CrowdFace system, which combines state-of-the-art segmentation models with contextual advertisement placement."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup and Dependencies\n",
"\n",
"First, let's install the required dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install torch torchvision opencv-python transformers diffusers accelerate safetensors huggingface_hub"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Clone the Bagel repository",
"!git clone https://github.com/ByteDance-Seed/Bagel.git",
"# Add Bagel to the Python path",
"import sys",
"sys.path.append(\"Bagel\")",
"",
"# Clone the CrowdFace repository to get the implementation files",
"!git clone https://github.com/BlackBoyZeus/CrowdFace.git",
"sys.path.append(\"CrowdFace\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os",
"import torch",
"import numpy as np",
"import cv2",
"from PIL import Image",
"from tqdm.notebook import tqdm",
"from huggingface_hub import snapshot_download",
"from copy import deepcopy",
"from typing import Dict, List, Optional, Tuple, Union, Any",
"",
"# Set Hugging Face token",
"# Replace with your own token or set in environment variables",
"HUGGINGFACE_TOKEN = None",
"",
"# For Google Colab, we can use the secrets module for more secure handling",
"try:",
" from google.colab import userdata",
" # If running in Colab and token is set in secrets, use that instead",
" if userdata.get(\"HUGGINGFACE_TOKEN\"):",
" HUGGINGFACE_TOKEN = userdata.get(\"HUGGINGFACE_TOKEN\")",
" print(\"Running in Google Colab\")",
"except:",
" # Try to get from environment variables",
" if os.environ.get(\"HUGGINGFACE_TOKEN\"):",
" HUGGINGFACE_TOKEN = os.environ.get(\"HUGGINGFACE_TOKEN\")",
" print(\"Running in local environment\")",
" ",
"# If no token is set, prompt the user",
"if not HUGGINGFACE_TOKEN:",
" print(\"\nIMPORTANT: You need a Hugging Face token to access the models.\")",
" print(\"Please set your token in the cell below.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set your Hugging Face token here if not set above",
"if not HUGGINGFACE_TOKEN:",
" HUGGINGFACE_TOKEN = \"\" # Enter your token here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Loading\n",
"\n",
"### 1. Load SAM2 (Segment Anything Model 2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import SamModel, SamProcessor",
"",
"# Load SAM2 model and processor",
"model_id = \"facebook/sam2\"",
"sam_processor = SamProcessor.from_pretrained(model_id, token=HUGGINGFACE_TOKEN)",
"sam_model = SamModel.from_pretrained(model_id, token=HUGGINGFACE_TOKEN)",
"",
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"",
"sam_model = sam_model.to(device)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Load RVM (Robust Video Matting)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Clone RVM repository if not already present",
"!git clone https://github.com/PeterL1n/RobustVideoMatting.git",
"sys.path.append(\"RobustVideoMatting\")",
"",
"try:",
" from model import MattingNetwork",
" ",
" # Load RVM model",
" rvm_model = MattingNetwork(\"mobilenetv3\").eval().to(device)",
" ",
" # Download RVM weights",
" if not os.path.exists(\"rvm_mobilenetv3.pth\"):",
" !wget https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3.pth",
" ",
" # Load weights",
" rvm_model.load_state_dict(torch.load(\"rvm_mobilenetv3.pth\", map_location=device))",
" print(\"RVM model loaded successfully\")",
"except Exception as e:",
" print(f\"Error loading RVM model: {e}\")",
" print(\"Will use fallback methods for matting\")",
" rvm_model = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Load BAGEL Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Download and set up BAGEL model",
"save_dir = \"models/BAGEL-7B-MoT\"",
"repo_id = \"ByteDance-Seed/BAGEL-7B-MoT\"",
"cache_dir = save_dir + \"/cache\"",
"",
"try:",
" print(\"Downloading BAGEL model (this may take some time)...\")",
" snapshot_download(cache_dir=cache_dir,",
" local_dir=save_dir,",
" repo_id=repo_id,",
" local_dir_use_symlinks=False,",
" resume_download=True,",
" token=HUGGINGFACE_TOKEN,",
" allow_patterns=[\"*.json\", \"*.safetensors\", \"*.bin\", \"*.py\", \"*.md\", \"*.txt\"],",
" )",
" print(\"BAGEL model downloaded successfully!\")",
"except Exception as e:",
" print(f\"Error downloading BAGEL model: {e}\")",
" print(\"Will use fallback methods for scene understanding and ad placement.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize BAGEL model",
"try:",
" from accelerate import infer_auto_device_map, load_checkpoint_and_dispatch, init_empty_weights",
" from Bagel.data.transforms import ImageTransform",
" from Bagel.data.data_utils import add_special_tokens",
" from Bagel.modeling.bagel import (",
" BagelConfig, Bagel, Qwen2Config, Qwen2ForCausalLM, SiglipVisionConfig, SiglipVisionModel",
" )",
" from Bagel.modeling.qwen2 import Qwen2Tokenizer",
" from Bagel.modeling.bagel.qwen2_navit import NaiveCache",
" from Bagel.modeling.autoencoder import load_ae",
" from Bagel.inferencer import InterleaveInferencer",
" ",
" model_path = save_dir",
" ",
" # LLM config preparing",
" llm_config = Qwen2Config.from_json_file(os.path.join(model_path, \"llm_config.json\"))",
" llm_config.qk_norm = True",
" llm_config.tie_word_embeddings = False",
" llm_config.layer_module = \"Qwen2MoTDecoderLayer\"",
" ",
" # ViT config preparing",
" vit_config = SiglipVisionConfig.from_json_file(os.path.join(model_path, \"vit_config.json\"))",
" vit_config.rope = False",
" vit_config.num_hidden_layers = vit_config.num_hidden_layers - 1",
" ",
" # VAE loading",
" vae_model, vae_config = load_ae(local_path=os.path.join(model_path, \"ae.safetensors\"))",
" ",
" # Bagel config preparing",
" config = BagelConfig(",
" visual_gen=True,",
" visual_und=True,",
" llm_config=llm_config, ",
" vit_config=vit_config,",
" vae_config=vae_config,",
" vit_max_num_patch_per_side=70,",
" connector_act=\"gelu_pytorch_tanh\",",
" latent_patch_size=2,",
" max_latent_size=64,",
" )",
" ",
" # Initialize model with empty weights",
" with init_empty_weights():",
" language_model = Qwen2ForCausalLM(llm_config)",
" vit_model = SiglipVisionModel(vit_config)",
" model = Bagel(language_model, vit_model, config)",
" model.vit_model.vision_model.embeddings.convert_conv2d_to_linear(vit_config, meta=True)",
" ",
" # Load tokenizer and add special tokens",
" tokenizer = Qwen2Tokenizer.from_pretrained(model_path)",
" tokenizer, new_token_ids, _ = add_special_tokens(tokenizer)",
" ",
" # Set up transforms",
" vae_transform = ImageTransform(1024, 512, 16)",
" vit_transform = ImageTransform(980, 224, 14)",
" ",
" # Set up device map for model loading",
" device_map = infer_auto_device_map(",
" model,",
" max_memory={i: \"80GiB\" for i in range(torch.cuda.device_count())},",
" no_split_module_classes=[\"Bagel\", \"Qwen2MoTDecoderLayer\"],",
" )",
" ",
" # Define modules that should be on the same device",
" same_device_modules = [",
" \"language_model.model.embed_tokens\",",
" \"time_embedder\",",
" \"latent_pos_embed\",",
" \"vae2llm\",",
" \"llm2vae\",",
" \"connector\",",
" ]",
" ",
" # Load model weights",
" model = load_checkpoint_and_dispatch(",
" model, ",
" os.path.join(model_path, \"pytorch_model.bin\"),",
" device_map=device_map,",
" offload_folder=None,",
" offload_state_dict=False,",
" same_device_modules=same_device_modules,",
" )",
" ",
" # Initialize the inferencer",
" bagel_inferencer = InterleaveInferencer(",
" model=model, ",
" vae_model=vae_model, ",
" tokenizer=tokenizer, ",
" vae_transform=vae_transform, ",
" vit_transform=vit_transform, ",
" new_token_ids=new_token_ids",
" )",
" ",
" print(\"BAGEL model initialized successfully!\")",
"except Exception as e:",
" print(f\"Error initializing BAGEL model: {e}\")",
" print(\"Will use fallback methods for scene understanding and ad placement.\")",
" bagel_inferencer = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import CrowdFace Components\n",
"\n",
"Now let's import the CrowdFace components from the repository."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import CrowdFace components",
"from CrowdFace.src.python.bagel.scene_understanding import BAGELSceneUnderstanding",
"from CrowdFace.src.python.bagel.ad_placement import BAGELAdPlacement",
"from CrowdFace.src.python.bagel.ad_optimization import BAGELAdOptimization",
"from CrowdFace.src.python.crowdface_pipeline import CrowdFacePipeline",
"",
"# Initialize the CrowdFace pipeline",
"pipeline = CrowdFacePipeline(",
" sam_model=sam_model,",
" sam_processor=sam_processor,",
" rvm_model=rvm_model if \"rvm_model\" in locals() else None,",
" bagel_inferencer=bagel_inferencer if \"bagel_inferencer\" in locals() else None",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Upload and Process a Video\n",
"\n",
"Now let's upload a video and process it with our CrowdFace pipeline."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# For Google Colab, add file upload widget",
"try:",
" from google.colab import files",
" uploaded = files.upload()",
" video_path = next(iter(uploaded.keys()))",
" print(f\"Uploaded video: {video_path}\")",
"except ImportError:",
" # If not in Colab, use a sample video",
" # Download a sample video",
" !wget -O sample_video.mp4 https://pixabay.com/videos/download/video-41758_source.mp4?attachment",
" video_path = \"sample_video.mp4\"",
" print(f\"Using sample video: {video_path}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create or download a sample ad image",
"try:",
" # Try to upload an ad image",
" from google.colab import files",
" print(\"Upload an ad image:\")",
" uploaded = files.upload()",
" ad_path = next(iter(uploaded.keys()))",
" print(f\"Uploaded ad image: {ad_path}\")",
"except (ImportError, StopIteration):",
" # If not in Colab or no file uploaded, create a sample ad",
" ad_img = np.ones((300, 500, 4), dtype=np.uint8) * 255",
" # Add some text",
" cv2.putText(ad_img, \"SAMPLE AD\", (50, 150), cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 0, 255, 255), 5)",
" cv2.imwrite(\"sample_ad.png\", ad_img)",
" ad_path = \"sample_ad.png\"",
" print(\"Created a sample ad image\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Process the video",
"output_path = \"output_video.mp4\"",
"pipeline.process_video(",
" video_path=video_path,",
" ad_image=ad_path,",
" output_path=output_path,",
" max_frames=100 # Limit to 100 frames for faster processing",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Display the Results\n",
"\n",
"Let's display the output video to see the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import Video",
"",
"# Display the output video",
"Video(output_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# For Google Colab, add download option",
"try:",
" from google.colab import files",
" files.download(output_path)",
"except ImportError:",
" print(f\"Output video saved to {output_path}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"In this notebook, we've demonstrated the CrowdFace system, which combines:\n",
"\n",
"1. **SAM2** for precise crowd segmentation",
"2. **RVM** for high-quality video matting",
"3. **BAGEL** for intelligent scene understanding and ad placement\n",
"\n",
"This integration creates a sophisticated product that's difficult to replicate because it combines multiple cutting-edge AI models in a way that enhances each component. The result is a system that not only places ads in videos but does so with an understanding of scene context, optimal placement, and content optimization."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment