BlackBoyZeus/CrowdFace_Demo.ipynb

## CrowdFace_Demo.ipynb
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# CrowdFace Demo: Neural-Adaptive Crowd Segmentation with Contextual Pixel-Space Advertisement Integration\n",
        "\n",
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BlackBoyZeus/CrowdFace/blob/main/CrowdFace_Demo.ipynb)\n",
        "\n",
        "This notebook demonstrates the CrowdFace system, which combines state-of-the-art segmentation models with contextual advertisement placement."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup and Dependencies\n",
        "\n",
        "First, let's install the required dependencies."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "!pip install torch torchvision opencv-python transformers diffusers accelerate safetensors huggingface_hub"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Clone the Bagel repository",
        "!git clone https://github.com/ByteDance-Seed/Bagel.git",
        "# Add Bagel to the Python path",
        "import sys",
        "sys.path.append(\"Bagel\")",
        "",
        "# Clone the CrowdFace repository to get the implementation files",
        "!git clone https://github.com/BlackBoyZeus/CrowdFace.git",
        "sys.path.append(\"CrowdFace\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os",
        "import torch",
        "import numpy as np",
        "import cv2",
        "from PIL import Image",
        "from tqdm.notebook import tqdm",
        "from huggingface_hub import snapshot_download",
        "from copy import deepcopy",
        "from typing import Dict, List, Optional, Tuple, Union, Any",
        "",
        "# Set Hugging Face token",
        "# Replace with your own token or set in environment variables",
        "HUGGINGFACE_TOKEN = None",
        "",
        "# For Google Colab, we can use the secrets module for more secure handling",
        "try:",
        "    from google.colab import userdata",
        "    # If running in Colab and token is set in secrets, use that instead",
        "    if userdata.get(\"HUGGINGFACE_TOKEN\"):",
        "        HUGGINGFACE_TOKEN = userdata.get(\"HUGGINGFACE_TOKEN\")",
        "    print(\"Running in Google Colab\")",
        "except:",
        "    # Try to get from environment variables",
        "    if os.environ.get(\"HUGGINGFACE_TOKEN\"):",
        "        HUGGINGFACE_TOKEN = os.environ.get(\"HUGGINGFACE_TOKEN\")",
        "    print(\"Running in local environment\")",
        "    ",
        "# If no token is set, prompt the user",
        "if not HUGGINGFACE_TOKEN:",
        "    print(\"\nIMPORTANT: You need a Hugging Face token to access the models.\")",
        "    print(\"Please set your token in the cell below.\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Set your Hugging Face token here if not set above",
        "if not HUGGINGFACE_TOKEN:",
        "    HUGGINGFACE_TOKEN = \"\"  # Enter your token here"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Model Loading\n",
        "\n",
        "### 1. Load SAM2 (Segment Anything Model 2)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from transformers import SamModel, SamProcessor",
        "",
        "# Load SAM2 model and processor",
        "model_id = \"facebook/sam2\"",
        "sam_processor = SamProcessor.from_pretrained(model_id, token=HUGGINGFACE_TOKEN)",
        "sam_model = SamModel.from_pretrained(model_id, token=HUGGINGFACE_TOKEN)",
        "",
        "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"",
        "sam_model = sam_model.to(device)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 2. Load RVM (Robust Video Matting)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Clone RVM repository if not already present",
        "!git clone https://github.com/PeterL1n/RobustVideoMatting.git",
        "sys.path.append(\"RobustVideoMatting\")",
        "",
        "try:",
        "    from model import MattingNetwork",
        "    ",
        "    # Load RVM model",
        "    rvm_model = MattingNetwork(\"mobilenetv3\").eval().to(device)",
        "    ",
        "    # Download RVM weights",
        "    if not os.path.exists(\"rvm_mobilenetv3.pth\"):",
        "        !wget https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3.pth",
        "    ",
        "    # Load weights",
        "    rvm_model.load_state_dict(torch.load(\"rvm_mobilenetv3.pth\", map_location=device))",
        "    print(\"RVM model loaded successfully\")",
        "except Exception as e:",
        "    print(f\"Error loading RVM model: {e}\")",
        "    print(\"Will use fallback methods for matting\")",
        "    rvm_model = None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### 3. Load BAGEL Model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Download and set up BAGEL model",
        "save_dir = \"models/BAGEL-7B-MoT\"",
        "repo_id = \"ByteDance-Seed/BAGEL-7B-MoT\"",
        "cache_dir = save_dir + \"/cache\"",
        "",
        "try:",
        "    print(\"Downloading BAGEL model (this may take some time)...\")",
        "    snapshot_download(cache_dir=cache_dir,",
        "      local_dir=save_dir,",
        "      repo_id=repo_id,",
        "      local_dir_use_symlinks=False,",
        "      resume_download=True,",
        "      token=HUGGINGFACE_TOKEN,",
        "      allow_patterns=[\"*.json\", \"*.safetensors\", \"*.bin\", \"*.py\", \"*.md\", \"*.txt\"],",
        "    )",
        "    print(\"BAGEL model downloaded successfully!\")",
        "except Exception as e:",
        "    print(f\"Error downloading BAGEL model: {e}\")",
        "    print(\"Will use fallback methods for scene understanding and ad placement.\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Initialize BAGEL model",
        "try:",
        "    from accelerate import infer_auto_device_map, load_checkpoint_and_dispatch, init_empty_weights",
        "    from Bagel.data.transforms import ImageTransform",
        "    from Bagel.data.data_utils import add_special_tokens",
        "    from Bagel.modeling.bagel import (",
        "        BagelConfig, Bagel, Qwen2Config, Qwen2ForCausalLM, SiglipVisionConfig, SiglipVisionModel",
        "    )",
        "    from Bagel.modeling.qwen2 import Qwen2Tokenizer",
        "    from Bagel.modeling.bagel.qwen2_navit import NaiveCache",
        "    from Bagel.modeling.autoencoder import load_ae",
        "    from Bagel.inferencer import InterleaveInferencer",
        "    ",
        "    model_path = save_dir",
        "    ",
        "    # LLM config preparing",
        "    llm_config = Qwen2Config.from_json_file(os.path.join(model_path, \"llm_config.json\"))",
        "    llm_config.qk_norm = True",
        "    llm_config.tie_word_embeddings = False",
        "    llm_config.layer_module = \"Qwen2MoTDecoderLayer\"",
        "    ",
        "    # ViT config preparing",
        "    vit_config = SiglipVisionConfig.from_json_file(os.path.join(model_path, \"vit_config.json\"))",
        "    vit_config.rope = False",
        "    vit_config.num_hidden_layers = vit_config.num_hidden_layers - 1",
        "    ",
        "    # VAE loading",
        "    vae_model, vae_config = load_ae(local_path=os.path.join(model_path, \"ae.safetensors\"))",
        "    ",
        "    # Bagel config preparing",
        "    config = BagelConfig(",
        "        visual_gen=True,",
        "        visual_und=True,",
        "        llm_config=llm_config, ",
        "        vit_config=vit_config,",
        "        vae_config=vae_config,",
        "        vit_max_num_patch_per_side=70,",
        "        connector_act=\"gelu_pytorch_tanh\",",
        "        latent_patch_size=2,",
        "        max_latent_size=64,",
        "    )",
        "    ",
        "    # Initialize model with empty weights",
        "    with init_empty_weights():",
        "        language_model = Qwen2ForCausalLM(llm_config)",
        "        vit_model = SiglipVisionModel(vit_config)",
        "        model = Bagel(language_model, vit_model, config)",
        "        model.vit_model.vision_model.embeddings.convert_conv2d_to_linear(vit_config, meta=True)",
        "    ",
        "    # Load tokenizer and add special tokens",
        "    tokenizer = Qwen2Tokenizer.from_pretrained(model_path)",
        "    tokenizer, new_token_ids, _ = add_special_tokens(tokenizer)",
        "    ",
        "    # Set up transforms",
        "    vae_transform = ImageTransform(1024, 512, 16)",
        "    vit_transform = ImageTransform(980, 224, 14)",
        "    ",
        "    # Set up device map for model loading",
        "    device_map = infer_auto_device_map(",
        "        model,",
        "        max_memory={i: \"80GiB\" for i in range(torch.cuda.device_count())},",
        "        no_split_module_classes=[\"Bagel\", \"Qwen2MoTDecoderLayer\"],",
        "    )",
        "    ",
        "    # Define modules that should be on the same device",
        "    same_device_modules = [",
        "        \"language_model.model.embed_tokens\",",
        "        \"time_embedder\",",
        "        \"latent_pos_embed\",",
        "        \"vae2llm\",",
        "        \"llm2vae\",",
        "        \"connector\",",
        "    ]",
        "    ",
        "    # Load model weights",
        "    model = load_checkpoint_and_dispatch(",
        "        model, ",
        "        os.path.join(model_path, \"pytorch_model.bin\"),",
        "        device_map=device_map,",
        "        offload_folder=None,",
        "        offload_state_dict=False,",
        "        same_device_modules=same_device_modules,",
        "    )",
        "    ",
        "    # Initialize the inferencer",
        "    bagel_inferencer = InterleaveInferencer(",
        "        model=model, ",
        "        vae_model=vae_model, ",
        "        tokenizer=tokenizer, ",
        "        vae_transform=vae_transform, ",
        "        vit_transform=vit_transform, ",
        "        new_token_ids=new_token_ids",
        "    )",
        "    ",
        "    print(\"BAGEL model initialized successfully!\")",
        "except Exception as e:",
        "    print(f\"Error initializing BAGEL model: {e}\")",
        "    print(\"Will use fallback methods for scene understanding and ad placement.\")",
        "    bagel_inferencer = None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Import CrowdFace Components\n",
        "\n",
        "Now let's import the CrowdFace components from the repository."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Import CrowdFace components",
        "from CrowdFace.src.python.bagel.scene_understanding import BAGELSceneUnderstanding",
        "from CrowdFace.src.python.bagel.ad_placement import BAGELAdPlacement",
        "from CrowdFace.src.python.bagel.ad_optimization import BAGELAdOptimization",
        "from CrowdFace.src.python.crowdface_pipeline import CrowdFacePipeline",
        "",
        "# Initialize the CrowdFace pipeline",
        "pipeline = CrowdFacePipeline(",
        "    sam_model=sam_model,",
        "    sam_processor=sam_processor,",
        "    rvm_model=rvm_model if \"rvm_model\" in locals() else None,",
        "    bagel_inferencer=bagel_inferencer if \"bagel_inferencer\" in locals() else None",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Upload and Process a Video\n",
        "\n",
        "Now let's upload a video and process it with our CrowdFace pipeline."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# For Google Colab, add file upload widget",
        "try:",
        "    from google.colab import files",
        "    uploaded = files.upload()",
        "    video_path = next(iter(uploaded.keys()))",
        "    print(f\"Uploaded video: {video_path}\")",
        "except ImportError:",
        "    # If not in Colab, use a sample video",
        "    # Download a sample video",
        "    !wget -O sample_video.mp4 https://pixabay.com/videos/download/video-41758_source.mp4?attachment",
        "    video_path = \"sample_video.mp4\"",
        "    print(f\"Using sample video: {video_path}\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Create or download a sample ad image",
        "try:",
        "    # Try to upload an ad image",
        "    from google.colab import files",
        "    print(\"Upload an ad image:\")",
        "    uploaded = files.upload()",
        "    ad_path = next(iter(uploaded.keys()))",
        "    print(f\"Uploaded ad image: {ad_path}\")",
        "except (ImportError, StopIteration):",
        "    # If not in Colab or no file uploaded, create a sample ad",
        "    ad_img = np.ones((300, 500, 4), dtype=np.uint8) * 255",
        "    # Add some text",
        "    cv2.putText(ad_img, \"SAMPLE AD\", (50, 150), cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 0, 255, 255), 5)",
        "    cv2.imwrite(\"sample_ad.png\", ad_img)",
        "    ad_path = \"sample_ad.png\"",
        "    print(\"Created a sample ad image\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Process the video",
        "output_path = \"output_video.mp4\"",
        "pipeline.process_video(",
        "    video_path=video_path,",
        "    ad_image=ad_path,",
        "    output_path=output_path,",
        "    max_frames=100  # Limit to 100 frames for faster processing",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Display the Results\n",
        "\n",
        "Let's display the output video to see the results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from IPython.display import Video",
        "",
        "# Display the output video",
        "Video(output_path)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# For Google Colab, add download option",
        "try:",
        "    from google.colab import files",
        "    files.download(output_path)",
        "except ImportError:",
        "    print(f\"Output video saved to {output_path}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Conclusion\n",
        "\n",
        "In this notebook, we've demonstrated the CrowdFace system, which combines:\n",
        "\n",
        "1. **SAM2** for precise crowd segmentation",
        "2. **RVM** for high-quality video matting",
        "3. **BAGEL** for intelligent scene understanding and ad placement\n",
        "\n",
        "This integration creates a sophisticated product that's difficult to replicate because it combines multiple cutting-edge AI models in a way that enhances each component. The result is a system that not only places ads in videos but does so with an understanding of scene context, optimal placement, and content optimization."
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# CrowdFace Demo: Neural-Adaptive Crowd Segmentation with Contextual Pixel-Space Advertisement Integration\n",
	"\n",
	"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BlackBoyZeus/CrowdFace/blob/main/CrowdFace_Demo.ipynb)\n",
	"\n",
	"This notebook demonstrates the CrowdFace system, which combines state-of-the-art segmentation models with contextual advertisement placement."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Setup and Dependencies\n",
	"\n",
	"First, let's install the required dependencies."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"!pip install torch torchvision opencv-python transformers diffusers accelerate safetensors huggingface_hub"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Clone the Bagel repository",
	"!git clone https://github.com/ByteDance-Seed/Bagel.git",
	"# Add Bagel to the Python path",
	"import sys",
	"sys.path.append(\"Bagel\")",
	"",
	"# Clone the CrowdFace repository to get the implementation files",
	"!git clone https://github.com/BlackBoyZeus/CrowdFace.git",
	"sys.path.append(\"CrowdFace\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"import os",
	"import torch",
	"import numpy as np",
	"import cv2",
	"from PIL import Image",
	"from tqdm.notebook import tqdm",
	"from huggingface_hub import snapshot_download",
	"from copy import deepcopy",
	"from typing import Dict, List, Optional, Tuple, Union, Any",
	"",
	"# Set Hugging Face token",
	"# Replace with your own token or set in environment variables",
	"HUGGINGFACE_TOKEN = None",
	"",
	"# For Google Colab, we can use the secrets module for more secure handling",
	"try:",
	" from google.colab import userdata",
	" # If running in Colab and token is set in secrets, use that instead",
	" if userdata.get(\"HUGGINGFACE_TOKEN\"):",
	" HUGGINGFACE_TOKEN = userdata.get(\"HUGGINGFACE_TOKEN\")",
	" print(\"Running in Google Colab\")",
	"except:",
	" # Try to get from environment variables",
	" if os.environ.get(\"HUGGINGFACE_TOKEN\"):",
	" HUGGINGFACE_TOKEN = os.environ.get(\"HUGGINGFACE_TOKEN\")",
	" print(\"Running in local environment\")",
	" ",
	"# If no token is set, prompt the user",
	"if not HUGGINGFACE_TOKEN:",
	" print(\"\nIMPORTANT: You need a Hugging Face token to access the models.\")",
	" print(\"Please set your token in the cell below.\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Set your Hugging Face token here if not set above",
	"if not HUGGINGFACE_TOKEN:",
	" HUGGINGFACE_TOKEN = \"\" # Enter your token here"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Model Loading\n",
	"\n",
	"### 1. Load SAM2 (Segment Anything Model 2)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from transformers import SamModel, SamProcessor",
	"",
	"# Load SAM2 model and processor",
	"model_id = \"facebook/sam2\"",
	"sam_processor = SamProcessor.from_pretrained(model_id, token=HUGGINGFACE_TOKEN)",
	"sam_model = SamModel.from_pretrained(model_id, token=HUGGINGFACE_TOKEN)",
	"",
	"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"",
	"sam_model = sam_model.to(device)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 2. Load RVM (Robust Video Matting)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Clone RVM repository if not already present",
	"!git clone https://github.com/PeterL1n/RobustVideoMatting.git",
	"sys.path.append(\"RobustVideoMatting\")",
	"",
	"try:",
	" from model import MattingNetwork",
	" ",
	" # Load RVM model",
	" rvm_model = MattingNetwork(\"mobilenetv3\").eval().to(device)",
	" ",
	" # Download RVM weights",
	" if not os.path.exists(\"rvm_mobilenetv3.pth\"):",
	" !wget https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3.pth",
	" ",
	" # Load weights",
	" rvm_model.load_state_dict(torch.load(\"rvm_mobilenetv3.pth\", map_location=device))",
	" print(\"RVM model loaded successfully\")",
	"except Exception as e:",
	" print(f\"Error loading RVM model: {e}\")",
	" print(\"Will use fallback methods for matting\")",
	" rvm_model = None"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 3. Load BAGEL Model"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Download and set up BAGEL model",
	"save_dir = \"models/BAGEL-7B-MoT\"",
	"repo_id = \"ByteDance-Seed/BAGEL-7B-MoT\"",
	"cache_dir = save_dir + \"/cache\"",
	"",
	"try:",
	" print(\"Downloading BAGEL model (this may take some time)...\")",
	" snapshot_download(cache_dir=cache_dir,",
	" local_dir=save_dir,",
	" repo_id=repo_id,",
	" local_dir_use_symlinks=False,",
	" resume_download=True,",
	" token=HUGGINGFACE_TOKEN,",
	" allow_patterns=[\".json\", \".safetensors\", \".bin\", \".py\", \".md\", \".txt\"],",
	" )",
	" print(\"BAGEL model downloaded successfully!\")",
	"except Exception as e:",
	" print(f\"Error downloading BAGEL model: {e}\")",
	" print(\"Will use fallback methods for scene understanding and ad placement.\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Initialize BAGEL model",
	"try:",
	" from accelerate import infer_auto_device_map, load_checkpoint_and_dispatch, init_empty_weights",
	" from Bagel.data.transforms import ImageTransform",
	" from Bagel.data.data_utils import add_special_tokens",
	" from Bagel.modeling.bagel import (",
	" BagelConfig, Bagel, Qwen2Config, Qwen2ForCausalLM, SiglipVisionConfig, SiglipVisionModel",
	" )",
	" from Bagel.modeling.qwen2 import Qwen2Tokenizer",
	" from Bagel.modeling.bagel.qwen2_navit import NaiveCache",
	" from Bagel.modeling.autoencoder import load_ae",
	" from Bagel.inferencer import InterleaveInferencer",
	" ",
	" model_path = save_dir",
	" ",
	" # LLM config preparing",
	" llm_config = Qwen2Config.from_json_file(os.path.join(model_path, \"llm_config.json\"))",
	" llm_config.qk_norm = True",
	" llm_config.tie_word_embeddings = False",
	" llm_config.layer_module = \"Qwen2MoTDecoderLayer\"",
	" ",
	" # ViT config preparing",
	" vit_config = SiglipVisionConfig.from_json_file(os.path.join(model_path, \"vit_config.json\"))",
	" vit_config.rope = False",
	" vit_config.num_hidden_layers = vit_config.num_hidden_layers - 1",
	" ",
	" # VAE loading",
	" vae_model, vae_config = load_ae(local_path=os.path.join(model_path, \"ae.safetensors\"))",
	" ",
	" # Bagel config preparing",
	" config = BagelConfig(",
	" visual_gen=True,",
	" visual_und=True,",
	" llm_config=llm_config, ",
	" vit_config=vit_config,",
	" vae_config=vae_config,",
	" vit_max_num_patch_per_side=70,",
	" connector_act=\"gelu_pytorch_tanh\",",
	" latent_patch_size=2,",
	" max_latent_size=64,",
	" )",
	" ",
	" # Initialize model with empty weights",
	" with init_empty_weights():",
	" language_model = Qwen2ForCausalLM(llm_config)",
	" vit_model = SiglipVisionModel(vit_config)",
	" model = Bagel(language_model, vit_model, config)",
	" model.vit_model.vision_model.embeddings.convert_conv2d_to_linear(vit_config, meta=True)",
	" ",
	" # Load tokenizer and add special tokens",
	" tokenizer = Qwen2Tokenizer.from_pretrained(model_path)",
	" tokenizer, new_token_ids, _ = add_special_tokens(tokenizer)",
	" ",
	" # Set up transforms",
	" vae_transform = ImageTransform(1024, 512, 16)",
	" vit_transform = ImageTransform(980, 224, 14)",
	" ",
	" # Set up device map for model loading",
	" device_map = infer_auto_device_map(",
	" model,",
	" max_memory={i: \"80GiB\" for i in range(torch.cuda.device_count())},",
	" no_split_module_classes=[\"Bagel\", \"Qwen2MoTDecoderLayer\"],",
	" )",
	" ",
	" # Define modules that should be on the same device",
	" same_device_modules = [",
	" \"language_model.model.embed_tokens\",",
	" \"time_embedder\",",
	" \"latent_pos_embed\",",
	" \"vae2llm\",",
	" \"llm2vae\",",
	" \"connector\",",
	" ]",
	" ",
	" # Load model weights",
	" model = load_checkpoint_and_dispatch(",
	" model, ",
	" os.path.join(model_path, \"pytorch_model.bin\"),",
	" device_map=device_map,",
	" offload_folder=None,",
	" offload_state_dict=False,",
	" same_device_modules=same_device_modules,",
	" )",
	" ",
	" # Initialize the inferencer",
	" bagel_inferencer = InterleaveInferencer(",
	" model=model, ",
	" vae_model=vae_model, ",
	" tokenizer=tokenizer, ",
	" vae_transform=vae_transform, ",
	" vit_transform=vit_transform, ",
	" new_token_ids=new_token_ids",
	" )",
	" ",
	" print(\"BAGEL model initialized successfully!\")",
	"except Exception as e:",
	" print(f\"Error initializing BAGEL model: {e}\")",
	" print(\"Will use fallback methods for scene understanding and ad placement.\")",
	" bagel_inferencer = None"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Import CrowdFace Components\n",
	"\n",
	"Now let's import the CrowdFace components from the repository."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Import CrowdFace components",
	"from CrowdFace.src.python.bagel.scene_understanding import BAGELSceneUnderstanding",
	"from CrowdFace.src.python.bagel.ad_placement import BAGELAdPlacement",
	"from CrowdFace.src.python.bagel.ad_optimization import BAGELAdOptimization",
	"from CrowdFace.src.python.crowdface_pipeline import CrowdFacePipeline",
	"",
	"# Initialize the CrowdFace pipeline",
	"pipeline = CrowdFacePipeline(",
	" sam_model=sam_model,",
	" sam_processor=sam_processor,",
	" rvm_model=rvm_model if \"rvm_model\" in locals() else None,",
	" bagel_inferencer=bagel_inferencer if \"bagel_inferencer\" in locals() else None",
	")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Upload and Process a Video\n",
	"\n",
	"Now let's upload a video and process it with our CrowdFace pipeline."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# For Google Colab, add file upload widget",
	"try:",
	" from google.colab import files",
	" uploaded = files.upload()",
	" video_path = next(iter(uploaded.keys()))",
	" print(f\"Uploaded video: {video_path}\")",
	"except ImportError:",
	" # If not in Colab, use a sample video",
	" # Download a sample video",
	" !wget -O sample_video.mp4 https://pixabay.com/videos/download/video-41758_source.mp4?attachment",
	" video_path = \"sample_video.mp4\"",
	" print(f\"Using sample video: {video_path}\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Create or download a sample ad image",
	"try:",
	" # Try to upload an ad image",
	" from google.colab import files",
	" print(\"Upload an ad image:\")",
	" uploaded = files.upload()",
	" ad_path = next(iter(uploaded.keys()))",
	" print(f\"Uploaded ad image: {ad_path}\")",
	"except (ImportError, StopIteration):",
	" # If not in Colab or no file uploaded, create a sample ad",
	" ad_img = np.ones((300, 500, 4), dtype=np.uint8) * 255",
	" # Add some text",
	" cv2.putText(ad_img, \"SAMPLE AD\", (50, 150), cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 0, 255, 255), 5)",
	" cv2.imwrite(\"sample_ad.png\", ad_img)",
	" ad_path = \"sample_ad.png\"",
	" print(\"Created a sample ad image\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Process the video",
	"output_path = \"output_video.mp4\"",
	"pipeline.process_video(",
	" video_path=video_path,",
	" ad_image=ad_path,",
	" output_path=output_path,",
	" max_frames=100 # Limit to 100 frames for faster processing",
	")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Display the Results\n",
	"\n",
	"Let's display the output video to see the results."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from IPython.display import Video",
	"",
	"# Display the output video",
	"Video(output_path)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# For Google Colab, add download option",
	"try:",
	" from google.colab import files",
	" files.download(output_path)",
	"except ImportError:",
	" print(f\"Output video saved to {output_path}\")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Conclusion\n",
	"\n",
	"In this notebook, we've demonstrated the CrowdFace system, which combines:\n",
	"\n",
	"1. SAM2 for precise crowd segmentation",
	"2. RVM for high-quality video matting",
	"3. BAGEL for intelligent scene understanding and ad placement\n",
	"\n",
	"This integration creates a sophisticated product that's difficult to replicate because it combines multiple cutting-edge AI models in a way that enhances each component. The result is a system that not only places ads in videos but does so with an understanding of scene context, optimal placement, and content optimization."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.10.0"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}