Skip to content

Instantly share code, notes, and snippets.

@TarickWake
Last active June 19, 2024 12:30
Show Gist options
  • Save TarickWake/03e1e6be2f1d354fe7b5a3f9c07768bf to your computer and use it in GitHub Desktop.
Save TarickWake/03e1e6be2f1d354fe7b5a3f9c07768bf to your computer and use it in GitHub Desktop.
imort into array of array tif image fast ! ["frame", "y_im", "x_im", "rgb"]

📸 TIFF loading FAST !

This Python script is designed to load images from a specified directory using a ThreadPoolExecutor for concurrent execution. It reads all .tif files from the "tif" directory and processes them in parallel, utilizing multiple threads to speed up the image loading process.

Important Note: The order of the images is preserved throughout this process. The executor.map method guarantees that the results are returned in the same order as the input paths, ensuring that the correspondence between the file paths and the loaded images is maintained.

🛠️ Dependencies

  • 🖥️ os (built-in): For interacting with the operating system to list files in a directory.
  • 📷 cv2 (OpenCV, to be installed): For reading image files.
  • 📁 pathlib (built-in): For manipulating filesystem paths in a platform-independent way.
  • 🌐 concurrent.futures.ThreadPoolExecutor (built-in): For managing a pool of threads to execute calls asynchronously.
  • 📊 tqdm (to be installed): For displaying a progress bar during the image loading process.
  • 🔢 numpy (to be installed): For handling arrays efficiently (assumed to be imported as np).

📦 Installation

Make sure you have the required dependencies installed:

pip install opencv-python-headless tqdm numpy

📂 Directory Structure

Place your .tif images in a directory named tif in the script's working directory.

🚀 Usage

  1. Define the tiff_folder_path variable as shown below.
  2. Run the script to load images concurrently:
    import os
    import cv2
    import pathlib
    from concurrent.futures import ThreadPoolExecutor
    from tqdm import tqdm
    import numpy as np  # Ensure numpy is imported
    
    tif_folder_path = 'tif'
    
    list_path = sorted([
        pathlib.Path(os.path.join(tif_folder_path, file)).absolute().as_posix()
        for file in os.listdir(tif_folder_path)
    ])
    
    def load_image(path):
        return cv2.imread(path, cv2.IMREAD_COLOR)
    
    with ThreadPoolExecutor() as executor:
        images = np.array(
            list(tqdm(executor.map(load_image, list_path), total=len(list_path))))

📝 Output

  • The script outputs a NumPy array (images) containing the loaded images. 📢 output format : ["frame", "y_im", "x_im", "rgb"]

📖 Explanation of the Final Line

  • executor.map(load_image, list_path): This uses the ThreadPoolExecutor to map the load_image function across all paths in list_path. The executor.map method is similar to the built-in map function but executes the function concurrently across multiple threads, allowing for parallel processing of image loading.

  • tqdm(..., total=len(list_path)): The tqdm function wraps around the iterator returned by executor.map, providing a progress bar that shows the progress of loading all images. The total parameter indicates the total number of items to process, enabling tqdm to accurately display the progress.

  • list(...): The list function converts the iterator returned by executor.map into a list. This is necessary because executor.map returns a generator, and we need a concrete list of images.

  • np.array(...): Finally, the list of images is converted into a NumPy array. This allows for efficient storage and manipulation of the image data, which is useful for further processing steps in scientific and engineering applications.

ImageLoader Class Overview 📸 (Advence memory managment and example of context managment and garbage collection)

The ImageLoader class is a class designed to streamline the process of loading images from a directory, making it both memory-efficient and fast. Utilizing the ThreadPoolExecutor, it significantly speeds up the loading of images by doing so concurrently. This class is especially useful in data processing and machine learning projects where handling large datasets is common. 🚀

Key Features 🗝️

  • Concurrent Loading: Leverages multithreading to load images simultaneously, drastically reducing load times. 🕒
  • Memory Management: Automatically manages memory by clearing loaded images upon exiting the context, preventing memory leaks. 💾
  • Flexibility: Offers the ability to specify the color mode for loading images, accommodating various image processing requirements. 🌈

How It Works 🛠️

Upon entering its context (with statement), ImageLoader begins the concurrent loading of all images from the specified directory. This is achieved by mapping the load_image method across all image paths using a ThreadPoolExecutor. The loaded images are stored in an array, accessible via the images attribute. Exiting the context triggers the clearing of this array from memory and initiates garbage collection, ensuring efficient memory use.

Use Case and Limitations 📉

Memory Efficiency

Designed with memory efficiency in mind, ImageLoader ensures that loaded images do not linger in memory longer than necessary, which is crucial for long-running applications where memory leaks can lead to performance issues or crashes.

Limitations

Despite its efficiency, ImageLoader is bound by the system's available memory. It cannot load more data into memory than the system can handle, which means it may still be possible to encounter memory-related errors with extremely large datasets or high-resolution images. 🧱

In such cases, alternative strategies like loading images in batches, downsampling, or using disk-based caching might be necessary to process the dataset effectively without exceeding memory limits.

Conclusion 🎉

ImageLoader is an invaluable class for anyone working with large sets of images, providing a balance between speed and memory efficiency. However, users must be mindful of its limitations regarding system memory capacity. By understanding and working within these constraints, ImageLoader can be a powerful tool in your data processing and machine learning arsenal. 💪

doc ecrit avec AI

import os
import cv2
import pathlib
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
# Assuming tiff_folder_path is defined somewhere above
tif_folder_path = 'tif'
list_path = sorted([
pathlib.Path(os.path.join(tif_folder_path, file)).absolute().as_posix()
for file in os.listdir("tif")
])
def load_image(path):
return cv2.imread(path, cv2.IMREAD_COLOR)
with ThreadPoolExecutor() as executor:
images = np.array(
list(tqdm(executor.map(load_image, list_path), total=len(list_path))))
import os
import pathlib
import cv2
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
def get_sorted_file_paths(folder_path):
"""
Returns a sorted list of absolute file paths within the given folder.
"""
return sorted([
pathlib.Path(os.path.join(folder_path, file)).absolute().as_posix()
for file in os.listdir(folder_path)
])
def load_image(path):
"""
Loads an image from the given path using OpenCV.
"""
return cv2.imread(path, cv2.IMREAD_COLOR)
def load_images_concurrently(folder_path):
"""
Loads all images from the specified folder path concurrently and returns them as a numpy array.
"""
list_path = get_sorted_file_paths(folder_path)
with ThreadPoolExecutor() as executor:
images = np.array(
list(
tqdm(executor.map(load_image, list_path),
total=len(list_path))))
return images
import os
import pathlib
import cv2
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
import gc
class ImageLoader:
"""
Manages concurrent image loading from a specified directory using a context manager.
Utilizes ThreadPoolExecutor to enhance the loading speed of images. Designed to load
all images into memory upon context entry and clear them upon exit, ensuring efficient
resource management.
Parameters
----------
folder_path : str
Path to the directory containing images.
cv2_flag : int
Flag for OpenCV's imread function to specify the color mode of the image. Default is cv2.IMREAD_COLOR.
Attributes
----------
images : np.ndarray or None
Loaded images array; None until images are loaded.
Methods
-------
__enter__()
Concurrently loads images upon context entry.
__exit__(exc_type, exc_val, exc_tb)
Clears loaded images and triggers garbage collection on context exit.
get_sorted_file_paths()
Fetches sorted list of image file paths in the directory.
load_image(path)
Loads a single image using OpenCV with the specified cv2_flag.
load_images_concurrently()
Concurrently loads all images from the directory using the specified cv2_flag.
to_array()
Loads and returns all images into a numpy array.
Examples
--------
>>> loader = ImageLoader('/path/to/images', cv2.IMREAD_GRAYSCALE)
>>> with loader as img_loader:
... images = img_loader.images
... # images now contains all images from '/path/to/images' in grayscale
"""
def __init__(self, folder_path, cv2_flag=cv2.IMREAD_COLOR):
self.folder_path = folder_path
self.cv2_flag = cv2_flag
self.images = None
def __enter__(self):
self.images = self.load_images_concurrently()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.images = None
import gc
gc.collect()
def get_sorted_file_paths(self):
file_paths = [
os.path.join(self.folder_path, file)
for file in os.listdir(self.folder_path)
]
return sorted(
pathlib.Path(path).absolute().as_posix() for path in file_paths)
def load_image(self, path):
return cv2.imread(path, self.cv2_flag)
def load_images_concurrently(self):
paths = self.get_sorted_file_paths()
with ThreadPoolExecutor() as executor:
images = list(
tqdm(executor.map(lambda path: self.load_image(path), paths),
total=len(paths)))
return np.array(images)
def to_array(self):
return self.load_images_concurrently()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment