This Python script is designed to load images from a specified directory using a ThreadPoolExecutor
for concurrent execution. It reads all .tif
files from the "tif" directory and processes them in parallel, utilizing multiple threads to speed up the image loading process.
✅Important Note: The order of the images is preserved throughout this process. The executor.map
method guarantees that the results are returned in the same order as the input paths, ensuring that the correspondence between the file paths and the loaded images is maintained.
- 🖥️ os (built-in): For interacting with the operating system to list files in a directory.
- 📷 cv2 (OpenCV, to be installed): For reading image files.
- 📁 pathlib (built-in): For manipulating filesystem paths in a platform-independent way.
- 🌐 concurrent.futures.ThreadPoolExecutor (built-in): For managing a pool of threads to execute calls asynchronously.
- 📊 tqdm (to be installed): For displaying a progress bar during the image loading process.
- 🔢 numpy (to be installed): For handling arrays efficiently (assumed to be imported as
np
).
Make sure you have the required dependencies installed:
pip install opencv-python-headless tqdm numpy
Place your .tif
images in a directory named tif
in the script's working directory.
- Define the
tiff_folder_path
variable as shown below. - Run the script to load images concurrently:
import os import cv2 import pathlib from concurrent.futures import ThreadPoolExecutor from tqdm import tqdm import numpy as np # Ensure numpy is imported tif_folder_path = 'tif' list_path = sorted([ pathlib.Path(os.path.join(tif_folder_path, file)).absolute().as_posix() for file in os.listdir(tif_folder_path) ]) def load_image(path): return cv2.imread(path, cv2.IMREAD_COLOR) with ThreadPoolExecutor() as executor: images = np.array( list(tqdm(executor.map(load_image, list_path), total=len(list_path))))
- The script outputs a NumPy array (
images
) containing the loaded images. 📢 output format : ["frame", "y_im", "x_im", "rgb"]
-
executor.map(load_image, list_path)
: This uses theThreadPoolExecutor
to map theload_image
function across all paths inlist_path
. Theexecutor.map
method is similar to the built-inmap
function but executes the function concurrently across multiple threads, allowing for parallel processing of image loading. -
tqdm(..., total=len(list_path))
: Thetqdm
function wraps around the iterator returned byexecutor.map
, providing a progress bar that shows the progress of loading all images. Thetotal
parameter indicates the total number of items to process, enablingtqdm
to accurately display the progress. -
list(...)
: Thelist
function converts the iterator returned byexecutor.map
into a list. This is necessary becauseexecutor.map
returns a generator, and we need a concrete list of images. -
np.array(...)
: Finally, the list of images is converted into a NumPy array. This allows for efficient storage and manipulation of the image data, which is useful for further processing steps in scientific and engineering applications.
ImageLoader Class Overview 📸 (Advence memory managment and example of context managment and garbage collection)
The ImageLoader
class is a class designed to streamline the process of loading images from a directory, making it both memory-efficient and fast. Utilizing the ThreadPoolExecutor
, it significantly speeds up the loading of images by doing so concurrently. This class is especially useful in data processing and machine learning projects where handling large datasets is common. 🚀
- Concurrent Loading: Leverages multithreading to load images simultaneously, drastically reducing load times. 🕒
- Memory Management: Automatically manages memory by clearing loaded images upon exiting the context, preventing memory leaks. 💾
- Flexibility: Offers the ability to specify the color mode for loading images, accommodating various image processing requirements. 🌈
Upon entering its context (with
statement), ImageLoader
begins the concurrent loading of all images from the specified directory. This is achieved by mapping the load_image
method across all image paths using a ThreadPoolExecutor
. The loaded images are stored in an array, accessible via the images
attribute. Exiting the context triggers the clearing of this array from memory and initiates garbage collection, ensuring efficient memory use.
Designed with memory efficiency in mind, ImageLoader
ensures that loaded images do not linger in memory longer than necessary, which is crucial for long-running applications where memory leaks can lead to performance issues or crashes.
Despite its efficiency, ImageLoader
is bound by the system's available memory. It cannot load more data into memory than the system can handle, which means it may still be possible to encounter memory-related errors with extremely large datasets or high-resolution images. 🧱
In such cases, alternative strategies like loading images in batches, downsampling, or using disk-based caching might be necessary to process the dataset effectively without exceeding memory limits.
ImageLoader
is an invaluable class for anyone working with large sets of images, providing a balance between speed and memory efficiency. However, users must be mindful of its limitations regarding system memory capacity. By understanding and working within these constraints, ImageLoader
can be a powerful tool in your data processing and machine learning arsenal. 💪
doc ecrit avec AI