guanarp/Multiple Object Tracker ByteTracker.md

## Multiple Object Tracker ByteTracker.md

      
    Raw
  

              Multiple Object Tracker ByteTracker.md
            
          
    Google Summer of Code 2023 with OpenCV

PR Thread here
What has been done


Base classes for Multiple object tracker
with its wrappers (bindings). Track, Detection and MultiTracker. Found here
ByteTracker interface for public access. Found here
ByteTracker implementation with Strack class under detail. Here and here
Wrapped classes to be available in other languages such as Java and Python
Introduced Linear assignment problem solver using jonker volgenant method and its bindings into OpenCV codebase. Here
Created a demo for ByteTracker both in C++ and Python. Here and Here
Worked on tests for ByteTracker and LAPJV. Here and Here

Note: ByteTracker tests run good but results are not as expected because tracker outputs are not in the same order as the reference data. That's why I need LAPJV function to work for this test. I can call LAPJV function but first I need to call a costMatrix calculator. Which right now is not public.
30/08/23 Update: LAPJV and costMatrix calculator are now available and test is good. I just need to upload my files to opencv extra
Also, even though the library builds perfectly while in local machine, the build bot is getting a Java error while trying to build. I replicated the environment in the local machine and it still works as expected so I need to check why that is happening before a PR.
28/08/23 Update: This is happening because my MultiTracker base class has the same name as a legacy class. Renaming it to MultipleTracker works fine for linux systems but I still get a DLL error on Windows.
Further steps


Solve ReID problems with objects when they mix their paths, it's probably a matching problem (maybe adjust match threshold).
Get a cost matrix function callback for ByteTracker test. ✅
Tweak Kalman filter parameters for better performance.
Check ByteTracker if after matching there still are output differences with original Python repo, and why they are happening.
Add enum for cost matrix to choose between IoU, etc. Could be implemented with a parser for demo.

Short video

https://youtu.be/g6hWEWd7VJ0
Experience

This spring I took an AI elective at my college. As a way to start working on it I also applied to openCV GSOC proposal “real-time object MOT”.
This is my proposal letter.  I was not sure at the time which algorithm to implement because even though I had the lectures about the topic, I still didn’t have any hands on experience on it (this was good reason to apply to GSOC).
OpenCV GSOC proposal Jose Rios
Happily, I was selected to do a Bytetrack and/or OCSort C++ implementation for the library.
Even though my focus for the proposal was to do a lightweight tracker, ByteTracker is a more general solution, that depending on what I chose as the detection model could also be lightweight, but it is not its main focus.
As a first step I started digging about how Bytetrack works, its usual backbone and common state-of-the-art industry practices.
Bonding period

So after I’ve been accepted my mentors approached me to make a more specific project proposal. The idea now was to implement Bytetrack and/or OCSort tracking algorithms in C++.
At first this really scared me because I’ve never had real C++ experience apart from college courses (OOP and C++ programming). Even though I enjoyed them I believe that there’s a huge gap between production/contribution and classwork code. It was also a big change for me because all my ML practices and examples were in Python; at this point I was very skilled in figuring a python notebook out and start modifying it, but I was not trained in C++ which -at least at the beginning- would be a hard step to take.
I accepted because I also believe that this was a huge opportunity for me, it was my first step into a real developing environment, with specific coding practices, documentation rules, and the related challenges.
As I applied to GSOC I was starting my journey into Machine Learning in general with my first personal projects, and stepping into computer vision/robotics AI was one of my goals. I had a decent background in neural networks and computer vision theory (box matching, detection, metrics, etc.) but, as mentioned before, I’ve never put my hands on them yet.
My first step was to learn about how the current Bytetrack implementations work. From what I read something common was to use YOLOX as the backbone and use Kalman filter to estimate box movements and the Hungarian method to assign boxes to objects.
After talking to my mentors, I saw that what I previously called “the backbone”, which is the detector, is not something rigorous and it could be changed. Depending on what is available on the OpenCV library -and what I require-  the object detection (detector) model can just be changed, as they are just a callback to provide the boxes on each frame.
Another option presented to me was to consider using “earth mover distance” algorithm for box matching but maybe tweak it to make it faster.
Until this point these were all theoretical concepts, but I also had to check how should I implement them in C++.
Week 1-4

After I’ve been accepted, I started looking for information on ByteTracker, but the project “oficially” started (on the bonding period) with my first project discussion meeting on May 5. We agreed that I would start coding on May 29.
The meeting was a good heads up to what the project plan should be

1- Read about metrics and datasets.
2- Read the original ByteTrack paper.
3- Create a private develop repo(on Github or Gitlab) and share access.
4- Run Python implementation of ByteTrack with MOT dataset and write down metrics.
5- Implement C++ version with OpenCV Kalman Filter.
6- Compare C++ and Python metrics.
7- Build OpenCV with a new Tracker.
8- Write tests with gtest.
9- Check Python bindings.
10 -Write documentation and examples.

Even though I already started 1-) and 2-) before this, I still was a little bit confused about the MOT inner working in general. Also, implementing it in C++ felt like a very big adventure.
These weeks were the heaviest part for me because of the theory needed, I previously used object trackers and detectors for small coursework demos, but I’ve never jumped to the practical inner working of them (I did know how a DNN worked theorically but it was different on practice).  This was a delay, because I was lost about how to start so it was difficult to take the first step.
The other two big parts of implementation where Kalman Filter and Linear Assignment Problem (Hungarian Algorithm).
Something that also delayed me were my initial tests with the original ByteTracker version. I had way more problems that expected setting up the environment, running it correctly without memory problems, and getting detection and tracking outputs to a text file. I may should’ve move on and start working on my C++ implementation rather than focusing on solving the python examples.
On the other hand, it was a nice exercise where I could get examples like this one, from a YouTube video of my local downtown.
The main idea of the algorithm was for every frame to do the following:

Get frame’s detections and separate them in high score and low score
Do a first matching between high score detections and current tracks, minimizing cost with LAPJV
Do a second matching between low score detections and previously unmatched tracks
Create new tracks for high score unmatched detections
Save unmatched tracks for a specific number of frames without discarding them

Until week 3 I did not have any tangible progress in coding. The implementation idea was harder to digest than expected, it was a big challenge.
Furthermore, my low experience programming in C++ outside of an IDE was an obstacle at first, I didn’t have any idea of how to compile a program, build a library, use cmake or how linkers worked.
My idea and implementation of the problem consisted of using using to classes:

ByteTracker: for the main tracker with its methods

Used a ByteTrackerImpl class, ByteTracker is just an interface for the public api under video/include/tracking.hpp header. The source is under an ABI in src/tracking
Update every frame
Calculate cost matrix of all detections
Lapjv for matching


Strack

Implemented under src/tracking/detail
create tracks
reactivates lost tracks
Uses Kalman Filter to predict new positions and dimensions for existing bounding boxes (tracks)


At the end of this period I had an initial C++ implementation and was starting a first demo to check that everything was working.
Week 5-8

My demo was somewhat working but it had bugs related to some detections missing in specific frames. Using the MOT train data this was one my results.
This example was made with YOLOv5 as the detector, specifically the s version.
Note that when using a more advanced detector suchas YOLOX-X detections (and also trackings) change by a lot, for example:

The previous weeks, and in the bonding period I read a notable amount of information and documentation about OpenCV codebase, macros and contribution guidelines.
My challenge now was to refactor my code to get it into OpenCV video module.
The main problems were the following: The tracking API didn’t have any multiple object tracker implementation and the video module didn’t have a LAP solver to use.
In this period my head was on separating my ByteTracker header, Strack Header and their initial implementations from what would be in the tracking.hpp API.
The powerful thing about using this interface  is that it enabled wrapping implementations using macros to automatically create bindings for other languages such as Java and Python.
By the end of this period my midterm evaluation was done.
My OpenCV branch was building and working as expected but I needed to reestructure my code.
Week 9-12

I was suggested to use the following structure:

Create a new folder and present an implementation for LAPJV because there’s no LAP solver in OpenCV.
Create three public base classes: Track, Detection and MultiTracker. With these changes I’m contributing a great base for future MOT implementations, also, LAP solver is a useful tool for every aspect in CV, AI and data in general so I believe that contributing this solver to the codebase will have a great impact.

After this implementation the ByteTracker interface (that inherited from MultiTracker), I worked on having two options for the update method, one with InputArray and OutputArray to make bindings easier.
CV_WRAP bool update(InputArray inputDetections,CV_OUT OutputArray& outputTracks) CV_OVERRIDE = 0;
Another one (without python bindings) using vector of Detections and vector of Tracks
CV_EXPORTS virtual void update(const std::vector<Detection>& detections, CV_OUT std::vector<Track>& tracks) CV_OVERRIDE = 0;

During this period I also modified the update algorithm to be more efficient with its data structures.
At this point my main focus was on making the interface work as expected with wrappers, tweak some Kalman parameters and make tests for ByteTracker and LAPJV.
With tests I saw a major problem: I’m struggling finding a way to to make LAPJV public and visible for the linker. It is working good internally (called by ByteTracker) but when I want to use it externally get linker errors.
This is an obstacle because with it I cannot build and run test for the LAPJV function. Also, my ByteTracker test outputs are not in the same order as the reference data; the perfect solution for this is to do output matching with LAPJV, but I also need to solve the linker problems for this.
All in all the results I am getting from my progress are quite good but not perfect, the tracker is not exactly the same as the python version. The main reason for this I believe is because the Kalman filter structure is different in both implementations, and I need to still work on my prediction parameters if I’m going to use OpenCV’s filter (which is the most reasonable thing to do here).
It feels great to get started in this big world of doing open contribution, I feel these weeks were a challenging but enjoyable starting point for me. At the same time, it is quite intimidating to think about providing resources (code) that millions of people could actually see and maybe use.
Demonstration


Build OpenCV
Download required files


2.1. Detector onnx model
2.2. COCO dataset class names
2.3. Choose video to track. Example here:


Get files paths:

string home = getenv("HOME");
string DETECTIONS_OUTPUT_PATH = home + "/files/det.txt";
string TRACKINGS_OUTPUT_PATH = home + "/files/tracked.txt";
string VIDEO_OUTPUT_PATH = home + "/files/output.mp4";
string COCO_NAMES = home + "/files/coco.names";
string NET_PATH = home + "/files/yolov8x.onnx";

Build opencv/samples/cpp/bytetracker_demo.cpp if it wasn't build yet

g++ -ggdb bytetracker_demo.cpp -o bytetracker_demo -L <install path>/lib/ \
-I <install path>/include/opencv4/ -lopencv_core -lopencv_videoio -lopencv_imgproc \
-lopencv_dnn -lopencv_video -lopencv_highgui && export LD_LIBRARY_PATH=<install path>/lib/:$LD_LIBRARY_PATH

Run opencv/samples/cpp/bytetracker_demo

Check detection output txt files and video.
It will be under <home>/files

If wanted, one can also build the python version and run it in the same manner.
It would be under opencv/samples/python/bytetracker_demo.py
python3 bytetracker_demo.py
Tests

For tests these are the required files

detFile.txt
newRef.txt
Both should be available under <path>/opencv_extra/cv/bytetracker