lutet88/imumouse.md

## imumouse.md

      
    Raw
  

              imumouse.md
            
          
    The IMUMouse Project


While discussing alternative input methods for cursor movement as somewhat of a joke with my gamer friends a month ago, I proposed an idea: a fully IMU-based mouse that would map the screen to a 2D plane in the real world. Although at first I did not take this seriously, as I understood the impracticality of double integration for location, especially also taking rotation into account, I questioned whether this type of input device would be possible. Thus, I spent dozens of hours the second half of summer figuring it out, as part of a research project mentored by Oxford Professor Alex Rogers. Here is the rough idea, in the form of a dated 5-minute paint.net sketch:

Test Methodology

In order to develop an algorithm that is able to calculate the exact position of the IMU device, I needed to find the ground truth for the algorithm, so that different algorithms can be compared and evaluated.
To do so, I used my HUION HS64 Drawing Tablet and its stock tablet pen, on an area of 128mm x 72mm (or exactly 0.05 times my monitor's resolution in pixels). I 3D-printed mounts for each IMU and MCU I used throughout the process. In all cases, OpenTabletDriver v0.5.3.* and a Python data retriever were used to retrieve the absolute position.

(Left: Original Onboard LSM6DS device, Right: BNO055 version used for gathering Machine Learning training data, Underneath: Huion HS64 Drawing Tablet)
Double Integration Algorithms

Naive Double Integration

At first, I tried a simple algorithm with just basic double integration on a LSM6DS33. Using the Adafruit_AHRS library for Madgwick orientation calculations, and manual deltaTime-based calculations to perform the double integration. Results were, well, not astonishing.
Mean Absolute Error (MAE) is used, as it calculates the geometric distance between expected and predicted points, which is probably the best measure of error in this circumstance. All MAE values are a mean of at least 3 trials. (I frankly can't remember exactly how many I did for each algorithm)


Algorithm
Filtering
MAE (1s)
MAE (10s)
Specs


Double Integration
None
6333.9mm
794266.2mm
208Hz, max sensitivity for all sensors


In this naive approach, we can take a look at each step of the integration for drift and error. The LSM6DS's velocity graph drifts significantly:

And thus, its position cannot be measured accurately.

I also tried using some basic filtering to improve results:


Algorithm
Filtering
MAE (1s)
MAE (10s)
Specs


Double Integration
DC Filter
311.0mm
8179.9mm
208Hz, max sensitivity for all sensors


Double Integration
Kalman Filter
685.4mm
Not measured
208Hz, max sensitivity for all sensors


Although both DC and Kalman filters reduced the absolute error after 1 second significantly, both were not accurate enough to create a useful human input device.
Recurrent Neural Network

Another approach to complete this similar task is with Recurrent Neural Networks (RNNs), but more specifically LSTMs and GRUs.
Recurrent neural networks work by time-step, and feed some amount of data from its output to the next time-step's input, making the output for each time-step dependent of previous time-steps. LSTMs (Long/Short-Term Memory Networks) and GRUs (Gated Recurrent Units) improve upon this, implementing various gates to process what is carried on between time-steps more effectively, forming short-term and long-term memory, hence the name.
As shown by Machine Learning Improvements to Human Motion Tracking with IMUs, 2020 human tracking can be optimized with LSTMs, performing significantly better than double integration in cases where another metric is not available. However, even this paper lands on the conclusion that the LSTM provides similar performance to the robust double integration algorithm detailed in Robust IMU Double Integration, 2017.
Following these works, this is the algorithm I landed on for my project:

In this, raw IMU data is first fed into a sensor fusion algorithm. To avoid doing this myself and thus risking potential inaccuracies so that I could focus on the neural network instead, I used a Bosch BNO055 sensor, which integrates the sensor fusion onboard the IMU so that linear acceleration and precise yaw, pitch, and roll are outputted and recorded directly. The BNO055 only runs at 100Hz, so the sample rate had to be reduced as such.
Then, a single-layer LSTM classifier classifies each data timestep into a single value, with 0 meaning stationary and 1 meaning currently moving. Its 3 inputs consist of Linear Acceleration vectors in m/s^2 on the X, Y, and Z axes.


LSTM(24, input_shape=(None, 3), dropout=0.2, recurrent_dropout=0.4, activation=None)


Dense(1, activation='sigmoid')


The network uses BinaryCrossentropy as the loss function with the Adam optimizer, and was trained for 100 epochs at a batch size of 30, and validated with a split of 0.3.
The results from the classifier are then added to the list of values in the input of the double integration network, which takes Linear Acceleration vectors (m/s^2) on X and Y, Yaw in radians offset from data sample 0, and the value between 0 and 1 from the classifier as inputs.


GRU(30, input_shape=(None, 4), dropout=0.2, recurrent_dropout=0.4, activation=None, return_sequences=True)


GRU(30, dropout=0.2, recurrent_dropout=0.4, activation=None)


Dense(2, activation=None)


This network uses MeanSquaredError as its loss function also with the Adam optimizer, with an epsilon of 1e-3. GRUs were used instead of LSTMs, as I was unable to find hyperparameters that wouldn't result in a gradient explosion using LSTMs. This network was also trained for 100 epochs, but with a batch size of 150, in order to take advantage of my GPU. (it took around 15+8 hours, with the 15 hour run dead due to a gradient explosion around epoch 95) It was also validated with a split of 0.3.
In addition to these algorithms, I also created a few baseline algorithms. These include a naive classifier for stationary periods that returns
$$
\tanh\left(\frac{1}{0.023}\sqrt{A_x^2+A_y^2+A_z^2}\right)
$$
aka, a scaled hyperbolic tangent of the geometric length of the acceleration vector. This function returns 0 at x=0m/s^2, 0.5 at x≃0.013m/s^2, and approaches 1 as it increases.
For double integration regression, the classifier simply multiplies the intended input by 0 if a random value between 0 and 1 is greater than the classifier's output.
Training Data


Training data is collected aboard the 3D-printed devices (mentioned above) manually moving on top of a HUION HS64 drawing tablet, in psuedo-random directions, and in an attempt to cover all possible movements at least a few times. The BNO055 sensor is calibrated before each test, according to Bosch's manual. Data is outputted from the microcontroller via UART at 230400 baud, and collected via pyserial. 3 samples of data were collected, consisting of 179,999 timesteps of linear acceleration on all axes, yaw, pitch, roll, angular velocity on all axes, raw acceleration on all axes, physical position in mm, and calibration status. Only linear acceleration and yaw were used.
In addition to these data points, an additional value was added, depending on the standard deviation of a point's physical position, marking whether the cursor was moving or not. This is used as ground truth for training the classifier. The physical position in mm is used as ground truth for training the regressor.

Results & Conclusion

All results produced are the mean of at least 3 trials, each lasting around 1 minute. shorter periods are calculated as the mean of that statistic for each period within each trial. All values are rounded to 3 significant figures, as I somewhat doubt my methodology.


Regressor
Classifier
MAE (0.01s)
MAE (1s)
MAE (10s)
Hardware


Double Integration
None
Not measured
6340mm
794000mm
LSM6DS33 @ 208Hz


Double Integration + DC Filter
Naive Implementation
Not measured
61.4mm
704mm
BNO055 @ 100Hz


Double Integration + DC Filter
LSTM
Not measured
82.0mm
1510mm
BNO055 @ 100Hz


2-GRU Integration
Naive Implementation
0.143mm
20.7mm
280mm
BNO055 @ 100Hz


2-GRU Integration
LSTM
0.151mm
24.9mm
302mm
BNO055 @ 100Hz


As expected, the 2-GRU implementation of the regressor performs significantly (3.0x) better than even the highest performing double integration regressor, using the naive function as classifier.
Keep in mind that although double integration with a DC filter resulted in decent results, these results are likely not very accurate and were harshly compensated by the neural network.
An interesting thing to note is that although in theory an LSTM should outperform a manually calibrated naive algorithm at classification, my implementation simply doesn't. Perhaps this is due to poor training data, or inaccurate classification between moving and unmoving periods.
Additionally, while the 2-GRU method performed admirably compared to simple double integration, its drift of roughly 2cm/s is still undesirable. In order to develop a mouse-like device, precision must be kept to less than 1mm/s, if possible, and even then the device would drift over time and would not be practical to use. This could be a possible continuation to the project in the future, or a potential field of additional research.
Another interesting point is that when comparing my data to the data in Machine Learning Improvements to Human Motion Tracking with IMUs, similar results are achieved:


Regressor
Classifier
MAE (10s)
MAE (20s)
MAE (adjusted 20s)


"A - Integrative"
6-LSTM

1.2m
1.2m


"C - No Initialization"
6-LSTM

0.6m
0.6m


2-GRU Integration
Naive Implementation
0.28m

0.6-0.7m


2-GRU Integration
LSTM
0.30m

0.65-0.75m


My 2-GRU model, also without initialization, performs quite similarly to Ribeiro et al.'s non-initialized LSTM regression network, despite being used in quite a different use-case. This shows that some LSTM/GRU double integration models may offer similar performance on similar grade IMU devices between different applications.
Another possible extension of this project is to replicate it with a higher end IMU, and continue to optimize the algorithm. As accuracy should be improved at least an order of magnitude to fit the application, further research into using consumer-grade IMU sensors and more advanced industrial IMU sensors is needed to reopen the possibility of an entirely IMU-based absolute-position mouse.
However, due to this concern of accuracy, I found no point in loading the algorithm in TFLite, and onto the microcontroller, as it would likely have been unusable.  (this was the original end-product of the project)
Although I was not able to successfully complete the original task of developing an absolute positioned IMU mouse, this project has been relatively successful in its intent; to evaluate whether such a device was possible. While there can never be a definitive answer, with the hardware that I have on hand, plus the algorithms detailed in this post, I was unable to achieve accuracy even remotely close to that necessary to create a decent input device. However, with alternative algorithms, such as an additional error-corrector for Yan, et al's robust IMU double integration algorithm, or a modified implementation of the hybrid convolutional network outlined in PEEK, it may still be possible to develop such a device and get it running on microcontroller-grade hardware. Though, I am happy that my experiments achieved similar results to Ribeiro et al's work, as it acts as somewhat of a scientific contribution.
I might continue this in a year, not sure.
Citations

Drumond, Rafael Rego, et al. "PEEK: An LSTM Recurrent Network for Motion Classification from Sparse Data." Retrieved 11 August 2021, from https://www.scitepress.org/Papers/2018/65852/65852.pdf
Ribeiro, Pedro Manuel Santos, et al. “Machine Learning Improvements to Human Motion Tracking with IMUs.” Sensors, vol. 20, no. 21, MDPI AG, Nov. 2020, p. 6383. Crossref, doi:10.3390/s20216383.
Yan, Hang, et al. “RIDI: Robust IMU Double Integration.” ArXiv:1712.09004 [Cs], Dec. 2017. arXiv.org, http://arxiv.org/abs/1712.09004.
Algorithm	Filtering	MAE (1s)	MAE (10s)	Specs
Double Integration	DC Filter	311.0mm	8179.9mm	208Hz, max sensitivity for all sensors
Double Integration	Kalman Filter	685.4mm	Not measured	208Hz, max sensitivity for all sensors
Regressor	Classifier	MAE (0.01s)	MAE (1s)	MAE (10s)	Hardware
Double Integration	None	Not measured	6340mm	794000mm	LSM6DS33 @ 208Hz
Double Integration + DC Filter	Naive Implementation	Not measured	61.4mm	704mm	BNO055 @ 100Hz
Double Integration + DC Filter	LSTM	Not measured	82.0mm	1510mm	BNO055 @ 100Hz
2-GRU Integration	Naive Implementation	0.143mm	20.7mm	280mm	BNO055 @ 100Hz
2-GRU Integration	LSTM	0.151mm	24.9mm	302mm	BNO055 @ 100Hz
Regressor	Classifier	MAE (10s)	MAE (20s)	MAE (adjusted 20s)
"A - Integrative"	6-LSTM		1.2m	1.2m
"C - No Initialization"	6-LSTM		0.6m	0.6m
2-GRU Integration	Naive Implementation	0.28m		0.6-0.7m
2-GRU Integration	LSTM	0.30m		0.65-0.75m