Skip to content

Instantly share code, notes, and snippets.

@lutet88
Created August 11, 2021 16:08
Show Gist options
  • Save lutet88/f462039814f6bddb0133c4ebc9282792 to your computer and use it in GitHub Desktop.
Save lutet88/f462039814f6bddb0133c4ebc9282792 to your computer and use it in GitHub Desktop.
imumouse draft

The IMUMouse Project

While discussing alternative input methods for cursor movement as somewhat of a joke with my gamer friends a month ago, I proposed an idea: a fully IMU-based mouse that would map the screen to a 2D plane in the real world. Although at first I did not take this seriously, as I understood the impracticality of double integration for location, especially also taking rotation into account, I questioned whether this type of input device would be possible. Thus, I spent dozens of hours the second half of summer figuring it out, as part of a research project mentored by Oxford Professor Alex Rogers. Here is the rough idea, in the form of a dated 5-minute paint.net sketch:

Test Methodology

In order to develop an algorithm that is able to calculate the exact position of the IMU device, I needed to find the ground truth for the algorithm, so that different algorithms can be compared and evaluated.

To do so, I used my HUION HS64 Drawing Tablet and its stock tablet pen, on an area of 128mm x 72mm (or exactly 0.05 times my monitor's resolution in pixels). I 3D-printed mounts for each IMU and MCU I used throughout the process. In all cases, OpenTabletDriver v0.5.3.* and a Python data retriever were used to retrieve the absolute position.

(Left: Original Onboard LSM6DS device, Right: BNO055 version used for gathering Machine Learning training data, Underneath: Huion HS64 Drawing Tablet)

Double Integration Algorithms

Naive Double Integration

At first, I tried a simple algorithm with just basic double integration on a LSM6DS33. Using the Adafruit_AHRS library for Madgwick orientation calculations, and manual deltaTime-based calculations to perform the double integration. Results were, well, not astonishing.

Mean Absolute Error (MAE) is used, as it calculates the geometric distance between expected and predicted points, which is probably the best measure of error in this circumstance. All MAE values are a mean of at least 3 trials. (I frankly can't remember exactly how many I did for each algorithm)

Algorithm Filtering MAE (1s) MAE (10s) Specs
Double Integration None 6333.9mm 794266.2mm 208Hz, max sensitivity for all sensors

In this naive approach, we can take a look at each step of the integration for drift and error. The LSM6DS's velocity graph drifts significantly:

And thus, its position cannot be measured accurately.

I also tried using some basic filtering to improve results:

Algorithm Filtering MAE (1s) MAE (10s) Specs
Double Integration DC Filter 311.0mm 8179.9mm 208Hz, max sensitivity for all sensors
Double Integration Kalman Filter 685.4mm Not measured 208Hz, max sensitivity for all sensors

Although both DC and Kalman filters reduced the absolute error after 1 second significantly, both were not accurate enough to create a useful human input device.

Recurrent Neural Network

Another approach to complete this similar task is with Recurrent Neural Networks (RNNs), but more specifically LSTMs and GRUs.

Recurrent neural networks work by time-step, and feed some amount of data from its output to the next time-step's input, making the output for each time-step dependent of previous time-steps. LSTMs (Long/Short-Term Memory Networks) and GRUs (Gated Recurrent Units) improve upon this, implementing various gates to process what is carried on between time-steps more effectively, forming short-term and long-term memory, hence the name.

As shown by Machine Learning Improvements to Human Motion Tracking with IMUs, 2020 human tracking can be optimized with LSTMs, performing significantly better than double integration in cases where another metric is not available. However, even this paper lands on the conclusion that the LSTM provides similar performance to the robust double integration algorithm detailed in Robust IMU Double Integration, 2017.

Following these works, this is the algorithm I landed on for my project:

In this, raw IMU data is first fed into a sensor fusion algorithm. To avoid doing this myself and thus risking potential inaccuracies so that I could focus on the neural network instead, I used a Bosch BNO055 sensor, which integrates the sensor fusion onboard the IMU so that linear acceleration and precise yaw, pitch, and roll are outputted and recorded directly. The BNO055 only runs at 100Hz, so the sample rate had to be reduced as such.

Then, a single-layer LSTM classifier classifies each data timestep into a single value, with 0 meaning stationary and 1 meaning currently moving. Its 3 inputs consist of Linear Acceleration vectors in m/s^2 on the X, Y, and Z axes.

  • LSTM(24, input_shape=(None, 3), dropout=0.2, recurrent_dropout=0.4, activation=None)

  • Dense(1, activation='sigmoid')

The network uses BinaryCrossentropy as the loss function with the Adam optimizer, and was trained for 100 epochs at a batch size of 30, and validated with a split of 0.3.

The results from the classifier are then added to the list of values in the input of the double integration network, which takes Linear Acceleration vectors (m/s^2) on X and Y, Yaw in radians offset from data sample 0, and the value between 0 and 1 from the classifier as inputs.

  • GRU(30, input_shape=(None, 4), dropout=0.2, recurrent_dropout=0.4, activation=None, return_sequences=True)

  • GRU(30, dropout=0.2, recurrent_dropout=0.4, activation=None)

  • Dense(2, activation=None)

This network uses MeanSquaredError as its loss function also with the Adam optimizer, with an epsilon of 1e-3. GRUs were used instead of LSTMs, as I was unable to find hyperparameters that wouldn't result in a gradient explosion using LSTMs. This network was also trained for 100 epochs, but with a batch size of 150, in order to take advantage of my GPU. (it took around 15+8 hours, with the 15 hour run dead due to a gradient explosion around epoch 95) It was also validated with a split of 0.3.

In addition to these algorithms, I also created a few baseline algorithms. These include a naive classifier for stationary periods that returns

$$ \tanh\left(\frac{1}{0.023}\sqrt{A_x^2+A_y^2+A_z^2}\right) $$

aka, a scaled hyperbolic tangent of the geometric length of the acceleration vector. This function returns 0 at x=0m/s^2, 0.5 at x≃0.013m/s^2, and approaches 1 as it increases.

For double integration regression, the classifier simply multiplies the intended input by 0 if a random value between 0 and 1 is greater than the classifier's output.

Training Data

Training data is collected aboard the 3D-printed devices (mentioned above) manually moving on top of a HUION HS64 drawing tablet, in psuedo-random directions, and in an attempt to cover all possible movements at least a few times. The BNO055 sensor is calibrated before each test, according to Bosch's manual. Data is outputted from the microcontroller via UART at 230400 baud, and collected via pyserial. 3 samples of data were collected, consisting of 179,999 timesteps of linear acceleration on all axes, yaw, pitch, roll, angular velocity on all axes, raw acceleration on all axes, physical position in mm, and calibration status. Only linear acceleration and yaw were used.

In addition to these data points, an additional value was added, depending on the standard deviation of a point's physical position, marking whether the cursor was moving or not. This is used as ground truth for training the classifier. The physical position in mm is used as ground truth for training the regressor.

Results & Conclusion

All results produced are the mean of at least 3 trials, each lasting around 1 minute. shorter periods are calculated as the mean of that statistic for each period within each trial. All values are rounded to 3 significant figures, as I somewhat doubt my methodology.

Regressor Classifier MAE (0.01s) MAE (1s) MAE (10s) Hardware
Double Integration None Not measured 6340mm 794000mm LSM6DS33 @ 208Hz
Double Integration + DC Filter Naive Implementation Not measured 61.4mm 704mm BNO055 @ 100Hz
Double Integration + DC Filter LSTM Not measured 82.0mm 1510mm BNO055 @ 100Hz
2-GRU Integration Naive Implementation 0.143mm 20.7mm 280mm BNO055 @ 100Hz
2-GRU Integration LSTM 0.151mm 24.9mm 302mm BNO055 @ 100Hz

As expected, the 2-GRU implementation of the regressor performs significantly (3.0x) better than even the highest performing double integration regressor, using the naive function as classifier.

Keep in mind that although double integration with a DC filter resulted in decent results, these results are likely not very accurate and were harshly compensated by the neural network.

An interesting thing to note is that although in theory an LSTM should outperform a manually calibrated naive algorithm at classification, my implementation simply doesn't. Perhaps this is due to poor training data, or inaccurate classification between moving and unmoving periods.

Additionally, while the 2-GRU method performed admirably compared to simple double integration, its drift of roughly 2cm/s is still undesirable. In order to develop a mouse-like device, precision must be kept to less than 1mm/s, if possible, and even then the device would drift over time and would not be practical to use. This could be a possible continuation to the project in the future, or a potential field of additional research.

Another interesting point is that when comparing my data to the data in Machine Learning Improvements to Human Motion Tracking with IMUs, similar results are achieved:

Regressor Classifier MAE (10s) MAE (20s) MAE (adjusted 20s)
"A - Integrative" 6-LSTM 1.2m 1.2m
"C - No Initialization" 6-LSTM 0.6m 0.6m
2-GRU Integration Naive Implementation 0.28m 0.6-0.7m
2-GRU Integration LSTM 0.30m 0.65-0.75m

My 2-GRU model, also without initialization, performs quite similarly to Ribeiro et al.'s non-initialized LSTM regression network, despite being used in quite a different use-case. This shows that some LSTM/GRU double integration models may offer similar performance on similar grade IMU devices between different applications.

Another possible extension of this project is to replicate it with a higher end IMU, and continue to optimize the algorithm. As accuracy should be improved at least an order of magnitude to fit the application, further research into using consumer-grade IMU sensors and more advanced industrial IMU sensors is needed to reopen the possibility of an entirely IMU-based absolute-position mouse.

However, due to this concern of accuracy, I found no point in loading the algorithm in TFLite, and onto the microcontroller, as it would likely have been unusable. (this was the original end-product of the project)

Although I was not able to successfully complete the original task of developing an absolute positioned IMU mouse, this project has been relatively successful in its intent; to evaluate whether such a device was possible. While there can never be a definitive answer, with the hardware that I have on hand, plus the algorithms detailed in this post, I was unable to achieve accuracy even remotely close to that necessary to create a decent input device. However, with alternative algorithms, such as an additional error-corrector for Yan, et al's robust IMU double integration algorithm, or a modified implementation of the hybrid convolutional network outlined in PEEK, it may still be possible to develop such a device and get it running on microcontroller-grade hardware. Though, I am happy that my experiments achieved similar results to Ribeiro et al's work, as it acts as somewhat of a scientific contribution.

I might continue this in a year, not sure.

Citations

Drumond, Rafael Rego, et al. "PEEK: An LSTM Recurrent Network for Motion Classification from Sparse Data." Retrieved 11 August 2021, from https://www.scitepress.org/Papers/2018/65852/65852.pdf

Ribeiro, Pedro Manuel Santos, et al. “Machine Learning Improvements to Human Motion Tracking with IMUs.” Sensors, vol. 20, no. 21, MDPI AG, Nov. 2020, p. 6383. Crossref, doi:10.3390/s20216383.

Yan, Hang, et al. “RIDI: Robust IMU Double Integration.” ArXiv:1712.09004 [Cs], Dec. 2017. arXiv.org, http://arxiv.org/abs/1712.09004.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment