roblesch/vsb-fault-detection.md

## vsb-fault-detection.md

      
    Raw
  

              vsb-fault-detection.md
            
          
    VSB Power Line Fault Detection

This gist contains notes and links pertaining to the VSB Power Line Fault Detection kaggle competition
Readings

[0] Analysis of Time Series Data - T. Vantuch (Thesis)
[1] A Complex Classification Approach of Partial Discharges - T. Vantuch Et. Al
Reference Kernels

Optimizing Probabilities for best MCC
DWT Signal Denoising
Fast Fourier Transform & Denoising
Preprocessing Techniques Utilized in [1]

2.2 Signal Characterization


Raw signals contain a high amount of noise. Every signal in the impulse component of a raw signal is considered background noise, except for PD-pattern. Noise sources include radio emissions, power electronics, random pulse interference (lightning, switching, corona) and ambieng & amplifier noise.
Radio emissions can be recognized using FFT based on their modulation. RPI is often represented by a corona discharge, and creates false hit peaks that may be mistaken as a PD-pattern. False peaks can be identified according to their position, shape, amplitude and periodicity.
2.3 Subset Selection

Rareness of PD leads to significant class imbalance. Class imbalance was addressed by applying under-sampling for subset construction, with an equal distribution of class labels for reasonable representation of background noise.
2.4 Feature Extraction


Kraskov estimation was used to estimate feature relevance. The most relevant feature was determined to be the number of peaks and the features derived from detected peaks.
3.1 Univariate Wavelet De-Noising and Peaks Extraction

Univariate wavelet de-noising is used to suppress the majority if the noise. Wavelet decomposition was performed with a level of 1 using the Daubechies 4 (db4) wavelet. Small peaks are suppressed using thresholding, and preserved peaks are examined. Peaks are described with their starting index, amplitude and width.

3.2 Cancellation of False Hit Peaks

False hit peaks are often followed by another with opposite polarity, forming a symmetric pair. Corona peaks were identified as "pulse trains" in the time domain of the signal, and their symmetric peak pairs were identified and removed. Oscillations following the symmetric pair are also removed to prevent midetection as a PD-pattern.
3.3 Selection of Relevant Areas in Raw Signals

Studies indicate that peak appearance is mostly clustered in specific subparts of the sinusoidal signal. The input signal was divided equally into four parts. The first part was ignored. The second and last were considered relevant, and the third part was used as a reference as a high difference on peak-based features can hypothetically imply the ocurrence of a PD-pattern. Alternatively, consistent peak distribution in all parts implies that peaks were introduced as noise.
The previously described features were computed on the three selected parts of the input signal, forming a classification matrix of 28 columns.
Wavelets

The wavelet transform is the projection of a discrete signal into two spaces: the set of
approximation coefficients and the set of detail coefficients.
CWT vs DWT

The major difference between the CWT and discrete wavelet transforms, such as the dwt and modwt, is how the scale parameter is discretized. The CWT discretizes scale more finely than the discrete wavelet transform. In the CWT, you typically fix some base which is a fractional power of two, for example, 21/v where v is an integer greater than 1. The v parameter is often referred to as the number of “voices per octave”.


In the discrete wavelet transform, the scale parameter is always discretized to integer powers of 2, 2j, j=1,2,3,..., so that the number of voices per octave is always 1. The difference between scales on a log2 scale is always 1 for discrete wavelet transforms. Note that this is a much coarser sampling of the scale parameter, s, than is the case with the CWT. Further, in the decimated (downsampled) discrete wavelet transform (DWT), the translation parameter is always proportional to the scale.

Choosing a Wavelet

If you want to find closely spaced features, choose wavelets with smaller support, such as haar, db2, or sym2. The support of the wavelet should be small enough to separate the features of interest. Wavelets with larger support tend to have difficulty detecting closely spaced features. Using wavelets with large support can result in coefficients that do not distinguish individual features. For an example, see Effect of Wavelet Support on Noisy Data. If your data has sparsely spaced transients, you can use wavelets with larger support.


Names for many wavelets are derived from the number of vanishing moments. For example, db6 is the Daubechies wavelet with six vanishing moments and sym3 is the symlet with three vanishing moments. For coiflet wavelets, coif3 is the coiflet with six vanishing moments. For Fejér-Korovkin wavelets, fk8 is the Fejér-Korovkin wavelet with a length 8 filter. Biorthogonal wavelet names are derived from the number of vanishing moments the analysis wavelet and synthesis wavelet each have. For instance, bior3.5 is the biorthogonal wavelet with three vanishing moments in the synthesis wavelet and five vanishing moments in the analysis wavelet.

Wavelet Browser
LSTM

Reference Kernels

https://www.kaggle.com/afajohn/cnn-lstm-for-signal-classification-lb-0-513
https://www.kaggle.com/suicaokhoailang/5-fold-lstm-with-threshold-tuning-0-618-lb
Supplemental Reading

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://www.cs.bham.ac.uk/~jxb/INC/l12.pdf