Skip to content

Instantly share code, notes, and snippets.

@yasyf
Created July 29, 2017 08:50
Show Gist options
  • Save yasyf/944a2c3a565f7d8ccf3317f7c19e12cd to your computer and use it in GitHub Desktop.
Save yasyf/944a2c3a565f7d8ccf3317f7c19e12cd to your computer and use it in GitHub Desktop.
[root@15d3c1a2801b speaker-diarization]# ./spk-diarization2.py meeting.wav
Reading file: meeting.wav
Writing output to: stdout
Using feacat from: /speaker-diarization/feacat
Writing temporal files in: /tmp
Writing lna files in: /speaker-diarization/lna
Writing exp files in: /speaker-diarization/exp
Writing features in: /speaker-diarization/fea
Performing exp generation and feacat concurrently
tokenpass: ./VAD/tokenpass/test_token_pass
Reading recipe: /tmp/initlIJC0P.recipe
Using model: ./hmms/mfcc_16g_11.10.2007_10
Writing `.lna` files in: /speaker-diarization/lna
Writing `.exp` files in: /speaker-diarization/exp
Processing file 1/1
Input: meeting.wav
Output: /speaker-diarization/lna/meeting.lna
FAN OUT: 0 nodes, 0 arcs
FAN IN: 0 nodes, 0 arcs
Prefix tree: 3 nodes, 6 arcs
WARNING: No tokens in final nodes. The result will be incomplete. Try increasing beam.
Calling voice-detection2.py
Reading recipe from: /tmp/initlIJC0P.recipe
Reading .exp files from: /speaker-diarization/exp
Writing output to: /tmp/vadHuAeKI.recipe
Sample rate set to: 125
Minimum speech turn duration: 0.5 seconds
Minimum nonspeech between-turns duration: 1.5 seconds
Segment before expansion set to: 0.0 seconds
Segment end expansion set to: 0.0 seconds
Waiting for feacat to end.
Calling spk-change-detection.py
Reading recipe from: /tmp/vadHuAeKI.recipe
Reading feature files from: /speaker-diarization/fea
Feature files extension: .fea
Writing output to: /tmp/spkc_0J8dR.recipe
Conversion rate set to frame rate: 125.0
Using a growing window
Deltaws set to: 0.096 seconds
Using BIC as distance measure, lambda = 1.0
Window size set to: 1.0 seconds
Window step set to: 3.0 seconds
Threshold distance: 0.0
Useful metrics for determining the right threshold:
---------------------------------------------------
Average between windows distance: -370.524562364
Maximum between windows distance: 2039.10263549
Minimum between windows distance: -1222.91049332
Total windows: 346
Total segments: 64
Average between detected segments distance: 327.139641148
Maximum between detected segments distance: 2043.41634976
Minimum between detected segments distance: 11.1822260761
Total detected speaker changes: 41
Calling spk-clustering.py
Reading recipe from: /tmp/spkc_0J8dR.recipe
Reading feature files from: /speaker-diarization/fea
Feature files extension: .fea
Writing output to: stdout
Conversion rate set to frame rate: 125.0
Using hierarchical clustering
Using BIC as distance measure, lambda = 1.3
Threshold distance: 0.0
Maximum speakers: 0
Initial cluster with: 64 speakers
Merging: 38 and 44 distance: -2921.76944564
Merging: 38 and 40 distance: -2951.21353662
Merging: 38 and 43 distance: -2871.71348074
Merging: 38 and 44 distance: -2917.25872908
Merging: 51 and 53 distance: -2871.05437544
Merging: 51 and 54 distance: -2940.83512461
Merging: 28 and 38 distance: -2852.69573475
Merging: 50 and 51 distance: -2850.94326678
Merging: 28 and 39 distance: -2759.01409284
Merging: 49 and 52 distance: -2695.00341959
Merging: 44 and 49 distance: -2756.06638545
Merging: 44 and 49 distance: -2710.34849493
Merging: 28 and 36 distance: -2667.44821857
Merging: 28 and 39 distance: -2660.35830143
Merging: 20 and 28 distance: -2657.11499677
Merging: 20 and 35 distance: -2715.17996197
Merging: 18 and 20 distance: -2710.6547319
Merging: 17 and 18 distance: -2684.77184028
Merging: 17 and 19 distance: -2617.35061331
Merging: 15 and 17 distance: -2620.85138956
Merging: 15 and 28 distance: -2607.31238421
Merging: 18 and 24 distance: -2471.32688989
Merging: 1 and 4 distance: -2433.14561303
Merging: 10 and 14 distance: -2358.08500264
Merging: 16 and 25 distance: -2350.12722152
Merging: 16 and 21 distance: -2387.94145847
Merging: 16 and 19 distance: -2393.68683048
Merging: 16 and 18 distance: -2417.33812362
Merging: 20 and 28 distance: -2339.78094975
Merging: 3 and 22 distance: -2321.18497749
Merging: 28 and 30 distance: -2288.37399579
Merging: 10 and 12 distance: -2284.99770592
Merging: 7 and 10 distance: -2266.63776959
Merging: 18 and 25 distance: -2181.96396457
Merging: 7 and 29 distance: -2173.48090795
Merging: 7 and 9 distance: -2130.32932914
Merging: 4 and 7 distance: -2130.82075976
Merging: 4 and 15 distance: -2110.68070368
Merging: 4 and 14 distance: -2115.52832853
Merging: 4 and 16 distance: -2171.81672547
Merging: 4 and 6 distance: -2114.09630116
Merging: 3 and 8 distance: -1928.85407494
Merging: 3 and 7 distance: -2005.49337413
Merging: 2 and 4 distance: -1928.80405363
Merging: 3 and 15 distance: -1922.85094438
Merging: 17 and 19 distance: -1843.92849553
Merging: 14 and 15 distance: -1815.35085063
Merging: 2 and 9 distance: -1797.8385882
Merging: 2 and 4 distance: -1908.88004705
Merging: 2 and 5 distance: -1845.96847883
Merging: 6 and 9 distance: -1655.75553701
Merging: 2 and 4 distance: -1625.96419614
Merging: 2 and 7 distance: -1404.1768073
Merging: 2 and 7 distance: -1317.07970031
Merging: 3 and 4 distance: -1299.29447592
Merging: 7 and 9 distance: -1179.8780516
Merging: 5 and 8 distance: -1144.80951179
Merging: 1 and 3 distance: -741.754094786
Merging: 4 and 5 distance: -618.754819342
Final speakers: 5
Useful metrics for determining the right threshold:
---------------------------------------------------
Maximum between segments distance: 21370.5775165
Minimum between segments distance: -2951.21353662
Total segments: 64
Total detected speakers: 5
[root@15d3c1a2801b speaker-diarization]#
[root@15d3c1a2801b speaker-diarization]# cat stdout
audio=meeting.wav lna=a_1 start-time=0.384 end-time=5.82 speaker=speaker_1
audio=meeting.wav lna=a_2 start-time=5.82 end-time=31.648 speaker=speaker_2
audio=meeting.wav lna=a_3 start-time=31.648 end-time=58.272 speaker=speaker_1
audio=meeting.wav lna=a_4 start-time=60.032 end-time=66.536 speaker=speaker_1
audio=meeting.wav lna=a_5 start-time=66.536 end-time=68.748 speaker=speaker_2
audio=meeting.wav lna=a_6 start-time=68.748 end-time=70.576 speaker=speaker_2
audio=meeting.wav lna=a_7 start-time=70.576 end-time=78.264 speaker=speaker_2
audio=meeting.wav lna=a_8 start-time=79.84 end-time=80.248 speaker=speaker_2
audio=meeting.wav lna=a_9 start-time=80.248 end-time=82.792 speaker=speaker_2
audio=meeting.wav lna=a_10 start-time=82.792 end-time=83.372 speaker=speaker_2
audio=meeting.wav lna=a_11 start-time=83.372 end-time=88.96 speaker=speaker_2
audio=meeting.wav lna=a_12 start-time=88.96 end-time=93.288 speaker=speaker_1
audio=meeting.wav lna=a_13 start-time=93.288 end-time=93.9 speaker=speaker_2
audio=meeting.wav lna=a_14 start-time=93.9 end-time=96.436 speaker=speaker_1
audio=meeting.wav lna=a_15 start-time=96.436 end-time=98.436 speaker=speaker_2
audio=meeting.wav lna=a_16 start-time=98.436 end-time=102.736 speaker=speaker_2
audio=meeting.wav lna=a_17 start-time=102.736 end-time=103.284 speaker=speaker_2
audio=meeting.wav lna=a_18 start-time=103.284 end-time=103.888 speaker=speaker_2
audio=meeting.wav lna=a_19 start-time=103.888 end-time=110.156 speaker=speaker_1
audio=meeting.wav lna=a_20 start-time=110.156 end-time=114.2 speaker=speaker_2
audio=meeting.wav lna=a_21 start-time=119.936 end-time=124.256 speaker=speaker_2
audio=meeting.wav lna=a_22 start-time=124.256 end-time=126.512 speaker=speaker_3
audio=meeting.wav lna=a_23 start-time=126.512 end-time=140.956 speaker=speaker_2
audio=meeting.wav lna=a_24 start-time=140.956 end-time=143.256 speaker=speaker_3
audio=meeting.wav lna=a_25 start-time=148.76 end-time=152.472 speaker=speaker_3
audio=meeting.wav lna=a_26 start-time=157.208 end-time=166.98 speaker=speaker_2
audio=meeting.wav lna=a_27 start-time=166.98 end-time=171.5 speaker=speaker_3
audio=meeting.wav lna=a_28 start-time=171.5 end-time=173.588 speaker=speaker_2
audio=meeting.wav lna=a_29 start-time=173.588 end-time=190.016 speaker=speaker_3
audio=meeting.wav lna=a_30 start-time=190.016 end-time=193.208 speaker=speaker_2
audio=meeting.wav lna=a_31 start-time=195.176 end-time=195.88 speaker=speaker_4
audio=meeting.wav lna=a_32 start-time=195.88 end-time=199.672 speaker=speaker_2
audio=meeting.wav lna=a_33 start-time=201.888 end-time=203.436 speaker=speaker_2
audio=meeting.wav lna=a_34 start-time=203.436 end-time=209.304 speaker=speaker_3
audio=meeting.wav lna=a_35 start-time=210.912 end-time=212.88 speaker=speaker_1
audio=meeting.wav lna=a_36 start-time=215.256 end-time=216.708 speaker=speaker_2
audio=meeting.wav lna=a_37 start-time=216.708 end-time=218.912 speaker=speaker_2
audio=meeting.wav lna=a_38 start-time=224.424 end-time=226.968 speaker=speaker_2
audio=meeting.wav lna=a_39 start-time=226.968 end-time=227.448 speaker=speaker_2
audio=meeting.wav lna=a_40 start-time=227.448 end-time=240.544 speaker=speaker_2
audio=meeting.wav lna=a_41 start-time=242.92 end-time=243.628 speaker=speaker_2
audio=meeting.wav lna=a_42 start-time=243.628 end-time=257.08 speaker=speaker_3
audio=meeting.wav lna=a_43 start-time=257.08 end-time=259.384 speaker=speaker_2
audio=meeting.wav lna=a_44 start-time=261.096 end-time=293.136 speaker=speaker_2
audio=meeting.wav lna=a_45 start-time=298.96 end-time=301.064 speaker=speaker_2
audio=meeting.wav lna=a_46 start-time=301.064 end-time=304.952 speaker=speaker_2
audio=meeting.wav lna=a_47 start-time=304.952 end-time=306.896 speaker=speaker_2
audio=meeting.wav lna=a_48 start-time=339.76 end-time=357.404 speaker=speaker_4
audio=meeting.wav lna=a_49 start-time=357.404 end-time=360.664 speaker=speaker_1
audio=meeting.wav lna=a_50 start-time=360.664 end-time=365.416 speaker=speaker_4
audio=meeting.wav lna=a_51 start-time=369.728 end-time=370.428 speaker=speaker_4
audio=meeting.wav lna=a_52 start-time=370.428 end-time=382.376 speaker=speaker_4
audio=meeting.wav lna=a_53 start-time=382.376 end-time=390.176 speaker=speaker_5
audio=meeting.wav lna=a_54 start-time=390.176 end-time=414.136 speaker=speaker_4
audio=meeting.wav lna=a_55 start-time=417.936 end-time=448.504 speaker=speaker_4
audio=meeting.wav lna=a_56 start-time=451.032 end-time=465.808 speaker=speaker_4
audio=meeting.wav lna=a_57 start-time=473.504 end-time=487.584 speaker=speaker_4
audio=meeting.wav lna=a_58 start-time=492.048 end-time=493.64 speaker=speaker_4
audio=meeting.wav lna=a_59 start-time=495.992 end-time=499.336 speaker=speaker_4
audio=meeting.wav lna=a_60 start-time=501.68 end-time=525.328 speaker=speaker_4
audio=meeting.wav lna=a_61 start-time=537.92 end-time=545.268 speaker=speaker_4
audio=meeting.wav lna=a_62 start-time=545.268 end-time=549.18 speaker=speaker_5
audio=meeting.wav lna=a_63 start-time=549.18 end-time=549.768 speaker=speaker_2
audio=meeting.wav lna=a_64 start-time=549.768 end-time=565.584 speaker=speaker_4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment