Created
July 29, 2017 08:50
-
-
Save yasyf/944a2c3a565f7d8ccf3317f7c19e12cd to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[root@15d3c1a2801b speaker-diarization]# ./spk-diarization2.py meeting.wav | |
Reading file: meeting.wav | |
Writing output to: stdout | |
Using feacat from: /speaker-diarization/feacat | |
Writing temporal files in: /tmp | |
Writing lna files in: /speaker-diarization/lna | |
Writing exp files in: /speaker-diarization/exp | |
Writing features in: /speaker-diarization/fea | |
Performing exp generation and feacat concurrently | |
tokenpass: ./VAD/tokenpass/test_token_pass | |
Reading recipe: /tmp/initlIJC0P.recipe | |
Using model: ./hmms/mfcc_16g_11.10.2007_10 | |
Writing `.lna` files in: /speaker-diarization/lna | |
Writing `.exp` files in: /speaker-diarization/exp | |
Processing file 1/1 | |
Input: meeting.wav | |
Output: /speaker-diarization/lna/meeting.lna | |
FAN OUT: 0 nodes, 0 arcs | |
FAN IN: 0 nodes, 0 arcs | |
Prefix tree: 3 nodes, 6 arcs | |
WARNING: No tokens in final nodes. The result will be incomplete. Try increasing beam. | |
Calling voice-detection2.py | |
Reading recipe from: /tmp/initlIJC0P.recipe | |
Reading .exp files from: /speaker-diarization/exp | |
Writing output to: /tmp/vadHuAeKI.recipe | |
Sample rate set to: 125 | |
Minimum speech turn duration: 0.5 seconds | |
Minimum nonspeech between-turns duration: 1.5 seconds | |
Segment before expansion set to: 0.0 seconds | |
Segment end expansion set to: 0.0 seconds | |
Waiting for feacat to end. | |
Calling spk-change-detection.py | |
Reading recipe from: /tmp/vadHuAeKI.recipe | |
Reading feature files from: /speaker-diarization/fea | |
Feature files extension: .fea | |
Writing output to: /tmp/spkc_0J8dR.recipe | |
Conversion rate set to frame rate: 125.0 | |
Using a growing window | |
Deltaws set to: 0.096 seconds | |
Using BIC as distance measure, lambda = 1.0 | |
Window size set to: 1.0 seconds | |
Window step set to: 3.0 seconds | |
Threshold distance: 0.0 | |
Useful metrics for determining the right threshold: | |
--------------------------------------------------- | |
Average between windows distance: -370.524562364 | |
Maximum between windows distance: 2039.10263549 | |
Minimum between windows distance: -1222.91049332 | |
Total windows: 346 | |
Total segments: 64 | |
Average between detected segments distance: 327.139641148 | |
Maximum between detected segments distance: 2043.41634976 | |
Minimum between detected segments distance: 11.1822260761 | |
Total detected speaker changes: 41 | |
Calling spk-clustering.py | |
Reading recipe from: /tmp/spkc_0J8dR.recipe | |
Reading feature files from: /speaker-diarization/fea | |
Feature files extension: .fea | |
Writing output to: stdout | |
Conversion rate set to frame rate: 125.0 | |
Using hierarchical clustering | |
Using BIC as distance measure, lambda = 1.3 | |
Threshold distance: 0.0 | |
Maximum speakers: 0 | |
Initial cluster with: 64 speakers | |
Merging: 38 and 44 distance: -2921.76944564 | |
Merging: 38 and 40 distance: -2951.21353662 | |
Merging: 38 and 43 distance: -2871.71348074 | |
Merging: 38 and 44 distance: -2917.25872908 | |
Merging: 51 and 53 distance: -2871.05437544 | |
Merging: 51 and 54 distance: -2940.83512461 | |
Merging: 28 and 38 distance: -2852.69573475 | |
Merging: 50 and 51 distance: -2850.94326678 | |
Merging: 28 and 39 distance: -2759.01409284 | |
Merging: 49 and 52 distance: -2695.00341959 | |
Merging: 44 and 49 distance: -2756.06638545 | |
Merging: 44 and 49 distance: -2710.34849493 | |
Merging: 28 and 36 distance: -2667.44821857 | |
Merging: 28 and 39 distance: -2660.35830143 | |
Merging: 20 and 28 distance: -2657.11499677 | |
Merging: 20 and 35 distance: -2715.17996197 | |
Merging: 18 and 20 distance: -2710.6547319 | |
Merging: 17 and 18 distance: -2684.77184028 | |
Merging: 17 and 19 distance: -2617.35061331 | |
Merging: 15 and 17 distance: -2620.85138956 | |
Merging: 15 and 28 distance: -2607.31238421 | |
Merging: 18 and 24 distance: -2471.32688989 | |
Merging: 1 and 4 distance: -2433.14561303 | |
Merging: 10 and 14 distance: -2358.08500264 | |
Merging: 16 and 25 distance: -2350.12722152 | |
Merging: 16 and 21 distance: -2387.94145847 | |
Merging: 16 and 19 distance: -2393.68683048 | |
Merging: 16 and 18 distance: -2417.33812362 | |
Merging: 20 and 28 distance: -2339.78094975 | |
Merging: 3 and 22 distance: -2321.18497749 | |
Merging: 28 and 30 distance: -2288.37399579 | |
Merging: 10 and 12 distance: -2284.99770592 | |
Merging: 7 and 10 distance: -2266.63776959 | |
Merging: 18 and 25 distance: -2181.96396457 | |
Merging: 7 and 29 distance: -2173.48090795 | |
Merging: 7 and 9 distance: -2130.32932914 | |
Merging: 4 and 7 distance: -2130.82075976 | |
Merging: 4 and 15 distance: -2110.68070368 | |
Merging: 4 and 14 distance: -2115.52832853 | |
Merging: 4 and 16 distance: -2171.81672547 | |
Merging: 4 and 6 distance: -2114.09630116 | |
Merging: 3 and 8 distance: -1928.85407494 | |
Merging: 3 and 7 distance: -2005.49337413 | |
Merging: 2 and 4 distance: -1928.80405363 | |
Merging: 3 and 15 distance: -1922.85094438 | |
Merging: 17 and 19 distance: -1843.92849553 | |
Merging: 14 and 15 distance: -1815.35085063 | |
Merging: 2 and 9 distance: -1797.8385882 | |
Merging: 2 and 4 distance: -1908.88004705 | |
Merging: 2 and 5 distance: -1845.96847883 | |
Merging: 6 and 9 distance: -1655.75553701 | |
Merging: 2 and 4 distance: -1625.96419614 | |
Merging: 2 and 7 distance: -1404.1768073 | |
Merging: 2 and 7 distance: -1317.07970031 | |
Merging: 3 and 4 distance: -1299.29447592 | |
Merging: 7 and 9 distance: -1179.8780516 | |
Merging: 5 and 8 distance: -1144.80951179 | |
Merging: 1 and 3 distance: -741.754094786 | |
Merging: 4 and 5 distance: -618.754819342 | |
Final speakers: 5 | |
Useful metrics for determining the right threshold: | |
--------------------------------------------------- | |
Maximum between segments distance: 21370.5775165 | |
Minimum between segments distance: -2951.21353662 | |
Total segments: 64 | |
Total detected speakers: 5 | |
[root@15d3c1a2801b speaker-diarization]# | |
[root@15d3c1a2801b speaker-diarization]# cat stdout | |
audio=meeting.wav lna=a_1 start-time=0.384 end-time=5.82 speaker=speaker_1 | |
audio=meeting.wav lna=a_2 start-time=5.82 end-time=31.648 speaker=speaker_2 | |
audio=meeting.wav lna=a_3 start-time=31.648 end-time=58.272 speaker=speaker_1 | |
audio=meeting.wav lna=a_4 start-time=60.032 end-time=66.536 speaker=speaker_1 | |
audio=meeting.wav lna=a_5 start-time=66.536 end-time=68.748 speaker=speaker_2 | |
audio=meeting.wav lna=a_6 start-time=68.748 end-time=70.576 speaker=speaker_2 | |
audio=meeting.wav lna=a_7 start-time=70.576 end-time=78.264 speaker=speaker_2 | |
audio=meeting.wav lna=a_8 start-time=79.84 end-time=80.248 speaker=speaker_2 | |
audio=meeting.wav lna=a_9 start-time=80.248 end-time=82.792 speaker=speaker_2 | |
audio=meeting.wav lna=a_10 start-time=82.792 end-time=83.372 speaker=speaker_2 | |
audio=meeting.wav lna=a_11 start-time=83.372 end-time=88.96 speaker=speaker_2 | |
audio=meeting.wav lna=a_12 start-time=88.96 end-time=93.288 speaker=speaker_1 | |
audio=meeting.wav lna=a_13 start-time=93.288 end-time=93.9 speaker=speaker_2 | |
audio=meeting.wav lna=a_14 start-time=93.9 end-time=96.436 speaker=speaker_1 | |
audio=meeting.wav lna=a_15 start-time=96.436 end-time=98.436 speaker=speaker_2 | |
audio=meeting.wav lna=a_16 start-time=98.436 end-time=102.736 speaker=speaker_2 | |
audio=meeting.wav lna=a_17 start-time=102.736 end-time=103.284 speaker=speaker_2 | |
audio=meeting.wav lna=a_18 start-time=103.284 end-time=103.888 speaker=speaker_2 | |
audio=meeting.wav lna=a_19 start-time=103.888 end-time=110.156 speaker=speaker_1 | |
audio=meeting.wav lna=a_20 start-time=110.156 end-time=114.2 speaker=speaker_2 | |
audio=meeting.wav lna=a_21 start-time=119.936 end-time=124.256 speaker=speaker_2 | |
audio=meeting.wav lna=a_22 start-time=124.256 end-time=126.512 speaker=speaker_3 | |
audio=meeting.wav lna=a_23 start-time=126.512 end-time=140.956 speaker=speaker_2 | |
audio=meeting.wav lna=a_24 start-time=140.956 end-time=143.256 speaker=speaker_3 | |
audio=meeting.wav lna=a_25 start-time=148.76 end-time=152.472 speaker=speaker_3 | |
audio=meeting.wav lna=a_26 start-time=157.208 end-time=166.98 speaker=speaker_2 | |
audio=meeting.wav lna=a_27 start-time=166.98 end-time=171.5 speaker=speaker_3 | |
audio=meeting.wav lna=a_28 start-time=171.5 end-time=173.588 speaker=speaker_2 | |
audio=meeting.wav lna=a_29 start-time=173.588 end-time=190.016 speaker=speaker_3 | |
audio=meeting.wav lna=a_30 start-time=190.016 end-time=193.208 speaker=speaker_2 | |
audio=meeting.wav lna=a_31 start-time=195.176 end-time=195.88 speaker=speaker_4 | |
audio=meeting.wav lna=a_32 start-time=195.88 end-time=199.672 speaker=speaker_2 | |
audio=meeting.wav lna=a_33 start-time=201.888 end-time=203.436 speaker=speaker_2 | |
audio=meeting.wav lna=a_34 start-time=203.436 end-time=209.304 speaker=speaker_3 | |
audio=meeting.wav lna=a_35 start-time=210.912 end-time=212.88 speaker=speaker_1 | |
audio=meeting.wav lna=a_36 start-time=215.256 end-time=216.708 speaker=speaker_2 | |
audio=meeting.wav lna=a_37 start-time=216.708 end-time=218.912 speaker=speaker_2 | |
audio=meeting.wav lna=a_38 start-time=224.424 end-time=226.968 speaker=speaker_2 | |
audio=meeting.wav lna=a_39 start-time=226.968 end-time=227.448 speaker=speaker_2 | |
audio=meeting.wav lna=a_40 start-time=227.448 end-time=240.544 speaker=speaker_2 | |
audio=meeting.wav lna=a_41 start-time=242.92 end-time=243.628 speaker=speaker_2 | |
audio=meeting.wav lna=a_42 start-time=243.628 end-time=257.08 speaker=speaker_3 | |
audio=meeting.wav lna=a_43 start-time=257.08 end-time=259.384 speaker=speaker_2 | |
audio=meeting.wav lna=a_44 start-time=261.096 end-time=293.136 speaker=speaker_2 | |
audio=meeting.wav lna=a_45 start-time=298.96 end-time=301.064 speaker=speaker_2 | |
audio=meeting.wav lna=a_46 start-time=301.064 end-time=304.952 speaker=speaker_2 | |
audio=meeting.wav lna=a_47 start-time=304.952 end-time=306.896 speaker=speaker_2 | |
audio=meeting.wav lna=a_48 start-time=339.76 end-time=357.404 speaker=speaker_4 | |
audio=meeting.wav lna=a_49 start-time=357.404 end-time=360.664 speaker=speaker_1 | |
audio=meeting.wav lna=a_50 start-time=360.664 end-time=365.416 speaker=speaker_4 | |
audio=meeting.wav lna=a_51 start-time=369.728 end-time=370.428 speaker=speaker_4 | |
audio=meeting.wav lna=a_52 start-time=370.428 end-time=382.376 speaker=speaker_4 | |
audio=meeting.wav lna=a_53 start-time=382.376 end-time=390.176 speaker=speaker_5 | |
audio=meeting.wav lna=a_54 start-time=390.176 end-time=414.136 speaker=speaker_4 | |
audio=meeting.wav lna=a_55 start-time=417.936 end-time=448.504 speaker=speaker_4 | |
audio=meeting.wav lna=a_56 start-time=451.032 end-time=465.808 speaker=speaker_4 | |
audio=meeting.wav lna=a_57 start-time=473.504 end-time=487.584 speaker=speaker_4 | |
audio=meeting.wav lna=a_58 start-time=492.048 end-time=493.64 speaker=speaker_4 | |
audio=meeting.wav lna=a_59 start-time=495.992 end-time=499.336 speaker=speaker_4 | |
audio=meeting.wav lna=a_60 start-time=501.68 end-time=525.328 speaker=speaker_4 | |
audio=meeting.wav lna=a_61 start-time=537.92 end-time=545.268 speaker=speaker_4 | |
audio=meeting.wav lna=a_62 start-time=545.268 end-time=549.18 speaker=speaker_5 | |
audio=meeting.wav lna=a_63 start-time=549.18 end-time=549.768 speaker=speaker_2 | |
audio=meeting.wav lna=a_64 start-time=549.768 end-time=565.584 speaker=speaker_4 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment