Skip to content

Instantly share code, notes, and snippets.

@adolfont
Last active October 18, 2023 21:51
Show Gist options
  • Save adolfont/6452374069e7a5cb4b9bf52febdffe00 to your computer and use it in GitHub Desktop.
Save adolfont/6452374069e7a5cb4b9bf52febdffe00 to your computer and use it in GitHub Desktop.
Transcribing an audio file into a .srt file (Using Livebook 0.11.1 on a HuggingFace Livebook space)

Transcribing an audio file into a .srt file

Mix.install(
  [
    {:kino_bumblebee, "~> 0.4.0"},
    {:exla, ">= 0.0.0"}
  ],
  config: [nx: [default_backend: EXLA.Backend]]
)

Code

{:ok, model_info} = Bumblebee.load_model({:hf, "openai/whisper-large"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-large"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-large"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-tiny"})
generation_config = Bumblebee.configure(generation_config, max_new_tokens: 100)

serving =
  Bumblebee.Audio.speech_to_text_whisper(
    model_info,
    featurizer,
    tokenizer,
    generation_config,
    compile: [batch_size: 4],
    chunk_num_seconds: 30,
    timestamps: :segments,
    stream: true,
    defn_options: [compiler: EXLA]
  )
audio_input = Kino.Input.audio("Audio", sampling_rate: featurizer.sampling_rate)
form = Kino.Control.form([audio: audio_input], submit: "Run")
frame = Kino.Frame.new()

Kino.listen(form, fn %{data: %{audio: audio}} ->
  if audio do
    audio =
      audio.file_ref
      |> Kino.Input.file_path()
      |> File.read!()
      |> Nx.from_binary(:f32)
      |> Nx.reshape({:auto, audio.num_channels})
      |> Nx.mean(axes: [1])

    Kino.Frame.render(frame, Kino.Text.new("--START--\n", chunk: true))

    for chunk <- Nx.Serving.run(serving, audio) do
      [start_mark, end_mark] =
        for seconds <- [chunk.start_timestamp_seconds, chunk.end_timestamp_seconds] do
          seconds |> round() |> Time.from_seconds_after_midnight() |> Time.to_string()
        end

      text = "1\n#{start_mark} --> #{end_mark}\n#{chunk.text}\n\n"
      Kino.Frame.append(frame, Kino.Text.new(text, chunk: true))
    end

    Kino.Frame.append(frame, Kino.Text.new("\n--END--", chunk: true))
  end
end)

Kino.Layout.grid([form, frame], boxed: true, gap: 16)
@adolfont
Copy link
Author

How to create the .srt file

You have to copy everything between (but not including) "--START--" and "--END--" and paste into a text editor. Then save the file with the .srt termination.

@adolfont
Copy link
Author

adolfont commented Oct 18, 2023

How was this code generated?

I just adapted the code from the Livebook Smart cell that transcribes an audio. Watch this video to see this smart cell in action: Transcription with Whisper using Neural Network Smart cell

@adolfont
Copy link
Author

adolfont commented Oct 18, 2023

Result for "There is no partial application in Elixir! Nor in Haskell!" https://www.youtube.com/watch?v=YfwW_lpBW4U

1
00:00:00 --> 00:00:08
Hello, my name is Adolfo Neto and I'm here to show you why I think there is no partial

1
00:00:08 --> 00:00:12
application in Elixir or in Haskell.

1
00:00:12 --> 00:00:27
Let me just start by opening the Haskell REPL, GACI, then I'm going to create a function f that receives two numbers, num1, num2,

1
00:00:27 --> 00:00:32
and returns the product of those two numbers.

1
00:00:32 --> 00:00:36
Okay, if I call this function with three and four,

1
00:00:36 --> 00:00:43
the result is 12. If I call it with five and three, the result is 15. And that's okay.

1
00:00:43 --> 00:00:48
And if I call it with only one argument,

1
00:00:48 --> 00:00:50
it returns an error.

1
00:00:50 --> 00:00:55
But take a look, the error is not that you cannot apply f

1
00:00:55 --> 00:00:59
to only six, is that there is no instance

1
00:00:59 --> 00:01:03
for show integer integer.

1
00:01:03 --> 00:01:07
So let me ask Haskell,

1
00:01:07 --> 00:01:10
what's the type of f?

1
00:01:10 --> 00:01:14
The type of f is this.

1
00:01:14 --> 00:01:18
It means that a is a number

1
00:01:18 --> 00:01:22
and it is number, number, number.

1
00:01:22 --> 00:01:27
So f is not a function that receives two arguments

1
00:01:28 --> 00:01:30
and returns a result.

1
00:01:30 --> 00:01:33
It's a function that receives one argument

1
00:01:33 --> 00:01:38
and then returns another function.

1
00:01:39 --> 00:01:44
And then, if you apply to the resulting function,

1
00:01:44 --> 00:01:48
another argument, it returns another value.

1
00:01:48 --> 00:01:51
So let me show you what I mean by that.

1
00:01:51 --> 00:01:57
Suppose I say now that g equals to f of 6.

1
00:01:57 --> 00:01:59
See no error.

1
00:01:59 --> 00:02:03
I ask now the type of g, it returns.

1
00:02:03 --> 00:02:08
It's a function that receives a number and returns a number so now if

1
00:02:08 --> 00:02:23
i call g of three it returns 18 so i could get the same result by doing this i'm calling f to six

1
00:02:20 --> 00:02:24
So in this, I'm calling f to six,

1
00:02:24 --> 00:02:27
the result is a function,

1
00:02:27 --> 00:02:32
and then I call the result to three,

1
00:02:32 --> 00:02:35
and the return is 18.

1
00:02:35 --> 00:02:39
Why am I saying that this is not partial application?

1
00:02:39 --> 00:02:41
Because for me, partial application

1
00:02:41 --> 00:02:44
is what you have in Elixir.

1
00:02:44 --> 00:02:49
Let me show here the live book.

1
00:02:52 --> 00:03:05
Now, I'm going to create a function, Matthew, math, do, oops sorry.

1
00:03:07 --> 00:03:11
And def f, it receives two numbers,

1
00:03:12 --> 00:03:17
num one, num two, and then it returns

1
00:03:18 --> 00:03:22
num one times num two.

1
00:03:24 --> 00:03:30
Sorry, let me increase the font here.

1
00:03:30 --> 00:03:39
Okay, now if I click on re-evaluate, oh there's no space here, re-evaluate. Okay,

1
00:03:39 --> 00:03:50
now I have a function called math of f and you see here it's telling me math.f which is the name of

1
00:03:50 --> 00:04:00
this function here it's a function with arity 2 so there's no way that I can do can I that I can call

1
00:04:00 --> 00:04:05
call just one argument.

1
00:04:05 --> 00:04:09
I have to use both arguments.

1
00:04:09 --> 00:04:11
So this is an error,

1
00:04:11 --> 00:04:15
but you see here, there is no instance for show,

1
00:04:15 --> 00:04:19
so it was not possible to show this,

1
00:04:19 --> 00:04:24
but it was possible to assign the result of f of 6 to g,

1
00:04:24 --> 00:04:28
and to use it with other arguments.

1
00:04:29 --> 00:04:32
But here I cannot, it's saying, no,

1
00:04:32 --> 00:04:35
there is no function with just one argument.

1
00:04:35 --> 00:04:39
You cannot do partial application with Elixir.

1
00:04:40 --> 00:04:43
So if I want to do that,

1
00:04:43 --> 00:04:46
suppose I want to have, for instance,

1
00:04:46 --> 00:04:50
what I have here with G,

1
00:04:50 --> 00:04:52
which is F of six.

1
00:04:52 --> 00:04:55
Let me define a G,

1
00:04:55 --> 00:05:02
def G, which receives a number.

1
00:05:02 --> 00:05:04
What does it do?

1
00:05:04 --> 00:05:09
It calls F with six fixed and then no.

1
00:05:11 --> 00:05:13
Let me reevaluate.

1
00:05:13 --> 00:05:18
Now I can have math.g,

1
00:05:18 --> 00:05:21
which number did I use here?

1
00:05:21 --> 00:05:22
Three.

1
00:05:23 --> 00:05:24
Three.

1
00:05:24 --> 00:05:28
Now I can have 18 here.

1
00:05:28 --> 00:05:29
So it's different.

1
00:05:29 --> 00:05:33
It's in Elixir,

1
00:05:33 --> 00:05:38
you can have functions with one, two, three or more arguments

1
00:05:40 --> 00:05:43
and you cannot not have a partial application.

1
00:05:43 --> 00:05:48
You must always apply a function using all arguments.

1
00:05:50 --> 00:05:53
But okay, in Elixir you have the full arguments,

1
00:05:53 --> 00:05:55
but that's another story.

1
00:05:55 --> 00:05:58
But you cannot have partial application.

1
00:05:58 --> 00:06:01
And in Haskell, you also don't have

1
00:06:01 --> 00:06:03
exactly partial application.

1
00:06:03 --> 00:06:05
But what we call partial application,

1
00:06:06 --> 00:06:11
something like this, when you get a function, which is in fact a function

1
00:06:11 --> 00:06:17
that returns another function, and then you just use the first argument,

1
00:06:17 --> 00:06:19
you just fix the first argument.

1
00:06:21 --> 00:06:31
So if in fact you you can see add int int int as add int times int into int.

1
00:06:31 --> 00:06:39
So that's what you call partial application in Haskell.

1
00:06:39 --> 00:06:45
In the next video, maybe we will show you how to do curing,

1
00:06:45 --> 00:06:49
which is related to partial

1
00:06:49 --> 00:06:56
application it's yes it's a bit related see you next video bye bye

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment