You won't believe how fast it is | Raspberry Pi Speech-to-Text

4 місяці тому

Faster than real-time offline speech transcription on Raspberry Pi - or any other computing system, including Orange Pi, Jetson Nano and many other Linux SBCs. A quick hands-on guide from installing necessary packages to running Whisper model with whisper.cpp or faster-whisper.
Whisper.cpp Python bindings repository:
github.com/AIWintermuteAI/whi...
faster-whisper:
github.com/SYSTRAN/faster-whi...
Benchmark gist:
gist.github.com/AIWintermuteA...

КОМЕНТАРІ: 72

@Hardwareai 9 днів тому

The follow-up video is also live on UKposts - find it in my channel. Support my work on making tutorials and guides on Patreon! www.patreon.com/hardware_ai

@C0ldSpace 3 місяці тому

I need this because im building a translator for my sister. There’s a new person in her class that can only speak Spanish, so im making this.

@Hardwareai Місяць тому

Good usage!

@brianmeyer107 3 місяці тому

love this video! i rarely find myself pausing, and rewinding but here the details were coming fast enough that i became the weak link. love this.

@Hardwareai 3 місяці тому

Glad to hear I found the right pace. Thank you for the feedback!

@newtownsmells 3 місяці тому

Hey this is incredible. really appreciate your work

@Hardwareai 3 місяці тому

Thank you so much 😀

@tribelessa 4 місяці тому

Hello! Great work, will try test it. Your projects are interesting (for me since Kendryte K210).

@Hardwareai 3 місяці тому

Thanks! I see you have been following my channel for a while :)

@emanuelepapa3548 15 днів тому

I’m using your repository. Thanks you

@Hardwareai 14 днів тому

Thanks for the feedback!

@exploring-electronic 4 місяці тому

Thanks for the work done fixing the whisper.cpp python bindings! I'll check them out.

@user-jo9ih9uj5p 4 місяці тому

😊

@Hardwareai 4 місяці тому

Yes, let me know if you run into any issues.

@newtownsmells 3 місяці тому

Would you consider showing how to implement live real time streaming with faster-whisper? Seems like that would be a huge way forward

@Hardwareai 3 місяці тому

Yes, this is much requested. So stay tuned.

@levbereggelezo 4 місяці тому

Well done! Was whisper.cpp compiled with BLAS optimizations?

@Hardwareai 4 місяці тому

No, it wasn't. It is a possible way to slightly improve the results, but at least on raspberry pi it will not change the outcome too much, faster-whisper still will be faster. Jetson series on the other hand might take advantage of CUBLAS, so it is more interesting.

@antoniorodriguez-ynyestosa5907 3 місяці тому

Hi! This is amazing! Thank you very much! Just a quick question, should it work on Windows? Because I get an error when I run "python -m build -w": * Building wheel... running bdist_wheel Building pybind11 extension... error: [WinError 193] %1 not a valid Win32 app ERROR Backend subprocess exited when trying to invoke build_wheel

@Hardwareai 3 місяці тому

Thank you for the feedback! While theoretically it SHOULD run on Windows as well, I only tested it on Raspberry Pi (so Debian Linux) and MacOS...

@bens4446 5 днів тому

I had heard about faster whisper on other channels but thought it couldn't work on an SBC because it uses GPU which an SBC doesn't have. I have no idea how you did this. Thanks!

@Hardwareai 4 дні тому

Interesting. No, it certainly can run on CPU - I made a follow-up on this video, explaining more about faster-whisper specifically, you can find it on my channel.

@phillipreay Місяць тому

How hard would it be to add a continuous background search process taking keywords from the conversation? I wanna have a screen in my office that's supporting the dialogue with more right brain material. Of course, they need to interrupt and follow the sauce for resource would be important.

@Hardwareai Місяць тому

follow the sauce for resource? Very interesting. Anyways, this is already shown in the example here: github.com/AIWintermuteAI/whispercpp/blob/e46fd2da91bab8cfd98a0af886230cc773afd982/examples/stream/stream.py#L18

@user-nf2pe4kr3n 2 місяці тому

Can the program be modified so that all recognized texts are consolidated into a single paragraph upon exiting the program?

@Hardwareai Місяць тому

Append strings to the list and then concatenate and print them at the end?

@danilovaz9839 4 місяці тому

oh man, please teach me the ways. Like, for real. I saw you provide 1:1 consultancy, but I need to know if your price is per meeting of for a full project.

@Hardwareai 4 місяці тому

The ways of hardware, tricky they are, young padawan... Okay, jokes aside - I did reply in the other comment xD long story short - I'm focused on getting my YT channel back on track at the moment, at least getting back monetization would be nice (YT took it away from me). So I'm not really doing consulting - but if your project is based on my videos/tutorials, I can provide some feedback.

@danilovaz9839 4 місяці тому

@@Hardwareai Oh master. Sorry I missed your last message! Thanks for replying again, though! Oh man, sad to hear you're not doing consulting. But I still appreciate watching your incoming videos so that's a win anyway. And yeah, your videos are the main inspirational source for me. So it'd be amazing to get some feedback as I'm sure I'll get stuck with something along the way - as its usual with all things computer related. May I let you know when that happens?

@Hardwareai 4 місяці тому

If you are doing something related to my projects, then yes :) QA is always welcome

@abdullahdogan5822 Місяць тому

hi, What should I do to make it understand in more than one language? Is this possible?

@Hardwareai Місяць тому

Use tiny model instead of tiny.en. Do keep in mind the quality of recognition is likely to be worse with multi-language model.

@yashvishah9315 Місяць тому

Can i use INMP441 Microphone Module I2S instead of respeaker 2-mics pi hat fir real time transcription? If yes what will be my pin configuration fot that? And will there be any changes on the code?

@Hardwareai Місяць тому

In theory you can use any audio input device. In practice your mileage will vary, some hardware choice will be more difficult to work with from software perspective. For pin configuration you can have a look at INMP441 related docs. The code uses SDL for audio capture, so if INMP441 can work with that, there should minimal to none code changes. Can't say for sure tho until you try :)

@yashvishah9315 Місяць тому

@@Hardwareai oh understood! So I have to select that microphone which supports SDL!?

@Hardwareai Місяць тому

If you want minimum code changes - yes. Otherwise, you could of course re-write the code to support any audio input device - whisper model by itself is obviously device agnostic, as long as you can provide audio in a specified format supported by the model.

@yashvishah9315 Місяць тому

Ohkii! Understood 😃 thank you!!

@user-cl2og 4 місяці тому

I downloaded this on the Raspberry Pi 4, bookworm 64 bit and I got the following error: fatal: remote error: upload-pack: not our ref c9d5095f0c64455b201f1cd0b547efcf093ee7c3 fatal: Fetched in submodule path 'extern/whispercpp/bindings/ios', but it did not contain c9d5095f0c64455b201f1cd0b547efcf093ee7c3. Direct fetching of that commit failed. fatal: Failed to recurse into submodule path 'extern/whispercpp'. Any suggestions?

@Hardwareai 4 місяці тому

It sounds like you git cloned the upstream? I solved exactly the same issue in my fork

@Hardwareai 4 місяці тому

Can you do git log and paste the output?

@user-cl2og 4 місяці тому

@@Hardwareai I messaged you on linkedin because I think the youtube spam filter is not letting me paste the output.

@Hardwareai 4 місяці тому

Oh, all right, that is possible. For future generations, who find this comment - if it is code related, creating an issue in GH is preferable.

@lagkdd2913 Місяць тому

When i run in raspberry pi, It raise an error: AttributeError: module 'os' has no attribute 'add_dll_directory'

@Hardwareai Місяць тому

I built it just today and it works as expected. If you are still struggling, it's best to open an issue in my fork on GH.

@BogdanMnikov 4 місяці тому

I did find this video while coding the next big thing, how did you know 🤣

@Hardwareai 4 місяці тому

Magic 8 ball xD

@harokk4242 Місяць тому

error in pip install build error-externally managed enviorment

@Hardwareai Місяць тому

I think you missed one step in my video - regardless, here it is stackoverflow.com/questions/75608323/how-do-i-solve-error-externally-managed-environment-every-time-i-use-pip-3.

@Tyrone-Ward 4 місяці тому

Does this require Internet?

@Hardwareai 4 місяці тому

Nope. Completely offline.

@moneshraghu5598 Місяць тому

How to correct SDL error of audio device not found and what mic are you using???

@Hardwareai Місяць тому

Mic - reSpeaker 2-mic hat for Raspberry Pi. Your SDL troubles will heavily depend on the device you are trying to run this on ....

@moneshraghu5598 Місяць тому

@@Hardwareai which file in whispercpp do i alter to use usb mic as input?

@Hardwareai Місяць тому

Since it relies on SDL2 for sound capture, theoretically you don't have to change anything...

@LoneEntrepreneur 4 місяці тому

It's not real time. it's from a file, if you want to test real-time stream it and get output back as you speak

@Hardwareai 4 місяці тому

It is though? I start with whisper.cpp streaming example, which is also real-time for quantized model.

@LoneEntrepreneur 4 місяці тому

yes but you keep repeating real time when using pre-recorded file sent to the cloud. that's not the definition of real time, although the module is real time but the method and technique used is not@@Hardwareai

@Hardwareai 4 місяці тому

/scratching the head/ are we talking about the same video? at 2:30 I run whisper model on my voice in real-time, not from file, but from respeaker mic.

@LoneEntrepreneur 4 місяці тому

my bad, you're right it's different video. lol.@@Hardwareai

@40centuriones Місяць тому

It would be awesome if you combined this with piper(rhasspy) to make a hardware device capable of STT to TTS. It would be a Zentreya-style portable voice changer.

@Hardwareai Місяць тому

The TTS part of it would be more computationally expensive - but you can have a try!

@SouvikPal-notionvidz 2 місяці тому

I was trying to use this in a intel based mini pc running Ubuntu22.04 and ran into audio issues. When I run python stream.py --model_name tiny , I get ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available Traceback (most recent call last): File "/home/souvik/whispercpp/examples/stream/stream.py", line 30, in main transcription = self.transcriber.stream_transcribe(callback=self.store_transcript_handler, **kwargs) File "/home/souvik/whisper/lib/python3.10/site-packages/whispercpp/__init__.py", line 257, in stream_transcribe raise RuntimeError("Failed to initialize audio capture device.") RuntimeError: Failed to initialize audio capture device. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/souvik/whispercpp/examples/stream/stream.py", line 100, in transcriber.main(**vars(args)) File "/home/souvik/whispercpp/examples/stream/stream.py", line 32, in main assert transcription is not None, "Something went wrong!" AssertionError: Something went wrong!

@SouvikPal-notionvidz 2 місяці тому

The audio system is fine, I can record audio using parecord, arecord etc....and SDL libraries are all installed.

@Hardwareai Місяць тому

Yeah, these issues can be tough to diagnose unfortunately. The problem as you can see is not really with whisper.cpp, but rather with SDL not wanting to play nicely with your audio setup.

@newtownsmells 3 місяці тому

I canºt seem to get the stream.py to work. gives this error: ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available Traceback (most recent call last): File ".../whispercpp/examples/stream/stream.py", line 30, in main transcription = self.transcriber.stream_transcribe(callback=self.store_transcript_handler, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "..whisper/lib/python3.11/site-packages/whispercpp/__init__.py", line 257, in stream_transcribe raise RuntimeError("Failed to initialize audio capture device.") RuntimeError: Failed to initialize audio capture device. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "../examples/stream/stream.py", line 100, in transcriber.main(**vars(args)) File "..whispercpp/examples/stream/stream.py", line 32, in main assert transcription is not None, "Something went wrong!" ^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Something went wrong! Any ideas? i get this when i run list audio devices ERROR: Failed to initialized SDL: Audio target 'pulseaudio' not available

@Hardwareai 3 місяці тому

Hmm, I am able to find something on Google for Audio target 'pulseaudio' not available, but it is for OpenSUSE. Are you using the latest Raspberry Pi OS?

@newtownsmells 3 місяці тому

@@Hardwareai yeah i am using the latest pi os

@Hardwareai 3 місяці тому

Okay, then it is likely something specific to the mic setup. I was using reSpeaker 2 mic raspberry pi hat. First thing to try would be to see if the mic works correctly (with arecord) and then if it does, debug the issue with this particular mic and SDL. To summarize, the issue is not in the stream.py code or even whisper.cpp, but rather that SDL does not seem to be working with your mic setup...