Month 3 Box - AI Deep Dive

Lesson 5: Speech Recognition and Text-to-Speech on Raspberry Pi

Today, your Raspberry Pi learns to listen and speak. In this lesson, we’ll explore offline voice interaction using speech-to-text (STT) and text-to-speech (TTS) so your Pi can follow voice commands and talk back—all with the included USB microphone and speaker.


🧠 What You’ll Learn Today:

  • How to recognize simple voice commands using offline STT tools like Vosk
  • How to convert text into speech using tools like eSpeak NG
  • How to connect and use your USB microphone
  • How to build a basic two-way voice interface for your AI assistanT


🎙️ Speech Recognition (STT):

  • We’ll use Vosk, a fast and lightweight offline STT engine
  • Capture audio input using sounddevice in Python


🔌 Getting Audio In:

  • Plug in your USB microphone
  • Run arecord -l to confirm your Pi detects it
  • Use Python libraries to capture audio snippets


🔊 Making the Pi Talk (TTS):

  • Tools like eSpeak NGFlite, or even Google TTS (if internet is OK)
  • Useful for:

✅ Reading out detected objects

✅ Confirming commands

✅ Building personality into your AI


🧪 Hands-On Activity:

Create a simple voice assistant that can:

  • Listen to a question
  • Convert speech to text
  • Match the phrase to a command
  • Speak (send to Discord/Telegram) back the appropriate response


Starter command ideas:

  • “What’s your name?”
  • “Tell me a joke.”
  • “What’s the weather?”


🛠️ Troubleshooting Tips:

  • 🎙 If mic isn’t working:

   → Run arecord -l or alsamixer to check devices

  → Ensure your script is listening to the correct device ID

  • 🔊 If speaker is too quiet:

  → Check connections and power draw

  → Adjust volume with alsamixer


📝 Homework:

  • Complete your two-way voice interaction script
  • Post a short video of your Pi talking back in #MONTH3 on Discord

🔥 Bonus: Add a custom voice command that triggers hardware (like a light or buzzer)


🚀 Up Next:

We’re taking things to the next level with offline chatbots powered by local LLMs. Your Pi is about to start understanding and conversing on a whole new level.