Yt2txt - YouTube Video to Text Transcriber

Convert YouTube video soundtracks to text using Faster Whisper and PyTorch. Supports GPU acceleration (CUDA/MPS) for fast transcription and audio extraction.

pythonaiwhispertranscriptionyoutubepytorch

Yt2txt

Description

This project allows you to download YouTube videos, extract audio, and transcribe the audio files into text using the Faster Whisper library.

Installation

1. Prerequisites

  • Python 3.8 or newer must be installed. Download Python.
  • A GPU compatible with CUDA (e.g., NVIDIA RTX 4090) is recommended for better performance on Windows/Linux.
  • macOS users with Apple Silicon (M1/M2/M4) can use the CPU version optimized for Metal Performance Shaders (MPS).

2. Set Up a Virtual Environment

Create and activate a virtual environment:

On Windows

python -m venv .venv
.\.venv\Scripts\Activate

On macOS/Linux

python3 -m venv .venv
source .venv/bin/activate

3. Install Dependencies

Install the required libraries for your project:

pip install -r requirements.txt

4. Install PyTorch

On Windows/Linux with NVIDIA GPU (CUDA)

Install PyTorch with CUDA support. Replace cu118 with your specific CUDA version if necessary.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

On macOS with Apple Silicon (MPS)

Install the MPS-optimized version of PyTorch:

pip install torch torchvision torchaudio

For CPU-only

For systems without a GPU:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

5. Verify Installation

Run the following script to ensure PyTorch is configured correctly:

python -c "import torch; print(torch.cuda.is_available()); print(torch.backends.mps.is_available()); print(torch.__version__)"
  • On Windows/Linux, torch.cuda.is_available() should return True.
  • On macOS with MPS, torch.backends.mps.is_available() should return True.

Usage

Run the project with the following command:

python your_script.py --url "https://youtube.com/..." -o output -m large-v3
  • --url: The URL of the YouTube video or playlist.
  • -o: The output directory for the transcribed files.
  • -m: Whisper model size to use (e.g., large-v3).

Additional Notes

Installing CUDA for Windows/Linux with GPU

If using CUDA on an NVIDIA GPU, make sure to:

Optimization for macOS

PyTorch leverages MPS (Metal Performance Shaders) to accelerate computations on Apple Silicon. No additional configuration is required.


Development and Contributions

If you'd like to contribute, ensure you test the project on multiple platforms and configurations (Windows, macOS, GPU, CPU).


Authors

Project developed by [Your Name/Team].