yt2doc – To transcribe your videos into Markdown documents | Open source

Sunday 24th November 2024 08:43 AM

Are you tired of spending hours transcribing your YouTube videos by hand? Or maybe you're looking for an efficient way to turn your podcasts into blog posts? Well I have good news for you: yt2doc is here to streamline your workflow!

It's a tool that can automatically turn any YouTube video or podcast into a perfectly structured Markdown document, with AI-generated paragraphs, chapters, and even titles.

Developed by the talented Shun Liang, this open-source tool is a true virtual assistant for all content creators, journalists, students or simply curious people who wish to make the most of the audio and video resources available online. Moreover, yt2doc is designed to work entirely locally, without relying on external APIs, which guarantees the confidentiality of your data.

yt2doc relies on the power of Whisperthe speech recognition model developed by OpenAI. Thanks to it, the tool is capable of transcribing the audio content of your videos or podcasts with remarkable precision. But where yt2doc What really stands out is in the post-processing of this raw transcription.

Indeed, most existing transcription tools are primarily geared toward generating subtitles and often provide a continuous block of text without line breaks or segmentation, making reading difficult. Whisper, for example, does not generate line breaks in its transcriptions. Without post-processing, you end up with a huge block of indigestible text.

yt2docfor its part, prioritizes readability. It goes further by intelligently structuring content to create an easy-to-read document. To do this, he uses Segment Any Text (SaT)a library specializing in text segmentation. Thanks to it, your transcription is automatically divided into logical sentences and paragraphs, which makes reading much more pleasant and natural. Additionally, you have the option to customize the SaT template used according to your preferences.

And if your video is not already chaptered (which is often the case for podcasts, for example), yt2doc can use a language model (LLM) to automatically generate relevant chapter headings. It's like having a built-in assistant editor! Lightweight models that work well include: gemma2:9b, llama3.1:8b et qwen 2.5:7b.

You will have understood it, yt2doc is not a simple transcription tool, but a true all-in-one solution for transforming your audio and video content into structured and usable documents.

Before installing it, make sure you have ffmpeg installed on your system. This is an essential prerequisite for yt2doc can function correctly. ffmpeg is used to process audio and video streams. If you haven't already done so, here are the commands to install it:

On macOS:

brew install ffmpeg

Sur Debian/Ubuntu :

sudo apt install ffmpeg

Then you can install yt2doc. The recommended method is to use pipxa handy tool for installing Python applications in isolated environments:

pipx install yt2doc

If you prefer to use uva super-fast Python package manager, it's also possible:

uv tool install yt2doc

To get help using the tool, you can use the command:

yt2doc --help

Now that yt2doc is installed, let's see how to use it. The basic command for transcribing a YouTube video is:

yt2doc --video

For example, if you want to transcribe a TED talk, you could use:

yt2doc --video

By default, yt2doc will display the transcript directly in your terminal. But you can of course save the result in a Markdown file for later consultation:

yt2doc --video -o ma_transcription.md

What if you want to transcribe an entire YouTube playlist? No problem :

yt2doc --playlist -o dossier_de_sortie

As I said in my intro, one of the most interesting features of yt2doc is its ability to automatically segment and chapter videos that are not already chaptered. For this you will needTo bea tool that allows you to run language models locally. Once To be installed and configured, you can use the following command:

yt2doc --video --segment-unchaptered --llm-model

For example, with the model gemma2:9b :

yt2doc --video --segment-unchaptered --llm-model gemma2:9b

This command will not only transcribe the video, but also cut it into logical chapters with AI-generated titles. This is especially useful for long videos or podcasts that don't have predefined chapters.

yt2doc is not limited to YouTube. You can also use it to transcribe podcast episodes on Apple Podcast:

yt2doc --audio --segment-unchaptered --llm-model

Another interesting aspect of yt2doc is its flexibility in terms of configuration. By default it uses faster-whisper as a transcription backend, but you can adjust various settings to optimize performance depending on your hardware:

bashyt2doc --video --whisper-model --whisper-device --whisper-compute-type

The options for --whisper-model, --whisper-device et --whisper-compute-type are detailed in the faster-whisper documentation.

If you're using a Mac with an Apple Silicon chip, you can take advantage of whisper.cpp for even better performance, as it leverages Apple's integrated GPU. The support of whisper.cpp was implemented in yt2doc :

yt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable --whisper-cpp-model

As mentioned previously, yt2doc utilise Segment Any Text (SaT) to segment the transcription into sentences and paragraphs. You can also customize the SaT model used:

yt2doc --video --sat-model

The list of available SaT models is accessible here.

You will have understood it, yt2doc is an extremely powerful and flexible tool that can adapt to a multitude of use cases. But like any AI-based tool, yt2doc is not perfect. The quality of the transcription will always depend on the audio quality of the source, and automatically generated titles may sometimes require some manual adjustments. Well, compared to the time you save, these little inconveniences are very negligible!

Many thanks to NiKo for the info! You can follow him on Twitter @N1K0 for more exciting tech discoveries.

Source