Automated Transcription and Translation
Today, on a whim, I wanted to study various courses in English, like biology and geography. There are quite a few The Great Courses offerings on Bilibili, but based on my previous viewing experience, these courses often use a lot of terminology; just listening—even with English subtitles—I still can’t fully understand them. So I figured I’d add bilingual Chinese–English subtitles to these course videos and then watch them. The technical plan is straightforward: use yutto to download videos from Bilibili, ffmpeg to extract audio from the videos, whisper.cpp to transcribe the audio into SRT subtitles (by the way, I’m using Apple M3 Silicon), and finally use the DeepSeek API to translate the SRT subtitles. ...