MELON SOUR

About

Posts

Projects

A Walkthrough on Clipping and Subbing VTubers

Media

16/05/2024

hololive, vtubers, kdenlive, aegisub, houshou marine, oozora subaru

Like many other people I got introduced to VTubers, specifically Hololive, during the pandemic. I can't remember when exactly it happened by I eventually decided on picking Marine as my favorite member.

There's this one stream featuring Marine and Subaru which I particularly liked so on a whim I thought of taking a shot at clipping + translating a segment. I went through the trouble of trying out different softwares and settled on a decent workflow which I though I could share.

The basic flow of making a clip and subbing it is as follows

  • Use yt-dlp to download the video
  • Edit it using Kdenlive
  • Create and time the subtitles with Aegisubs
  • Combine the subtitles into the video with ffmepg

Downloading the Video

yt-dlp is a fork of youtube-dl, a command-line tool to download audio/video from not only youtube but thousands of other sites. Since I'm using windows I installed it by simply downloading the yt-dlp.exe. It's not required but ffmpeg, an awesome open-source suite of video processing libraries, is recommended for some additional functionality. yt-dlp provides patched builds of ffmpeg that are tweaked so that they work smoothly together.

Using the yt-dlp command anywhere requires setting it in the PATH but I opted to place both exes in the same directory and execute the commands from there.

// go to directory with yt-dlp.exe yt-dlp -f 'bestvideo+bestaudio' <youtube-video-url>

I wanted the best possible quality at this point since we're re-encoding down the line but the options for the yt-dlp command were hilariously convoluted so I ended up using the above command suggested by ChatGPT.

Editing the Video

I tried various free video editing software including OpenShot, DaVinci Resolve, Shotcut and Kdenlive. I settled on Kdenlive since it's open-source, light weight and doesn't require any sort of sign up for use. Shoutout to lossless-cut for when you only want to get a simple segment without any edits.

Since we're adding subtitles to specific timings of the video later on, it's way easier to finish all the editing in this step. Edits like image overlays that don't change the overall length of the video can be squeezed in later but things like cuts and transition length should be set in stone.

Kdenlive has an unorthodox method of editing a crossfade between clips. In most video editing software you simply overlap two clips on top of each other but in Kdenlive you put the clips on two separate track and click on the bottom corner of the top track (where a purple circle shows up) to create a wipe transition. Similarly you drag the edges of the audio clip to insert a fade in/fade out effect.

Adding subtitles in the next step requires a video file but its only for referencing the timings and doesn't necessarily have to be the final file which the subtitles gets merged into. Therefore it's okay to tentatively render a low quality video. I found out the hardware accelerated options like NVENC H265 ABR have pretty nice speed to quality ratio.

Making the Subtitles

I could have added subtitles using the text feature in Kdenlive but that results in a pretty awkward workflow of managing every line as a text asset. I decided on using Aegisub which is a software specifically for timing and creating subtitles. It has a slight learning bump at first but it's infinitely smoother to work with once I got used to it. Before starting I recommend the following settings.

Snapping allow you to easily set a subtitle to the start or end of another line and 'show inactive lines' just gives you more information to work with. Also turn of spectrum analyzer mode, situated within the top right icons.

The subtitling process is as follows

  • Load the video made in the previous step, this shows the audio wave form in the top right.
  • Use the top right GUI to select a duration. You can either drag continuously or use left and right click individually to set the start and end times respectively. Space Q W E D are audio controls that help with getting the right timing, I recommend experimenting with them.
  • After matching the times to exactly when the line starts and ends, I like to add some lead in C and lead out V for easier reading.
  • Commit the current line with enter and then start setting the duration of the next one. Instead of immediately filling the content of the line right now, I first go through the entire video just setting the durations for all lines. Additionally, it's easier to do this per speaker if there are multiple speakers in the video.
  • Repeat 2-5 until all the lines for a single speaker are timed, then repeat from the start of the video for any more speakers. I use Subtitle > Sort Selected Lines to group lines by speaker first, then chronologically second.
  • Now go through your lines and add the content of your translation. For styles I used a huge sans-serif font situated towards the middle of the screen. The vertical margins are staggered by about 200 between the speakers. You can also manually insert a line break with Shift-Enter in a more natural position than the automatic one.

The UI with some points of interests.

Some tips

  • Aegisub doesn't set default shortcuts for some common actions like insert new line or video controls but anything can be set in View > Options > Interface > Hotkeys. You click New under the Default tree and assign an action to your preferred hotkey; for example I added video/play/line to Ctrl-Q.
  • Subtitles that have too much words relative to the duration show up in red but they can be beneficial in certain cases to convey a rushed tone
  • One of the options on the top-left is Set start of selected subtitles to current video frame. This is useful to sync a line to a specific change in visuals like a zoom-in. (Technically the key-frames visualizer is supposed to help with this in the audio spectrum but they weren't showing up for me by default)

Hardsubbing the Video

There are two prominent formats for subtitles that Aegisub can export to. .srt is the barebones format which simply specify the text and timings. .ass is the other one which includes styling properties like color, font and position. Obviously I wanted to use the latter but it turns out a lot of video editing software don't support directly importing this format.

Theoretically Kdenlive does support .ass imports but for some reason I walked into a pretty consistent crash when doing so. I looked on the forums but couldn't find any resolution, sucks but I guess that just comes with open-source development. As a workaround I decided to use the aforementioned ffmpeg which is awesome and supports burning in subtitles. If you used a temporary low quality video for subtitling, nows the time to render a good quality video file, I used the Ultra High Definition MP4-H265 setting.

.\ffmpeg.exe -i <the-video-file.mp4> -vf "ass=<the-subtitle-file.ass>" -c:v libx264 -crf 18 -c:a copy <the-output-file.avi>

The above command hardsubs using the libx264 encoder with a crf value of 18. Apparently the valid range is 0-51 with a lower number being a higher quality and 18 considered to be 'visually lossless'

Finished Video

I've uploaded the completed video here. All in all it took around 10 to 20 hours with the most time spent on translation and following that, the trial and error of trying and learning new tools. I'd say the tedious part is rewatching the same segment to the point where the jokes and content start to become uninteresting so I have a new found appreciation towards the passion of people who consistently upload many clips.