Media
16/05/2024
hololive, vtubers, kdenlive, aegisub, houshou marine, oozora subaru
Like many other people I got introduced to VTubers, specifically Hololive, during the pandemic. I can't remember when exactly it happened by I eventually decided on picking Marine as my favorite member.
There's this one stream featuring Marine and Subaru which I particularly liked so on a whim I thought of taking a shot at clipping + translating a segment. I went through the trouble of trying out different softwares and settled on a decent workflow which I though I could share.
The basic flow of making a clip and subbing it is as follows
yt-dlp is a fork of youtube-dl, a command-line tool to download audio/video from not only youtube but thousands of other sites. Since I'm using windows I installed it by simply downloading the yt-dlp.exe
. It's not required but ffmpeg, an awesome open-source suite of video processing libraries, is recommended for some additional functionality. yt-dlp provides patched builds of ffmpeg that are tweaked so that they work smoothly together.
Using the yt-dlp
command anywhere requires setting it in the PATH but I opted to place both exes in the same directory and execute the commands from there.
// go to directory with yt-dlp.exe yt-dlp -f 'bestvideo+bestaudio' <youtube-video-url>
I wanted the best possible quality at this point since we're re-encoding down the line but the options for the yt-dlp
command were hilariously convoluted so I ended up using the above command suggested by ChatGPT.
I tried various free video editing software including OpenShot, DaVinci Resolve, Shotcut and Kdenlive. I settled on Kdenlive since it's open-source, light weight and doesn't require any sort of sign up for use. Shoutout to lossless-cut for when you only want to get a simple segment without any edits.
Since we're adding subtitles to specific timings of the video later on, it's way easier to finish all the editing in this step. Edits like image overlays that don't change the overall length of the video can be squeezed in later but things like cuts and transition length should be set in stone.
Kdenlive has an unorthodox method of editing a crossfade between clips. In most video editing software you simply overlap two clips on top of each other but in Kdenlive you put the clips on two separate track and click on the bottom corner of the top track (where a purple circle shows up) to create a wipe transition. Similarly you drag the edges of the audio clip to insert a fade in/fade out effect.
Adding subtitles in the next step requires a video file but its only for referencing the timings and doesn't necessarily have to be the final file which the subtitles gets merged into. Therefore it's okay to tentatively render a low quality video. I found out the hardware accelerated options like NVENC H265 ABR
have pretty nice speed to quality ratio.
I could have added subtitles using the text feature in Kdenlive but that results in a pretty awkward workflow of managing every line as a text asset. I decided on using Aegisub which is a software specifically for timing and creating subtitles. It has a slight learning bump at first but it's infinitely smoother to work with once I got used to it. Before starting I recommend the following settings.
Snapping allow you to easily set a subtitle to the start or end of another line and 'show inactive lines' just gives you more information to work with. Also turn of spectrum analyzer mode, situated within the top right icons.
The subtitling process is as follows
Space Q W E D
are audio controls that help with getting the right timing, I recommend experimenting with them.C
and lead out V
for easier reading.enter
and then start setting the duration of the next one. Instead of immediately filling the content of the line right now, I first go through the entire video just setting the durations for all lines. Additionally, it's easier to do this per speaker if there are multiple speakers in the video.Subtitle > Sort Selected Lines
to group lines by speaker first, then chronologically second.Shift-Enter
in a more natural position than the automatic one.The UI with some points of interests.
insert new line
or video controls but anything can be set in View > Options > Interface > Hotkeys
. You click New
under the Default tree and assign an action to your preferred hotkey; for example I added video/play/line
to Ctrl-Q
.Set start of selected subtitles to current video frame
. This is useful to sync a line to a specific change in visuals like a zoom-in. (Technically the key-frames visualizer is supposed to help with this in the audio spectrum but they weren't showing up for me by default)There are two prominent formats for subtitles that Aegisub can export to. .srt
is the barebones format which simply specify the text and timings. .ass
is the other one which includes styling properties like color, font and position. Obviously I wanted to use the latter but it turns out a lot of video editing software don't support directly importing this format.
Theoretically Kdenlive does support .ass
imports but for some reason I walked into a pretty consistent crash when doing so. I looked on the forums but couldn't find any resolution, sucks but I guess that just comes with open-source development. As a workaround I decided to use the aforementioned ffmpeg which is awesome and supports burning in subtitles. If you used a temporary low quality video for subtitling, nows the time to render a good quality video file, I used the Ultra High Definition MP4-H265
setting.
.\ffmpeg.exe -i <the-video-file.mp4> -vf "ass=<the-subtitle-file.ass>" -c:v libx264 -crf 18 -c:a copy <the-output-file.avi>
The above command hardsubs using the libx264 encoder with a crf value of 18. Apparently the valid range is 0-51 with a lower number being a higher quality and 18 considered to be 'visually lossless'
I've uploaded the completed video here. All in all it took around 10 to 20 hours with the most time spent on translation and following that, the trial and error of trying and learning new tools. I'd say the tedious part is rewatching the same segment to the point where the jokes and content start to become uninteresting so I have a new found appreciation towards the passion of people who consistently upload many clips.