YouTube automatic captions to .srt subtitle format


If you know how to download the video from YouTube, then you may like to download the automatic captions (in English) as the subtitle. The automatic captions unlike the “closed captions”, “closed captions” can be downloaded using the userscript such as Download YouTube Captions. With the script, we can download the captions as the .srt subtitle format.

However, automatic captions is different. It is created by YouTube based on speech recognition, thus the captions are not very accurate. But I personally feel that it may be a little useful. Therefore, I have done some scripting to solve the problem semi-manually. Semi-manual is because the preparation of the subtitle have to do it manually. I do not spend time to write a userscript to solve it.

In order to convert the automatic captions into the .srt,

  1. Go to the YouTube page of our interested video.
  2. Click the “Transcript” icon which is besides “About”, “Share”, and “Add to”. This will show a frame of English (automatic captions). Now we are going to copy all the text in the frame.
  3. Using web browser’s Inspector by right-click. Then choose the parent HTML element of these captions, then right-click the element to “Copy Inner HTML” and save to a plain text file as HTML file format.
  4. Open the HTML file format with the web browser, this will show the time and subtitles for every two lines. Copy these text to another plain text file.
  5. Finally, use the Perl script below to convert the plain text file.
#!/usr/bin/perl
# Download the auto generated caption (English) from the internet, convert to the text.
# Then this script is to convert the text into the .srt format
use strict;
use warnings;
 
my $file = $ARGV[0];
 
my @time,my @subtitles;
 
open(FILE,$file);
while(<FILE>) {
    my $line = $_;
    $line =~ s/^\s+|\s+$//g;
    if($line =~ /^(\d+:\d+)/) { #Updated (thanks to Daniel)
        push @time,$1;
    }
    elsif($line =~ /(.+)/) {
        if(length($1)) {
            push @subtitles,$1;
        }
    }
}
close(FILE);
 
for(my $i=0;$i<@subtitles-1;$i++) {
    print "00:$time[$i],000 --> 00:$time[$i+1],000\n";
    print $subtitles[$i],"\n\n";
}
 
my $next = $time[@subtitles-1];
if($next =~ /((\d+):(\d+))/) {
    my $temp = $3+5;
    $next = "$2:$temp";
}
 
print "00:$time[@subtitles-1],000 --> 00:$next,000\n";
print $subtitles[@subtitles-1],"\n";

The steps 1-4 are done manually. It is possible to convert the above steps using JavaScript (userscript). But it is too time consuming for me.

Advertisements