RSS Feed

How to extract zip file which contains filenames with SHIFT_JIS encoding in Ubuntu


If a zip file contains the filenames which are Japanese, the encoding normally is SHIFT_JIS especially Windows. To extract the files, normal “unzip” will not work. 7z is a good solution.

The following commands are done in terminal. Firstly, we need to change the LANG environment variable, because the default LANG is normally UTF-8. Since the filenames are SHIFT_JIS, which is not UTF-8, we need to change it.

LANG=ja_JP # Don't use UTF-8, use "export" if needed

Then,

7z x jp.zip    #extract the files and preserve the encoding

As a result, a list of  unreadable files are extracted. Then use convmv command to convert the filenames. Assuming all the files are in a same folder.

convmv --notest -f shift-jis -t utf8 *.* #convert all the filename to UTF8

If we don’t know the character encoding for the filenames, we can also use the iconv to check the encoding before extract them,

env LANG=ja_JP 7z l jp.zip | iconv -f SHIFT-JIS -t UTF8

Update (2011-05-16):
The easiest way is using the 7-zip through Wine and installing all the required fonts through winetricks.

About these ads

About Allen Choong

A cognitive science student, a programmer, a philosopher, a Catholic.

15 responses »

  1. Please, fix you article:

    “7z a jp.zip” this don’t extract. Use the command “7z e” to extract.

    Reply
  2. “7z e” extracts without preserving paths, “7z x” extracts while preserving paths

    similar thing with convmv, you need “-r” switch for it do work recursively through all directories. Using “*” instead “*.*” also is a good idea, unlike with Windows, it doesn’t match files without an extension…

    Reply
  3. Thank you very much for this! Broken encoding was driving me totally insane!

    Reply
  4. unzip -O CP932 japanese_sjis.zip

    Reply
    • this is arch only. It can be compiled for other linux distro’s, but info-zip is a real pain in the ass. (I did it though). It’s the best option here.

      Reply
  5. Allemcch, thank you so much for posting this. I needed some Shift-JIS-encoded documents for work, and with your help I was able to read the filenames properly.

    Reply
  6. it works like a charm!

    Reply
  7. Pingback: Extracting files from zip which contains non-UTF8 filename in Linux | Allen's Blog 2.0

  8. Though this solution is excellent (including the ones in the comments with the sneaky -O option in unzip, neat!), I guess it will only work if you have this locale installed and/or created with locale.gen.

    Here, I have a Lubuntu installation with, for instance, does not even provide a ISO-8859-15 encoding by default: all is UTF-8 only. Wait, how do I know that?
    By typing

    locale -a

    Reply
  9. Hi Allen, Could you please give me the full command line using this statement?
    env LANG=ja_JP 7z l jp.zip | iconv -f SHIFT-JIS -t UTF8. You mention in your article that you can detect the encoding prior to extracting the file using 7-zip. I tried using 7za env LANG=ja_JP 7z l [file name].zip | iconv -f SHIFT-JIS -t UTF8 but that didn’t work. Thanks!

    Reply
  10. Hey, have you checked out The Unarchiver? It’s the only utility I had zero problems with when extracting Japanese archives. Just works.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 163 other followers

%d bloggers like this: