I have an HP 1T external HDD. It is dying. Sh*t!
So, I made my final struggling onto it. Though it is not 100% work as planned, I still have around 320G usable space.
Firstly, I found that some files cannot be deleted, I had no choice but plug out the USB cable manually. Try to chkdsk with Windows since it is NTFS partition, but the chkdsk stopped and no response during the progress. Then I know, these were the bad signs. So, have to avoid using it and get a new HDD, and move whatever files able to be moved.
Backup like sh*t
I use Linux instead of Windows, because copying out the files with Windows is damn slow. So, I have no choice but to use Linux. During copying from this pity HDD to the new HDD, the copying process will stop without warning, and the HDD failed to work. Worst, there was no way to cancel the operation, only solution to plug it out. I repeated this procedure approximately 60-70 times (heuristic guess only).
There is one thing I have to mention, that is S.M.A.R.T. I had enabled it on the external HDD, but it did not show any useful feature to SAVE my date.
After copying out primary data and giving up some secondary data, I decided to re-format it as NTFS. Though I very dislike Windows, NTFS is still the main stream. NTFS and FAT32 are widely supported by the devices, for instance, LG video player. I also considered exFAT, but it is not as good as NTFS supported by Linux.
So, booted into Windows, plugged in the USB cable, format the disk without “Quick Format”, because I intended to have a thorough “chkdsk” scanning for bad sector.
Unfortunately, it was DAMN slow. To increase from 0% to 1% requires about 30 minutes. How am I going to live my following life?
Then I cancelled the format and gave up NTFS.
Since there was no more hope on NTFS, I planned to format as ext4. I run mkfs.ext4 with “-c” to check for bad blocks (something like bad sectors). But it failed. The hard disk failed to work until I re-plugged in the USB. I tried dmesg, found that there are a lot of errors like
usb 2-2: reset SuperSpeed USB device number 2 using xhci_hcd usb 2-2: LPM exit latency is zeroed, disabling LPM. blk_update_request: I/O error, dev sdb, sector 721688448
I concluded that it is not just bad sectors/blocks, but just failed to read the blocks.
So, I assumed that mkfs.ext4 with “-c” or even fsck.ext4 will not solve my problem. Those bad sectors should never be accessed. So, I decided to give up those sectors, meaning, skip those factors from the partitions. This can be done by using fdisk or cfdisk, creating the partitions based on the “sector” unit instead of size. We can create the partition to occupy the bad sectors area, then create next partition after it, and then delete the partition that has bad sectors.
Then, the next question is how to find the bad sectors. mkfs.ext4 and fsck.ext4 cannot solve, because they are checking for the bad sectors thoroughly through the partition. The solution is to use “badblocks”.
There are two things have to know, i) sector and ii) block. They are different things. badblocks command can identify the bad blocks, but not the bad sectors. However, fdisk allows us to get the total number of the sectors.
Now, we can do some maths here. Let’s say you run the “fdisk -l”, and get this,
Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
The total number of the sectors of the hard disk is 1953525168.
Now, to get the total number of blocks, use “cat /proc/partitions”, you will get something like this,
major minor #blocks name 8 0 976762584 sdb
So, that is the total number of blocks, 976762584. And you can do the calculation, 976762584*2 = 1953525168. So, 1 block = 2 sectors.
This is important information, because when we use the badblocks, the values are shown in block unit. But when we want to create partitions, we are using sector unit.
So, when we run the badblocks like
badblocks -v -s -o bad.txt /dev/sdb
It will show something like
Checking blocks 0 to 976762583
which is the first block (0) until the last block (976762584 -1).
Now, in case we stop/interrupt the badblocks, we can continue from any where we want. Or, we can just start from any block. For instance
badblocks -v -s -o bad.txt /dev/sdb 976762583 488381292
where the 976762583 is the last block we want to check, and 488381292 is the start block we want to check. (Please read the manual in detail.)
So, based on these tools and information, finally I found that there is around 320G contiguous safe space. So, I create the partition for it, and finally format it at ext4. Since it is a dying hard disk, I will not use it to store primary data, but secondary data and secondary backup. (Secondary data is unimportant; secondary backup is the duplicated backup, not one and only one.)
Actual plan for the next stage
Unfortunately, because the hard disk has critical failure, I cannot implement the next stage.
In my expectation, the hard disk may have multiple large, contiguous, and safe space. For example, 0-25% from the beginning is safe, and 60-100% is safe. As a result, I can create 2 partitions for these space. This is what I actually want. If this really happen, then I can use the LVM so that I can combine both partitions as a logical volume, and finally mkfs on it.
But since it does not happen, I cannot test on the LVM.