I have around 15000 image, video, and audio files, along with a few others, which I have been transferring back and forth through FTP from my phone. Unfortunately, I discovered that some of the files were corrupted after so many transfers. However, this is not a significant issue since all of the media is backed up on Google Photos or other storage methods.
The problem is that the local and backup sets of files are not identical, and it would be challenging to compare checksums. The most effective solution seems to be to use a scanner to identify the corrupted files and manually download their copies.
I have searched extensively, but I have not been able to find a suitable tool. My closest attempt was a python script on Github called “check-media-integrity,” but I was unable to get it to work since I am on Windows 10.
Any suggestions would be greatly appreciated.
Thank you.
3 Answers
I have successfully resolved all the issues I was facing, and here is what worked for me:
To use “check-media-integrity,” I had to follow several steps:
- Firstly, I had to use the fork from garygan89 available on Github.
- Secondly, I needed to have all the required modules installed on both Python 2 and 3, as the original version was intended for Python 2, and the fork required a conversion to Python 3. So, to avoid any issues, I installed the requirements on both versions.
- Thirdly, I had to use Python Wheels available on the website https://www.lfd.uci.edu to install Pillow-SIMD, as it was the most significant issue that needed to be resolved. Compiling it from the source on Windows was challenging.
- Fourthly, I had to install the ffmpeg-python module as a requirement, although it was not listed on the project’s readme.md, it was listed on a separate requirements file.
- Fifthly, I had to specify all the switches for ffmpeg stuff to broaden its output.
- Sixthly, I experimented with different values for optional parameters such as timeout and threads. My optimal values were around 75 threads and a timeout of at least 120 seconds (400 seconds when dealing with big files).
- Lastly, I used “WizTree” and “robocopy” tools to apply the tool to smaller groups of files. These tools helped in identifying file extensions in the folders and then moving them to a “passed” folder.
To check PDF files, I didn’t get the desired result from “check-media-integrity,” so I used “PDFtk” instead. It can load and read a few dozen files at a time and displays an error message if any of them are broken.
There were also a few files without any extension, which I opened on “Notepad++” and tried to guess from the headers and file size. I found some .mp4, .pdf, .pgn, .png, .jpeg, and .apk files, and almost all of them worked fine after being renamed properly.
Using all of the above methods, I was able to scan through my fileset (which had grown to around 25,000 files) and find approximately 50 corrupt files that I could replace as desired.