I have around 15000 image, video, and audio files, along with a few others, which I have been transferring back and forth through FTP from my phone. Unfortunately, I discovered that some of the files were corrupted after so many transfers. However, this is not a significant issue since all of the media is backed up on Google Photos or other storage methods.
The problem is that the local and backup sets of files are not identical, and it would be challenging to compare checksums. The most effective solution seems to be to use a scanner to identify the corrupted files and manually download their copies.
I have searched extensively, but I have not been able to find a suitable tool. My closest attempt was a python script on Github called “check-media-integrity,” but I was unable to get it to work since I am on Windows 10.
Any suggestions would be greatly appreciated.
Thank you.
3 Answers
Introduction
Media files are a crucial part of our digital lives, and it is essential to ensure their integrity. When transferring media files through FTP or any other method, there is always a risk of corruption. Corrupted media files can result in lost data and can be a significant issue for professional photographers and videographers. In this blog post, we will discuss the suitable method for batch checking the integrity of media files.
Checksum Comparison
Checksum comparison is the most effective method for checking the integrity of media files. It involves generating a checksum for each file and then comparing it to the original checksum. If the two checksums are the same, the file is considered to be intact. If the checksums do not match, the file is corrupted.
There are many tools available for generating checksums, such as md5sum, sha1sum, and sha256sum. These tools are available for Windows, Mac, and Linux operating systems. To generate a checksum, you need to open a command prompt or terminal and run the checksum command followed by the file name.
For example, to generate an md5 checksum for a file named “example.jpg,” you would run the following command:
md5sum example.jpg
You can then compare the generated checksum to the original checksum to determine if the file is intact.
Media File Scanner
A media file scanner is a tool that scans a directory or folder for media files and checks their integrity. The scanner generates a report that lists all the corrupted files. The report includes information such as the file name, file type, and location.
There are many media file scanners available for Windows, Mac, and Linux operating systems. Some popular media file scanners include File Check MD5, FileVerifier++, and MultiHasher. These tools are easy to use and can scan thousands of files in a matter of minutes.
Automated Backup and Sync Tools
Automated backup and sync tools are another effective method for checking the integrity of media files. These tools automatically backup and sync files between different devices and cloud storage services. They also use checksum comparison to ensure that the files are intact.
Some popular automated backup and sync tools include Google Backup and Sync, Dropbox, and Microsoft OneDrive. These tools are easy to use and can be set up to automatically backup and sync files on a regular basis.
Cloud Storage Services with Integrity Checks
Cloud storage services such as Google Photos, Amazon S3, and Microsoft Azure have built-in integrity checks that ensure the files are intact. These services use checksum comparison to ensure that the files are not corrupted. If a file is corrupted, the service will automatically replace it with a backup copy.
Using a cloud storage service with integrity checks is an effective method for ensuring the integrity of media files. These services are easy to use and can be accessed from anywhere with an internet connection.
Conclusion
In conclusion, there are several methods for batch checking the integrity of media files. Checksum comparison, media file scanners, automated backup and sync tools, and cloud storage services with integrity checks are all effective methods for ensuring the integrity of media files.
It is essential to check the integrity of media files regularly to avoid lost data and ensure that the files are intact. By using one or more of these methods, you can ensure that your media files are secure and intact.
There are several approaches you can take to batch check the integrity of media files on Windows 10.
One option is to use a file verification tool, such as HashCheck, to compute and compare checksums for your files. This can help you identify any files that have been corrupted or modified in some way.
Another option is to use a media player that is capable of detecting and handling corrupted media files. For example, VLC Media Player has a built-in feature for handling corrupted video files, and can automatically skip over problematic frames or sections of the file.
You could also use a file recovery tool, such as Recuva, to scan for and attempt to repair any corrupted media files. These types of tools can be useful if the corruption is not too severe, and can potentially recover some or all of the data from the damaged file.
Finally, if you are comfortable with programming, you could try writing a script to automate the process of checking the integrity of your media files. For example, you could use Python’s built-in hashlib
module to compute checksums for your files, and then compare them to a reference set of known-good checksums.
I hope this information helps! Let me know if you have any questions or need further assistance.
I have successfully resolved all the issues I was facing, and here is what worked for me:
To use “check-media-integrity,” I had to follow several steps:
- Firstly, I had to use the fork from garygan89 available on Github.
- Secondly, I needed to have all the required modules installed on both Python 2 and 3, as the original version was intended for Python 2, and the fork required a conversion to Python 3. So, to avoid any issues, I installed the requirements on both versions.
- Thirdly, I had to use Python Wheels available on the website https://www.lfd.uci.edu to install Pillow-SIMD, as it was the most significant issue that needed to be resolved. Compiling it from the source on Windows was challenging.
- Fourthly, I had to install the ffmpeg-python module as a requirement, although it was not listed on the project’s readme.md, it was listed on a separate requirements file.
- Fifthly, I had to specify all the switches for ffmpeg stuff to broaden its output.
- Sixthly, I experimented with different values for optional parameters such as timeout and threads. My optimal values were around 75 threads and a timeout of at least 120 seconds (400 seconds when dealing with big files).
- Lastly, I used “WizTree” and “robocopy” tools to apply the tool to smaller groups of files. These tools helped in identifying file extensions in the folders and then moving them to a “passed” folder.
To check PDF files, I didn’t get the desired result from “check-media-integrity,” so I used “PDFtk” instead. It can load and read a few dozen files at a time and displays an error message if any of them are broken.
There were also a few files without any extension, which I opened on “Notepad++” and tried to guess from the headers and file size. I found some .mp4, .pdf, .pgn, .png, .jpeg, and .apk files, and almost all of them worked fine after being renamed properly.
Using all of the above methods, I was able to scan through my fileset (which had grown to around 25,000 files) and find approximately 50 corrupt files that I could replace as desired.