Zipfile.badzipfile file is not a zip file

Sometimes while dealing with zip files, we may face Zipfile.badzipfile: file is not a zip file error. There can be many possible reasons for this error. This error can occur when the input file format is incorrect, the file is corrupted or the file that you are trying to open is not a zip file anymore. In this short article, we will learn how we can solve the Zipfile.badzipfile: file is not a zip file error using various methods. Moreover, we will also discuss how to understand errors in Python by taking Zipfile.badzipfile: file is not a zip file as an example.

Solve Zipfile.badzipfile: file is not a zip file in Python

Before going into the possible solutions let us first figure out the reasons for getting this error. Some of the main causes are listed below:

  • The file name is incorrect: If the file is in zip format but the contents are not in the correct format then we will get the same error:
  • Corrupted file: Sometime, we may face the same error if the file has been damaged or the contents inside are not readable anymore.
  • Incorrect Path: Before opening the file, make sure that the path is correct.
  • Opening non-zip file: Another thing to be sure that the file should be with .zip extension.
Zipfile.badzipfile-file-is-not-a-zip-file-error

Let us now solve the Zipfile.badzipfile: file is not a zip file error using various possible methods:

Using a user-defined function – fixed the corrupted zip file

One way to solve the Zipfile.badzipfile: file is not a zip file error is to use a user-defined function and then use the conditional statements to get rid of the error:

def fixBadZipfile(zipFile):  
 f = open(zipFile, 'r+b')  
 data = f.read()  
 pos = data.find('\x50\x4b\x05\x06') 
 if (pos > 0):  
     self._log("Trancating file at location " + str(pos + 22)+ ".")  
     f.seek(pos + 22)   
     f.truncate()  
     f.close()  
 else:  
     # raising the error
     print("Error occurs")

The code is a Python function named fixBadZipfile that attempts to fix a corrupted zip file by finding the end of the central directory signature (‘\x50\x4b\x05\x06’) and truncating the file from that position. The function opens the specified zip file in binary mode (‘r+b’) for reading and writing, reads its content into memory, and uses the find method to search for the end of the central directory signature. If the signature is found, the function seeks to the position immediately after the signature, truncates the file to remove any corrupted data that might be present after the signature, and closes the file. If the signature is not found, an error is raised indicating that the file is truncated.

Repair the corrupted zip file

Now let us see how we can repair the corrupted zip file and get rid of the error:

You may also like: No module named cv2 ModuleNotFoundError in Python

def main(zipFileContainer):
    content = zipFileContainer.read()
    pos = content.rfind('\x50\x4b\x05\x06')
    if pos>0:
        zipFileContainer.seek(pos+20)  link above.
        zipFileContainer.truncate()
        zipFileContainer.write('\x00\x00')  
        zipFileContainer.seek(0)

    return zipFileContainer

This solution takes a file-like object, zipFileContainer, assumed to contain a zip archive, and attempts to repair a corrupt zip file by finding and truncating the end of the zip’s central directory. The code reads the contents of zipFileContainer into memory, performs a reverse find of the end of central directory signature (‘\x50\x4b\x05\x06’), and if found, seeks to a position 20 bytes after the signature, truncates the file to remove any corrupted data that might be present after the signature, writes two bytes (‘\x00\x00’) to the file to indicate a zero-byte comment length, and returns the zipFileContainer object. If the end of central directory signature is not found, the code will return the zipFileContainer object without modifying it. Hopefully, now you will no more get the Zipfile.badzipfile: file is not a zip file error.

Downloading 7zip

Another method to get rid of the error is to download the 7zip and add it to the same directory. Then the next step is to use the following script:

import subprocess
ziploc = "C:/Program Files/7-Zip/7z.exe"
cmd = [ziploc, 'e',your_Zip_file.zip ,'-o'+ OutputDirectory ,'-r' ] 
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)

Hopefully, now you will not get the error anymore.

Closing the file before unzipping the file

If none of the above-mentioned methods helped you then most probably you are not closing the file before unzipping it. Try to close the file before unzipping as shown below:

with open(path, 'wb') as outFile:
    outFile.write(data)
    outFile.close()   # was missing this
    with zipfile.ZipFile(path, 'r') as zip:
        zip.extractall(destination)

As you can see that, The code opens a file at the specified path in binary write mode (‘wb’) as outFile, writes the contents of data to the file, and closes it. Then it opens the file as a zip archive using the zipfile module’s ZipFile class in read mode (‘r’) as zip and extracts all its contents to the specified destination directory. The code ensures that the file is closed properly after writing the contents and extracting the contents from the zip archive.

Confirm the gzip format

Sometimes you have to confirm if the zip file is actually in a gzip format or not. You can use the following simple script to confirm the gzip file.

import requests
import tarfile
url = ".tar.gz link"
response = requests.get(url, stream=True)
file = tarfile.open(fileobj=response.raw, mode="r|gz")
file.extractall(path=".")

Summary

In this short article, we discussed how we can solve the Zipfile.badzipfile: file is not a zip file error in Python. we covered 5 different solutions and explained each of them.

Leave a Comment