How to rip files (and images) from Chrome and Firefox web browser cache

Disclaimer: Please observe licenses when using this information.

Whether you are developing a website or would like to access hard-to-get information in a web browser, it may be useful to retrieve files from a web browser cache. All browsers provide a way of accessing the cache and you can access the files yourself, but sometimes that is not enough. In addition, in many cases, the web browsers only provide you a report on a file, not the actual file.

For instance, if you type “about:cache” in Firefox, you will be able to get a list of entries, where each generates a report like this:

FirefoxBrowserCacheReportWhile there is good information here like the size of the file, the raw content of the file in hexadecimal, and the type of the file; it’s not real useful in this format. (For informational purposes, you can type “chrome://cache” in Google Chrome to access a very similar view of the cache files)

While there are extensions available for these browsers that makes it easier to get at the original files, those extensions are susceptible to the day-to-day whim of the browser developers and are often broken in my experience. Some of them can also be rather buggy. I decided it would be easier to write a tool that would provide a means to get back to the original file outside the scope of the browser.

I am providing a Perl script that will read the above HTML file and produce the original file, originally developed for reproducing images. Given this is an external script, it is cross-platform (Windows, Mac, Linux, Unix) and cross-browser (Firefox, Chrome, others?). You can download the script at the bottom of the post.

In order to use the script to retrieve cache files, you must first save the report file from the web browser. This can be done by right-clicking on the page and selecting “Save As”, “Save Page”, etc. This will save the HTML file to a local directory.

You must then make sure you have Perl installed. If you are running Mac, Linux, or Unix, you probably already have it installed. You can check by opening a command prompt (or terminal) and typing “perl –version,” which would produce version information like this:

 

PerlVersion

 

If you don’t have it, you can get it from here (www.perl.org). Once installed, download the script at the bottom of the page. You can then recreate the original file by executing the script with the cache report file as a parameter:

 

perl webcachefile.pl mycachefile.html

 

PerlWebcacheRun

When you run it, you will see that it tries to identify the file first, finds the hex data block, and recreates the file- and that’s it!

Now, in my case, I often have a bunch of files I want to convert at once. If I download these cache files and put them in the same directory. I can run this shell script in Linux (and probably works on Mac and Unix too).

 

for f in *.html ; do perl webcachefile.pl “$f” ; done

 

PerlWebcacheRunMulti

Currently, there are only a few content types (images, icons, text files, etc) that the script identifies and adds the correct file extension for. If the script does not recognize the file, it will give the extension “.raw,” which can then be changed back to the original file’s extension. You can also add this information directly to the script if you’d like.

Feel free to leave comments below.

 

Current Version of Web Cache File Download script:

  • Version 1.0, March 29, 2013
  • Validated on Firefox 19.0.2 and Chrome 26.0 cache files
  • If you make changes to the script and think I might find it useful, please send the changes over to me and I may add them to the released script here.
This entry was posted in Software. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.