Google Books, linux, and PDF files.

As many people know, Google has a project where they have digitized (scanned) a bunch of uncopyrighted books (books for which copyright has expired). They provide a way to view these books online, which is nice, but I always find myself wanting to download a PDF file of the whole thing. You can actually do this quite easily, but Google hides this option. Either they want to discourage it, or the google book interface sucks (and I vote for the latter).

At this point in time when you visit a google book, there is a tiny gear like thing in the upper right corner. If you click on the arrow next to this, a menu appears which includes "download PDF". When you click this, you must type in some text to prove that you are a living breathing human, and away you go.

Here is the link to the google book for

Linux PDF tools

Before I discovered the hidden PDF download, I figured there must be a way and a search turned up a little project called: This is a fairly compact bit of python code that downloads the pages of a google book page by page saving each page as a PNG file. Curiously, even though the program names all of the images "png", some are actually JPEG files, but it doesn't seem to matter.

On my Fedora system, I had to download two other packages before I could get this to assemble a PDF for me:

yum install python-reportlab
yum install python-reportlab-docs
yum install python-imaging
At first I had downloaded only the reportlab package and was getting confusing tracebacks because of the missing imaging (PIL) package. If you are interested in reportlab, you should download the docs package and look at /usr/share/docs to find several very nice documents describing the package.

Note that all of this is really unnecessary given that google does provide a PDF download. Reportlab is interesting. It is a python library to support PDF generation. It is available open source, but unfortunately they are also promoting a commercial big brother "pro" package.


Have any comments? Questions? Drop me a line!

Adventures in Computing / [email protected]