Sciencemadness Discussion Board

GREAT old books site

a_bab - 30-3-2003 at 13:42

I just found a french site which is some sort of MOA site. You guys may know that the MOA site doesn't let you to save all the book (as NAP edu). Instead, you have to browse the wanted book page by page and save it page by page. A nasty task when you are dealing with a 400 pages (or more) book.

Well, this site let's you save the whole book in a nice clear pdf format. I am saving right now "Elements of chemistry, theoretical and practical / by D. B. Reid" and it's about 40 Mb (over 900 pages). I suppose that it could be compressed with the Xerox tool as the pager are B&W.

A simple query having "chimie" (chemistry in french) as domain returned me more than 500 entries ! The good news is that there are SOME books in English (maybe less than 10 % as I saw). This is normal, becase the site is a french one, so nearly all the books are french. There are lots of german books aswell, russian and others.

So, for those which are loving the old science (chemistry in my case) books, happy downloading !

Oops, nearly missed. The address is this one

Blind Angel - 30-3-2003 at 19:12

Since my mother tongue is french i was all happy with seeing that this site exist so i made a search for "chimie organique", this returned me about 89 result. But there a big problem: every time i try to dwl a file it tell me: this file is protected by copyright and can be dwl :( :(

a_bab - 31-3-2003 at 12:03

What book did you tried to download ? It worked fine for me. There may be a few which are copyrighted though. Dunno...

had the same problem

Polverone - 31-3-2003 at 12:17

More often than not, the books that interested me were copyrighted. I tried to grab Beilstein and couldn't, for example.

a_bab - 31-3-2003 at 12:26

Damn. Is it possible to get it page by page ? I can set up something if necessary.
The link worths for the old ones though.

sorry

Polverone - 31-3-2003 at 12:31

I was actually using the page-by-page system, and it wouldn't let me view any of Beilstein. I'm not sure why they even list books online that they have no intention of allowing people to view, but I didn't build the site.

tangent - 1-4-2003 at 00:33

Think about PERL for automating fetching and processing.

This book is online.

http://www.oreilly.com/catalog/webclient/

this one is more current, but you need to buy a copy or use the library:

http://www.oreilly.com/catalog/perllwp/

other resources:

http://www.linpro.no/lwp/

http://ftp.ics.uci.edu/pub/websoft/libwww-perl/

http://search.cpan.org/author/GAAS/libwww-perl/lwpcook.pod

other options include Micro$hafts “save page/site/directory offline” as part of IE’s browser and Adobe Acrobat 5’s ability to gather and convert web pages and convert them into various formats.

There are a variety of programs out there that will harvest news or data of various types including archiving specific sites.

Also check out this one next time your at the bookstore, and the online examples.

http://www.oreilly.com/catalog/googlehks/


-t

Organikum - 1-4-2003 at 14:02

Beilstein III works for me page by page, but how do I get the the whole file in one piece?
I don´t speak french at all.

Polverone - 1-4-2003 at 14:20

Your advice in general is good, Tangent. Have you ever tried the Google Web API system? It's really nice for automated queries! I haven't used it since last summer, though, so I don't know if it ever moved beyond the limitations of the beta stage.

However, the books of the BNF archive can be downloaded as single-file PDFs if you know the unique identification code for the work in question. Web-rippers and offline browsers are great for situations where you can't just get one big file, like my Muspratt pages.

linux web mirroring

blazter - 1-4-2003 at 15:40

Anyone who runs linux should already have a very powerful web mirroring tool installed - wget. Anything that wget cannot do can be done by another tool called pavuk which should be available at freshmeat.net . In a previous life someone I knew saw these tools being used to rip complete ebook web sites :D

a_bab - 1-4-2003 at 23:33

There are several tols for ripping off an entire site. These tools are called Web Spiders. The best one (in my oppinion) in Teleport Pro. Then Snake, etc. You can get the entire site, relink the files, recreate the folders structure as on the server, etc.




Organikum, in order to download any book from the gallica site, you have to click on the "Téléchargement de l'ouvrage" option (right-up) and to have for "Choisissez le début de votre sélection:", "La 1ère page" set as on, and for "Choisissez le nombre de pages : " to have "Jusqu'à la fin de l'ouvrage" on aswell. That means that the beggining of the selection is set as the first page of the book, and the number of pages to be downloaded is "until the end of the book". Than click on the Fishier PDF. You may encounter troubles due to the heavy traffic; all you have is to wait and try again. If it works, you'll get a ling to download the book.

Organikum - 2-4-2003 at 13:56

thanks a_bab, I´ll try it this way.