Sciencemadness Discussion Board
Not logged in [Login ]
Go To Bottom

Poll: Chemistry eBooks: OCR or bitmaps, you decide!
I prefer an OCR'ed book, it is worth the additional effort and I'm willing to wait longer until I can get it. --- 11 (61.11%)
A bitmap PDF would be alright, the disadvantages in comparance to the OCR'ed version won't lower my interest. --- 7 (38.89%)

Printable Version  
Author: Subject: Chemistry eBooks: OCR or bitmaps, you decide!
Rhadon
Hazard to Others
***




Posts: 169
Registered: 26-5-2002
Location: Germany
Member Is Offline

Mood: No Mood

[*] posted on 24-11-2002 at 17:19
Chemistry eBooks: OCR or bitmaps, you decide!


Hi

I'm just scanning some chemistry books and will upload them for you once they are done. But I cannot decide if I shall do an OCR or publish the books in the form of a PDF file being composed of bitmap images.
The OCR will take significantly longer and will be much more work, but it looks better and the filesize is smaller.
The bitmap version will be done much faster, but it will not look as well.

Some of the books will be OCR'ed, some won't. But there are still ones that I'm not sure what to do with, so please help me to decide.
View user's profile View All Posts By User
raistlin
Hazard to Others
***




Posts: 200
Registered: 5-7-2002
Location: Ohio
Member Is Offline

Mood: No Mood

[*] posted on 24-11-2002 at 17:36


Hey Rhadon do an OCR. Drop me a U2U if you want any help, I think my dad might have some OCR software around here somewhere...



\"To ignite, or not to ignite, that is the question.\"
View user's profile View All Posts By User This user has MSN Messenger
Rhadon
Hazard to Others
***




Posts: 169
Registered: 26-5-2002
Location: Germany
Member Is Offline

Mood: No Mood

[*] posted on 24-11-2002 at 17:48


Thank you for offering your help, Raistlin. Unfortunately the the OCR itself is not the most laborious thing: Proofreading the text, correcting mistakes, positioning images, labelling images and giving the document a nice layout are more work by far.
Thank you also for offering software, but I do already have FineReader 6.0 which I suppose is the best program for this job.
View user's profile View All Posts By User
Polverone
Now celebrating 18 years of madness
*********




Posts: 3164
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 25-11-2002 at 00:49
how about this


I notice that archive electronic journal articles from the ACS are bitmap scans of the original paper but also include an OCR version somehow embedded in the same file beneath the bitmap. The OCR information that is included has not been hand-corrected - so it has a rather high error rate - but it is still very useful if you're just looking for certain words in the text. The bitmap version, of course, retains all the diagrams and formulae that OCR wouldn't handle very well. Can you do this? It won't save space but it will make the texts more useful than plain bitmaps and a lot faster to process than complete human-proofed OCR.
View user's profile Visit user's homepage View All Posts By User
Rhadon
Hazard to Others
***




Posts: 169
Registered: 26-5-2002
Location: Germany
Member Is Offline

Mood: No Mood

[*] posted on 25-11-2002 at 11:01


The books which will be published in the form of bitmaps will most likely be processed with Adobe Acrobat Capture. I didn't have the time to test it yet, but it should create a bitmap PDF file as you described it (text "beneath" the bitmap).
The same system was used for many eBooks on mathematics which I do own, and they have a very low error rate (except for the special characters)! I think that this depends strongly on the resolution of the source images, which will be 600 DPI in most cases.
View user's profile View All Posts By User
Rhadon
Hazard to Others
***




Posts: 169
Registered: 26-5-2002
Location: Germany
Member Is Offline

Mood: No Mood

[*] posted on 26-11-2002 at 16:24


Damn, I have some severe problems with Acrobat Capture. I tested two releases now, and none of them works properly. Some of the problems are the same in both versions, others do only occur in one of them.
This will at least delay the release date of the books.
View user's profile View All Posts By User
Blind Angel
International Hazard
*****




Posts: 845
Registered: 24-11-2002
Location: Québec
Member Is Offline

Mood: Meh!

[*] posted on 28-11-2002 at 18:56


Stupid questio but what are OCR?



/}/_//|//) /-\\/|//¬/=/_
My PGP Key Fingerprint: D4EA A609 55E4 7ADD 8529 359D D6E2 33F6 4C76 78ED
View user's profile View All Posts By User This user has MSN Messenger
Rhadon
Hazard to Others
***




Posts: 169
Registered: 26-5-2002
Location: Germany
Member Is Offline

Mood: No Mood

[*] posted on 29-11-2002 at 08:58


Blind Angel: It's not a stupid question. OCR is standing for "Optical Character Recognition". Here is what it is and why we do it:
When you scan a text, you get an image at first. The image shows the text page exactly as it can be found in the book. When I was talking of a bitmap version, I meant that those images would b put together to a PDF file. Unfortunately, the filesize of such PDF files is either quite large or unreadable because of lack image quality. Usually you are also unable to search the images like you can do it with ordinary text (what can be a great disadvantage if you are looking for a particular information).
So, we use an OCR program, such as "Abbyy FineReader", which is able to recognize the characters in the image and enabled you to export the whole text thus obtained. Since there are always some characters that are not recognized correctly (e.g. the small 'l' looks quite similar to the capital 'I' and 'O' to zero), the text has to be proof-read for mistakes. If it contains images, they will have to be placed in appropriate positions in the text.
View user's profile View All Posts By User
blazter
Hazard to Self
**




Posts: 71
Registered: 3-9-2002
Member Is Offline

Mood: No Mood

[*] posted on 30-11-2002 at 09:00
bitmaps


personally, I think bitmaps are the way to go. Just make sure that they can be extracted from the pdf or whatever format they are put in. that way if someone is ambitious enough they can do the OCR themselves and proof it according to the bitmaps that they have.

Sometimes chemistry books are better in bitmap format because of the images. But they would be even better if they were converted to an easily printed format like pdf or even html which could be searched easily.
View user's profile View All Posts By User
Blind Angel
International Hazard
*****




Posts: 845
Registered: 24-11-2002
Location: Québec
Member Is Offline

Mood: Meh!

[*] posted on 30-11-2002 at 11:58


That answer more question thant i though, i always wondered how they were doing these e-Book you can find on the net



/}/_//|//) /-\\/|//¬/=/_
My PGP Key Fingerprint: D4EA A609 55E4 7ADD 8529 359D D6E2 33F6 4C76 78ED
View user's profile View All Posts By User This user has MSN Messenger
Rhadon
Hazard to Others
***




Posts: 169
Registered: 26-5-2002
Location: Germany
Member Is Offline

Mood: No Mood

[*] posted on 1-12-2002 at 09:45


It wasn't until today that I realized that FineReader offers the user an option to do exactly what I wanted to do with Acrobat Capture. The release of the first book is coming closer :)
View user's profile View All Posts By User
Rhadon
Hazard to Others
***




Posts: 169
Registered: 26-5-2002
Location: Germany
Member Is Offline

Mood: No Mood

[*] posted on 1-12-2002 at 15:08
Announcement: The first book is done


Those who have access to EliteForums FTP will be able to download "Nitration and aromatic reactivity" by J. G. Hogget in an hour or so.
I will upload it to some webspace which can be accessed by everyone when I can find the time to do so. Anyway, I'd be glad if someone else could do that since I'm quite busy.
View user's profile View All Posts By User
Eliteforum
International Hazard
*****




Posts: 571
Registered: 18-11-2002
Location: United Kingdom
Member Is Offline

Mood: Enjoying the journey

[*] posted on 2-12-2002 at 18:27


I might set up Apache (or IIS depending) so people can upload via FTP and download via HTTP.

If it sounds like a good idea, drop me a line.

Only downside is that, we may have a lot of people downloading/leeching and speeds may be affected.




All that glitters isn't gold.
View user's profile View All Posts By User This user has MSN Messenger
Rhadon
Hazard to Others
***




Posts: 169
Registered: 26-5-2002
Location: Germany
Member Is Offline

Mood: No Mood

[*] posted on 2-12-2002 at 23:25


The idea is a good one, but we must find a way to bypass the leechers. A password protection would be nice, but things like that are tricky.
View user's profile View All Posts By User

  Go To Top