Sciencemadness Discussion Board
Not logged in [Login ]
Go To Bottom

Printable Version  
 Pages:  1  
Author: Subject: Download an open forum backup!
Polverone
Now celebrating 21 years of madness
*********




Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 29-4-2005 at 10:41
Download an open forum backup!


Sciencemadness.org is now offering for the first time what I hope will become a standard fixture in online chemistry communities: an open, freely available offline archive of messages and files from this discussion board. This is only a first release, and it has a few glitches and rough edges to work out. Here's what it already has to offer:

-An archive of static HTML copies of forum index pages from all sections but Whimsy and Detritus (Whimsy may be added at a later date)

-An archive of static HTML copies of all threads from all sections but Whimsy and Detritus (Whimsy may be added at a later date)

-Modifications to make the static HTML indices refer to the static HTML threads, and to make all static HTML pages use local copies of graphics files and attachments

-An optional media archive containing copies of all attachments and inline images from threads

Here's some of the rough edges that I hope to address in the future:

-User-added links from one thread to another still refer to the online site, not the offline archive; there are also other opportunities to rewrite links for local use

-There's considerable page clutter that I can and should remove before the next release; there's no use for Post New Topic, Today's Posts, etc. in an offline archive

-A disturbingly large fraction of attachments seemed to download with errors; I'm not sure if this is a problem with my archiving software, the board, or the original uploads

-All threads appear as single HTML files; this is taxing to browsers/computers on large threads

The base archive, containing board icons/graphics and threads, can be found here: http://www.sciencemadness.org/archive/sm_main.zip, 21,840,110 bytes.

The media archive, containing inline images and attachments, can be found here: http://www.sciencemadness.org/archive/sm_media.zip, 92,212,812 bytes.

The media archive is still uploading from my home machine, so I would suggest waiting a couple of hours before attempting to download it. After you have downloaded the main archive or both archives, unzip them and point your web browser at index.html to begin enjoying your offline copy of the forum. Depending on how heavily people download these files, I may make them available all the time, or for only a limited time window near the end of each month.

I will continue to upload encrypted database dumps from time to time, since those are easier to use for board recovery, but this is the archive you want to download if you've ever feared losing something from the forum, or if you want to refer to it even when you're not on the internet, or if you'd like a local copy to search/analyze/whatever.

Please let me know of any glitches you encounter or enhancements you'd like to see in this thread. Enjoy!




PGP Key and corresponding e-mail address
View user's profile Visit user's homepage View All Posts By User
chemoleo
Biochemicus Energeticus
*****




Posts: 3005
Registered: 23-7-2003
Location: England Germany
Member Is Offline

Mood: crystalline

[*] posted on 29-4-2005 at 18:03


Thank you very much Polverone! I see all this heavy bandwidth is being put to *good* use!


I tested it out a little.
I guess the major problem I could see is that the linkage is relative to *your* computer system. I.e. when I load up index.html, and load up a subforum, to click on a thread, it produces i.e. this link file:///extra/sciencegrab/chemistry_in_general/0000023.html
This is of course not the directory I installed this into, so clicking this link produces nothing. But it shouldnt be a problem to fix as the links between index.html and the different subdirectories work fine.

Very nice work otherwise, btw. I wonder how you combined several paged threads into a single html.




Never Stop to Begin, and Never Begin to Stop...
Tolerance is good. But not with the intolerant! (Wilhelm Busch)
View user's profile View All Posts By User
Polverone
Now celebrating 21 years of madness
*********




Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 29-4-2005 at 19:52


Ugh, you're right. I obviously made a typo in preparing that section. I have uploaded a zip file with just the fixed index information:

http://www.sciencemadness.org/archive/fixedindices.zip

I am also uploading a fixed version of the sm_main archive. Edit: the fixed version is now in place. Anyone who downloads sm_main.zip now should get the correct index information.

It was very easy to make the threads into one long piece: I created a new user account named archiver, went to my control panel, and had it show 1000 posts per page (longer than any existing thread). Then I just had my script log in as archiver and grab each of the one-page threads.

I must thank everyone who contributed financially to sciencemadness. This sort of project would not have been possible on the old site, due to the much more limited bandwidth and disk space.

[Edited on 4-30-2005 by Polverone]




PGP Key and corresponding e-mail address
View user's profile Visit user's homepage View All Posts By User
Ramiel
Vicious like a ferret
***




Posts: 484
Registered: 19-8-2002
Location: Room at the Back, Australia
Member Is Offline

Mood: Semi-demented

[*] posted on 30-4-2005 at 06:25


Both backups downloaded.



Caveat Orator
View user's profile View All Posts By User This user has MSN Messenger
The_Davster
A pnictogen
*******




Posts: 2861
Registered: 18-11-2003
Member Is Offline

Mood: .

[*] posted on 30-4-2005 at 09:43


I downloaded both of them, but when I attempt to extract the media zip file I get several errors and the file into which I extracted them is empty. Anyone else have this problem?



View user's profile View All Posts By User
Rosco Bodine
Banned





Posts: 6370
Registered: 29-9-2004
Member Is Offline

Mood: analytical

[*] posted on 30-4-2005 at 11:19
Backups are a great idea !


In these times of troubling disappearances of websites and data
and discussions , particularly of obscure
or not well known information of the nature which makes such knowledge

" SENSITIVE IN NATURE " ........

then under such circumstances , the free distribution of such information spread far and wide is an effective countermeasure for the censors and " thought police " who
are doing their tyrannical best to keep people ignorant subjects whose extent of
knowledge is limited only to what they are
deemed " authorized " to know .

Any small victory against those Orwellian ,
Machiavellian fascists , is a worthy accomplishment .

And that is the larger matter which should govern us all in these times , seeing what
has happened with the hive , and the direction things seem to be going for E&W also , the priority should be to preserve hard gotten data assembled in such ways
nowhere else on earth , and guarantee that informations continued availability ,
as much so as if it were Winchesters being
passed out to the pioneers , as they circle the wagons and see what the savages are going to do to interfere with progress .
View user's profile View All Posts By User
Polverone
Now celebrating 21 years of madness
*********




Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 30-4-2005 at 13:29


Hi rogue chemist, I just tested downloading the media file and unzipping it. I had no problems. Are you sure the file you downloaded is exactly 92,212,812 bytes in size? Its md5 checksum is a109745ae876bdff3a9273b1b01ed225.

[Edited on 4-30-2005 by Polverone]




PGP Key and corresponding e-mail address
View user's profile Visit user's homepage View All Posts By User
The_Davster
A pnictogen
*******




Posts: 2861
Registered: 18-11-2003
Member Is Offline

Mood: .

[*] posted on 30-4-2005 at 14:13


I tried re-downloading it, the first time my wireless connection died on me which could have caused some corruption. In any case it works fine now, I still get a few errors during unzipping, but the files work now, so all is good.:cool:
Now to save this archive to disk.




View user's profile View All Posts By User
Rosco Bodine
Banned





Posts: 6370
Registered: 29-9-2004
Member Is Offline

Mood: analytical

[*] posted on 1-5-2005 at 18:52


Dowloaded the backup quick and easy ,
and got no errors on decompressing the zip files .

Everything appears to work perfectly ,
navigation and page loading is instantaneous ....

Never seen the forum work so fast :D

Nothing like a data drive for a local file server , and it would probably be quick
even on a CD .

Oh , just a reminder to anybody having any problems , it can be a firewall glitch on your local machine , being spoofed by explorer activity and blocking the unrecognized activity which may be blocked as suspect " traffic " . If you
have any trouble check your firewall allow settings or turn off filtering .
View user's profile View All Posts By User
chemoleo
Biochemicus Energeticus
*****




Posts: 3005
Registered: 23-7-2003
Location: England Germany
Member Is Offline

Mood: crystalline

[*] posted on 2-5-2005 at 07:33


Works great here, too, including attachments and pictures!
Even pictures stored elsewhere were grabbed, which is great because once those sites go down this data isn't irretrievably losts!

One minor issue - I noticed that, once browsing in actual threads, trying to go back by clicking Sciencemadness Discussion Board (file:///.../sciencemadness%20html/sciencegrab/energetic_materials/index.php), or Organic chemistry (i.e. the forum, file:///.../sciencemadness%20html/sciencegrab/organic_chemistry/forumdisplay.php?fid=10) or whatever doesn't work - the file is not found, so internal crossreferencing by board-links is seemingly not applied to all internal links.
Essentially this can be avoided by using the backbutton of course.
Maybe there's an easy fix for this. Although it's not essential, so all is good.




Never Stop to Begin, and Never Begin to Stop...
Tolerance is good. But not with the intolerant! (Wilhelm Busch)
View user's profile View All Posts By User
Polverone
Now celebrating 21 years of madness
*********




Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 2-5-2005 at 10:42


It's correct that I made no effort to fix those additional links. I will add that fix in a future release.



PGP Key and corresponding e-mail address
View user's profile Visit user's homepage View All Posts By User
Rosco Bodine
Banned





Posts: 6370
Registered: 29-9-2004
Member Is Offline

Mood: analytical

[*] posted on 2-5-2005 at 13:58


One feature I would like to see enabled
is the " printable version " view , since
that makes it much easier to capture
and export any text .

Saves a lot of ink when you want to print
something too .

[Edited on 2-5-2005 by Rosco Bodine]
View user's profile View All Posts By User
MadHatter
International Hazard
*****




Posts: 1332
Registered: 9-7-2004
Location: Maine
Member Is Offline

Mood: Enjoying retirement

[*] posted on 2-5-2005 at 22:30
Backups


Both backups downloaded. Thanks, Polverone !



From opening of NCIS New Orleans - It goes a BOOM ! BOOM ! BOOM ! MUHAHAHAHAHAHAHA !
View user's profile View All Posts By User
Polverone
Now celebrating 21 years of madness
*********




Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 13-5-2005 at 00:48


New backups are now ready for download under the same names as before, sm_main.zip and sm_media.zip. There is unfortunately no easy way to offer incremental updates at the present time. Links have been improved in this version. The visual appearance has been cleaned up too. Finally, the archive now includes printable versions of the threads.

One oddity that you may notice with this archive or the last is that the index pages can be slightly more up to date than the actual threads. For example, a thread may be listed as having three replies but you only see one when you click on the thread. This is a bit of a wart but not actually a bug; it's due to the way the archiver caches threads but always downloads fresh index pages.




PGP Key and corresponding e-mail address
View user's profile Visit user's homepage View All Posts By User
Axt
National Hazard
****




Posts: 778
Registered: 28-1-2003
Member Is Offline

Mood: No Mood

[*] posted on 14-5-2005 at 02:05


One of the things I think would improve the search capabilities is using the topic title as the "page title".

For example if one searches through windows, or google for a word, the page is always named "Sciencemadness Discussion Board - Powered by XMB 1.8 Partagium Final S..". On other forums the topic title becomes part of the "page title" so its easy to identify threads.

Another example, look at the top title bar of the page <a href="http://www.sciencemadness.org/talk/viewthread.php?tid=3295">here</a>, compared to <a href="http://www.xsorbit2.com/users/apcforum/index.cgi?board=general&action=display&num=1095230413">here</a>.

Since the search function doesnt work in the archive, this makes it hard to use "3rd party" search engines on the archive, as they only pull up the page title.

Other then that .... great :D
View user's profile Visit user's homepage View All Posts By User
Polverone
Now celebrating 21 years of madness
*********




Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 26-7-2005 at 21:18


New backups are now ready for download under the same names as before, sm_main.zip and sm_media.zip. There is unfortunately no easy way to offer incremental updates at the present time.



PGP Key and corresponding e-mail address
View user's profile Visit user's homepage View All Posts By User
Polverone
Now celebrating 21 years of madness
*********




Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 3-12-2005 at 18:16


New backups are now ready for download under the same names as before, sm_main.zip and sm_media.zip. There is unfortunately no easy way to offer incremental updates at the present time.

Later I'm going to try to take the forum down for a bit while I upgrade the software, so you might want to take the opportunity to download an offline copy now.

[Edited on 12-4-2005 by Polverone]




PGP Key and corresponding e-mail address
View user's profile Visit user's homepage View All Posts By User
MadHatter
International Hazard
*****




Posts: 1332
Registered: 9-7-2004
Location: Maine
Member Is Offline

Mood: Enjoying retirement

[*] posted on 4-12-2005 at 07:38
Backups


Both now in the UPLOAD folder on my FTP.



From opening of NCIS New Orleans - It goes a BOOM ! BOOM ! BOOM ! MUHAHAHAHAHAHAHA !
View user's profile View All Posts By User
Nerro
National Hazard
****




Posts: 596
Registered: 29-9-2004
Location: Netherlands
Member Is Offline

Mood: Whatever...

[*] posted on 4-12-2005 at 10:58


This is a quick reply :D



#261501 +(11351)- [X]

the \"bishop\" came to our church today
he was a fucken impostor
never once moved diagonally

courtesy of bash
View user's profile View All Posts By User
wa gwan
Harmless
*




Posts: 37
Registered: 15-4-2005
Member Is Offline

Mood: No Mood

[*] posted on 19-8-2006 at 12:48


Is there another backup coming soon?
View user's profile View All Posts By User
Rosco Bodine
Banned





Posts: 6370
Registered: 29-9-2004
Member Is Offline

Mood: analytical

[*] posted on 25-1-2007 at 08:31


Yesterday I noticed some error message script superimposed on the image of the main page
and a few glitches otherwise which were a transient
problem ...... and remembering some connectivity
problems not too long ago the two things made
me a bit nervous and caused me to wonder about
how up to date is the present backup .

There's been a lot of interesting information and discussion added since the last known backup which really should be protected , archived data , secured
by an up to date backup .

So please .....at the earliest opportunity ,
let's get an updated backup . It's cheap insurance
and peace of mind .
View user's profile View All Posts By User
solo
International Hazard
*****




Posts: 3967
Registered: 9-12-2002
Location: Estados Unidos de La Republica Mexicana
Member Is Offline

Mood: ....getting old and drowning in a sea of knowledge

[*] posted on 25-1-2007 at 08:40


I have a question do all the articles that have been uploaded become part of the back up? Also is there a file folder where all of the uploaded articles reside .......and can they be accessed by members? the reason for asking is because there is an awful lot of citations being uploaded and no way to see and index of what's available.....at WD I keep a folder for all the references ever requested and fulfilled and their upload link available for future researchers also to avoid reinventing the wheel...........solo



It's better to die on your feet, than live on your knees....Emiliano Zapata.
View user's profile View All Posts By User
Polverone
Now celebrating 21 years of madness
*********




Posts: 3186
Registered: 19-5-2002
Location: The Sunny Pacific Northwest
Member Is Offline

Mood: Waiting for spring

[*] posted on 25-1-2007 at 20:14


I do like having the cheap insurance of a backup, but doing the sort of transformations that are necessary to make the offline archive presentable and navigable is a bit painful. In case someone with relevant programming experience is reading: I am using the Python module BeautifulSoup to locate elements in each page (e.g. the "New Topic" button) and then I do string replacements to delete or alter elements (operating on an entire page as one large string). The problem is that the strings returned by BeautifulSoup may not be exactly the same sequence of characters that appeared in the web page -- whitespace may be changed. Each one of these discrepancies must have a special case in the code, which is ugly and time-consuming to develop. I did it once, but several months later the forum software was upgraded and the work needed to be done again. I still haven't re-done this work.

Solo: yes, attachments are downloaded and stored by the code. You would still need an indexing system to go with them, because the file names may be something uninformative like "068374_methanol.pdf", where the numerical prefix is the number of the post that the attachment was found in.




PGP Key and corresponding e-mail address
View user's profile Visit user's homepage View All Posts By User
Waffles
Hazard to Others
***




Posts: 196
Registered: 1-10-2006
Member Is Offline

Mood: No Mood

[*] posted on 28-1-2007 at 11:25


Quote:
Originally posted by prica
X > B.D.
I cane Send vaglias,but if ya came doon this parts, you're my guest(x2 p. 2weeks).Augh !


WHY DO THESE PEOPLE THINK THAT WE UNDERSTAND THEM

THIS IS NOT LANGUAGE




\"…\'tis man\'s perdition to be safe, when for the truth he ought to die.\"
View user's profile View All Posts By User
gambler
Harmless
*




Posts: 44
Registered: 30-6-2006
Member Is Offline

Mood: No Mood

[*] posted on 31-3-2007 at 18:46


Is there plans in the mist to prepare a current open forum backup?
Thankyou in advance
View user's profile View All Posts By User
 Pages:  1  

  Go To Top