Sciencemadness Discussion Board

Lengthy, time-consuming overhaul of forum software

Melgar - 14-2-2018 at 16:52

Okay, so the problem is that XMB (the software SM uses) has been more or less discontinued a long, long time ago. I kind of like the old-school feel of the site, but it would be better if we used an old-school forum software that's still supported.

I checked around a bit, and phpBB is an early forum software package that's still going strong, and seems familiar. Overhauling it would be a bitch though. A bunch of php code would need to be rewritten so that links stay working from outside this site. The theme would have to be remade for phpBB, and a lot of the text would have to have its tags edited. On the plus side, there's none of that BS with "quote" vs. "rquote", and where to put which end tags, so that's a bonus. Lots of other annoyances could finally be put to rest, including blowing up moderators' inboxes every time someone reports spam.

I looked into vBulletin too, but it seems a lot more bloated than it used to be, whereas phpBB has the same old-school feel that it always has.

I don't want to be overly critical, or give away too much to people who we'd prefer not have that info, but this software would have been outdated a decade ago, and won't even run on newer systems. Upgrading is pretty much a necessity. Polverone will back me up on this.

Fortunately, this is something I can do. I'm actually pretty handy with SQL, and have some time on my hands. Unlike most people who do this sort of thing, I'd accept donations of reagents and glassware in lieu of money. (PM me for details ;) )

I wouldn't expect anything upfront, but I should be able to put something together. This site runs on Linode, which, unsurprisingly, is an old-school host. It would probably be a good idea to switch hosts to AWS, because they make it really easy to add new features in a way that reduces their cost by understanding the priorities of commonly-used features. For instance, at my last job, we had a lot of large media files that were seldomly accessed, so those were put in the AWS S3 (simple storage service) system, which is a lot cheaper than using a dedicated drive. They have this really complex "a la carte" system of offering services, so learning how it works is daunting, but it ends up actually being really nice because you're only paying for what you need and what you use.

This is mainly a proposal of my recommended approach. Obviously I don't want to start work on it unless people are onboard. I really don't want things to reach the point where there's nobody in the world that's familiar with the software that this board uses to run. Right now, that seems to be our main line of defense against hackers though, so it isn't ALL bad. :)

phpBB is the software that I think most closely resembles XMB:

https://www.phpbb.com/

Here is phpBB's support forum, which obviously runs phpBB:

https://www.phpbb.com/community/

Keep in mind, themes would be matched as closely as possible, so try not to focus on the theme there.

Since I know how much we all love reading PDF files, here's a link to the phpBB documentation:

https://download.phpbb.com/pub/documentation/3.2/rhea_docume...

Feedback is welcome.

Vosoryx - 14-2-2018 at 17:29

phpBB looks much more powerful than XMB.
I'm on board, but can't really offer much help.

Quote: Originally posted by Melgar  

Keep in mind, themes would be matched as closely as possible, so try not to focus on the theme there.


Whoops - i'm talking about themes. How many people don't use the sciencemadness theme? I tried out the dark theme for a while (It's my theme on discord and youtube) but I disliked the XMB version of the dark theme. We could run a poll to see how many people use the standard theme. If it's only one or two main themes used, then switching it over would be no problem.


My question:
Would we be able to keep our usernames/posts? Is it possible to update the software while keeping all 16 years of posting history? One of my favourite things about SM is the decade and a half of conversations (even if some of them are kinda stupid long. For example, I was contemplating condensing the resolutions of the toluene ->benzaldehyde down to a useful, readable document and not have it 24 pages.)

I'm really keen on this idea, but can't offer much. But i'll be there if I can.

Edit:

For further discussions (if we decide to do this), since SM isn't great for quick back and forth conversing, should we set up another group on something like discord? If so, i'm happy to set that up.

[Edited on 15-2-2018 by Vosoryx]

Melgar - 14-2-2018 at 18:09

Quote: Originally posted by Vosoryx  
Would we be able to keep our usernames/posts? Is it possible to update the software while keeping all 16 years of posting history? One of my favourite things about SM is the decade and a half of conversations (even if some of them are kinda stupid long. For example, I was contemplating condensing the resolutions of the toluene ->benzaldehyde down to a useful, readable document and not have it 24 pages.)

Yes, it's just a huge pain in the ass. What I'd do, is to do a database dump, import it into an SQL database on a new server, then write a bunch of scripts to convert the old database to match the format of the new one, and make sure everything is linked properly. When everyone is satisfied with how the new site looks, I'd do a final database dump, run all the scripts again on the most recent data, then switch the DNS servers over to the new IP address. The old site would live on at least for a time, accessible only by IP address, in case we end up forgetting something.

This is something I've actually done twice before now, with a similar stack, and I figure if I don't do it, nobody ever will. Which would be a shame.

Quote:
I'm really keen on this idea, but can't offer much. But i'll be there if I can.

Edit:

For further discussions (if we decide to do this), since SM isn't great for quick back and forth conversing, should we set up another group on something like discord? If so, i'm happy to set that up.

[Edited on 15-2-2018 by Vosoryx]

Sure. I plan to set it up on the latest LTS version of Ubuntu, but AWS has all sorts of its own services that might be better for discord. It'd also be possible to set up a system for uploading large files using your SM login, then have the files stored on S3. phpBB has a million different extensions too, and new ones are constantly being developed, unlike with XMB.

The way I'd planned to do it, there's really no risk of losing data, since two sites would be running until we've determined that we're sure that nothing is missing. I'm less sure about the wiki, but that'd just be a matter of moving everything over as-is, and I can't imagine it being very involved.

Themes can both be imported and edited, so it should be possible to match things pretty closely to the default theme here. Just so the site is recognizable, if nothing else.

[Edited on 2/15/18 by Melgar]

Vosoryx - 14-2-2018 at 18:28

Right then - just cuz it took me about three minutes i set up a discord channel.
For those of you who don't know, discord is like skype but actually works. It does require an account, but that's easy.
This link should never expire:

https://discord.gg/KqMqDnX

I doubt anyone would be interested in moving conversation over there, but just in case. It allows talking, and text-chatting is way easier.

Edit:
Call me amateur (that's fine) but what does LTS and AWS stand for? As mentioned, i'm not great on the whole programming/computer lingo thing. I'm hoping that I can help in an organizational manner, as that seems to be the only thing I can do well.

[Edited on 15-2-2018 by Vosoryx]

Melgar - 14-2-2018 at 18:59

LTS is "Long Term Support". Ubuntu releases a new version every six months, but only an LTS version every two years. LTS is considered more stable, and bugfixes for LTS versions are given a much higher priority.

AWS is "Amazon Web Services". Basically, they're able to charge the lowest prices by making their services "a la carte". That is, you add services as you need them, and scale up when you need it. About half of the internet runs on AWS, and when there's an outage every few years, the internet becomes practically useless. Their pricing schemes can be confusing, but only because they charge based on what services cost them, which isn't necessarily intuitive to customers. For storing large files, for example, they charge based on both file size and how often it's downloaded.

The trouble with discord is that it doesn't generate the sort of lengthy discussions that are preserved here for posterity. So I'm a bit ambivalent.

Vosoryx - 14-2-2018 at 19:23

Quote: Originally posted by Melgar  

The trouble with discord is that it doesn't generate the sort of lengthy discussions that are preserved here for posterity. So I'm a bit ambivalent.


Yeah I thought about that, but seen as how updating the site is going to mostly involve short conversations, not in depth scientific discussions, I thought it might be appropriate. And it's I guess we'll find out.


I want to clarify my idea here, because it might be easy to confuse.
I just made the discord to talk about the update, not as a replacement. I was just thinking that a lot of the update will be talking, conversations and debugging, and discord could be an easier place to talk about that because:

A. Messaging is faster and easier. You're right, it's not good (at all) for long, in depth discussions. But:
"Yeah I ran into problem X. Here's a screenshot. Any ideas?"
Is way faster and easier on discord.

B. So much of the change is going to be talking, I don't want to annoy the users of this site during the change who are talking about actual science. It would be needless clutter - and we could talk endlessly on discord without bothering anyone.

Again, I don't really mind. I just put it out there because i'm trying to help out.

Melgar - 14-2-2018 at 20:08

Ok, for anyone else reading this, I logged into the discord server, and we're talking there. I'll try and keep it open if possible.

woelen - 15-2-2018 at 04:42

First let me say that I am absolutely positive on someone having the forum migrated to a more modern platforum, which still is supported by developers and has regular (security) updates.

If the forum is transferred, then keep in mind that besides user accounts, threads and individual posts we have other things:
- embedded pictures, available from threads
- references to other posts
- quotes from other posts (you may have to translate the quote and rquote tags)
- LaTeX-like formatted formulas
- Tags, which change appearance of text (bigger, smaller, bold, italics, superscript, subscript, color)

We also have things around the forums (FAQ, library of many many books, Wiki, publications)

I can imagine that it is possible to transfer all these things to the new forum, but we have to be careful that EVERYTHING is transferred to the new forum software.

If you are willing to do this, then I certainly would like it. I also am willing to donate something in return (maybe a nice package of small quantities of different reagent(s) like iodate, bromate, or some metal salts, I have many interesting inorganic compounds, which can be shipped internationally without risk or hassle).

If we transfer the software, then we will need an extensive testing period before we really switch and probably it would also be wise to have someone else review the code of the migration software and/or data of the migrated database and maybe also review the suggested migration plan.
I myself am a software engineer having quite deep knowledge of SQL, ASP, JSP, Java/C/C#/C++, Servlet containers, and I am willing to review things. I, however, am too busy to help with actual migration. I am willing though to do some sparring (e.g. by email or in a special thread).

You also will need Polverone's support, because he has system access and low-level database access.



[Edited on 15-2-18 by woelen]

Melgar - 15-2-2018 at 13:44

Right. What I have right now is a test database, where all the user passwords hashes are reset to the same thing. Trouble is, it's on a virtual machine, and it's a hassle to move files back and forth and such. Also, it's running the 2010 version of Ubuntu, which doesn't seem to work well with IPv6 addresses, along with a host of other problems.

I spun up a 2016 LTS version of Ubuntu on my AWS account. It's only free below 8GB, but you can pay by the hour if that's all you need, so I upped it to 100GB. Also set up an S3 bucket for storing static files. It's easy enough to do a find/replace on any links to internal files, and possible to set up a redirect so that external links (to static files) still work.

Fortunately, most of these issues are easy to solve via global find/replace, regular expressions, and scripts. I'm partial to Ruby, so most of my scripts use that.

For anyone that would like to help, but isn't much for web programming, maybe look up extensions and themes for phpBB that would work as counterparts to what's here.

I really tried to work with this virtual machine, but it felt like I was blowing a decade's worth of dust off of it. I'm hoping Polverone doesn't mind using AWS. It's the best host to use if you want to control costs and don't know exactly what you'll need when you start out. I had a 8GB EBS drive, and scaled it to 100GB when I saw the size of the database dump, for example. And that doesn't include static files, which would be hosted on S3, for a lot less than regular storage. I set up an FTP server that will automatically put uploaded files into S3, but I haven't decided how that will work with authentication. I'd imagine there's some phpBB extension that allows file uploading, but haven't looked into it much.

Right now, I'm trying to upload an 8GB file over a connection that can upload about 5 MB every minute. According to my calculations, the upload should take 26 hours.

So, how's everyone doing?

aga - 15-2-2018 at 14:07

Polverone Owns this board.

Basically it is His decision, not a consensus.

I admire some of what you're trying to do.

Chances are that Other things are happening at the moment.

There is no way on earth that Polverone would choose something that originated on SM (probably) given the invasive Evil.

He'd more likely go with his own choice, seeing as it's HIS board and HIS money.

woelen - 15-2-2018 at 14:20

Have a look at hetzner.de for cloud servers. A quad-core CPU, 16 GByte of RAM and 80 GByte SSD system costs appr. EUR 15 per month. With 2 CPU's, 8 GByte RAM you'll pay less than EUR 10 per month.

Things like backups, taking image snapshots and so on also is arranged. You provide your own OS (e.g. Ubuntu 16.0x LTS) and can run whatever you like. You have 500 MBit/s internet access and can use many TBytes per month of data.

Hosting is in Germany. I have very good experience with this.

https://www.hetzner.com/cloud

I have no other relation with Hetzner than just being a happy user.

madcedar - 15-2-2018 at 15:16

Quote: Originally posted by woelen  

https://www.hetzner.com/cloud

I have no other relation with Hetzner than just being a happy user.


Thank you for the link Woelen, nice find.

RogueRose - 15-2-2018 at 15:21

I've installed and done some admin on phpBB 2 and 3 and found it a nice platform, easy to install, admin, etc and it has a very nice support community. There are lots of nice themes, plugins, mods, etc and there are LOTS of developers who will gladly make custom ones for a very reasonable price. It seems to be maybe the second most popular forum software next to vBulletin when it comes to mods and support (at least in my research back in ~2014).

I would imagine migration to be pretty simple and found a few of my migrations to be painless, though it was only thread content, members, forum lists and didn't do PM's or thread data, which is kind of important in this forum.

I also used vBulletin 3, 4 and 5 and thought 5 was a major step back for them and I stopped using it there and focused on phpBB 2 (before 3 came out in official release). IDK if they have changed much since 5 but I know it seemed to be a general consensus that 5 was not as "easy" or "nice" as the earlier versions and there was something about security if I remember correctly. While I do like the layout a lot, it is basically my favorite forum software, I don't know if it is really any better than phpBB.

[Edited on 2-15-2018 by RogueRose]

RogueRose - 15-2-2018 at 15:31

Have you looked into XenForo? It is being run on a forum I use and it seems pretty nice and they moved from vBulletin ver 3.XX. It supposedly works very well on small or mobile screens, which I've heard can be difficult (and some complain about this forum related to that issue). Also, supposedly the same coders from vBulletin came over to XenForo, so there may be some decent experience behind it.

Melgar - 15-2-2018 at 19:07

I've been talking with Polverone, and he helped me get the virtual machine running with test data. He's been meaning to upgrade the software for years now, but it's an enormous amount of work to transition the database to another bulletin board software. I'm willing to give it a shot, and am using AWS because I'm familiar with it, and it's a lot easier for me to work with than a virtual machine. Ultimately, I'd just be developing some scripts, which I'd hand over to Polverone, and which he could then use to transition everything to phpBB.

I appreciate the suggestions for hosts, however AWS is really what all the professionals use, because you don't pay for any services you don't use, and because it can be scaled up as needed. For the main disk drive, price is $0.10 per GB used per month. For large, static files, pricing is $0.03 per GB per month. And storage is the main thing that you pay for. That doesn't preclude Polverone or anyone else from going with a different host anyway, but it makes things easier and cheaper for me, as I figure out what sort of system parameters the site would gain the most benefit from.

I'm really only looking into the possibility of transitioning to phpBB anyway. It might not even be possible. Right now, I'm just putting the public test database onto a remote server so that I can do a better analysis of it and better understand my options. I tried to do that on the virtual machine, but it's just too unwieldily, and I don't have a lot of the analysis tools I'm used to using. But XMB isn't necessarily compatible with the newer releases.

Now that I see how everything is set up, it's clear that only Polverone is able to make changes to the live version, which isn't necessarily a problem. But that means that the only thing I could potentially solve in XMB is the forum spam, and he'd have to do the rest. I can only make recommendations if I understand how the database is set up.

But if I'm going to bother messing around with that, I may as well also look into the possibility of migrating to phpBB while I'm at it, since that would solve dozens of problems at once (though it'd be a lot of work).

I looked into other forum software, but my top priority was keeping a similar look and feel to the site. I like phpBB because it's about the same age as XMB, so it's still pretty old-school looking. When I think of chemistry, I don't necessarily think of polished, high-tech stuff. I think of JACS articles from 1910, chemists recklessly discovering new generations of unstable explosives, and stained wooden benchtops. It's has sort of a steampunk vibe to it, I guess. And the word "madness" hearkens back to that day, even.

If we change any other parts of the site in any capacity, I feel like it should make this site look like it was designed in the 1890s, not the 1990s.

Of course these are all just my own opinions, and you're all welcome to ridicule them.

[Edited on 2/16/18 by Melgar]

woelen - 16-2-2018 at 00:02

Melgar, what you describe sounds wonderful. Having an 1890's look would just be awesome. I indeed think that fits sciencemadness very well. It would be nice to have a retro look and feel, and at the same time showing high level/quality and responsibility.

For me, the precise hosting system is not important, as long as it can be maintained easily, is future-proof and reliable. Just discuss it with Polverone, as the site's technical owner and creator.

Migration to phpBB certainly must be possible. I gave Polverone a link to a commercial service who can do it for us and who only ask money, once everything is set up correctly and tested. It would be even better if you could do it. You know many peculiarities of the forum and the community.

I cannot help with my time, but I can help with some nice stuff to play with for you, so that in times when migration to phpBB becomes too hot for a while you can cool down with some relaxed and fun chem-playing :P

NEMO-Chemistry - 16-2-2018 at 04:39

I got a lot of experience with phpbb, it has alot of power and add on features. Its extremely easy to migrate most platforms too, V3.2 has some problems, big problems that if you join the developer club on the forum, you get to learn about.

From memory Polverone knows his php, thats going to be handy. The often touted Xenforo is shite beyond comprehension, i wish people wouldnt keep recommending that shit. Woltlab buliten board is better than xenforo in many ways, but they sting you for add ons and the core still has problems.

If it was sorted i would go wolab myself, but i really dont like the business model behind it. So i also use phpbb 3.2. I would use the Git files though, watch the plugins really carefully, some have horrendous security holes.

But the main ones you likely want for here are good, banning people effectively is pretty easy, spammers are quickly dealt with if you grab the ban hammer plugin. Others that are more persistent will need you to go into the backend and fiddle around a bit.

SQL wise phpbb apart from the password side of things, is easy to transfer other boards to. I have migrated from xenforo and woltlab without the plugins and bridges with zero problem.

I have also migrated VB boards to phpbb, those are the hardest to do, VB has some awful quirks and migrating it is a waste of time and effort.

SMB i have never used, i have not seen the sql structure. But its old, so i am guessing the transfer should be easy. If you want a hand or some testing let me know, I got a couple of VPS sitting about doing nothing and a server at home for testing.

You will need to rewrite the front code of the phpbb, well actually its more of a edit. The main defacers have scripts that look for some of the back links in phpbb3.2, take those out and your not on the radar.

Even the latest build (git build), of phpbb3 is not so different from the other cores, which is good, its a solid core tried and tested, just about all problems with security are plug in related, except one hole they are still trying to fix.

But its good software, surprising its been free all this time! If like many here you can code php then it knocks xenforo out the water. Xenforo is for those that need training wheels on a forum software, its over priced and over rated. Its used a great deal because most forums of size are owned by marketing companies.

Its kind of set up for them, its really easy to do adds etc, which you dont need here.

I take it you have joined area 51 on the phpbb forum?

Melgar - 19-2-2018 at 11:11

Okay, so I pulled the virtual machine into an AWS server, and got it running there. Logged into it, and dumped the database. It was about 9GB in size. This seemed way too large. Turns out, file attachments are stored in the database rather than as files. While this is somewhat easier to configure and get working, and is fine for small attachments here and there, there would be significant speed and cost benefits to storing large static files on S3 rather than on the higher-write-speed SSD that the database is stored on. I already set up an S3 science madness repo, and currently have the virtual machine (13 GB file) hosted on it:

https://s3.amazonaws.com/sciencemadness/SM-offline2.ova

Straightforward, no? It's optimized for hosting large static files, that you write once, and where you read the entire file whenever you access it. Once you host it, you can't edit it either, but you can remove it and replace it with a new one. It's perfect for things like attachments though, that you don't want to edit anyway. And cost is about $0.03 per GB per month.

Right now, I'm trying to migrate the database into PostgreSQL, which is somewhat more stable, faster, and more powerful than mySQL. It does have a steeper learning curve, but I'm more familiar with using it already, and Polverone has said that he prefers it as well. Also, phpBB is compatible with both. To convert the database to work with phpBB, I'm going to have to use some pretty fancy SQL, and that's something Postgres excels at.

edit: I exported just the attachments table, then set all the attachment data to null, then exported the database again. (ie, exported the entire database minus the attachment data that was stored in it) Now it's 273 MB as opposed to 9 GB. Far more manageable. :D

The plan would be to parse the attachments table into actual files and host those in a normal file hosting system rather than have them be columns in a database.

[Edited on 2/19/18 by Melgar]

Melgar - 19-2-2018 at 19:37

Okay, so work is underway on converting the XMB database into something useable with phpBB. Trouble is, mySQL uses a bunch of weird workarounds for implementing features that PostgreSQL does much better, and even with their "PostgreSQL compatibility" export, is still incompatible with Postgres without manually editing the schema. A bunch of things would be much easier to do manually than they would be to script. Some things, I'm not sure how important they'd be to transition over, like polls, for instance. We don't use them much. Fortunately, I got pretty good at using regular expressions, which allow for very flexible find and replace functionality, but it's still not simple or quick.

Some things I definitely want to preserve, like external links to discussions on this site. That's a high priority. External links to files that are hosted here seem like a lower priority though. For internal links, it should work fine to just modify the database text to point to the new URL. And I can't imagine external super common.

I'm also setting this up in a way that should drastically lower the hosting costs for Polverone, as long as he doesn't mind switching from Linode to AWS. He still hasn't read my last three U2U messages, so I guess I don't know where he stands on that. But considering his job, there's no way he doesn't already have experience with it.

I definitely promise to pay attention to detail. I have many fond memories of typing some sort of reaction into Google and getting excited when I'd see that one of the results was a discussion from SM. And the two times I posted undocumented reactions and people didn't believe me at first. Then further experimentation by members vindicated me. I certainly wouldn't want to see some half-witted script kiddie break anything.

Okay, now I'm motivated to work on the site some more again. :D Even if it does mean manually fixing all of mySQL's ancient index creation SQL. At least it only needs to be done once, and then it'd be used as a template for any future importing and exporting.

JJay - 19-2-2018 at 22:58

MySQL's Postgres export mode really is horrible. But why use Postgres? It's a forum; you don't need common table expressions and time series analysis. MySQL is optimized for document retrievals and whatnot. It actually has better performance than Postgres when doing simple queries of the sort used by forum software.


Melgar - 19-2-2018 at 23:42

Main reason is that I have four years of experience using Postgres, and Polverone prefers using it too. It's not actually that annoying though, I just had to separate the schema (CREATE TABLEs mostly) from the data SQL. The hard part of doing the upgrade is going to be linking the whole thing together in whatever way phpBB does it. I foresee having to do a lot of complex SQL, and possibly even writing some SQL functions. I wouldn't know where to start with mySQL, but I'm somewhat of an expert in Postgres.

Straight out of the box with no optimization, maybe mySQL works better, but if you use a bunch of partial indices for the most common queries, and use some of the more recently-released indexing algorithms as well, I can't imagine you wouldn't get better performance from Postgres.

Mainly though, it'd be a lot less work for me since I already know it, and even if I was just as experienced with mySQL, the work is mostly going to be in figuring out how to convert the old data into the new format.

JJay - 20-2-2018 at 00:58

You do realize that if you use a "bunch" of indexes, you use up cache space and consume more disk bandwidth, right? You also increase the distance that the read-write heads have to travel.

I have in excess of 20 years of database experience, Postgres about 4, MySQL more like 12. I'd say use Postgres over MySQL for almost anything else, but MySQL is really probably better for forums and blogs.

Melgar - 20-2-2018 at 02:44

Oh, interesting. Would you be interested in auditing the test server once I get everything working?

The difference in performance would be so minor that nobody would be able to tell, I'm sure you'd agree. Like I said, main reason I'm using it is because I know how to. A mySQL database configured by someone who doesn't know what he's doing will certainly run slower than a Postgres database configured by someone who knows what he's doing. And the Postgres query planner will just ignore indices that don't improve lookup speed. The only real downside to having a lot of them is that write speed is reduced, since both the tables and the indices have to be written to.

Quote:
You also increase the distance that the read-write heads have to travel.

Only on magnetic disks. It's been a while since I've seen a host not offer SSD-based storage for the primary disk. :P

Granted, this probably wouldn't apply here, but have you seen anything recently about how the optimizations to Postgres GIN have made it much more useful for multicolumn indexing? At my last job, I was able to slash 100+ MB indexes down to like 16 MB. The main downside is that you can't sort using that index, and you'll only see gains like that when the number of distinct values in the column is relatively small. Anyway, article is here:

http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bi...

Any suggestions on how to implement HTTPS? Most sites stick to one or the other, and usually HTTPS. On the other hand, sometimes when you connect to a public wifi and you need to bring up the screen that shows you that "connect" button, it won't come up if you go to an HTTPS site. There are fewer and fewer sites that have any components that aren't encrypted. So I've always gone to sciencemadness.org when HTTPS was interfering with my ability to log on to public wifi. But that's probably not the best reason to choose a system. :P

JJay - 20-2-2018 at 11:43

I've done migrations from MySQL to Postgres and have some idea of how hard they actually are and how long they should take.

You can actually get higher throughput for less if you use HDDs on Amazon: https://aws.amazon.com/blogs/aws/amazon-ebs-update-new-cold-...

Of course, this discussion is entirely academic. Amazon has database services that would usually provide better performance than a database server running on a highly optimized and expensive general-purpose compute instance.

RawWork - 20-2-2018 at 11:56

I noticed everybody today uses amazon. All android apps, even google ones. When I used NoRoot Firewall to monitor my connections, I expected I will see different domains being accessed for each app or developer (e.g. google.com or similar for google, facebook.com for facebook) but unfortunately almost all apps accesed the same domain: *.amazonaws.com
Second most common was *.1e400.net...

JJay - 20-2-2018 at 14:29

Amazon's cloud services are the best. Google's come close, and Google can do some things that Amazon can't (the Cloud Machine Learning Engine and Cloud Spanner come to mind). Also, Google gives everyone a free cloud node. Yet most of Amazon's services are more advanced than Google's. Amazon also allows doing things like building a network appliance using a standard file format in VirtualBox and uploading it directly to create a machine instance (Google requires uploading a disk image).

Melgar - 20-2-2018 at 22:47

Personally, I made most extensive use of Amazon's S3, but generally like the fact that when you start a project, you can add services as you need them rather than try to estimate what you'll eventually use.

The main annoying thing about AWS is that they use a lot of their own terminology that you have to learn, and they tend to solve problems by adding complexity.

I noticed AWS has machine learning now too. It must have been added just recently.

JJay - 21-2-2018 at 01:10

AWS has had it for a lot longer actually, but comparing their machine learning capabilities to Google's is like comparing a BB gun to a tank.

froot - 21-2-2018 at 11:04

Hosting is not really as big an issue as software conversion. You can move hosts before or after converting, or not converting, having worked with phpbb as an admin let me suggest another option.

Install the latest Phpbb and sql database alongside this forum and make front end access to that.

Find the most appropriate 'skin' and set up the discussion layout as it is here. One can also try copy, edit and re-use layout templates from this forum to make it look almost identical.

In each topic section add another forum/topic at the bottom calling it 'archive' or something with a link to the same forum/topic on this board which has been closed from posting and for reference only.

If a user wants to contribute to a topic here in the archive you can possibly write a script that automatically opens a new topic with a link to where reply was meant to go in the new forum and the discussion can continue relatively seamlessly.

Once you've set everything up point the domain to the new phpbb forum and make the old one inaccessible without registration.

The advantages are:

All links in the old database remain valid.

No need to double up on temporary disk space while doing the conversion.

Much quicker.

Nothing lost.

Negligible downtime.

This board's vulnerabilities can be hidden behind the new forum's security.

Some of this board's functionality becomes dispensible.

The disadvantages:

Users will have to re-register on the new forum. Once they've registered they will be able to access their old accounts on this board. Most people will carry their old usernames with them but there will always be the anomalies.

I think it's a no-brainer unless future hosting will render XMB absolutely dead.




Melgar - 21-2-2018 at 14:44

Yeah, I've adminned phpBB too. The difference seems to be that I'm also a programmer. Since all of this software is open source, and because the data exists entirely in SQL tables that can be easily queried and manipulated, it's possible to migrate from XMB to phpBB without any loss of data. The main issue is that Polverone controls access to the server, and is busy most of the time. However, he's distributed a virtual machine containing SM archives, with passwords reset and private messages erased. That's the database that I have. The goal here is to refactor the database to phpBB's format, then just make sure that can be done with a script. Then Polverone could run the scripts with the backup database on a new server, and it should be up and running.

I've been busy with other things the last 2-3 days, but it's in the works.

madcedar - 21-2-2018 at 16:26

Melgar, you're a star mate. If you are successful with this transfer does that mean we'll be able to view the forum with ease on a mobile device?

Cheers

Melgar - 8-3-2018 at 01:29

Well, I hope I finish it. If it works, then it'll be able to do whatever the most recent version of phpBB can do. (I haven't used it in a while, and I honestly don't know)

Melgar - 9-3-2018 at 19:22

I concluded that the best way to migrate the data is by having mySQL output a tab-delineated csv-type file, then import that data into postgres using COPY. It's sort of annoying how XMB will have a SET-type column that contains two values, "on" and "off", but heaven forbid they just make the column a boolean. Then they make the default an empty string, which is totally different from NULL, and not part of the set anyway. I should probably just make all those weird columns strings, then write functions to convert them to their phpBB counterparts. I'll have to do that anyway for a lot of data fields, and it'd be easy to just make that part of the process. If possible, I'd like to have php handlers that are either at the same url as the XMB versions, or at least redirect to the phpBB urls. Thread and post IDs should be easy enough to keep unchanged.

I've been running into issues of money, actually. My internet has been predominantly through tethering to my phone, and the data charges have added up. I just got $200 via paypal from my aspiring alchemist friend, which helps. And I'm having to spend a lot of time compiling evidence against my former employer just to collect unemployment. But once that happens, I should get months of unemployment all at once. It's been a rough couple of months though.

If anyone wants to help, it would be awesome if someone could purchase a Dash license for me. It's an offline documentation browser, which would allow me to download all the programming documentation I'd need via public wifi, then use it for reference when I'm paying for data via my phone. So far, I've picked up the tab for AWS fees, but fortunately they haven't been much.

Dash licenses can be used on multiple devices too, so if you have a license or want one for yourself, you can share it. I used to have a license, but then they released a new version and required you to purchase an upgrade. I missed the deadline where I could upgrade at a lower price though.

Melgar - 7-4-2018 at 20:39

Good news! I figured out a straightforward way to copy all the data over. First I had to completely rewrite the schema, to make sure data types were compatible. But I finally got the "forums" table, the "members" table, and the "posts" table to import successfully into Postgres. The data had to be exported via a CSV though, otherwise the encoding got messed up, and none of the fixes seemed easy.

To output a table in mySQL as CSV, this was how I did it:

Code:
SELECT * INTO OUTFILE '/tmp/sm_threads.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM xmb_threads;


Then I just uploaded all the CSVs to the host machine via scp to import them. A typical Postgres command to import a CSV is:

Code:
\copy xmb_posts FROM '/home/ubuntu/sm_posts.csv' WITH (FORMAT CSV, ESCAPE '\')


You can also use the standard SQL COPY command, but there are issues with permissions that are annoying.

I'm also realizing that most of the tables won't even need to be transferred over, like "ban" tables, and XMB settings tables. Easier just to figure that out via the web interface.

The important part is that I've figured out how to extract data from really old software that I'm unfamiliar with, and import it all into a more modern database system.

Once the imported tables are in the same database as the (empty) phpBB tables, I can start writing functions to format the XMB data for phpBB. But I need to play around with phpBB a bit first.

Attachments are stored really inefficiently, so I plan to write a script to store them in AWS S3, which charges about 1/3 the price. It also lets us get away with a much smaller primary volume.

My plan was to use nginx instead of Apache, mainly because you can make it as simple or complex as you want, whereas Apache is always pretty complex. Also, I'm more familiar with it, and the learning curve seems lower to me. They're practically neck and neck as far as popularity, but I've always had an easier time getting necessarily-bizarre configurations working with nginx.

Presumably Polverone can understand the process well enough to set it up once I do all the legwork.

Would anyone care if I made it HTTPS exclusively, and made "whisper" redirect to "talk"? Also, what's up with our wiki? Seems like half the time I see it, it's full of ads for old sci-fi shows. Easiest would be to host it all one one server, I'd imagine.

DavidJR - 7-4-2018 at 21:02

Quote: Originally posted by Melgar  

Would anyone care if I made it HTTPS exclusively, and made "whisper" redirect to "talk"?

There's absolutely no good reason to have the site accessible via http if https is already set up, and even less reason to have separate sub-paths /talk and /whisper ?! So that sounds like a good idea. Just be sure to set up redirects properly so no links are broken.

Quote: Originally posted by Melgar  

Also, what's up with our wiki? Seems like half the time I see it, it's full of ads for old sci-fi shows. Easiest would be to host it all one one server, I'd imagine.

Do you mean the old wikia one? Personally I think that should be deleted since the wiki moved 'home'

Melgar - 8-4-2018 at 11:58

Okay, basically what I'm not sure of is how much of this stuff Polverone can figure out on his own. I'd appreciate it if we could just spend some time on the phone when we're ready to switch. Just because I don't know him well enough to know what he's comfortable with. On the one hand, he's certainly someone I look up to, but on the other hand, it would really suck if I overestimated him Also, I accidentally deleted the attachment data from the virtual machine, but Polverone should be able to write a script easy enough to convert them into files. Then it'd just be a matter of using some regex magic to fix all the links to attachments. I believe attachments are in base64 format, which can be easily decoded via the command line like so:

printf "SGV5LCB5b3UgY2FuIHVzZSBhIHRlcm1pbmFsISBXYW50IHRvIGhlbHAgdXBncmFkZSBzY2llbmNlbWFkbmVzcy5vcmc/IDpE" | base64 -d

The "-d" is capitalized on Mac/BSD.

If it's binary, it can just be piped to an output file, and a script could handle them easily enough. My real point is that I don't want to reset the virtual machine just to get the attachments table, since it wouldn't be complete anyway, and parsing it into files would be simple. I actually have the metadata for the files, but erased to data column for attachments, which was 90% of the database size.

I'm also looking into linking an ftp client to the users database, and then figuring out some criteria for posting there. (100 posts over 3 months, maybe?) Files would be available automatically via the s3 bucket. I set up s3fs and confirmed that it works, I just haven't figured out the best way to make FTP work, which is a low priority anyway.

Next step is configuring phpBB before setting it up, to make sure that paths are as close to they are now as possible. Fortunately, phpBB is open-source, and changing something little like that should be simple.

Tsjerk - 9-4-2018 at 02:01

@Melgar; I didn't read this thread before, but you are a hero. Don't forget your cape!

Which version of phpBB are you using? 3.2.2?

Edit; could you let me know when you have the board running somewhere? Also when it is not for public yet? I want to try some attacks. I don't think I can do any damage, but I like to try.


[Edited on 9-4-2018 by Tsjerk]

Melgar - 19-4-2018 at 11:01

I got a very rudimentary mockup of this site in phpBB. Don't worry about the theme though, that can easily be customized:

http://ec2-184-72-122-13.compute-1.amazonaws.com/talk/index....

I need a bunch of sample posts and such in order to reverse engineer our older data, so have at it. Note that permissions have not been set properly, but there's nothing there to read yet, so that's not actually a problem.

Feedback is appreciated. If you're not an obvious troll, let me know and I'll give you mod/admin privileges. (Do it here though, so I can confirm who you are)

Corrosive Joeseph - 19-4-2018 at 11:14

Looks exactly like an engine management forum I know.
Maybe this might give peeps an example preview

http://www.hrtuning.com/forum/


/CJ

Melgar - 19-4-2018 at 15:26

Yep, it says "phpBB" right at the bottom, and it clearly uses the default theme. It's really easy to have a new theme for a board though, and there are plenty of public ones out there to draw from. The point was that I need people to post and register and all that on this board, in order to make it easier to reverse-engineer the XMB format.

My point was to try to make everything seem like the current boards, but with a different theme (that can easily be changed).

Tsjerk - 19-4-2018 at 15:38

Hello Melgar, nice going! I just registered under "Tsjerk". I will make some posts

Melgar - 19-4-2018 at 17:32

Thanks! The database formats are different enough that I've been having to keep an eye on them to see how activity in them changes table data.

I appreciate ANY feedback on the way the site is set up and looks, EXCEPT as it relates to the theme. No sense in building a custom them if nobody ends up being interested in using it. So yeah, please have all sorts of discussions there, including trolling and being mean and posting "foolproof" meth syntheses you thought up after binge-watching Breaking Bad. :P

The content doesn't matter. I just need to be able to compare how the two database systems store data

madcedar - 20-4-2018 at 03:17

One feature that I love about the current forum site is the "Today's Posts" button at the top. Is it possible to add this functionality to the new site or is that part of a theme?

Edit: It's okay, I've just noticed that the new software has an Active Topics button that can be filtered to one day.

[Edited on 20-4-2018 by madcedar]

Melgar - 20-4-2018 at 17:49

Anyone interested in commissioning an artist to redo the theme for this site? We'd try to keep a similar look and feel, though perhaps also with the look and feel of a well-utilized workshop somehow. Everyone would be able to contribute ideas, if they wanted to.

This wouldn't happen until the site is fully transferred, but it would be nice to know if that's something that's in the works.

Also, please, everyone, go there and register and try and use as many features as you can. I need data to reverse engineer this shit.

chemplayer... - 20-4-2018 at 19:48

Had a play and it feels good. Differences I can see are:

- Less intuitive to add hyperlinks, images, etc. The buttons at the top of the full post editing screen add [] square bracket code but it's hard to know how to use these (e.g. lists are not easy to do without experimentation), and no visualisation on-screen of the result unless you press 'preview'.
- Sometimes getting a "The submitted form was invalid. Try submitting again." error message when trying to use the embedded image tag function.
- Minor cosmetic stuff like no user 'location / mood' and no smilies in the 'quick reply' section.

Melgar - 21-4-2018 at 00:04

@chemplayer I like that hovering over the buttons will show you the context, so you don't even need to use them.

There might be some sort of restriction on new users including images. I was hoping one of the admins would take a look at the admin tools over there.

Right now, I'm trying to set up a better attachment system. If it works, you guys are really gonna like it. :D

streety - 17-5-2018 at 04:50

Quote: Originally posted by Tsjerk  
@Melgar; I didn't read this thread before, but you are a hero. Don't forget your cape!


I agree with Tsjerk on this.

One area that might need some extra testing is search. I have in the past been a member on a phpBB forum where the search became unusable due to the size of the forum. It was probably smaller than SM. I actually built a dedicated search engine just for the forum to replace the default.

That was back in 2009-2010 so hopefully no longer an issue but something to check.

sodium_stearate - 17-5-2018 at 06:29

I honestly do not understand why anything needs to
be done to this place.

It seems to all work very well just the way it presently is.

Why attempt to fix it, if it's not broken?

Again, my suggestion is simply to leave well enough alone!:D

streety - 17-5-2018 at 10:27

How would we know if it is broken or not? For any security issues any issues will only become obvious after the site is hacked. Far better to be proactive.

One approach would be to just patch this site whenever an issue is discovered. There are a small number of people able to do that though.

By moving to a more popular forum software there is a larger pool of people watching for issues and able to fix them.

It might be this site is fine. No security vulnerabilities to be found. However, a quick look at the html source for just this page does not fill me with confidence.

sodium_stearate - 17-5-2018 at 17:49

Kind of sounds to me like making more work
for one's self (and then later on, the users).

Sounds to me like inventing possible problems simply
as an excuse to change the whole works here at the forum.

I say, leave it alone! It works. It works well.

Why on earth are people these days so eager to always
change things just for the sake of change? Take something
that works well and is highly reliable, and then replace it
with some piece of junk "cloud" software deal that probably
does not work worth a hang....

That is my impression of it.

Possibly the present software on this forum works
just a little *too* well?

Maybe having it work well does not quite suit some
people around here.

Maybe having something that is all hosed up and does not
work well at all is somehow preferable?

Seriously. Just leave it alone.

Melgar - 17-5-2018 at 19:14

Okay, well one of the issues is that spambots come here all the time, and while the admins do remove them, whenever you report a post, that sends U2U messages to EVERY admin and moderator. The admin tools kind of suck, really. Also, the only thing preventing this site from getting hacked and getting every U2U message stolen is the fact that most people don't actually care. We actually had a hacker break into our site and make himself an admin and try to demand ransom, but the mods just kept banning him until he got tired of making messes and left. But his coding skills were really bad, which was the main thing that limited what he could do. If someone with better coding skills wanted to do some real damage though, they'd certainly be able to.

I've been working on this for a while, and think it might help to write up a "diagnosis" or something, where I lay out the different pieces that would have to be reverse-engineered and reassembled. It's doable for sure, and it's not like I'm too busy to do it, it's just hard to stay motivated. Also, my internet connection can be kind of crappy, which makes it hard to remote into the web server I set up. But I have been on it.

CouchHatter - 17-5-2018 at 20:56

The site has zero mobile optimization, which seems to be affecting a portion of the total users. I would like to see a migration to a more popular forum software, based on others' assessments of its current state.
Quote: Originally posted by Polverone  
I appreciate the thought, but I am well-funded from my day job nowadays. I don't need money. What I am short of is time.

Our forum software is ancient. It was one of maybe two "good" open source options back in 2002. It's not good any more, security-wise. (I actually rather like its minimalism apart from security issues.) Transitioning to another forum has been oft-discussed but it's a monumental undertaking.

In the mean time, if you are familiar with PHP, I would gladly welcome security audits and enhancements in the form of pull requests against this github repository:

https://github.com/mattbernst/xmbforum


It sounds to me like Melgar is doing the Lord's work.

Tsjerk - 18-5-2018 at 14:54

Maybe all members could mail Melgar a token of appreciation? Doesn't have to be much, should be non suspicious to customs, but I for example have quite some chemicals which are not too everyday, and I would happily send some to the hero who is getting our forum fixed.

Anything to help Melgar get reminded that what he does is appreciated.

Without transfer soon or late someone is going to do a lot of damage.

streety - 18-5-2018 at 17:35

Quote: Originally posted by Melgar  
I've been working on this for a while, and think it might help to write up a "diagnosis" or something, where I lay out the different pieces that would have to be reverse-engineered and reassembled.


Absolutely!

This type of thing interests me and I could easily see it sucking me in. No point needlessly repeating your work though.

Perhaps put it up as a github repository to facilitate collaboration?

JJay - 19-5-2018 at 11:19

My major concern with this project is that it shouldn't have taken more than a weekend to complete, and there are also serious concerns about the security practices and individuals involved. A github repository is pretty much the minimum measure that should be taken to ensure that no one is secretly inserting backdoors or encrypting new passwords with MD5, etc.

streety - 19-5-2018 at 16:42

Quote: Originally posted by JJay  
My major concern with this project is that it shouldn't have taken more than a weekend to complete


To me, that seems a little ambitious but I'm a scientist who programs as necessary rather than as my full time profession.

Quote: Originally posted by JJay  
there are also serious concerns about the security practices and individuals involved. A github repository is pretty much the minimum measure that should be taken


From reading this thread and a few other posts on the subject I think the intention is that a guide would be created so that Polverone can migrate the site without having to solve every problem. Addressing each issue will take a lot of work but when it's working it should be a straightforward process. It would be very difficult to hide a backdoor in the directions. If the forum needs to be customized at all it would be a simple process to look at the difference from the official phpBB release.

Quote: Originally posted by JJay  
encrypting new passwords with MD5, etc.


This is how passwords are currently stored. For example, here is the code for updating a password. It isn't even salted.

How to handle this going forward is an interesting question. The options seem to be:

  1. Continue using unsalted md5 hashes
  2. Silently migrate to something better and update passwords as users log in
  3. Migrate to something better and delete all existing passwords forcing everyone to go through the forgot password process


For something better, phpBB supports several options. Bcrypt is by far the best although the default work factor seems a little low.

Melgar - 20-5-2018 at 04:53

Lol @ JJay.

I basically have to reverse engineer both forum database schemas, (though phpBB's is at least well-documented, I guess) then write scripts and a guide for Polverone to do the transition himself. Also, one of the problems is that he deleted all the U2U messages in the test database, making it a lot harder for me to reverse-engineer that table. I'd been meaning to ask him if he could release a version of the test database where instead of deleting all the U2U messages, he just replaced the text of them with "hello!" Then at least I could work on that and be reasonably confident that it'll work on the real data if it works on the test data.

Also, I need to make sure Polverone is okay with AWS as a host. I've been charged about $7 a month for an Ubuntu server with 100GB of SSD primary storage, (Needed to be able to run the virtual machine) so I think if we did go that route, there would be huge savings on hosting. Also not sure if I'd set up the server with test data, transfer it to Polverone, then he'd import the real data. I don't foresee any problem doing it this way, as it'd eliminate the need for Polverone to set up a server from scratch. I mean, if I'd have wanted everyone's data, I'd have already gotten it. :P (I don't though.)

The trouble with github is that there isn't going to be much coding involved. The hard part is going to be all the SQL commands that would be needed to transform the XMB data such that it works with phpBB, and that will involve a lot of trial and error. However, I could start a github repo for the SQL I've written so far, mostly to export the data from the test server as CSV, create a similar database structure in Postgres, then import all the CSV data into Postgres. Also, we'd probably have to modify the base phpBB installation in order to minimize broken links and such.

@streety: Already figured that part out! phpBB actually checks for several different formats of encrypted passwords, including MD5. I guess way back in the day, phpBB used MD5 too. However, if it detects an MD5 hash, it will automatically bump up the encryption to bcrypt. (I think)

I'm in the process of writing up a proposal where I break down all the different parts that will need to be done. I know it doesn't look like I've done much yet, but I've gotten the XMB databases into Postgres, which was tricky, and I've also done a lot of research into what needs to be done. Still writing my plan of action, mainly. I'll be posting that soon, so you guys can tell me what you think of it, propose changes, etc.

streety - 20-5-2018 at 09:06

Quote: Originally posted by Melgar  
I'd been meaning to ask him if he could release a version of the test database where instead of deleting all the U2U messages, he just replaced the text of them with "hello!" Then at least I could work on that and be reasonably confident that it'll work on the real data if it works on the test data.


I haven't looked at the virtual image but another option would be to copy it, get it running, and populate it with new data. Then do the migration from the copy to test.

Quote: Originally posted by Melgar  
Also, I need to make sure Polverone is okay with AWS as a host. I've been charged about $7 a month for an Ubuntu server with 100GB of SSD primary storage, (Needed to be able to run the virtual machine) so I think if we did go that route, there would be huge savings on hosting.


This is a real apples to oranges comparison. I think you would be in for a nasty surprise. $7/month is probably a t2.nano or a t2.micro with at best a single cpu core and 1 GB of memory. That's fine for a small test server but to handle the higher traffic for the live site something with a little more power is probably needed. That's not the surprise though. The price on EC2 does not include traffic. You pay extra for the traffic. In that way it is quite different to a more traditional host.

I don't know what traffic the site gets so I'm just making up numbers - 10GB/month in and 500GB/month out. This would push up the price to $54.90/month.

Currently, I believe the site is hosted on linode. A server there with an 80GB SSD has 2 CPU cores, 4GB memory and 4TB data transfer all included for just $20/month. If 80GB is too small you can get 160GB for $40/month and CPUs and memory also double.


Quote: Originally posted by Melgar  
Also not sure if I'd set up the server with test data, transfer it to Polverone, then he'd import the real data. I don't foresee any problem doing it this way, as it'd eliminate the need for Polverone to set up a server from scratch. I mean, if I'd have wanted everyone's data, I'd have already gotten it. :P (I don't though.)


This seems quite fragile and messy. A set of steps including every task would make the process easily repeated and testable. It would also allay any security concerns if the process started from a default version of ubuntu and phpBB.

Quote: Originally posted by Melgar  
The trouble with github is that there isn't going to be much coding involved. The hard part is going to be all the SQL commands that would be needed to transform the XMB data such that it works with phpBB, and that will involve a lot of trial and error. However, I could start a github repo for the SQL I've written so far, mostly to export the data from the test server as CSV, create a similar database structure in Postgres, then import all the CSV data into Postgres. Also, we'd probably have to modify the base phpBB installation in order to minimize broken links and such.


Doesn't matter if it isn't going to be computer code. You could almost think of Polverone as the human "computer" your instructions will run on. :D

The advantage of something like git is you can go through multiple iterations and keep track of where you have been. It's perfect for trial and error.

Great to read you are already thinking about the password hashes.

Melgar - 20-5-2018 at 18:26

Yeah, I've used git extensively. The trouble is that the process I'd use for migrating the tables isn't exactly the sort of process git helps with. It's more of a trial-and-error type of process where you look at a lot of data and try to ascertain how it fits into the larger process, based on the patterns you see in it. After a while, it gets to be like in The Matrix, where the guy's looking at a screen full of changing numbers, and describing the people they represent. Fortunately, it's pretty rare to lose data permanently if you're not stupid about making backups and copies.

I will start at least two git repos though. One for the list of commands/scripts to migrate the data from XMB/MySQL to phpBB/PostgreSQL, and another for a custom version of phpBB.

As far as AWS, there's a reason half the internet runs on it. Although Linode gives you a flat rate, they can only do this because only a fraction of customers ever approach hitting it. AWS, on the other hand, charges only what customers actually use. Say you're off by an order of magnitude in your estimation? After all, this site mainly hosts text-based content. And a bulletin board with the traffic that this one gets is hardly at the high end of what might tax a server. Another advantage of AWS is that you can start off with a low-end instance, and can easily add services or upgrade services when you need them, rather than guessing what you might end up using eventually, and going with that. There are other nice things too, like you never have to worry about running out of disk space from attachments and other static files, since you'd just move them to S3.

Thinking about it further, I uploaded the entire virtual machine image to S3, which adds 40GB to the total space I'm using. And I logged in and downloaded huge amounts of data, including the virtual machine multiple times if I'm not mistaken. I should see if I can find some breakdown of the charges on my account, and see what it comes down to, just out of curiosity.

One thing about github is that you have to have a paid account to have a private repo. Probably no need though, right?

streety - 21-5-2018 at 16:34

I think a public repo should be fine but there are other free options if private is important. I think bitbucket is probably the largest.

For the type of scaling you are talking about any provider would be sufficient. For the setup being discussed here the work involved in adding an extra server and utilizing it on AWS or linode or anywhere else is going to be mostly changing the application. AWS would only have a big advantage if using the different services from the beginning, S3 for file hosting, RDS instead of postgresql, etc.

For the traffic it was just a guess and running some numbers through the calculator so it could well be very wrong. The traffic may well be relatively low and the costs better than alternatives. There is no upper limit though. If there is a spike in traffic for any reason the costs could go up dramatically. With conventional hosts you know what you are spending ahead of time.


Melgar - 21-5-2018 at 19:08

When I ended up looking at the contents of the virtual machine, out of 12.2 GB of the space that it takes up, 9 GB of that was attachments, in the MySQL database. Without attachments, the MySQL database is only 273 MB. Looking at the $7 a month I'm paying for hosting, almost all of that is because of the 100 GB SSD that I'm using. Of course, the only reason the SSD is so big is because it had to accommodate both its own software and virtual machines. I had to host both the original virtual machine file, the working virtual machine, the data that was extracted from the virtual machine via SQL and CSV files (both inside and outside of the virtual machine), and the working data in the host database (both in XMB and phpBB format), all at once. When everything goes live, that will no longer be the case.

I've used Linode, DigitalOcean, Rackspace, and AWS, and AWS seemed like it would require me to pay the least amount of my own money out-of-pocket while I tinkered around with the data trying to get everything live. That, and S3 is nice for static file hosting, which is maybe 70% of the data in the virtual machine.

I set up a github repo containing the SQL code to set up a postgres database with tables that can store the data from XMB CSV dumps. CSV dumps were MUCH easier to work with than SQL dumps, mainly because XMB hails from an era when UTF-8 wasn't standard. Although there are comments indicating it was automatically dumped from MySQL, I basically had to go through the whole thing and rewrite it by hand anyway, which is the main reason I thought it was important enough to include:

https://github.com/toldani/sm-transition

I'll add more files and comments once I'm more sure of the process myself, obviously. I'm using Ubuntu 16.04 LTS, php-7.0, PostgreSQL 9.5, and phpBB 3.2.2.

streety - 22-5-2018 at 04:33

Awesome! I've starred the repository and will take a look/give it a run in the next few days. I have the same username on github as here.

AWS is perfect for getting something like this up and running, my concern was more for the future when it is in the harsh glare of the larger world.

Are you thinking of going to S3 for hosting attachments from the beginning? wasabi also looks interesting. Faster than S3 but at Glacier like prices and data egress is free.

Melgar - 14-6-2018 at 19:23

If anyone wants the .pem file for accessing the AWS server, PM me with your email. This is just a test server anyway, I suppose, and so there's no need to be too careful about it.

I've been thinking about how to store the attachments such that their actual filenames are used, despite the fact that there can be duplicates. XMB seems to use some redirect mechanism that ensures that the filename is correct when downloading attachments, and that would need to be carried over somehow with the S3 attachment system. There's a phpBB plugin that allows S3 to work as the attachment system, so I'd kind of like to continue using that route, because it seems like there would be the least resistance.

I registered the "sciencemadness" S3 subdomain, so for the attachments, I'd probably just figure out how to transfer that whole system over to a different account, if Polverone wants to keep it under his name (which I assume he does). The attachments aren't exactly a secret anyway, I'd just need to maintain a consistent system for storing the attachments.

streety - 15-6-2018 at 15:32

Quote: Originally posted by Melgar  
If anyone wants the .pem file for accessing the AWS server, PM me with your email. This is just a test server anyway, I suppose, and so there's no need to be too careful about it.


This seems like a bad idea.

If anyone wants access I suggest you ask for their public key and then add that to the list of authorized keys. Your private key should be just that ... private.

Quote: Originally posted by Melgar  
I registered the "sciencemadness" S3 subdomain, so for the attachments, I'd probably just figure out how to transfer that whole system over to a different account, if Polverone wants to keep it under his name (which I assume he does). The attachments aren't exactly a secret anyway, I'd just need to maintain a consistent system for storing the attachments.


Do you mean you got the bucket with the name sciencemadness? Another way to handle this would be to alias a subdomain of sciencemadness.com to the s3 bucket. I'm not sure how either option works with using https though.

I downloaded the backup image and set up Apache, phpBB, and MySQL using docker on my laptop. I started writing a conversion script for the users and then started looking at the forums. Then I realized you intended to change from MySQL to postgresql. Should be easy to change what I have though.

XMB is quite simple. There are some peculiarities but easy enough to grasp. phpBB seems like the polar opposite, it's very complex. I wonder whether it is actually the best option or whether there is a simpler alternative that also has ongoing support.

Melgar - 16-6-2018 at 09:28

Quote: Originally posted by streety  
This seems like a bad idea.

If anyone wants access I suggest you ask for their public key and then add that to the list of authorized keys. Your private key should be just that ... private.

Yeah, that is a better idea. Initially I was thinking it didn't matter because the server wouldn't ultimately be used anyway, and I don't use that key or AWS account for anything else. But you're right that it's bad practice, and it would be a lot easier to just add public keys to the authorized_keys file.
Quote:
Do you mean you got the bucket with the name sciencemadness? Another way to handle this would be to alias a subdomain of sciencemadness.com to the s3 bucket. I'm not sure how either option works with using https though.

Yes, that is what I mean. For attachments, I think it's best to just stick to the S3 bucket, with that URL. The URLs would be automatically generated anyway, and there's a phpBB extension that allows attachments to use S3 for storage.

Quote:
I downloaded the backup image and set up Apache, phpBB, and MySQL using docker on my laptop. I started writing a conversion script for the users and then started looking at the forums. Then I realized you intended to change from MySQL to postgresql. Should be easy to change what I have though.

XMB is quite simple. There are some peculiarities but easy enough to grasp. phpBB seems like the polar opposite, it's very complex. I wonder whether it is actually the best option or whether there is a simpler alternative that also has ongoing support.

That's the problem I got stuck on too. This seems to be caused by all the new features phpBB has added over the years.

The best solution I could come up with is to install phpBB 2.X, transition the XMB data to that (which would be a lot easier), then upgrade phpBB. I hadn't decided whether to try and go direct to phpBB 3.2.2 anyway or do it in two steps, but what you're saying reinforces that it'd be better to do the transition with an older version of phpBB that would have a much more similar database structure.

As far as using PostgreSQL or MySQL, I have a lot of experience doing things like regex find/replace with PostgreSQL, which seems like it'd be necessary to make sure all the bbcode tags convert correctly, all the links work, etc. It seems like it's inevitable that there will be various formatting problems caused by different tag behavior, so I anticipate there will be a lot of global tag-fixing needed.

If anyone DOES want their public key added, let me know and I'll add you.

[Edited on 6/16/18 by Melgar]

streety - 16-6-2018 at 14:21

Quote: Originally posted by Melgar  

That's the problem I got stuck on too. This seems to be caused by all the new features phpBB has added over the years.

The best solution I could come up with is to install phpBB 2.X, transition the XMB data to that (which would be a lot easier), then upgrade phpBB. I hadn't decided whether to try and go direct to phpBB 3.2.2 anyway or do it in two steps, but what you're saying reinforces that it'd be better to do the transition with an older version of phpBB that would have a much more similar database structure.


That's a brilliant idea.

It looks like the oldest version is 2.0.12 on an "official" phpBB page - https://sourceforge.net/projects/phpbb/files/phpBB%202/

There are other sites with earlier copies but not sure I would trust them.


Quote: Originally posted by Melgar  
As far as using PostgreSQL or MySQL, I have a lot of experience doing things like regex find/replace with PostgreSQL, which seems like it'd be necessary to make sure all the bbcode tags convert correctly, all the links work, etc. It seems like it's inevitable that there will be various formatting problems caused by different tag behavior, so I anticipate there will be a lot of global tag-fixing needed.


:o

You want to do all that in SQL? I'm perfectly comfortable with the basics but that is way beyond my level of familiarity.

There are some complex transformations needed for the users table but I used python instead. I'm more familiar with it and there are a greater variety of packages available.

I had intended to include a link to what I had in my previous post but forgot about it. It's up on github now.

User migration: https://github.com/streety/new-sm-forum/blob/master/migrate_... - needs testing but I think this is close.

Forum and post migration: https://github.com/streety/new-sm-forum/blob/master/migrate_... - only just started, I still don't understand the structure fully.

Melgar - 16-6-2018 at 19:33

I've written C extensions and PLPGSQL functions for Postgres, so I'm pretty comfortable with optimizing it. I'm also used to using 20+ GB databases, and SM's database is only 273 MB if you subtract the file attachments. So it's be really simple to do the sort of DB manipulation that'd be needed.

My experience is mainly as a Rails backend dev, so my go-to language is Ruby, but I'm pretty comfortable writing complex SQL queries too. Not much experience with Python, but the use cases are similar to Ruby, from what I've seen. I have used Python a little for something to do with Sublime extensions, but I'm more comfortable with Ruby, mainly due to the nicer console and the more forgiving syntax. But if you just comment your Python code, I'm sure I'd be able to figure it out.

I don't think it should be necessary to go as far back as possible with phpBB. The jump from 2.X to 3.X was the major one, and I don't think that there are significant increases in complexity between the various minor versions of phpBB 2.X. Also, if the goal is to eventually upgrade to phpBB 3.X, then it would be better to have one of the later 2.X versions as a target, since upgrading to 3.X would then be easier.

streety - 18-6-2018 at 17:22

It looks like the latest version in the 2.0.x series is 2.0.23. I doubt I'll have time during the week but I can take a look at the weekend.

Melgar - 21-8-2018 at 14:12

Sorry to leave everyone hanging. The spam problem certainly seems to have become too serious to ignore anymore. Had my development laptop break (Macbook) and it was going to cost $700 to fix. So I picked up this crappy little Lenovo Ideapad for $150 just to hold me over until I can afford something better. It came with Windows 10, which it ran incredibly poorly, but wiping everything and replacing it with Linux really improved its performance. So I spent yesterday getting it set up for writing code, which it's actually been halfway decent at. I do most development via a remote terminal anyway.

Anyway, I've been wanting to try out GCP (Google Cloud / Google's version of Amazon Web Services) for a while, since I've always wondered what AWS would be like if it was scrapped and rebuilt by people who were trying to be helpful. And it really is pretty nice. They had the benefit of learning from Amazon's mistakes, plus Google is head and shoulders above Amazon when it comes to developing software that people actually want to use. Rather than force people to learn their own flavor of SQL, Google has graciously used MySQL and Postgres as the engines powering its relational database service. You even have your choice of which one to use! And Google's documentation is not only up-to-date, but has helpful links that take you to the appropriate part of the control panel scattered throughout it. So yeah, I really like what Google's doing, and I feel way better about supporting their innovation than the sort of bean-counting that characterizes Amazon.

What was I talking about again? Oh right, the SM database. Now my first order of business now that I'm working with GCP is going to be converting the attachment table into files, then storing them in GCP's static data storage system. Then I can reduce the database size from 9GB to 273MB, and the data will be much easier to work with.

IP address for the main GCP server is 35.196.251.155. I'd be glad to add your public keys to the list, if anyone feels they have anything to contribute.

Melgar - 26-8-2018 at 18:02

Now I have the virtual machine running on a Google Cloud instance, and I set up port forwarding so you can access it here:

http://35.185.63.230:8080/smtalk/

I uploaded all the attachments in the database to Google Cloud Storage. Here's an example of one of them:

https://storage.googleapis.com/sciencemadness/attachments/10...

If you want to try checking to see if they're all there, look at any of the attachment links in this forum. There will be two numbers in the link, one of which is the thread id, and the other that's the attachment id. You can tell which one is the thread id because it's in the URL of the page you're on. The filename should constitute the text of the link. The only thing is, I had to sanitize the filesnames (I used the Ruby CGI.escape method) that had special characters in them. So files without spaces or commas or parentheses in their file names are easier to figure out the path for.

streety - 27-8-2018 at 04:30

I'm glad to see you're open to alternative hosting options. You had seen quite set on AWS.

At the risk of throwing a spanner in the works I had been wondering whether phpBB is actually the best target for this effort. Implementing the spam captcha on this forum was somewhat challenging, largely because it was

a) a language I had not used in many years
b) poorly organized
c) lacked separation between data, logic, and presentation
d) not written with modification in mind

phpBB is an improvement but far from a complete solution. I fear writing each of the changes we would need would be a lengthy and difficult process even after the database has been migrated across.

Contrast that with working on the spam monitoring script and there have been times when I've thought it would be easier to re-implement the forum than to migrate it to phpBB. I'm working in a language I know (python) and I can separate functionality across organized files.

How long would it take you to re-implement the forum in a language you know (Ruby) using a modern framework, the MVC pattern, an ORM etc? How long would it take you to make changes later on? It's not obvious to me it would be any longer than migrating to phpBB.

I'm not suggesting we switch to writing our own forum software now but we should get a better idea of how much work will be needed to make the necessary modifications to phpBB. You have the attachments in google cloud storage. I don't know what your next intended step was but I suggest modifying phpBB to store attachments there and then display them.

JJay - 27-8-2018 at 04:48

I use PHP all the time, mainly on Zend (I could use Laravel or something but it wouldn't be my first choice). I have a strong bias against Ruby, but Python is cool. SimpleMachines is pretty popular....

I've worked on forum software before... I don't think it would be very hard to implement a forum, but if I were doing it for myself, I would be tempted to use something like bbPress.

Melgar - 27-8-2018 at 10:26

My thought was that phpBB has been going strong for a very long time, and will probably continue to be maintained in the future. So even if we make changes to the code in places, then if we move on like Polverone has, our successors here won't be stuck with a huge mess on their hands.

The main reason I thought phpBB would be a good choice had more to do with the similarities to the XMB interface. I went around to all the major open-source bulletin board software sites and looked around, trying to rank them in how difficult they might be to get used to for people who don't seem to want things to change very much. Here's how I ranked at least some of them:

1. phpBB https://www.phpbb.com/community/
2. Simple Machines https://www.simplemachines.org/community/
3. myBB https://community.mybb.com/
4. vBulletin https://www.vbulletin.com/forum/
5. Discourse https://meta.discourse.org/

I assumed the consensus among long-time contributors would be that the less "web 2.0" we went with it, the better. Correct me if I'm wrong. I've used bbPress, and don't have any complaints about it. I'd assume there would be some hesitation about tying this site to the Wordpress ecosystem though. When I was a system administrator, I cut our routing errors in half by setting up this route to where the Wordpress login page would be if we were running Wordpress. The security isn't necessarily bad, but it's definitely where hackers are focusing their energies.

And yes, thank you to whoever pushed me in the direction of Google Cloud. I'd incorrectly assumed that since AWS was a huge mess to figure out, that the same might hold true of Google's offerings. Unlike Amazon though, Google actually seems to put some priority on ease-of-use and having good documentation.

As an aside, I think this has a lot to do with their corporate cultures. Amazon is notoriously competitive, even among themselves. So they don't spend a lot of time thinking about how they can improve things for their uses. Google seems to prioritize making things more useful and helpful a lot more, even if they aren't immediately profitable.

Once things are set up, you still spend most of the time using Linux through a terminal over ssh in any case. Google seems to use a less confusing system of authentication between services too. For whatever reason, AWS made you copy and paste this modified JSON object into a textbox to set up your S3 bucket to automatically allow publicly accessible downloads. With Google, not only can you just turn it on from the console, but they have a few really powerful and really useful command-line utilities that are way more intuitive to use for authenticating connections between different services. Like, it makes a lot more sense to log into your secured Linux server, and then use a command line utility there to establish your credentials on that side. That way, you only need to look up the credentials for the other side.

As far as Ruby, I was only planning to use it for converting data between the databases. It's really good for writing "glue code", where you have to make several different applications process each other's data in ways that they aren't quite set up to do. It fills a similar role as Python does, but is more forgiving of syntax. I also use shells a lot, and Ruby does those really well. For the record, here's the Ruby script I used to pull out all the attachment data and save them as files. After doing that, I used Google's command-line tool to transfer the whole attachment filesystem into cloud storage:

https://github.com/toldani/sm-transition/blob/master/parse_a...

I've been making it a point to comment my code too, so even if Ruby isn't your language, it should be pretty obvious how it works.

Loptr - 27-8-2018 at 10:33

Quote: Originally posted by Melgar  
My thought was that phpBB has been going strong for a very long time, and will probably continue to be maintained in the future. So even if we make changes to the code in places, then if we move on like Polverone has, our successors here won't be stuck with a huge mess on their hands.

The main reason I thought phpBB would be a good choice had more to do with the similarities to the XMB interface. I went around to all the major open-source bulletin board software sites and looked around, trying to rank them in how difficult they might be to get used to for people who don't seem to want things to change very much. Here's how I ranked at least some of them:

1. phpBB https://www.phpbb.com/community/
2. Simple Machines https://www.simplemachines.org/community/
3. myBB https://community.mybb.com/
4. vBulletin https://www.vbulletin.com/forum/
5. Discourse https://meta.discourse.org/

I assumed the consensus among long-time contributors would be that the less "web 2.0" we went with it, the better. Correct me if I'm wrong. I've used bbPress, and don't have any complaints about it. I'd assume there would be some hesitation about tying this site to the Wordpress ecosystem though. When I was a system administrator, I cut our routing errors in half by setting up this route to where the Wordpress login page would be if we were running Wordpress. The security isn't necessarily bad, but it's definitely where hackers are focusing their energies.

And yes, thank you to whoever pushed me in the direction of Google Cloud. I'd incorrectly assumed that since AWS was a huge mess to figure out, that the same might hold true of Google's offerings. Unlike Amazon though, Google actually seems to put some priority on ease-of-use and having good documentation.

As an aside, I think this has a lot to do with their corporate cultures. Amazon is notoriously competitive, even among themselves. So they don't spend a lot of time thinking about how they can improve things for their uses. Google seems to prioritize making things more useful and helpful a lot more, even if they aren't immediately profitable.

Once things are set up, you still spend most of the time using Linux through a terminal over ssh in any case. Google seems to use a less confusing system of authentication between services too. For whatever reason, AWS made you copy and paste this modified JSON object into a textbox to set up your S3 bucket to automatically allow publicly accessible downloads. With Google, not only can you just turn it on from the console, but they have a few really powerful and really useful command-line utilities that are way more intuitive to use for authenticating connections between different services. Like, it makes a lot more sense to log into your secured Linux server, and then use a command line utility there to establish your credentials on that side. That way, you only need to look up the credentials for the other side.

As far as Ruby, I was only planning to use it for converting data between the databases. It's really good for writing "glue code", where you have to make several different applications process each other's data in ways that they aren't quite set up to do. It fills a similar role as Python does, but is more forgiving of syntax. I also use shells a lot, and Ruby does those really well. For the record, here's the Ruby script I used to pull out all the attachment data and save them as files. After doing that, I used Google's command-line tool to transfer the whole attachment filesystem into cloud storage:

https://github.com/toldani/sm-transition/blob/master/parse_a...

I've been making it a point to comment my code too, so even if Ruby isn't your language, it should be pretty obvious how it works.


Polverone hasn't moved on, he is here every so often.

Melgar - 27-8-2018 at 11:26

Well, I just meant he's moved on to having other priorities in his life. And if we do the same thing, we want to make sure that most of the software we implement is going to be supported in the future.

phpBB has a nice system for developing extensions, and my impression was that it would offer the most room for the site to evolve. I know there hadn't been any sort of consensus reached yet, but the security and spam problems have gotten so bad that it was clear somebody needed to just grab the bull by the horns. And phpBB was one of the top contenders already, so it seemed like as good a choice as any.

Initially, my plan was to import all the MySQL data into Postgres, then use SQL functions as a means of converting the data. But since I don't know what types of problems I'd run into doing it that way, the way I'm approaching it now is to just use something like Ruby or Python to pull the data in from one database, modify it as needed, then write it to the other database. Considering a Ruby script extracted 9GB of binary attachment data and saved it as files in a few hours, I think having flexibility in our data conversion tools should take precedence. For these one-off scripts, language isn't important, and would fall to whoever writes the scripts. We'd have to just make sure to comment them well enough that it's obvious how they work.

Initially, I was set on using Postgres as the database, since I would have needed to write SQL functions to do the conversion. But now that plans have changed, the database software isn't going to be critical to my choice. I've also been using MySQL more, so if someone had a compelling reason to use MySQL going forward, I wouldn't be opposed.

Apache, though, is a whole other story. After getting lost in a maze of configuration files, trying to figure out which one was responsible for the weird caching behavior that was happening, my vote is for Nginx. Your configuration only needs to be complicated if your setup is! Imagine that!

Any bulletin board software we do go with though, I think we should disable any features where you can "like" or "rate" people's posts. That seems to almost universally divide people up into factions, and I like that we don't have that here.

streety - 28-8-2018 at 14:17

Melgar, I think your selection process is sensible and phpBB probably is the best of the bunch. I just find that somewhat depressing.

I'm glad you're thinking of switching from sql functions to ruby. Hopefully you will be able to use the python functions I put together a while ago.

Quote:
My thought was that phpBB has been going strong for a very long time, and will probably continue to be maintained in the future. So even if we make changes to the code in places, then if we move on like Polverone has, our successors here won't be stuck with a huge mess on their hands.


Quote:
phpBB has a nice system for developing extensions


I haven't taken a good look at the phpBB code or the extension system so the situation may be far better than I fear.

From looking at the database structure though I do worry that due to all the added complexity we don't need and the historical baggage phpBB will be a mess not for the whoever comes after us but for us from day one. From there it will only get worse.

The mention of Zend by JJay reminds me that before working more with python I used Zend for probably a year or so. At the time PHP had a very bad reputation for 'spaghetti code' and the release of the Zend framework, and others like it, seemed to be the turning point. phpBB 3 looks like it was under development long before the PHP community really got its act together.

I think we need hands on experience to know whether phpBB is the right choice for us and a simple extension would be a good first step.


Melgar - 30-8-2018 at 13:35

Here is the latest working link to the phpBB test site

Although it's true that phpBB is based on an older development philosophy, that isn't necessarily a bad thing for us. For one, it means that the databases have quite a bit in common even if there are a lot of new tables that correspond with new features. I attempted to set up a working installation of phpBB 2.x, but it seems to have been designed under php4. Documents seem to indicate that php5 can work with some fixes, but there were issues relating to whether older versions of php can work with nginx.

Along the way, I got more familiar with the phpBB table system, and concluded that it's not as intimidating as it looked. For one, the table structure isn't that different between versions, and the differences mostly come down to things that were added when new features were added. I don't necessarily know what every field does, but they mostly seem to relate to user options and permissions. So I can just register a user with basic permissions, save the values that get stored in these fields, and then populate them with these default values as users get transferred over. This would preserve things like post count, register date, signature, etc.

Another benefit of phpBB's long history is that there used to be a time when encrypted passwords were stored as MD5 hashes. This means that there are still plenty of phpBB installations out there where some of the passwords are MD5-hashed, if a user hasn't logged in for maybe a decade. So phpBB already has code built in that will check if a password is MD5-hashed. Then, if the password hash matches the one stored in the database, it will automatically hash the password again using a stronger algorithm and overwrite the MD5 password hash with that one. It's hard to overstate how convenient this feature is for preserving forum security during a transition.

Another thing that works in our favor is that phpBB uses bbcode, and although the variant it uses is slightly different, they're probably about 90% the same. I wrote some helper classes and a few Ruby functions for converting XMB records to phpBB records. The code isn't that polished yet, but I like how there is just about nothing that Ruby doesn't let you do if you think it would help you solve a problem. It makes it really convenient for banging out quick scripts that you might only use once, but that are a huge help at the time. You can see my progress here:

https://github.com/toldani/sm-transition/blob/master/sql_hel...

I've mostly set up the environment for interfacing with the databases, and also written two functions. One is a hashing function that's generated from the email address and is used by phpBB to ensure that new users register with an email address that hasn't been used already, or get a notice that they've already registered or something. The other is a function to replace the "rquote" tag that's used here, with the quote tag system that phpBB uses. They serve the exact same function, but with phpBB, you just use the regular quote tag but with options specified.

My recommendation is that we continue with phpBB, but try to find a good balance between transferring data using scripts and just setting things up in the phpBB control panel. As an example, I'd put forth moderator privileges. I don't fully understand how they work in phpBB, but I don't think I need to. We'd just transfer every user over with the privileges of a normal registered user, and then use the admin control panel to find and reinstate mod/admin privileges for whoever should have them.

I read your blog post about optimizing the phpBB search function. Your problem is that MySQL hasn't properly implemented full-text indexing until relatively recently. Prior to its implementation, I would have recommended transitioning the database to Postgres and implementing proper full-text indexing there. But since MySQL seems to have implemented full-text indexing in version 5.6, then all you need to do is upgrade that, then set up a proper index. Here's a good place to start:

https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-inde...

I've actually decided to use MySQL for the phpBB installation. I've been studying it a lot lately, and as a result, I've come to realize that many impressions I had about its limitations have been overcome in recent years. And there are definitely benefits to using the same databases on both ends.

Table Conversion

Ultimately, there are only maybe five tables that are important to transfer over on the backend.

xmb_posts: Most important. Message text will have to be processed significantly. Some HTML and bbcode tags will need to be modified, but it's less than I thought. The "rquote" tags will need to be fixed, but I've got that taken care of. "file" tags work the same way in phpBB, just need to be replaced with "attachment".

xmb_threads: This data mainly connects the other tables together, and allows for the grouping of posts into threads. Not much work.

xmb_members: Very important. Stores users, registration dates, post counts, encrypted passwords, etc. Moderate amount of work. Since phpBB can accommodate XMB hashed passwords (MD5) and will even automatically upgrade password security when users log in, this can allow for the most seamless, secure transition possible.

xmb_attachments: Quite important. phpBB stores attachments differently than XMB, in a way that uses the filesystem rather than the SQL tables to hold raw data. By mounting a cloud storage bucket as a volume, this can allow phpBB to store attachments indefinitely without ever having to worry about running out of disk space. The conversion of this table is very straightforward once the filesystem is in place.

xmb_u2u: Also very important. They'll probably have to be processed similar to posts. This presents some difficulty, as the test database I have has been wiped of u2u messages. I can write the code, but I wouldn't know if it worked until it's been tested.

Features Being Researched

Post Edit Time Limit: We want to make sure people can't go back and delete their old posts. You might already be able to implement this in phpBB, but I'd need to import the post data to test it properly. Suffice to say, it's on the table and definitely possible.

Avatars and Smilies: Smilies are replaced on-the-fly in messages, so it's just a matter of someone going in and wiping the phpBB smilies table, then replacing them with whatever images we want. Avatars can be added easily enough, although they can be distracting. I thought we could go with an old-fashioned theme, and convert avatars to grayscale.

Redirecting Links: This is really easy to do in Nginx, which is the open-source web server that I've set up on the test site. The only real alternative is Apache, which I personally hate. Doing it this way though, it's easy to redirect old links to sciencemadness threads, so links continue to work to this site from the rest of the internet.

Theme: If someone wants to help with this project, setting up and configuring themes would really help. I haven't looked into that at all, mainly since I wanted to make sure the backend stuff was possible first. But that's been falling into place more and more lately.

Donations and Such: I had a few people ask about helping or donating money or such. I really didn't want to accept anything from anyone yet, and don't plan to unless there's been enough progress to justify it. But if anyone wants to agree to donate specific amounts of money if and when certain goals get met, I'm sure that would help keep me motivated. I've probably spent about $40-$50 and about a week and a half of work counting research, IT, and programming. Sorry for not emphasizing more of the technical reasons for selecting phpBB, but I tried to cover a bunch of them in this thread. I'm also getting to the point where it's starting to come together on the backend, so expect significant visible progress to be made in the near future.

WGTR - 30-8-2018 at 17:06

Thanks for all your work Melgar. Both myself and probably everybody else (even the ones that are lurking silently) appreciate it a lot. It sounds like you're making some real headway with this, and I'm hopeful (without putting pressure on you) that we'll be ready to migrate after so many years.

SM is a great forum for people of all different backgrounds to get together and discuss mad science. I'm not so sure that there's anything else out there that is really like it. There's a lot of information contained within the threads of the past 16 years or so, and hopefully more to come.


streety - 31-8-2018 at 17:23

I took at a look at the phpBB extension system and it is much better than I thought it would be. Between that and your great idea to target an earlier version and then upgrade it is looking much better than I thought.

I don't think anything like what I did previously for search will be needed. phpBB already comes with support for several options built in. We can try MySQL full text search first and if it doesn't work just change the option in the phpBB admin interface.

Other tables to consider are forums and polls.


j_sum1 - 31-8-2018 at 18:23

I am really encouraged by all of this.

I'd love to hear Polverone's thoughts.
He has not dropped off the scene: he logged in and dealt to some spam this morning.

Melgar - 3-9-2018 at 20:09

I may as well give an update. I've been able to pull in users from the XMB table pretty easily, and preserve just about everything. I'm going to make it a point to keep ids the same, so that thread and post ids will carry over. Same with user ids and attachment ids. One of the troubles I'm running into is that everything seems to be cross-referenced all over the place to the point of significant redundancy, and it's hard to set up the linking when the tables you need to link to haven't been imported yet. But fortunately, phpBB has such a similar underlying table structure to XMB that all the important id columns will carry over. This means that redirecting can be done on a URL just by changing the syntax, and thus wouldn't require any sophisticated lookups.

The code I've been writing in ruby is kind of haphazard, and is the inevitable result of reverse-engineering. It's worked to pull in users though. Next step is properly linking attachments, which shouldn't be difficult. There's a slight difference in the way that thumbnails are handled in phpBB for images, but it's pretty minor. Next up would be threads, then posts. But the attachments table needs to be there first, since importing threads and posts would need to pull data from the attachments table.

I also figured out how to disable editing posts after a certain amount of time. Turns out it was just somewhere in the admin settings, and was pretty simple. I'm using the latest version of phpBB, incidentally, not going with my earlier plan to start with phpBB 2.X and then upgrade. They're really not all that different on the backend, and the table structure is less intimidating now that I've identified the important parts of it. Not to mention, the documentation is updated for new versions, and there's a public wiki that describes table structures;

https://wiki.phpbb.com/Table.phpbb_attachments

Also, I've gotten a lot of good information by reading the comments in the source code.

Melgar - 16-9-2018 at 18:58

So I've been working pretty extensively on reverse-engineering XMB's database format. Fortunately phpBB has some internal tools to reparse all the bbcode once everything's all imported.

I imported all the users and all the attachments. That seemed to work well. Everyone will be able to keep things like their signatures, post counts, even things like location and mood. The only things that might not carry over are maybe some user preferences that don't have counterparts in phpBB. Also, time zones, just because it seemed like too much work having to import the actual timezone abbreviations into phpBB rather than the hour offsets XMB uses. So everyone will just have to fix their timezones manually.

One thing I'm not sure how to handle is the fact that it used to be possible for people to use HTML in their posts. So there are a bunch of old posts like this one:

https://www.sciencemadness.org/whisper/viewthread.php?tid=9&...

Now, I can pretty easily parse that and make it show whatever it's supposed to show. I just have no idea what that is. There seem to be weird bits of html embedded all over the place on this site. Since I'm doing the work anyway, I'd rather parse it and have it be fixed when the conversion goes live. But I've been seeing a bunch of really weird malformed HTML too, which I don't know what to make of. BBCode tags seem to carry over perfectly fine, but the two forums seem to handle attachments differently. I guess I have to convert all the posts over before I'll know if that's a problem. I've been using really slow, unoptimized Ruby code to do conversions, but I doubt that's a problem. It might take a day to finish running, but I don't think anyone would mind.

Also, you Europeans sure do have a lot of special characters that can screw up SQL insertion.

Jackson - 16-9-2018 at 19:21

Would this finally fix the spam?

fusso - 16-9-2018 at 21:42

Quote: Originally posted by Melgar  
One thing I'm not sure how to handle is the fact that it used to be possible for people to use HTML in their posts. So there are a bunch of old posts like this one:

https://www.sciencemadness.org/whisper/viewthread.php?tid=9&...
Isn't imageshack notorious for deleting/privatize uploaded images?

Melgar - 17-9-2018 at 06:33

Quote: Originally posted by Jackson  
Would this finally fix the spam?

Yes. It'd also fix some other security holes.

As for imageshack, none of the links seem to work, so there's nothing I can do about that. Still, it seems like this software used to allow embedded HTML in posts and no longer does, and I'm not sure to what extent I should try to parse it and reencode it as bbcode.

streety - 26-9-2018 at 17:40

I think I signed up here after the backup you are using was created so I'm not in the database. I've created an account. Given how thoroughly defeated the image based CAPTCHAs are it might as well be deactivated.

I've said this already in the 'tired of spam' thread but I'll say it again here: I doubt switching to phpBB by itself will help with the spam. It should make maintenance easier and prevent future security issues but handling the spam is a somewhat separate issue. Hopefully this just means using one or more of the already developed anti-spam extensions. Or, at worse, implementing our own which would still be easier than the equivalent on the current forum software.

For the tasks remaining might it be useful to use the issues feature on the github repository? What do you have on your list?

Taking a quick look I noticed http://35.185.63.230/talk/viewtopic.php?p=396775#p396775 Not sure why that isn't displaying as an image.

Melgar - 28-9-2018 at 19:26

Quote: Originally posted by streety  
I think I signed up here after the backup you are using was created so I'm not in the database. I've created an account. Given how thoroughly defeated the image based CAPTCHAs are it might as well be deactivated.

My experience has been that the bots are only programmed to solve CAPTCHAs on their default settings. If you change the fonts and the background pictures that the CAPTCHA software uses to generate images, that's enough to trip up nearly all of them. I haven't had a single bot register on that site yet, and not for a lack of Chinese and Russian IP addresses requesting pages from the server. Presumably, we would try and see if anything works, rather than give up immediately?

Quote:
I've said this already in the 'tired of spam' thread but I'll say it again here: I doubt switching to phpBB by itself will help with the spam. It should make maintenance easier and prevent future security issues but handling the spam is a somewhat separate issue. Hopefully this just means using one or more of the already developed anti-spam extensions. Or, at worse, implementing our own which would still be easier than the equivalent on the current forum software.

My experience has been that the more popular an anti-spam software package is, the more spambot programmers there are out there trying to write scripts to defeat them. Makes sense, no? Right now, this site doesn't have any anti-spam measures put in place at all, which would probably account for the enormous volume of spam.

Quote:
For the tasks remaining might it be useful to use the issues feature on the github repository? What do you have on your list?

Well, it's only been me so far, and I've been writing mainly in SQL and Ruby. I've been doing more reverse-engineering than anything. As a result, I'll build up a collection of functions until I start seeing an appropriate use for object-oriented programming, then I'll tidy up my code and organize it more into classes with inheritance and so forth. Since you've expressed some interest now, I updated the README.md to explain what I have so far. But I don't know if you'd want to work in Ruby or not.

Quote:
Taking a quick look I noticed http://35.185.63.230/talk/viewtopic.php?p=396775#p396775 Not sure why that isn't displaying as an image.

Yeah, there were some glitches in the automatic phpBB bbcode parser when I used it on the XMB post content. I think it had to do with the fact that it parsed links before it parsed bbcode. phpBB actually stores posts in a sort of XML, where bbcode start tags are surrounded by <s> and </s> and bbcode end tags are surrounded by <e> and </e>. It takes a little getting used to, but there's really not much to it. The image you pointed out is fixed now, I believe. It takes about a minute or two to run a regex find/replace on all the posts in the database, so once I find one pattern that isn't working right, I can just fix all of them at once.

What I'm working on is less of an automatic transfer script and more a set of tools that facilitate the conversion of data between the two message board formats. Right now, it'd work by transferring all the posts over with a few modifications, then use the phpBB parser on the bbcode. Then it'd have to fix all the glitches with that. But now that I understand the phpBB post format a lot better, I'm leaning towards just doing all the parsing on the fly, during the initial transfer. That way, I could also transfer the older messages from back when HTML was enabled in posts, by parsing it and converting it back to bbcode.

So yeah, I guess that means the next major thing on my agenda is writing a more robust text parser that can deal with basic HTML as well as bbcode. But I plan to write that whole thing in Ruby. My mind is drawing a blank right now, but I'm sure there's a lot to do in PHP. You'd probably need access to the server to do very much though, and I'd need your public key for that.

Falling asleep at the keyboard here, sorry. Later.

streety - 29-9-2018 at 04:44

It certainly would be nice if small changes to the CAPTCHA were enough to defeat most/all of the spambots. I logged in and took a look at the admin interface. It seems any admin can change to different plugins including a Q&A CAPTCHA with custom questions. I couldn't see any options to change the font or background image with the currently used plugin but there are definitely options.

It is probably a good idea to have options in place before the board goes live. I assume that after a burst of activity in migrating the board over access to the server might again be limited.

I tested out the recommendation of the visually impaired of sending a message to the admin. I didn't receive anything. Did you? If not, that is something we should fix. We need to make sure that goes to someone who can promptly act on it.

I would not give too much significance to no spam signups despite visits by multiple IP addresses. The bots hunting around for vulnerabilities on random IP addresses are probably different to the bots posting spam on forum boards indexed in the search engines.

What is your workflow when you make changes to the conversion script? It sounds like you are not deleting the phpBB database and starting again. Rather than have another person working on the same server it might be best if I set up a local server and work from that. Then any changes when tested can be applied to the running server.

I'll send you my public key and start posting issues for remaining steps.

j_sum1 - 29-9-2018 at 05:17

It has been mentioned before but I thought it worth saying it again.

One potential method for weeding out bots involves having an invisible field during registration. A human will always leave it blank. A bot will generally fill in everything. So any accounts where this field is non-empty can be instantly deregistered.

This device combined with a captcha will likely eliminate most of the spam problem IMO.

streety - 29-9-2018 at 05:35

I've started posting issues to address at https://github.com/toldani/sm-transition/issues

I think we are less likely to overlook anything in that todo list format than here where the discussion of each issue is interlaced.

By the same thinking it may be useful to compile a list of all the different options for handling spam.

fusso - 29-9-2018 at 14:02

Could we develop our own forum software based on XMB?

[Edited on 29/09/18 by fusso]

streety - 29-9-2018 at 14:27

It really depends on what exactly you mean by develop our own forum software based on XMB.

Adding a basic feature on a single page would be possible and using XMB as a base would make sense. I wrote a modification adding a chemistry inspired CAPTCHA question. Melgar independently did the same.

Adding multiple features would be possible but it wouldn't take much before starting from scratch would make more sense than using XMB as a base.

The XMB software doesn't do very much so it would actually be fairly easy to write an equivalent piece of software. At one point, looking at the complexity of phpBB, I semi-seriously suggested this. Melgar has now gotten so much of the migration done that, to my mind, he is clearly on the right path.

Melgar - 29-9-2018 at 17:24

I set up Q&A captcha in phpBB. I initially put in questions like "what element has the symbol 'C'?", but spambots often work by entering the question into Google and parsing the top answer. And Google is really good at answering those types of easy questions. Now I made them questions like "What does the 'C' mean in 'CO2'?"

Right now, I'm coming at this from a reverse-engineering approach, so I'm looking through XMB data, both raw and on this board. I'd like to be able to parse the HTML from when HTML was enabled on the XMB boards. I'm also looking into downloading the images that are hosted on external servers and saving them locally, so we don't lose those pictures if the hosts go down or people let their accounts lapse.

I looked into the issues you posted, and closed all but the one that would actually take a lot of time. That being improving parsing from the XMB post text to phpBB's format. I've also imported signatures and post/thread titles without removing the escape marks from in front of single and double quotes, so we just need to make sure to remember to run those fields through the unescape function when importing data.

RogueRose - 30-9-2018 at 06:28

Have you looked at pulling all the images off of photobucket and other image hosting sources (and anything else hosted off-site for that matter) and pull them down onto a local drive?

I've been browsing a lot of threads checking out where the images are hosted, what the HTTP links link to and other things, and there is A LOT of data that isn't stored locally on the site.

What are your plans for this? Have you figured out a way to do this? If you need help, I've figured out a way to do this, may not be the best, but I think it would work pretty well with a little bit if manual manipulation.

So many of these free hosting sites end up just disappearing over night or they shutdown people accounts for no reason. Maybe saving that data would be a good place to start and if you need someone else to do it, I'd be happy to do it.

Melgar - 30-9-2018 at 09:24

Quote: Originally posted by RogueRose  
Have you looked at pulling all the images off of photobucket and other image hosting sources (and anything else hosted off-site for that matter) and pull them down onto a local drive?

Yeah, the method of downloading images is slightly different for each host. The tricky ones are the ones that give you the bbcode to paste into whatever forum you're using. Photobucket does this. What their code actually does is show a thumbnail inline, and then make that a link that takes you to a web page with the large-size version of that image on it, along with some ads. In browsers, you can right click on images and go to "copy image address". Then, the idea is to compare the two urls. These were the thumbnail and full-sized image, respectively:

http://i454.photobucket.com/albums/qq261/Arkoma_USA/celestro...
http://i454.photobucket.com/albums/qq261/Arkoma_USA/celestro...

Next step is to come up with a "regular expression" or "regex" to match the one that's going to show up in posts, which is the thumbnail. This image came from post #332030. You can get a posts's id by quoting it. When you see the "[rquote=..." tag, the number right after the = sign is the post id. But anyway, the following is a Ruby regular expression that matches photobucket thumbnails:

Code:
/https?:\/\/\w+\.photobucket\.com\/[^\[]+\/th_.+?\.(?:jpg|gif|png|jpeg)/i


I'm not sure if anyone here uses regular expressions much, but they're really useful for finding text that fits a specific pattern. With that regular expression, I can find every instance of a Photobucket thumbnail. You'll see that the full-sized image is almost identical, but without that "th_" prefacing the image name. It's pretty straightforward to just strip out the "th_" and then download the image from the internet. If you try to do it in a browser, it won't work, but apparently it does work using "curl", which is a common tool used for downloading stuff off the internet from a command prompt. Then I'd just save that data as a new attachment, and change all the Photobucket tags to attachment tags. The initial work is kind of annoying, but once there's a system in place, it's pretty easy to apply it to the whole database.

If anyone wants to help in this process, try and look for posts that have formatting glitches and link to them. Like this one, for example:

https://www.sciencemadness.org/whisper/viewthread.php?tid=89...

If you notice, the signature has single quotes and double quotes escaped in it, (preceded by a backslash) which is obviously a glitch. Those backslashes shouldn't be there. I'm going to need a really glitchy sample of posts to test my parsing tools on, so I guess if you have some free time, try and find the posts with the most convoluted mixtures of bbcode and/or HTML that exist in the database, and post them here.

Thanks!

streety - 30-9-2018 at 10:54

Quote: Originally posted by Melgar  
I set up Q&A captcha in phpBB. I initially put in questions like "what element has the symbol 'C'?", but spambots often work by entering the question into Google and parsing the top answer. And Google is really good at answering those types of easy questions. Now I made them questions like "What does the 'C' mean in 'CO2'?"

Right now, I'm coming at this from a reverse-engineering approach, so I'm looking through XMB data, both raw and on this board. I'd like to be able to parse the HTML from when HTML was enabled on the XMB boards. I'm also looking into downloading the images that are hosted on external servers and saving them locally, so we don't lose those pictures if the hosts go down or people let their accounts lapse.

I looked into the issues you posted, and closed all but the one that would actually take a lot of time. That being improving parsing from the XMB post text to phpBB's format. I've also imported signatures and post/thread titles without removing the escape marks from in front of single and double quotes, so we just need to make sure to remember to run those fields through the unescape function when importing data.


There are still problems with some of the closed issues. I don't know if it makes more sense to re-open these or create new ones. I'll post in these and you can decide what to do.

Search seems to be broken now. I think it is the issue I mentioned a while ago about the phpBB implementation being inadequate for large sites.

The redirects I think will take several iterations to complete.

Melgar - 30-9-2018 at 12:59

I've been working on getting Sphinx working with phpBB. I've managed to query it from a shell, and the results it returns are really good ones. They're practically instant, and there are lots of options that can be configured to customize the weighting of different parameters. The way it works, phpBB queries sphinx outside of MySQL to get its results, but for some reason it's just been timing out, even though sphinx's log files are showing it's been getting the search params. This shouldn't be a hard problem to fix, but for some reason it's been eluding me since this morning. In the worst case scenario, I can build sphinx extensions into MySQL so that the indexing was done natively, but that would require compiling MySQL from source, so I'm trying not to have to go that route. But sphinx does work for yielding good search results very quickly, there's just something misconfigured that's preventing the results from getting back to phpBB.

I reopened the search issue, and yeah, you're right, I should have reopened it as soon as I started trying to switch to using sphinx. I closed it when I got MySQL fulltext search working. It did work for most queries, but got hung up on a few, and I didn't think that was good enough to ever go live.

Melgar - 1-10-2018 at 05:46

So, I was trying to think of cool features that I could add to this site that would accelerate people's interest in switching forum software, and came across this:

https://www.glmol.com/

Try clicking and dragging around any one of those four boxes. Pretty cool, no? This might not work with all mobile devices, but should work in most other browsers. We could make it so boxes like that could be embedded in posts, and scrape PubChem to pull in any molecular structure data, given a specific enough chemical name.

There's more information on the open-source components here:

https://web.chemdoodle.com/docs/chemdoodle-json-format/

[Edited on 10/1/18 by Melgar]

fusso - 16-1-2019 at 09:35

When can Polverone return to do the forum transition? What do we, as members, need to do before the transition?

WangleSpong5000 - 2-3-2019 at 07:35

phpBB is pretty solid when it comes to forums. It does seem to shit itself at a certain point however... this may have nothing to do with phpBB itself, i have no idea. The only full stack experience I have is with MERN stack setup... React library/framework for JavaScript, React-Redux to handle state, ExpressJS for coding a server, MongoDB as a non SQL database, PassportJS for username/PW shiz... all booted up using Node js and compatible with Git Bash etc.