Sciencemadness Discussion Board

anti "spam-bot" Bot?

Ubya - 16-12-2021 at 20:35

This board always had some spam posts, but I think many of us remember the mess that was 2019 and the daily floods that the whole community had to manage.

i'm bringing up the issue again simply because i think sending an email to an admin to be manually registered by them is quite an inefficient process.
It worked for nearly 2 years, but since i've been neck deep into automating stuff i guessed i could drop my 2 cents.

I have 2 ideas.

The first idea is to simply write a script that checks for new posts/topics and deletes them if they are spam, and eventually bans the users that wrote them after 3 offenses (or any value really).
This would be the same bot Melgar did at the time.

The second idea is to actually keep the current system, but automating it. New users could send an email request to a specific email, and a script would read, evaluate, create the account and then sending them the details via email.

I personally played a lot with web scraping (i have more data than i need on stuff i shouldn't have) and bots (mostly discord but integrated on a wide range on things like woocommerce, emails, minecraft, image generation and more), so in theory these 2 ideas are already doable.

For old time lurkers creating an account with the current way probably isn't too much of an hassle, and the fact that registering now isn't a 2 minute process tuned down a lot of people that would join just to get their homework done by us, so in a sense we have a lower number of registrations but they are of higher value, but at the same time i don't like make the whole thing harder than it has to be.
By personal experience if i had to register now i'd be quite put off by this process :(

what do you guys think?

oh and a question for staff, what's the ratio of valid users you register vs users you decide aren't worth registering?

Oxy - 16-12-2021 at 21:20

Quote: Originally posted by Ubya  

The first idea is to simply write a script that checks for new posts/topics and deletes them if they are spam, and eventually bans the users that wrote them after 3 offenses (or any value really).
This would be the same bot Melgar did at the time.


In general this is simple solution that will work. There is one problem however - how would you classify the post content as spam? We may use some simple algorithm but probably will not be a robust one. We may use some more sophisticated spam detection techniques but it may require some additional resources which I don't know if we could use as we might need to pay for it.

Quote: Originally posted by Ubya  

The second idea is to actually keep the current system, but automating it. New users could send an email request to a specific email, and a script would read, evaluate, create the account and then sending them the details via email.

Yes, this is simple. But I suspect that human reading is actually a potential spam protection mechanism as one who reads the mail can guess if email address and content look legit.

Tsjerk - 16-12-2021 at 23:01

There is a much, much simpler solution. There has been an update for XMB board 1.9.11, namely 1.9.12. Yes, 11 years after the last update some bugs were fixed and most importantly, reCaptcha was implemented in the registration process.

I've been saying this for over a year now. Polverone doesn't read his messages.

http://www.sciencemadness.org/talk/viewthread.php?tid=18651&...

Ubya - 17-12-2021 at 00:24

Quote: Originally posted by Tsjerk  
There is a much, much simpler solution. There has been an update for XMB board 1.9.11, namely 1.9.12. Yes, 11 years after the last update some bugs were fixed and most importantly, reCaptcha was implemented in the registration process.

I've been saying this for over a year now. Polverone doesn't read his messages.

http://www.sciencemadness.org/talk/viewthread.php?tid=18651&...


Yeah this would solve the issue at its core, even though reCaptcha has been cracked, it is way better than having nothing

Ubya - 17-12-2021 at 00:40

Quote: Originally posted by Oxy  
Quote: Originally posted by Ubya  

The first idea is to simply write a script that checks for new posts/topics and deletes them if they are spam, and eventually bans the users that wrote them after 3 offenses (or any value really).
This would be the same bot Melgar did at the time.


In general this is simple solution that will work. There is one problem however - how would you classify the post content as spam? We may use some simple algorithm but probably will not be a robust one. We may use some more sophisticated spam detection techniques but it may require some additional resources which I don't know if we could use as we might need to pay for it.


Bots create posts using some rules, once you know them you can detect spam posts pretty easily.
At the time for example a lot of spam posts were in russian, nobody here writes in russian, so one of the anti spam check checks would be to see how many russian charachters are used in a post, if it is a high percentage, there's a very high probability it is spam.
You can also look for links, the purpose of some spam bots is to increase the traffic to sketchy websites, if a post has many links towards spammy websites, the post itself is spam (there are huge public blacklists).
Spambots also apply some SEO rules to their posts, like writing multiple times the same set of keywords to increase SEO visibility, keywords we'd rarely use in a chemistry post, so keeping track of them and their percentage is another check.
I wouldn't worry about paying for anything advanced.

Quote: Originally posted by Oxy  

Quote: Originally posted by Ubya  

The second idea is to actually keep the current system, but automating it. New users could send an email request to a specific email, and a script would read, evaluate, create the account and then sending them the details via email.

Yes, this is simple. But I suspect that human reading is actually a potential spam protection mechanism as one who reads the mail can guess if email address and content look legit.


That was something i need to ask the admins, like what's their criteria for detecting a spammy registration request vs a legit one.
If we implement a specific email request format, automating the reading, data extraction and registration becomes as easy as reading a form, if the email is more vague it is still doable but the percentage of errors would be higher (imagine someone asking for a specific password while someone else saying anything is fine, or if a user asks for a username that already exists)

Tsjerk - 17-12-2021 at 05:06

Quote: Originally posted by Ubya  

Yeah this would solve the issue at its core, even though reCaptcha has been cracked, it is way better than having nothing


I'm pretty sure the people who can get a bot past reCaptcha wouldn't spam SM. And besides, reCaptcha is an external service which is under continuous development by Google, so it is updated automatically. I don't know what version is implemented and if that version is still maintained, but I guess it is.

aab18011 - 17-12-2021 at 06:01

If I may add my two cents as well,

I have been messing around with programning for a while and just finished some classes on AI detection methodology.

One can use a bunch of previously detected spam accounts, messages, links (by a human eye of course) as training data for the AI. The AI can be trained and tested until it recognizes spam 99.99% of the time.

And it would be both free and relatively easily. All you need is training data and some free time to get it learning and time to test it out.

Texium - 17-12-2021 at 07:52

All of the email requests I’ve received have looked legit. I haven’t turned any down. I assume it’s just not worth it for spammers to manually send an email to someone just for a chance to spam a forum, or to write a script that will email specific people to trick them into making spam accounts. As impractical as the system is for us and new members, it’s even less practical for spammers, and that’s the whole point.

I handle requests the same way I do with the wiki. I ask what they want their username to be if they didn’t tell me in their first email, then I open registrations temporarily, make the account with a temporary password, close registration, and tell the new user their password, encouraging them to change it when they log in. Funny enough, most accounts that I registered in this way still haven’t posted anything despite going to the trouble to reach out to me.

woelen - 17-12-2021 at 08:00

Requests for getting an account on sciencemadness always are personal emails, and each request differs from others. My reply also is a natural language email, which also is (somewhat) variable. I choose a password, when making an account, and in the email response I give that password, just being part of a sentence. I also ask the people to change their password as soon as they are in.

The entire process of manually creating accounts is a very good measure against spam. Automated bots simply cannot deal with that process, due to variations in free-format text, no free choice of password, variable response time (sometimes just a few hours, sometimes a few days if I am away for a while). The amount of work for me in the process is a appr. 10 minutes per week. I think that I never created a spam account.

If a software update would give us a good mechanism against automated spambots, then I would encourage the use of such a solution, but I do not think it is worth the effort to make your own mechnisms for registration, spam-discovery and spam-removal. Melgar's solution sort of worked, but it was not flawless and sometimes there also were issues with legit posts.

Oxy - 17-12-2021 at 09:23

Quote: Originally posted by Ubya  

Bots create posts using some rules, once you know them you can detect spam posts pretty easily.
At the time for example a lot of spam posts were in russian, nobody here writes in russian, so one of the anti spam check checks would be to see how many russian charachters are used in a post, if it is a high percentage, there's a very high probability it is spam.
You can also look for links, the purpose of some spam bots is to increase the traffic to sketchy websites, if a post has many links towards spammy websites, the post itself is spam (there are huge public blacklists).
Spambots also apply some SEO rules to their posts, like writing multiple times the same set of keywords to increase SEO visibility, keywords we'd rarely use in a chemistry post, so keeping track of them and their percentage is another check.
I wouldn't worry about paying for anything advanced.


I would be not so sure about counting any links as people here also add them, often more than 1 in one post. Multiple time the same keyword, hmmm... elimination, bromine, reaction, stirred, added, filtered and many, many more.
The russian example looks like a simple one and that would actually work until someone will not mix it with latin letters and will go below the threshold. Or someone posting procedure from Russian book, that would be false positive then. The problem will be with spammers using latin alphabet and to be honest - I've seen here only bots which were using it. And here the script will not do it's job as it will quickly fail making a lot of false positives or negatives. Spam will stay on the board and legit messages will be marked as potential spam waiting for mods reaction.
For this purpose I would rather look for a tool that can process natural language and train it against board database. Some spam should be collected also. Then we may train neural network and make it very, very efficient. There are even some research papers about spam detection systems with neural networks.

I have another idea, probably the best and the cheapest probably which works great in one board I was using a lot in past. There was a strict policy that first posts, 10 or 20 have to be accepted by moderator. Before acceptance they were not visible, once the user passes the threshold then he could use the board normally without any overhead due to waiting for a mod. That would require to find more mods or make existing more active but will do the trick. Also I propose to remove all the accounts without any posts which are older than 1 or 2 months. Bots are using this type of accounts, they had almost always 1 post after submitting the spam message. That can be even simply implemented as database job.

If you need any code/db help I would be glad to help. It would be a good idea to make the migration to newer version as Tsjerk proposed.
Recaptcha could be also implemented without migration I suppose.

Also, another idea - we could add a captcha to post form for every user with less than let's say 5 or 10 posts.

Tsjerk - 17-12-2021 at 09:24

I implemented reCaptcha v3 on my employers website, on a request form to be precise. Version 3 works so flawlessly you don't even know it's there anymore. You don't have to check the familiar pictures with the question "which pictures have stoplights" and such.

The software recognizes things like mouse movements and browser settings and determines whether you are a bot or not that way. My employer's website is a WordPress one, so a simple plugin was enough. For SM it would only require an update from 1.9.11 to 1.9.12 and the reCaptcha should work out of the box.

The use is free up to a million times a month.

Apparently there are still a lot of bots randomly trying to get into the registration form, as now and then one comes through. This must happen when the system is temporarily activated by the admins to make someone an account.

[Edited on 17-12-2021 by Tsjerk]

aab18011 - 17-12-2021 at 09:33

Quote: Originally posted by woelen  
Requests for getting an account on sciencemadness always are personal emails, and each request differs from others. My reply also is a natural language email, which also is (somewhat) variable. I choose a password, when making an account, and in the email response I give that password, just being part of a sentence. I also ask the people to change their password as soon as they are in.

The entire process of manually creating accounts is a very good measure against spam. Automated bots simply cannot deal with that process, due to variations in free-format text, no free choice of password, variable response time (sometimes just a few hours, sometimes a few days if I am away for a while). The amount of work for me in the process is a appr. 10 minutes per week. I think that I never created a spam account.

If a software update would give us a good mechanism against automated spambots, then I would encourage the use of such a solution, but I do not think it is worth the effort to make your own mechnisms for registration, spam-discovery and spam-removal. Melgar's solution sort of worked, but it was not flawless and sometimes there also were issues with legit posts.


I'd strongly argue the usage of an AI. The finalized algorithm script is very small, easy to use, and very very fast. If they are trained well enough they can literally pretend to be human. Im not saying it doesnt take some time to set up, but if you guys are interested, it could be of great use. I can provide resources when I get some time. I can even put together maybe a small model to show you what it can possibly do.

Tsjerk - 17-12-2021 at 10:29

Really, everyone, please stop thinking about hacking scripts into the forums code, training AI, migration of the forum and what not.

The 1.9.12 update is backwards compatible, meaning no one will ever notice a thing besides an 11 changing to a 12 somewhere. No one has to write any code, that has already been done by the XMB community. Tested for over a year now.

All that has to happen is Polverone running an update on the backend.

karlos³ - 17-12-2021 at 12:28

Keep in mind "sleeper agent" bots.
We have them experienced next door and even a few more "sophisticated" ones(asking for the admin, the first one was suspicious to me, but by the second one I realized it was just a fucking script to make them ask for that).
And some others where you can obviously realize their are scripted with stolen or bought mail accounts, they register and go offline.... Joe and me have annihilated literally hundreds of them, and they all had a diverse script how to acts.
Some turned to instant spamming, and some registerted and never turned up for several months, and then started spamming.

I still remember the huge battle of that sunday morning in 2019.... everywhere dead bot corpses(oh wait that was only in my mind after looking at the mod log :D).


Tsjerk - 17-12-2021 at 12:34

I saw a couple of them here literally registered that day, one a couple weeks ago and one longer ago :D

karlos³ - 17-12-2021 at 12:36

You were an observer of that huge battle? :o

We had to take turns literally to hack them into pieces.
I hope you did not get traumatized by this :D

Tsjerk - 17-12-2021 at 13:29

No :) I was talking about some bots, just a couple of them, which registered here on SM. They spammed the same day as they registered, not too long ago.

karlos³ - 17-12-2021 at 14:38

Oh yeah that is the typically way they enter next door too.

But the SMF2 boards use a different software of course, and we have 2-3 certain clues that tell us its bots.
Its confidential though :P

They do not have that here (or might, not sure what functions are available to the our staff here).
Thats why we can easily get rid of the sleeper agent bots as well.
Some will slip through as always.
But it is always a learning experience.

But the few years between the different board software makes quite a difference... much harder here to introduce bots it seems, but much harder to realize its bots.
Its the other way around with the SMF board engine's.

[Edited on 17-12-2021 by karlos³]

Ubya - 17-12-2021 at 20:20

Quote: Originally posted by Tsjerk  
Really, everyone, please stop thinking about hacking scripts into the forums code, training AI, migration of the forum and what not.

The 1.9.12 update is backwards compatible, meaning no one will ever notice a thing besides an 11 changing to a 12 somewhere. No one has to write any code, that has already been done by the XMB community. Tested for over a year now.

All that has to happen is Polverone running an update on the backend.


That's without no doubt the best thing to do, my idea was more of a something to implement in case this drags for longer

j_sum1 - 17-12-2021 at 21:45

I have not noticed any spam for a long time. And trolls seem to be pretty much absent too. Ithough it seem s like a cumbersome system (and I had my doubts at first), it seems to be working really well.

This contrasts significantly with 2019 when I would delete 120 spam messages before breakfast each morning and remove another 50-60 a couple of times later in the day. Glad to not be still doing that.
J.

karlos³ - 18-12-2021 at 10:33

Quote: Originally posted by j_sum1  
I have not noticed any spam for a long time. And trolls seem to be pretty much absent too. Ithough it seem s like a cumbersome system (and I had my doubts at first), it seems to be working really well.

This contrasts significantly with 2019 when I would delete 120 spam messages before breakfast each morning and remove another 50-60 a couple of times later in the day. Glad to not be still doing that.
J.

Although this has not neccessarily to do with any changes!
We had such a huge wave in 2019 too, in early 2019, and the next two years barely had a percent of that amount of spambots daily, some weeks even not a single one at all.

I don't know, maybe some botnet got taken down that year, who knows?
We just reacted, that was enough work already...
Why that happened, we have only theories.
But fact is, we had never a single spambot in the three years earlier.

Our registration is also still open, opposed to SM.
So it has nothing to with the closed registration, I suppose(except that you do not have the three spambots a week we might get at worst now, and not our trolls, although they do not find much ground next door in any case :D).

My own suspicion it was a botnet, now that I heard it was that bad here too!
I never knew, never would have expected this being that worse here too :o
You guys know I reported immediately whenever I noticed any spam post, but I guess I just saw a fraction of what you have gotten here too.
Wow.

Does that make more sense?
We should maybe exchange a little bit better in the future, because that time span(worst in the mid 2019's? almost over at the end of the year?), this seemed like something that has attacked us both with the same focus?
No idea.
But it looks like this.

Me or maybe Joe will contact one of the SM staff if we experience that again, it might really was concerted? (on specific words in chemistry boards maybe, who knows)
And who knows this either?
That number is surprisingly close to what we experienced.

Something else, the bots mostly had(despite us not needing a real email) used all google or yandex.ru mail accounts.
So its likely those were stolen, and that was definitely a concerted action.

Have you guys noticed something similar with bot flood back then?
I would assume so...

What about the theory with a bot net being taken down?
Could have happened.
It almost came to a halt and we were always open like a fucking barn :D

Tsjerk - 18-12-2021 at 10:48

I'm pretty sure that when the registration would be opened again the boys would be back in an hour here. They already get through once every three months or so with just the admins opening it for ten minutes to register someone.

I think it is well known XMB 1.9.11 is vulnerable, a quick Google for the term "powered by XMB 1.9.11" (use the quotes) gives you a list of boards using the software, so these boards are targeted.

You don't need a botnet to terrorize a board, a single laptop is sufficient, you only need a lot of email addresses. But there are enough shady parties giving away email addresses for free without too much verification whether you are a human or not.

I could probably whip up some scripts in a day, maybe two, which would spam the shit out of random XMB boards. I would use Robot Framework :D lovely tool for test automation with a nice Selenium Library available. Include some Python email library so you don't have to worry about checking the verification emails via the GUI and you are good to go.

Covering the blockage of your IP address is also easy, just run your bot via TOR, normal websites are available via TOR, but the requests would come from different IPs every time.

[Edited on 18-12-2021 by Tsjerk]

karlos³ - 18-12-2021 at 11:22

Luckily neither our troll(s) or bots were that smart... they always, both, did that(even the trolls we knew of that they at least read that we don't log IP's did really use that... buncha morons :D).

Texium - 18-12-2021 at 17:53

Quote: Originally posted by karlos³  
Something else, the bots mostly had(despite us not needing a real email) used all google or yandex.ru mail accounts.
So its likely those were stolen, and that was definitely a concerted action.

Have you guys noticed something similar with bot flood back then?
I would assume so...
Yes, we had noticed that many of the bots were using yandex email addresses. If I’m not mistaken, we banned that domain from registering, and that helped with stemming the flow of new spammers somewhat. It wasn’t enough though.

karlos³ - 19-12-2021 at 09:15

Quote: Originally posted by Texium  
Quote: Originally posted by karlos³  
Something else, the bots mostly had(despite us not needing a real email) used all google or yandex.ru mail accounts.
So its likely those were stolen, and that was definitely a concerted action.

Have you guys noticed something similar with bot flood back then?
I would assume so...
Yes, we had noticed that many of the bots were using yandex email addresses. If I’m not mistaken, we banned that domain from registering, and that helped with stemming the flow of new spammers somewhat. It wasn’t enough though.

I'd say in total the yandex bots made out like 20-30%, but the google mails were "almost"(not sure if not all, were there even others? I can't remember a single one, I would have noticed though) the total remainder of them.

None of them had ever used a fictional mail though.
Only the one or other smarter troll, but smart people do not keep on trolling for months or years.

I suddenly coughed and it almost sounded like "feedeechemist", whatever that word means :P