Sciencemadness Discussion Board » Non-chemistry » Forum Matters » Tired of reporting spam Select A Forum Fundamentals   » Chemistry in General   » Organic Chemistry   » Reagents and Apparatus Acquisition   » Beginnings   » Responsible Practices   » Miscellaneous   » The Wiki Special topics   » Technochemistry   » Energetic Materials   » Biochemistry   » Radiochemistry   » Computational Models and Techniques   » Prepublication Non-chemistry   » Forum Matters   » Legal and Societal Issues   » Detritus   » Test Forum

Pages:  1  ..  19    21    23  ..  28
Author: Subject: Tired of reporting spam
Marklet
Harmless

Posts: 1
Registered: 24-9-2018
Member Is Offline

Mood: WGTR

 Quote: Originally posted by WGTR Huh. I didn't know you could add folders like that, but I figured it out. Thanks! It looks like that's exactly what I needed.

I also just forever retired the username Marklet. May the spammy bot that kept registering this username forever rest uncomfortably in the confines of a diaper pail.

WGTR
Texium (zts16)

Posts: 3083
Registered: 11-1-2014
Location: San Marcos, TX
Member Is Offline

Nice! At first I thought the spammer had re-registered and ironically posted to this thread

Melgar
Anti-Spam Agent

Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

It's not coded messages, it's a formula that changes the words it uses to make it harder to block certain phrases and keywords.

These boards don't seem to have had much in the way of human activity lately, but there are a few decisions we have to make that it'd be nice to have some input on. First off, the bbcode is mostly the same between the two systems, but there are some minor differences. Like the tags for attachments and images are slightly different. It would be easier to just automatically replace all the XMB bbcode with the corresponding phpBB bbcode during the transfer, rather than try to redo phpBB's logic. It really shouldn't be a problem though, since it's about 90% the same.

The other thing is that I'm not sure whether to bother with trying to import polls. The way poll data is stored isn't very intuitive, and it's not a feature we even use that much. What I can do for old poll threads though, is just pull in the results and have them show as text in any threads that had polls in them. Polls would be available in phpBB going forward though, obviously.

One major thing I haven't been able to work on is transferring the private messages. This is because they've all been erased in the test database I'm using. I can't imagine they'll be hard to transfer over, I just haven't had a chance to test it.

If anyone wants to help figure out phpBB styles and extensions, that would help a lot. And at some point I'm going to have to coordinate with Polverone, and I'm assuming that would be on his terms. I'd also like to

In the meantime, are there any features you guys think we should have? I found some software that allows MediaWiki to use phpBB's authentication, so we can go to the wiki and edit it without having a separate login. That'd be nice to have.

The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
j_sum1

Posts: 5022
Registered: 4-10-2014
Location: Oz
Member Is Offline

Mood: Metastable, and that's good enough.

I'd like an opportunity to back up my private messages before any migration. Actually, I'd like an opportunity to back them up anyway. I haven't spotted a way of doing that.
andy1988
Hazard to Self

Posts: 96
Registered: 11-2-2018
Location: NW Americus ([i]in re[/i] Amerigo Vespucci)
Member Is Offline

Mood: No Mood

 Quote: Originally posted by Melgar XMB hasn't been actively developed since 2009, and I'm pretty sure we had a consensus a long time ago that we're going to have to transition to new software if we ever want to address this problem. Correct me if I'm wrong.

I think xmb will work fine with changes. But not up to me... I'm a "lazy" software developer, I hate to do things that are already "solved" or good enough. Vulnerabilities would be a credible reason to change the platform IMO (e.g. a former workplace had ISIS photographs/crap put on the homepage... that was interesting). Hopefully this website's data is backed up somewhere just-in-case.

My suggestion, as someone with "trusted" credentials can do, is set up your own "bot" to report the other bots... Use either Selenium or Sikuli:

1. Selenium: If on linux, use either the chromium-driver or firefoxdriver packages, and the python-selenium package. Selenium automates browsers, and uses their APIs to parse things.
2. Sikuli: An alternate approach is to do everything visually (no browser based API), using software like Sikuli. Sikuli uses Java... it is cross-platform.

I've used Sikuli a lot before... I can write a spam report script for it if you ask. But Sikuli has to use optical character recognition to understand text (error prone), while Selenium handles text perfectly. Less effort though to just update this forum's software with one of Streety's solutions though...

Either can be run on linux in its own X11 server 24/7, so as not to clutter up your main desktop... but computer would have to be on 24/7, or run it on the "cloud" at some interval...

Using either approach... every ~5-10 minutes... refresh 'Today's Posts', parse new posts, and report new users with 2+ posts on the same day containing links. Give it both a "trusted" user and admin/mod's credentials (script could log in/out of each) and you'd wipe out classified spam.

 Quote: Originally posted by WGTR Now, since I have a few dozen sent messages that I'd like to keep, I have to go into my outbox and manually look through all 5,600 of these spam reports so that I don't accidentally delete something important. Sigh.

Similarly a script could be made to do this for you, parsing each u2u message. i.e. "IF spam report, THEN delete." If you attach a screenshot of one of these spam reports or better forward a bunch to me u2u EDIT: Nevermind! I have them in my outbox too I'll attach it as a zip. You can watch it as it moves the mouse.

Sikuli is pretty great (but text parsing is imperfect, hard to get right).

EDIT2: I've attached a Sikuli script you can use to delete all those spam reports in your u2u outbox/inbox. Should go in ~/.Sikulix/scripts I'm using v1.1.0, but it should work for v1.1.4 too. If you have no status bar on your browser (chrome), it may skip over the last item every screen. If you have different fonts or screen size than me you may have to change the images/regions.

[Edited on 25-9-2018 by andy1988]
Melgar
Anti-Spam Agent

Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

 Quote: Originally posted by j_sum1 I'd like an opportunity to back up my private messages before any migration. Actually, I'd like an opportunity to back them up anyway. I haven't spotted a way of doing that.

First of all, what format would you want your private messages in? CSV? HTML with minimal formatting? Plain text?

During any migration, there would be a period of time where both sites would exist simultaneously, one running phpBB and one running XMB. During that time period, we can all have the chance to audit the new site and make sure everything looks right. And we'd keep a backup of the server before taking it down, for certain.

Also, since phpBB has a separate system of reporting spam that doesn't use U2U messages, I should probably have my back-end scripts clear out all the spam reports that are in people's inboxes and outboxes.

As for automated GUI-based scripting software, my go-to has always been AutoHotkey. It's free and open-source, and works quite beautifully with Windows. I've even written scripts with it to grind my skills in MMORPG games back when I used to play them. Still, those are all half-measures and can only do so much to stem the rising tide of spam.

Right now, I set the maximum attachment size per file to 8MB, to match that here. For larger files, I can set up an FTP server that uses our forum passwords, and probably make it so an admin has to enable FTP access for users.

I've been trying to think of custom bbcodes that might be useful for chemistry too. Maybe something that can automatically work out the molar mass of a substance via its molecular formula, or by scraping the data from PubChem and storing it locally if it doesn't recognize it? Maybe link to the Wiki page if one exists, and have that also show common abbreviations for chemicals that can be used to reference their properties on the forums? This is all backend SQL stuff, which is my forté, so it'd actually be easier for me to do than creating a style or writing PHP code.

The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
unionised
International Hazard

Posts: 4184
Registered: 1-11-2003
Location: UK
Member Is Offline

Mood: No Mood

If there's anything I can to to help get rid of the spammers, let me know.
streety
Hazard to Others

Posts: 110
Registered: 14-5-2018
Member Is Offline

 Quote: Originally posted by symboom Anyone know if they are using AI (artificial intelligence) yet Got to be ahead of the game. Divergent and convergent thinking Open message To streety This guy lesterpq11 look at for pattern see key words to outwit the AI AI seems to try to imitate humans where ever it gets the info from [Edited on 22-9-2018 by symboom]

A well designed AI model could do far better than what we are seeing. The approach being taken is definitely quantity over quality.

If the goal was for posts to stay on a site and someone was involved who knew what they were doing there have been some groundbreaking advancements creating fakes almost indistinguishable from the real thing. If you are interested in this topic take a look at generative adversarial networks. This would be just the start.

I'll take a look at lesterpq11.

 Quote: Originally posted by RogueRose This may have been asked and stated but I haven't seen it. How many reports of spam are needed for all the users posts to be deleted? What I want to know is if I see 10 posts by a user, how many need to be reported for the program to sweep up all their posts?

This is the most complete description I know.

 Quote: Originally posted by Polverone I have tweaked the reporting process, somewhat inspired by violet sin. Before spam had to be reported by two trusted members or one moderator before it was auto-deleted. Now it will be auto-deleted if it is reported by one trusted member and one 'semi-trusted' member. Everyone who has been registered at least 60 days and has at least 50 posts automatically becomes a member of the 'semi-trusted' group. This should reduce the waiting time for spam deletion, and means that any frequent user of the site can help stamp out spam.

The recommendation seems to be to report the thread with the most views already.

 Quote: Originally posted by CuReUS My own suggestions- 1.Most spammers have a link in their message,so we could detect posts with links and block them(for 1st post only) 2.Do not allow newly registered members to post more than 1 message or in more than 1 sub forum. 3.Block usernames or posts with non english alphabets someone had posted an amazing idea is this thread,but I can't seem to find that post now.The idea was to run usernames through a password strength checker.Since bots use long alphanumerical strings,they would indirectly make very strong passwords,which could be detected and blocked. We must do something fast,or pretty soon we would have to build another arc to escape this flood [Edited on 23-9-2018 by CuReUS]

Currently the limiting issue is getting involvement from Polverone. The work-around is running a script off the server to clean up any spam soon after it's posted.

I've written a script that woelen is now able to run and the percentage of non-english characters is an important feature in the machine learning model for spam prediction.

The password strength metric on username idea is not one I've seen before and sounds interesting. I'll give that a try.

 Quote: Originally posted by RogueRose I have a feeling that there is more going on than just spam. I have a feeling that the posts might be a way to pass messages to others with no record of them. The posts are up for a short time and then the "system" erases them. While they are up they are grabbed. I would suspect that the spam bots wouldn't continue to post here if it wasn't getting some kind of return. It wouldn't post with such furry unless there was a benefit being had and I don't think it is members buying access to adult sites or ED pills. This could be a serious issue that really needs taken care of and is wreckless allowing it to continue.

This is an interesting idea I had not considered. I agree with others in thinking it is unlikely but I agree with you in thinking it would be virtually impossible to detect. Ultimately though our situation is the same, the forum receives posts we don't want.

I think Melgar touched on this but I want to discuss your comment on "members buying access to adult sites or ED pills". This would be the primary effect but I don't think this is where the money is. Instead it is more likely to be in the secondary or tertiary effects.

1. People see the spam on this forum and click through to buy the thing/have their computer highjacked by viruses etc
2. Across the thousands of forums these spam programs are run against someone sees the spam and clicks through to buy the thing/have their computer highjacked by viruses etc
3. People keep the idea alive that money can be made by spamming forums so they can sell forum spamming software to the people they are able to deceive.

Quote: Originally posted by andy1988
 Quote: Originally posted by Melgar XMB hasn't been actively developed since 2009, and I'm pretty sure we had a consensus a long time ago that we're going to have to transition to new software if we ever want to address this problem. Correct me if I'm wrong.

I think xmb will work fine with changes. But not up to me... I'm a "lazy" software developer, I hate to do things that are already "solved" or good enough. Vulnerabilities would be a credible reason to change the platform IMO (e.g. a former workplace had ISIS photographs/crap put on the homepage... that was interesting). Hopefully this website's data is backed up somewhere just-in-case.

My suggestion, as someone with "trusted" credentials can do, is set up your own "bot" to report the other bots... Use either Selenium or Sikuli:

1. Selenium: If on linux, use either the chromium-driver or firefoxdriver packages, and the python-selenium package. Selenium automates browsers, and uses their APIs to parse things.
2. Sikuli: An alternate approach is to do everything visually (no browser based API), using software like Sikuli. Sikuli uses Java... it is cross-platform.

I've used Sikuli a lot before... I can write a spam report script for it if you ask. But Sikuli has to use optical character recognition to understand text (error prone), while Selenium handles text perfectly. Less effort though to just update this forum's software with one of Streety's solutions though...

Either can be run on linux in its own X11 server 24/7, so as not to clutter up your main desktop... but computer would have to be on 24/7, or run it on the "cloud" at some interval...

Using either approach... every ~5-10 minutes... refresh 'Today's Posts', parse new posts, and report new users with 2+ posts on the same day containing links. Give it both a "trusted" user and admin/mod's credentials (script could log in/out of each) and you'd wipe out classified spam.

[Edited on 25-9-2018 by andy1988]

The script has been written, woelen has run it to test, and it just needs to be set on automatic.

Switching to phpBB isn't going to be an end to the spam. It will make the site easier to maintain in future but other steps will be needed to prevent spam beyond the default. It is also a longer term change. We have no idea how long it will be before Polverone has time to work on the site.

Currently we have people with the power to make changes but no time/interest and people with time/interest but no power.
Melgar
Anti-Spam Agent

Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

 Quote: Originally posted by streety Currently we have people with the power to make changes but no time/interest and people with time/interest but no power.

Yes, I do believe this gets at the heart of the problem.

Have you looked at my phpBB implementation recently? I've made some decent progress lately. If you register there, I can make you an admin, or if you send me a public key, I can add you to authorized_keys. Spam can definitely be reduced from a flood to a drip by switching forum software, not to mention, Polverone and others have been calling for it for years now.

Rogue pics removed.

[Edited on 26-9-2018 by j_sum1]

The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
j_sum1

Posts: 5022
Registered: 4-10-2014
Location: Oz
Member Is Offline

Mood: Metastable, and that's good enough.

Heh. Nice pics, Melgar.
I think they migrated from another thread.
I'll remove them for you.
streety
Hazard to Others

Posts: 110
Registered: 14-5-2018
Member Is Offline

I've not taken a good look at the progress you've made yet. I was out of town over the weekend, actually up in NYC, and just now getting back on top of things. I'll take a good look this evening, sign up etc.

Are you sure about the spam? I would have thought that the default phpBB install would be more susceptible to spam. I thought it would be the extensions that exist due to its ongoing development and community and the ease we can add additional countermeasures that would make the difference.
streety
Hazard to Others

Posts: 110
Registered: 14-5-2018
Member Is Offline

I found all the posts created by the account lesterpq11 and ran them through the spam classification model. There were a few close to 0.6 that I'm not too happy about but nothing too challenging. The model will eventually need to be updated as the spam changes but for the moment it should be fine.

 Code:  Topic |Post date |Spam score ----------------------------------------+------------------+---------- Matured site |2018-09-21 21:25 |0.947 Delivered adult galleries |2018-09-21 21:25 |0.733 Sexual pictures |2018-09-21 21:24 |0.932 Communal pictures |2018-09-21 21:24 |0.867 My unfamiliar website |2018-09-21 21:23 |0.783 Matured purlieus |2018-09-21 21:23 |0.920 Pictures from community networks |2018-09-21 21:22 |0.845 Adult galleries |2018-09-21 21:21 |0.911 Grown up galleries |2018-09-21 21:21 |0.663 Social pictures |2018-09-21 21:20 |0.760 Loose galleries |2018-09-21 21:20 |0.898 Matured site |2018-09-21 21:19 |0.688 Pictures from collective networks |2018-09-21 21:18 |0.840 Communal pictures |2018-09-21 21:18 |0.688 

I ran all the usernames through this password strength checker and plotted the histograms for spam accounts, and 0, 1-24, and 25+ post accounts.

There is a lot of overlap so I doubt much could be done with this. It's possible greater separation could be achieved by limiting the types of permutations considered.
fusso
International Hazard

Posts: 1747
Registered: 23-6-2017
Location: Nowhere
Member Is Offline

Mood:

What pic did Melgar post?

j_sum1

Posts: 5022
Registered: 4-10-2014
Location: Oz
Member Is Offline

Mood: Metastable, and that's good enough.

 Quote: Originally posted by fusso What pic did Melgar post?

Not that it matters much:

How the pictures got appended to another post, I don't know. But it has happened before.
CuReUS
International Hazard

Posts: 925
Registered: 9-9-2014
Member Is Offline

Mood: No Mood

 Quote: Originally posted by streety There is a lot of overlap so I doubt much could be done with this
Maybe we could tell future members(while they are signing up) to not use complicated usernames and instead keep them short and simple ?
streety
Hazard to Others

Posts: 110
Registered: 14-5-2018
Member Is Offline

The password strength is calculated using the zxcvbn package. A very thorough description is at https://blogs.dropbox.com/tech/2012/04/zxcvbn-realistic-pass...

Asking new members to keep their usernames short and simple might not be well received. Many people have a username they re-use in multiple places and identify strongly with.
Melgar
Anti-Spam Agent

Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

 Quote: Originally posted by streety I've not taken a good look at the progress you've made yet. I was out of town over the weekend, actually up in NYC, and just now getting back on top of things. I'll take a good look this evening, sign up etc. Are you sure about the spam? I would have thought that the default phpBB install would be more susceptible to spam. I thought it would be the extensions that exist due to its ongoing development and community and the ease we can add additional countermeasures that would make the difference.

I'm using the most recent version of phpBB now. Actually, a new version came out just last week. Once I figured out the database structure, having the better documentation for the newer versions ended up being more useful to me for figuring out how to make everything work. The main reason that XMB is so susceptible to spam is that there hasn't been a new version released in ~5 years or so, whereas phpBB has at least kept up with the pace of spambot evolution.

Anyway, I noticed you registered at the test site I set up, so I made you an admin. Feel free to poke around. Here's the test site again, for anyone curious:

http://35.185.63.230/talk/index.php

The style was chosen more or less randomly, and can easily be changed at any time. I've been focused a lot more on making sure that content is preserved.

And yeah, I'm not sure why those pictures were duplicated from my previous post. Just another weird glitch with this software, I guess.

[Edited on 9/28/18 by Melgar]

The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
fusso
International Hazard

Posts: 1747
Registered: 23-6-2017
Location: Nowhere
Member Is Offline

Mood:

fusso
International Hazard

Posts: 1747
Registered: 23-6-2017
Location: Nowhere
Member Is Offline

Mood:

Why we have fricking lots of spam today???

Why we have fricking lots of spam today???

phlogiston
International Hazard

Posts: 1306
Registered: 26-4-2008
Location: Neon Thorium Erbium Lanthanum Neodymium Sulphur
Member Is Offline

Mood: pyrophoric

Someone here said it before, I forgot who, but it is interesting that most spam posts contain incomprehensible crap, just strings of characters that do not make up words in any language.
The effort put into posting them suggests they serve some kind of purpose.
Apparently they contain information that is useful to someone, and our forum provides a means of transmitting it from A to B. Even if the posts exist only briefly before being deleted.

Regardless, it is becoming extremely annoying.
I find myself just giving up sometimes. 9 out of 10 times when I actually bother to log in, I do so only to report spam, rather than to enjoy discussing mad science with like-minded souls.

-----
"If a rocket goes up, who cares where it comes down, that's not my concern said Wernher von Braun" - Tom Lehrer
Assured Fish
National Hazard

Posts: 318
Registered: 31-8-2015
Location: Noo Z Land
Member Is Offline

Mood: Misanthropic

I dont think anyone is manually posting that spam, or creating the accounts, i suspect its all automated and the SM system is just rather easy to do that with.

I think we need to set up a bot verification system upon logging in.
At the very least something when creating an account, i think just about every site on the net has one nowadays, we are very very very behind the times.
Melgar
Anti-Spam Agent

Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

 Quote: Originally posted by phlogiston The effort put into posting them suggests they serve some kind of purpose. Apparently they contain information that is useful to someone, and our forum provides a means of transmitting it from A to B. Even if the posts exist only briefly before being deleted.

The effort at making theses posts is zero. Or at least, it's so little that it's basically nothing. It's the exact same amount of effort as sending a spam email. A lot of these programs are poorly-written by programmers in India and China and Russia.

Have you ever seen the movie "A Beautiful Mind"? Where he believes that there are hidden messages in the newspapers, and he has to decode them in order to stop some communist plot? Down that road, madness lies. Literally. And not the good kind of madness.

The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
j_sum1

Posts: 5022
Registered: 4-10-2014
Location: Oz
Member Is Offline

Mood: Metastable, and that's good enough.

I have just done a bit of a clean-up. I noticed it building before I went to bed last night but didn't have time to attend to it. My apologies.
Anyway, 118 spam posts gone. More than our reporting system was able to handle. (I don't know why some spam posts persist even after being reported.)
I am now about to check my inbox to see if theer is anything in there other than spam reports.

Thanks for sticking in there. I am really looking forward to a better solution than this.
Melgar
Anti-Spam Agent

Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

Hey, I just had an idea. You know the saying "use a thief to catch a thief"? Well, how about using a bot to catch bots? If someone gave me moderator privileges, I could set up my old laptop with a macro on it that'd refresh the boards like every ten seconds and delete all the posts from obvious bots.

Or Polverone could implement that fix for the user registration page. That would work better, I imagine, but if that's not happening, an auto-moderator could help with this. I could also change the rules it uses to identify bots, which would then mean that some of the ideas that have been floated could be put to the test.

Edit: I'm a bit worried that I'll completely fix the phpBB migration issues that are left, only to have Polverone be AWOL with no way of contacting him. I'm not sure to what extent I'd be trusted with forum data when we're ready to switch. I would say that I'm really terrible at being dishonest so I think you can trust me just fine. And I'm much more mature professionally than socially. I know it's weird to ask about these things, but there are certain systems that I can't test with the database that I have. Also, I quite like this site, and want it to continue to operate for as long as possible.

[Edited on 10/7/18 by Melgar]

The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
Elrik
Hazard to Self

Posts: 52
Registered: 1-9-2018
Member Is Offline

 Quote: Originally posted by phlogiston ...Apparently they contain information that is useful to someone, and our forum provides a means of transmitting it from A to B. Even if the posts exist only briefly before being deleted...
Thats exactly right.
You'll notice the bot spam has urls in it. One way google calculates ranking in search results is how often it sees urls being posted on websites. The bot designer spams urls that make him money somehow, be it ads that show on those pages, scams to get peoples money, etc and when google indexes this site those urls go up in the google search rankings. This is a very old tactic.
Its naustalgic seeing a forum use software from the turn of the century, but human verification in registration should really be implemented