Sciencemadness Discussion Board
Not logged in [Login ]
Go To Bottom

Printable Version  
 Pages:  1  ..  17    19    21  ..  28
Author: Subject: Tired of reporting spam
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 7-8-2018 at 18:36


It's difficult to get a true impression of the number and frequency of spam posts from occasional visits to the forum so I decided to collect some data. I wrote a simple script that downloaded the "Today's Posts" page every 5 minutes and recorded any new topics. If any topics it had seen previously were missing it checked whether they were deleted and recorded when that happened.

The results were quite interesting so I thought I would share what I found.

I stopped the script ~12 hours ago and at that point it had been running for 6 days, 2 hours. During that time I recorded 645 spam topics for a rate of 106/day.

The histogram below shows the time it took to delete each spam topic.

spam_duration.png - 53kB

The minimum was barely above 5 minutes so I probably missed some topics that were deleted so quickly they were gone before my script downloaded the page. :o

The average was 84 minutes. 32% were deleted within 30 minutes, 47% within 60 minutes, and 79% within 120 minutes. The median was 65 minutes.

time_of_day.png - 139kB

Messages posted around 6-7am board time seemed to take the longest time to be removed.

I've stopped the script for the moment but can restart it if there is continued interest in this type of analysis.
View user's profile View All Posts By User
diddi
National Hazard
****




Posts: 723
Registered: 23-9-2014
Location: Victoria, Australia
Member Is Offline

Mood: Fluorescent

[*] posted on 7-8-2018 at 19:14


excellent info streety



Beginning construction of periodic table display
View user's profile View All Posts By User
j_sum1
Administrator
********




Posts: 6219
Registered: 4-10-2014
Location: Unmoved
Member Is Offline

Mood: Organised

[*] posted on 7-8-2018 at 22:09


Thanks for doing this streety. We have not had good information on numbers until now. I think the volume is pretty consistent with what I have observed.
What I am most interested in is the time of day that spam appears. If there is a strong pattern then it is useful to know the heaviest times. It is worth my effort to do a targeted clean up if I know that a flood is coming.

To clarify... The time of day on that graph is the time that the spam appears and not the time it is deleted???
Help me translate the times on your graph to my local time zone.


Edit
There are some interesting diagonal stripes on that graph.
Looks like a spam bot was posting at regular five minute intervals and then all posts being deleted at once.

[Edited on 8-8-2018 by j_sum1]
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 8-8-2018 at 03:32


The x axis is the time posted and uses the default for the board, i.e. GMT-8. I think Sydney would be 18 hours ahead.

This is adjusted for time deleted.
time_of_day_deleted.png - 139kB

This is color coded for the day. It does seem like the uptick around 6-10am occurs on multiple days. Whether that pattern will continue or not is an open question.
time_of_day_posted_by_day.png - 175kB

[edit]
I've just looked at the users and there is a surprise there as well.

There were 216 members linked to spam posts.

I assumed they were registering and then immediately posting spam. Most accounts did follow that pattern but 15 accounts were registered days before they started posting. For these 15 accounts the median delay was 8 days. There were two accounts around 40-50 days and one account that was registered 71 days before it started posting.
[/edit]

[Edited on 8-8-2018 by streety]
View user's profile View All Posts By User
WGTR
National Hazard
****




Posts: 971
Registered: 29-9-2013
Location: Online
Member Is Offline

Mood: Outline

[*] posted on 8-8-2018 at 04:37


That means that some of those accounts are old enough that they are no longer "new", and require a manual delete of the spam. That explains why some posts tend to linger after being reported.

I've noticed something before, where a bot will register and make an innocuous post like "Hi, nice post!" Or something like that, and then wait some weeks before posting spam, presumably to get around a spam deletion script.

Do you see a pattern on the types of posts/accounts that register and then wait several weeks to post, or is it pretty much the same as those that post immediately?




View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 8-8-2018 at 05:28


Nothing jumps out at me. It seems like a mix of english characters and cyrillic. The number of posts is similar per user at ~3. The topics seem varied.
View user's profile View All Posts By User
WGTR
National Hazard
****




Posts: 971
Registered: 29-9-2013
Location: Online
Member Is Offline

Mood: Outline

[*] posted on 8-8-2018 at 16:44


I'm afraid that spam_laurie just had a very short and meaningless existence on sciencemadness. Her life snuffed out in the flower of her youth, I wish I could say that I miss her, but I truly don't.



View user's profile View All Posts By User
Abromination
Hazard to Others
***




Posts: 432
Registered: 10-7-2018
Location: Alaska
Member Is Offline

Mood: 1,4 tar

[*] posted on 10-8-2018 at 07:16


Maybe im simply not a good observer, but it appears that in the last few days that wheb someone spams the spam appears in all subforums (except detrius, ironically).
Everyone knows its a bot, that's obvious (Xrumertest maybe) but I had never noticed them doing this before. They would usually just post in genchem and occasionally some of the others.




List of materials made by ScienceMadness.org users:
https://docs.google.com/spreadsheets/d/1nmJ8uq-h4IkXPxD5svnT...
--------------------------------
Elements Collected: H, Li, B, C, N, O, Mg, Al, Si, P, S, Fe, Ni, Cu, Zn, Ag, I, Au, Pb, Bi, Am
Last Acquired: B
Next: Na
--------------
View user's profile View All Posts By User
WGTR
National Hazard
****




Posts: 971
Registered: 29-9-2013
Location: Online
Member Is Offline

Mood: Outline

[*] posted on 10-8-2018 at 11:10


So perhaps if a new account posts to all sub forums in a short time, the spam deletion script should nuke the account? That's one idea. There are quite a few spam accounts that only post once or twice, but I've seen some like you say.

I've corresponded with woelen and j_sum1, and a possible solution is in the works. Everybody please hang in there til then and keep reporting spammers as soon as they pop their slimy little heads into the daylight.




View user's profile View All Posts By User
Texium
Administrator
********




Posts: 4508
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline

Mood: PhD candidate!

[*] posted on 10-8-2018 at 11:15


Right now our solutions are quite patchy. Ultimately we will need to migrate to newer software. That's inevitable, as we can't just keep patching up this old ship forever. But, until we do that, we're just going to have to keep bailing her out as best as we can. Thanks for your patience, everyone.

And streety, thank you for presenting those analytics, that was very interesting. I can confirm that spammers sometimes get killed within 5 minutes because there are times when a new one pops up as I'm refreshing Today's Posts and I hit it within the minute it was posted!




Come check out the Official Sciencemadness Wiki
They're not really active right now, but here's my YouTube channel and my blog.
View user's profile Visit user's homepage View All Posts By User
Abromination
Hazard to Others
***




Posts: 432
Registered: 10-7-2018
Location: Alaska
Member Is Offline

Mood: 1,4 tar

[*] posted on 18-8-2018 at 16:36


Im still trying to figure out the aim of this PHD troll. Why would he bug us? Im sure he would have better things to do than constantly come back and interfere with the calm and order of a hugely diverse international forum of chemists.



List of materials made by ScienceMadness.org users:
https://docs.google.com/spreadsheets/d/1nmJ8uq-h4IkXPxD5svnT...
--------------------------------
Elements Collected: H, Li, B, C, N, O, Mg, Al, Si, P, S, Fe, Ni, Cu, Zn, Ag, I, Au, Pb, Bi, Am
Last Acquired: B
Next: Na
--------------
View user's profile View All Posts By User
Abromination
Hazard to Others
***




Posts: 432
Registered: 10-7-2018
Location: Alaska
Member Is Offline

Mood: 1,4 tar

[*] posted on 18-8-2018 at 16:55


Maybe you wouldn't be rejected if you didn't behave like you do.
I almost pity you. You should be concerned if a high school student is telling you that.




List of materials made by ScienceMadness.org users:
https://docs.google.com/spreadsheets/d/1nmJ8uq-h4IkXPxD5svnT...
--------------------------------
Elements Collected: H, Li, B, C, N, O, Mg, Al, Si, P, S, Fe, Ni, Cu, Zn, Ag, I, Au, Pb, Bi, Am
Last Acquired: B
Next: Na
--------------
View user's profile View All Posts By User
j_sum1
Administrator
********




Posts: 6219
Registered: 4-10-2014
Location: Unmoved
Member Is Offline

Mood: Organised

[*] posted on 18-8-2018 at 17:04


You err in thinking there is some logic behind it.
There is no reason. The guy is not reasonable. That's more or less the point.

There are some indicators that our troll is also a spammer. I think he gets his jollies messing with us when he gets bored of his meaningless existence pushing spam for pennies.

On the spam front I get the impression that we are staying ahead of things at the moment. Thanks for everyone's vigilance in reporting. I think the spam is getting obliterated pretty quickly. I would love to see some stats if you have them, streety.

As far as a long term solution goes, we basically won't get much headway without a software tweak or migrating to a new system. Both of these require approval and input from Polverone; who has been frequently absent of late. This means glacier-slow progress unfortunately.

We have had a bit of a discussion on a workaround involving passwords on the forums. There are numerous problems to this idea - principally that a password would be required to view as well as to post. I think it would also blind search engines. Which would mean no spam but probably few new members. It would place a obstacle for infrequent old members. And if Google can't find the board (which is what I suspect) then goodbye to a lot of the board's usefulness.
View user's profile View All Posts By User
j_sum1
Administrator
Thread Closed
18-8-2018 at 17:15
j_sum1
Administrator
Thread Opened
18-8-2018 at 18:34
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 19-8-2018 at 16:55


After the previous run of 6 days I stopped the script so no new update. I have been updating it to produce something more robust and, after discussing with woelen what would be needed, beginning to develop a script to actively delete spam posts.

I've just now set the spam monitoring script running again on a more permanent basis so will begin providing updates going forward.

The intention for the active part of the script will be to submit a spam report to trigger the automatic deletion script on the server. This might not work for all the spam but seemed like the safest first step as very little damage can be done by sending a U2U.

As far as I know the most recent description of the automatic deletion script is from page 8 of this topic:

Quote: Originally posted by Polverone  
I have tweaked the reporting process, somewhat inspired by violet sin. Before spam had to be reported by two trusted members or one moderator before it was auto-deleted. Now it will be auto-deleted if it is reported by one trusted member and one 'semi-trusted' member. Everyone who has been registered at least 60 days and has at least 50 posts automatically becomes a member of the 'semi-trusted' group. This should reduce the waiting time for spam deletion, and means that any frequent user of the site can help stamp out spam.


The idea is to make this more aggressive by

- expanding the pool of members able to trigger a deletion (but still requiring 2 members)
- automatically triggering deletion if a new member starts multiple new topics

I'm currently looking at whether we can use machine learning to go more aggressive on deletion but with having some safeguards. For example, by triggering deletion on the report of a single user instead of two if the machine learning model also thinks a post is spam.

I'll finish with some historical statistics. For the machine learning model I have examples of spam posts but I also need examples of genuine posts. I could use everything on the forum but the first posts from users would be best. There are 8835 members on the forum with at least one post. Somewhat to my surprise the majority (5062) started a new topic with their first post. Most of those (4206) posted within a day of registering on the forum.
View user's profile View All Posts By User
JJay
International Hazard
*****




Posts: 3440
Registered: 15-10-2015
Member Is Offline


[*] posted on 20-8-2018 at 11:41


Question: When you see a spammer posting several new posts, which one do you report? The oldest one?



View user's profile View All Posts By User
WGTR
National Hazard
****




Posts: 971
Registered: 29-9-2013
Location: Online
Member Is Offline

Mood: Outline

[*] posted on 20-8-2018 at 11:43


Usually I do, but it depends on which one(s) have the most view counts. Usually the oldest one looks like several people have already looked at it (and have presumably
reported it).




View user's profile View All Posts By User
fusso
International Hazard
*****




Posts: 1922
Registered: 23-6-2017
Location: 4 ∥ universes ahead of you
Member Is Offline


[*] posted on 21-8-2018 at 05:09


Quote: Originally posted by streety  
The x axis is the time posted and uses the default for the board, i.e. GMT-8. I think Sydney would be 18 hours ahead.

This is adjusted for time deleted.


This is color coded for the day. It does seem like the uptick around 6-10am occurs on multiple days. Whether that pattern will continue or not is an open question.


[edit]
I've just looked at the users and there is a surprise there as well.

There were 216 members linked to spam posts.

I assumed they were registering and then immediately posting spam. Most accounts did follow that pattern but 15 accounts were registered days before they started posting. For these 15 accounts the median delay was 8 days. There were two accounts around 40-50 days and one account that was registered 71 days before it started posting.
[/edit]

[Edited on 8-8-2018 by streety]
@streety can you please also analyse the frequencies of real posts?



View user's profile View All Posts By User
fusso
International Hazard
*****




Posts: 1922
Registered: 23-6-2017
Location: 4 ∥ universes ahead of you
Member Is Offline


[*] posted on 23-8-2018 at 10:22
Highlight reported posts


Quote: Originally posted by Polverone  
I have tweaked the reporting process, somewhat inspired by violet sin. Before spam had to be reported by two trusted members or one moderator before it was auto-deleted. Now it will be auto-deleted if it is reported by one trusted member and one 'semi-trusted' member. Everyone who has been registered at least 60 days and has at least 50 posts automatically becomes a member of the 'semi-trusted' group. This should reduce the waiting time for spam deletion, and means that any frequent user of the site can help stamp out spam.

To make others know that a post had already been reported, put an icon before the subject if a trusted member has reported it. Put another icon there if a semi-trusted member has reported it. This can reduce number of reports and u2us.




View user's profile View All Posts By User
fusso
International Hazard
*****




Posts: 1922
Registered: 23-6-2017
Location: 4 ∥ universes ahead of you
Member Is Offline


mad.gif posted on 24-8-2018 at 08:36
What should we write when reporting trolls?


Should we write "spam" or "troll" when reporting trolls?



View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 28-8-2018 at 13:58


fusso, what would you like to know about the legitimate posts? I can extract quite a lot of information from the backup.

The icons would be a good idea if we continue to struggle with spam. I'm hoping we are close to a major improvement.

I've been able to train a machine learning model to detect spam posts with high accuracy and have now modified the monitoring script to send spam reports. If run by a admin or mod account the reports should trigger the automatic deletion script on the server. To avoid mistakes it will only trigger if:
- a new user (less than 48 hours old) starts two or more new topics detected as spam
- a new user starts one topic detected as spam that is also reported by a member

As the spammers change their content the performance of the model will eventually degrade but it will be easy to update it.
View user's profile View All Posts By User
diddi
National Hazard
****




Posts: 723
Registered: 23-9-2014
Location: Victoria, Australia
Member Is Offline

Mood: Fluorescent

[*] posted on 28-8-2018 at 14:05


just write
"spam"




Beginning construction of periodic table display
View user's profile View All Posts By User
j_sum1
Administrator
********




Posts: 6219
Registered: 4-10-2014
Location: Unmoved
Member Is Offline

Mood: Organised

[*] posted on 29-8-2018 at 04:16


For interest' sake...
I have not cleared my u2u trash for about four months. (Late April or early May.) The only thing in there is spam reports.

You are all doing a grand job. And even though the bots are persistent, I get the feeling we are staying ahead of the mess.
But this does illustrate the size of the problem.



trashbox.jpg - 36kB
View user's profile View All Posts By User
fusso
International Hazard
*****




Posts: 1922
Registered: 23-6-2017
Location: 4 ∥ universes ahead of you
Member Is Offline


[*] posted on 29-8-2018 at 07:46


Quote: Originally posted by streety  
fusso, what would you like to know about the legitimate posts? I can extract quite a lot of information from the backup.

The icons would be a good idea if we continue to struggle with spam. I'm hoping we are close to a major improvement.

I've been able to train a machine learning model to detect spam posts with high accuracy and have now modified the monitoring script to send spam reports. If run by a admin or mod account the reports should trigger the automatic deletion script on the server. To avoid mistakes it will only trigger if:
- a new user (less than 48 hours old) starts two or more new topics detected as spam
- a new user starts one topic detected as spam that is also reported by a member

As the spammers change their content the performance of the model will eventually degrade but it will be easy to update it.
I'd like to know the distribution of posts in different time periods of a day.



View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 1-9-2018 at 07:05


This first figure is the frequency of posting over the complete period of the board:

post_frequency.png - 62kB

This is the figure showing the frequency of posts for each hour of the day. The time is the same as the board (UTC-8) so the peak is approximately, 12pm pacific time, 3pm eastern time, 8pm UTC, 5am in Sydney.

time_of_day_all_posts.png - 59kB

In putting together these figures I discovered two odd posts. The post date is 1969. In the database they are represented by timestamp values of 0 and 1. They are also clearly spam but then there are 8 pages of legitimate content.
https://www.sciencemadness.org/whisper/viewthread.php?tid=21...
View user's profile View All Posts By User
Diachrynic
Hazard to Others
***




Posts: 219
Registered: 23-9-2017
Location: western spiral arm of the galaxy
Member Is Offline

Mood: zenosyne

[*] posted on 2-9-2018 at 03:58


Can someone confirm this pattern? The bots seem to post in a certain order when spamming. Just an observation. Might be interesting to check if the pattern holds. (I just realized there is not so much order as I had hoped with that last one. Still.)



ic-5302.jpg.png - 125kB

ic-8857.jpg.png - 129kB




we apologize for the inconvenience
View user's profile View All Posts By User
 Pages:  1  ..  17    19    21  ..  28

  Go To Top