Sciencemadness Discussion Board

Post #2000 and an apology for accidentally deleting a thread

Melgar - 13-12-2018 at 17:26

I was trying to think of something clever to do for my 2000th post, but then BotKilla accidentally deleted the "detecting Hg in street drugs thread". So I figured I owed everyone an explanation of how that happened and what I've done to make sure that it doesn't happen again.

When new threads are created, they're assigned thread ids in sequential order. So a new thread will always have the largest thread id of any thread ever created. As a means of ensuring that it was impossible for BotKilla to accidentally delete an old thread, there was a hard-coded cutoff constant, whereby BotKilla would totally ignore any thread with a thread ID below this number. This worked well enough for preserving old threads that are part of the SM legacy, but because the number was a constant, it didn't increase over time. When I started running BotKilla, thread ids were typically around 96000. I raised the thread cutoff once, manually, to 99000. But raising it over time to account for increasing thread ids wasn't a very high priority, and so this meant that threads created after 99000 could be deleted under certain unlikely circumstances. That's basically what happened for the "detecting Hg in street drugs thread", a bunch of unrecognized links were in it and it got flagged as spam.

Thread ids for new threads are now around 112000. This means that BotKilla has killed over 15,000 spambot-created threads. About 99.8% of new threads are spam threads, with a new one being created about 12 times per hour, or every 5 minutes. So clearly there is plenty of spam to automatically sort through, and the script does seem to pick up almost all of it. However, when I noticed that this legitimate thread got deleted, I made two changes: a) removed the code to penalize a post for additional unrecognized links that are included in it beyond the first one, and b) changed the hard-coded constant to a number that's automatically incremented, and can only ever be increased over time.

So a few key points:

At this point, I don't think that it's an option to stop the script, and I think most people would agree. So I've restarted it with the above code changes. Sorry about accidentally deleting that thread. I can probably recover parts of the thread with some effort, but it wasn't very long, and the consensus seemed to be that you can buy mercury test strips for testing groundwater and paint and such, and that using those test strips is probably the way to go.

Anyway, I'm open to any feedback, and hope the 15,000 automatically-deleted spam threads were worth losing a few recently-started threads. There have only been three that I'm aware of, and measures are in place such that none of those three would be deleted if they were posted again.

JJay - 13-12-2018 at 18:07

Why don't you use support vector machines or naive Bayes or some other machine-learning method for flagging spam?

clearly_not_atara - 13-12-2018 at 18:16

Machine learning is a pain in the ass

fusso - 13-12-2018 at 18:17

Why not make a post only flaggable once, and will become undeletable once it has been scanned once and determined to be genuine, like res judicata in laws (you can't sue someone for the same thing again, if proven innocent)?

[Edited on 181214 by fusso]

JJay - 13-12-2018 at 18:25

Quote: Originally posted by clearly_not_atara  
Machine learning is a pain in the ass


Not really... there are dozens of open-source packages for doing it... that's just how professional antispam software works.

BromicAcid - 13-12-2018 at 18:32

Quote: Originally posted by Melgar  
This means that BotKilla has killed over 15,000 spambot-created threads. About 99.8% of new threads are spam threads, with a new one being created about 12 times per hour, or every 5 minutes.


Wow, thank you....

Did I mention thank you?

Phosphor-ing - 13-12-2018 at 19:29

Wow, I didn’t realized we received that many spam posts everyday. Thank you for all you do.

Melgar - 13-12-2018 at 19:57

Quote: Originally posted by fusso  
Why not make a post only flaggable once, and will become undeletable once it has been scanned once and determined to be genuine, like res judicata in laws (you can't sue someone for the same thing again, if proven innocent)?

[Edited on 181214 by fusso]

I actually had such a system, but it started with an empty array and added threads to it over time. The thing was, that array would have its contents reset when it was restarted. The problem arose from threads that had been started between the cutoff constant and when the current iteration of the script had started running.

As far as why I didn't use a third-party Bayesian filter library or whatever, that's easy: it wouldn't have been anywhere close to as accurate. The spam posts were designed to thwart common Bayesian filter algorithms, which only look at contents, and nothing else. By including user registration data, user post count, and checking link domains against a whitelist, accuracy got to about 1% false negatives and 0.03% false positives. That's better than my GMail's spam filter, for both figures.

JJay - 13-12-2018 at 22:26

Quote: Originally posted by Melgar  
That's better than my GMail's spam filter, for both figures.


I call bullshit.

https://techcrunch.com/2017/05/31/google-says-its-machine-le...


Tsjerk - 13-12-2018 at 23:46

You are great Melgar!

JJay - 14-12-2018 at 00:23

Quote: Originally posted by Tsjerk  
You are great Melgar!


I have very sound and logical reasons for calling bullshit, but whether or not what Melgar is saying is true is immaterial to the question of whether machine learning could improve an existing spam detection algorithm, and Melgar knows it, or he should know it. Please, let's set the blatant self-promotion aside.

Melgar, do you understand how machine learning could improve your algorithm, or not?

Tsjerk - 14-12-2018 at 03:43

You are one angry bugger, aren't you?

mayko - 14-12-2018 at 06:35

Thanks for keeping the front page clean Melgar!

woelen - 14-12-2018 at 12:54

The loss of that thread is a small accident and I appreciate very much that Melgar has been honest about this and let us know about it. I fully accept Melgar's apology.
Melgar's work is of great value for Sciencemadness. Without it the system would be next to useless and we would hardly be able to work with the forums anymore.

Such an error can happen. Good that the scripts are improved and that the chance of accidental deletion of a legit thread is further reduced.

@JJay: Being a little more constructive in your communication would be nice. Maybe with machine learning one could do even better, but the big difference between your words and Melgar's words is that he is showing a working system (in which he put a lot of effort) and lets us enjoy it for free, while you just have angry words and done no real work.

JJay - 14-12-2018 at 16:54

woelen: In response to my questions, Melgar gave a false and self-promoting excuse that wouldn't even logically excuse him if it were true. I will not assist in this, but are you really incapable of seeing how trivial it would be to use machine learning to improve Melgar's system?

Without Melgar's scripts, someone else would have put scripts into place. His scripts are a sum zero improvement to the board. A half dozen people here could have done better. I'm not saying that absolutely nothing would have been deleted, but we would be better off without Melgar's scripts.

I understand that you'd probably prefer that I work with Melgar on this, and I've considered it, but I consider the legal risks unacceptable. Who knows... maybe Melgar could say something to set my mind at ease. But I doubt it.

fusso - 14-12-2018 at 17:01

@JJ maybe he hates AI stuff? I don't think it's a problem to hate AI.

JJay - 14-12-2018 at 17:52

So let's say I come up with a working system, and it's better than Melgar's. Will it replace his? Seriously.

phlogiston - 14-12-2018 at 19:03

Quote: Originally posted by JJay  
Without Melgar's scripts, someone else would have put scripts into place. His scripts are a sum zero improvement to the board. A half dozen people here could have done better. I'm not saying that absolutely nothing would have been deleted, but we would be better off without Melgar's scripts.

I understand that you'd probably prefer that I work with Melgar on this, and I've considered it, but I consider the legal risks unacceptable. Who knows... maybe Melgar could say something to set my mind at ease. But I doubt it.


So, Melgar actually did the work, voluntarily. Moreover, he did a very good job. You, on the other hand, are only complaining about how worried you are about 'legal risks'.
Frankly, the forum was becoming unusable due to spam, and Melgar's script pretty much saved it.

Perhaps indeed 'half a dozen people could have done better'. But they didn't. Melgar did.

Also, Melgar was transparent and frank about accidentally deleting that thread, and I strongly feel that some collateral damage is acceptable in the battle against spam. I fully trust that with time, he'll be able improve his algorithms (using AI or not) to delete nearly only spam.

Vomaturge - 14-12-2018 at 19:54

Can't speak for the mods, but if you made a system which was very obviously better than Melgars' I think it would be accepted.

For now, Botkilla has spoiled us as far as providing a low spam forum. Thankyou, Melgar!

It is also worth noting that at least one thread (the old "everyday chemistry" thread) was deleted for some reason prior to botkilla's startup.

Would machine learning make a better spam filter? Can't say, since I don't have the tech knowledge to know the practical limits of such a program. Is Melgar somehow wrong for not using it? No, he chose a different approach, and applied it competently with good results. Did he show poor judgement in not using machine learning? No, there wasn't a highly obvious case for it based on the trials he did do.

Should we keep our minds open to new solutions to the barrage of spam which was (still is) flowing towards the forum? Absolutely! Everyone should feel free to share their own ideas for improvements, so long as they aren't upset if their proposals get turned down.

JJay - 14-12-2018 at 22:48

So, I reiterate, and please, unless you are a mod, hold your peace:

If I write a superior spam-fighting program, will you use it instead of Melgar's?

j_sum1 - 14-12-2018 at 23:40

Jjay. We are not about to install competing systems. What Melgar has done is working remarkably well within consideable constraints. He does not have access to the back end of the system and he has devised something that works. Could something similarbe done via a machine learning system? Probably. Would it be an improvement? Maybe marginal. It is tough to improve on something that is catching most spam within minutes and has made so few errors. The ultimate solution awaits a new platform.

In summary, you may well be right on technical details. But your approach to this discussion has been counterproductive. I don't think it wise to take down what Melgar has done just so that you can have your play in the sandbox.

JJay - 15-12-2018 at 00:16

My approach to this discussion was not counterproductive. Melgar's approach to this discussion was dishonest and counterproductive. I am extremely disappointed with the leadership of this forum, and I will be leaving.

Metacelsus - 15-12-2018 at 00:20

@JJay: would you consider collaborating with Melgar to improve the spam filter using machine learning? Spam filters don't have to be solo projects.

JJay - 15-12-2018 at 00:25

No, not a chance, absolutely not. As a U.S. citizen, I will not be a part of a forum where Melgar is running things. This is nothing personal against Melgar; seriously, I don't dislike you, Melgar. There are certain individuals I won't work with, though, and Melgar is one of those individuals. Sorry, that's just how it is.

Loptr - 15-12-2018 at 08:33

I think this has to do with the posts that Melgar made about his drug synthesis accusations from his family, and not having someone that put all of that out there so publically be even remotely linked to direction of the forum.

morganbw - 15-12-2018 at 11:16

Thank you, Mr. @Loptr.
This thread turned sour with no apparent reason. Perhaps some sanity with your post.

Perhaps our Harvard grad could give a bit more info when he goes psycho, I had thought that perhaps it was a bad synth.

Either way, it was handled badly.

Tsjerk - 15-12-2018 at 11:46

Don't forget Melgar uses shills to work against you @JJay!

diddi - 15-12-2018 at 12:25

Mods. its time to close this one down i think

It detracts from the collegial nature and ethos of the forum imo

j_sum1 - 15-12-2018 at 14:06

Quote: Originally posted by diddi  
Mods. its time to close this one down i think

It detracts from the collegial nature and ethos of the forum imo

OK.