| Pages:
1
2
3 |
bnull
National Hazard
  
Posts: 994
Registered: 15-1-2024
Location: East Woods
Member Is Offline
Mood: Fecking annoyed
|
|
1454 now and the forum is still usable. Weird.
|
|
|
bnull
National Hazard
  
Posts: 994
Registered: 15-1-2024
Location: East Woods
Member Is Offline
Mood: Fecking annoyed
|
|
All right, this one's got to be a joke.
[
And yet the forum was running smooth, https and whisper and all.
|
|
|
charley1957
Hazard to Others
 
Posts: 197
Registered: 18-2-2012
Location: Texas
Member Is Offline
Mood: Hotter than the hinges of Hell! Well, not yet, but fixing to be!
|
|
Today is the first day I’ve been able to get on the website in a week. I keep getting timeouts, server stopped responding messages.
You can’t claim you drank all day if you didn’t start early in the morning.
|
|
|
yobbo II
National Hazard
  
Posts: 820
Registered: 28-3-2016
Member Is Offline
Mood: No Mood
|
|
01:17 GMT 4 Feb
Are there normally (before the bots) this many guests.
The bots may have registered?
Yob
|
|
|
bnull
National Hazard
  
Posts: 994
Registered: 15-1-2024
Location: East Woods
Member Is Offline
Mood: Fecking annoyed
|
|
The guests are bots, or rather the majority of the guests are. Take 20 guests as real persons and what remains will be those bloody leeches.
|
|
|
bnull
National Hazard
  
Posts: 994
Registered: 15-1-2024
Location: East Woods
Member Is Offline
Mood: Fecking annoyed
|
|
Same joke again. Ha ha. 

Different scrapers?
|
|
|
charley1957
Hazard to Others
 
Posts: 197
Registered: 18-2-2012
Location: Texas
Member Is Offline
Mood: Hotter than the hinges of Hell! Well, not yet, but fixing to be!
|
|
Wow, second day in a row I’ve been able to get in. Fast too, instantly. Are there that many bots out there all using this website at one time?
I’m not real up on how all that works, what the bots are for, who runs them, etc. It’s my partial understanding that at least some of them are
here to get data for training AI models. That’s just from some of the discussion in here about it. But are they all here doing that, all at one
time?
You can’t claim you drank all day if you didn’t start early in the morning.
|
|
|
bnull
National Hazard
  
Posts: 994
Registered: 15-1-2024
Location: East Woods
Member Is Offline
Mood: Fecking annoyed
|
|
Stable. Number of 'guests' below 800.
|
|
|
bnull
National Hazard
  
Posts: 994
Registered: 15-1-2024
Location: East Woods
Member Is Offline
Mood: Fecking annoyed
|
|
Well...
|
|
|
Twospoons
International Hazard
   
Posts: 1392
Registered: 26-7-2004
Location: Middle Earth
Member Is Offline
Mood: A trace of hope...
|
|
Certain times seem to be worse than others : I can rarely load the website before midday (local time is GMT+12).
Helicopter: "helico" -> spiral, "pter" -> with wings
|
|
|
esquizete_electrolysis
Harmless
Posts: 26
Registered: 9-10-2018
Location: N.C. RT
Member Is Offline
|
|
Haven't been able to load the site in the last week, but had an idea during that time. Why not implement a captcha to view the site? Its better than
having to log in to view it and many other forums (albeit, on more modern codebases, mostly xenforo) have used it to great success. It may have the
downside of reducing the ability to search by threads via google, but I reckon that that's a fair trade off for being able to use the site more than
once a week
Edit: I wanted to flesh out my thought a bit more so this doesn't seem like a piss in the wind. Due to the bulk of the traffic likely being AI
crawlers, some more simple systems could be implemented prior to a captcha. I also don't particularly care for most captcha providers since google
datamines as much as possible, and cloudflare has a good tendency to pull service for ethereal reasons.
The most simple is using an updated robots.txt config. If its mostly crawlers from the most popular sources, this should take care of the bulk. I
doubt this is not already in use, but it may need updating. Github typically hosts decent stuff for this.
If that doesn't work, the next would be including a no indexing tag on most pages. This would unfortunately neuter the ability to search up relevant
info in our more expansive threads. This could be applied to body text so post titles will still remain searchable.
If neither of these worked, it would be more of a point of seeing if the site is actively being attacked by someone. At that point, implementing a
captcha along with DDoS protection frameworks like CrowdSec could be a decent way of aiding the issue. Its unlikely its this deep, but who knows,
there may be a long disgruntled loser who wants to take down this wonderful resource.
Finally, scorched earth. Have the no index tag on every page and require a trusted tag to access anything other than the login page. This could be
expanded out to have a list of trusted IPs, or using cookies to grant a trust token. All of that is unnecessary, since its still just requiring users
to login to use the site.
[Edited on 24-2-2026 by esquizete_electrolysis]
|
|
|
bnull
National Hazard
  
Posts: 994
Registered: 15-1-2024
Location: East Woods
Member Is Offline
Mood: Fecking annoyed
|
|
The leeches always find a workaround. Solving captcha is within their abilities. robots.txt is not a rule, the crawlers comply if they were instructed
to do so. A gentlemen's agreement, in fact.
If it were an attack, there would be more than just plain unavailability. If you have read older threads, there was a guy who messed up posts and
changed passwords and stuff until he was caught. I can't remember what post was it.
Is there an inverse relation between leech number and ease of access?
|
|
|
esquizete_electrolysis
Harmless
Posts: 26
Registered: 9-10-2018
Location: N.C. RT
Member Is Offline
|
|
Quote: Originally posted by bnull  | The leeches always find a workaround. Solving captcha is within their abilities. robots.txt is not a rule, the crawlers comply if they were instructed
to do so. A gentlemen's agreement, in fact.
|
Perhaps, but compliance is necessary for any large company to continue their operations. Legal precedence (atleast in the US) has been set that
considers noncompliance as either trespassing (kinda odd) or violation of DMCA.
I did look and we do have a decent robots.txt, however it only blocks some of the large ones, 11 to be precise, with only 4 of those being AI specific
crawlers. Of those 4, only 1 is in major use anymore, with the other 35 that make up 90% of crawler traffic not being blocked.
I attached both ours (as a quote since its short) and a random AI megalist one I pulled off github for comparison.
| Quote: |
User-agent: Amazonbot
User-agent: Amazonbot/0.1
User-agent: GPTBot **Note: This is the only blocked one of their 6 bots.
User-agent: Applebot
User-agent: SemrushBot
User-agent: Bytespider
User-agent: meta-externalagent
User-agent: coccocbot-web
User-agent: AhrefsBot
User-agent: PetalBot
User-agent: BLEXBot
|
Attachment: robots.txt (3kB) This file has been downloaded 26 times
|
|
|
chempyre235
Hazard to Others
 
Posts: 207
Registered: 21-10-2024
Location: Between Nb and Tc
Member Is Offline
Mood: Quite distracted
|
|
It's been almost two weeks since I've been able to access the forum (I was getting a bit worried). I now see that the sheer number of bots constantly
on this forum is far past what this site was designed for. At the time that I typed this, there are well over 900 users on this site, with three
members (including myself) logged in.
"However beautiful the strategy, you should occasionally look at the results." -Winston Churchill
"I weep at the sight of flaming acetic anhydride." -@Madscientist
"...the elements shall melt with fervent heat..." -2 Peter 3:10
|
|
|
macckone
Dispenser of practical lab wisdom
   
Posts: 2211
Registered: 1-3-2013
Location: Over a mile high
Member Is Offline
Mood: Electrical
|
|
Compliance with robots.txt is necessary under the computer fraud and abuse act. It is technically illegal to exceed authority granted to access a
website. They can also be sued for damages. Which at this point are substantial.
|
|
|
bnull
National Hazard
  
Posts: 994
Registered: 15-1-2024
Location: East Woods
Member Is Offline
Mood: Fecking annoyed
|
|
| Quote: | | Compliance with robots.txt is necessary under the computer fraud and abuse act. |
No. First of all, robots.txt was created in 1994 and | Quote: | | "is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody, and
there no guarantee that all current and future robots will use it. Consider it a common facility the majority of robot authors offer the WWW community
to protect WWW server against unwanted accesses by their robots." (From https://www.robotstxt.org/) |
As I wrote before, compliance with robots.txt is based on a gentlemen's agreement, which the leeches are not obliged to honor.
The Computer Fraud and Abuse Act was enacted in 1986, almost a decade before robots.txt. It does not define what constitutes unauthorized access.
| Quote: | | It is technically illegal to exceed authority granted to access a website. |
Yes, but unauthorized access requires that the leeches scrape content that is only visible to logged members (the u2u system and the References,
Whimsy, and Detritus subforums). The content that is visible to the general public without the necessity of username and password (the rest of the
subforums) does not fall under that definition. From Van Buren v. United States (attachment, page 5):
| Quote: | We must decide whether Van Buren also violated the Computer Fraud and Abuse Act of 1986 (CFAA), which makes it illegal "to access a computer with
authorization and to use such access to obtain or alter information in the computer that the accesser is not entitled so to obtain or alter."
He did not. This provision covers those who obtain information from particular areas in the computer—such as files, folders, or
databases—to which their computer access does not extend. It does not cover those who, like Van Buren, have improper motives for obtaining
information that is otherwise available to them. |
robots.txt is not legally binding, so whether the leeches comply with it or not is immaterial for the definition of unauthorized access.
Attachment: Van Buren.pdf (207kB) This file has been downloaded 34 times
|
|
|
BromicAcid
International Hazard
   
Posts: 3323
Registered: 13-7-2003
Location: Wisconsin
Member Is Offline
Mood: Rock n' Roll
|
|
Since we're so popular, poison?
https://rnsaffn.com/poison3/
Edit: I suppose we have Detritus though which is about the same thing.
[Edited on 2/27/2026 by BromicAcid]
|
|
|
Twospoons
International Hazard
   
Posts: 1392
Registered: 26-7-2004
Location: Middle Earth
Member Is Offline
Mood: A trace of hope...
|
|
There's always the nuclear option : no guest access. I really don't like the idea of blocking casual users, but the current situation is blocking
everyone.
[Edited on 27-2-2026 by Twospoons]
Helicopter: "helico" -> spiral, "pter" -> with wings
|
|
|
BromicAcid
International Hazard
   
Posts: 3323
Registered: 13-7-2003
Location: Wisconsin
Member Is Offline
Mood: Rock n' Roll
|
|
Quote: Originally posted by Twospoons  | | There's always the nuclear option : no guest access. I really don't like the idea of blocking casual users, but the current situation is blocking
everyone. |
Or maybe just block access to everything except Beginnings or something?
|
|
|
| Pages:
1
2
3 |