Conversation
Notices
-
Somebody went to a huge amount of trouble to set up thousands of accounts. And there's no URLs or spam or whatever. So strange.
- Stephen Michael Kellat likes this.
- Stephen Michael Kellat repeated this.
-
It's hard to do, really. You have to solve a captcha and confirm an email address. It's a lot of work for not a lot of gain.
Jason Riedy likes this. -
bet that amount of work can be had for tiny fraction of $5 and whatever odd objective pursued, worth <$5. just sayin' #fivebucksignup
-
Ah, good point.
-
They might lie dormant for a couple of months and then activate (based on my Black Friday spam observations).
-
Sadly $5 USD is a day wage in some countries. I don't like this idea.
-
Well, I've silenced about 5000 already, and there are probably a few more to come. Hope we can handle them by hand.
Stephen Michael Kellat likes this. -
@evan I wasn't counting how many, but I've been silencing them for a few hours now.
Stephen Michael Kellat likes this. -
Thanks a ton. It seems like we're down to just a few.
Stephen Michael Kellat likes this. -
@evan @evan@identi.ca something similar took down demo.friendika.com as well...
-
Really!? That's bizarre. Same kind of junk posts?
-
massive registrations swamped the db and took it down. many of the sn sites also saw massive increases in hits to their…
-
One other thing worth noting is that when someone on identi.ca gets silenced, everyone who registered from the same IP also gets silenced.
-
...so whoever did this needed tons of IPs to register from.
-
What about Diaspora?
Jason Riedy likes this. -
@evan apparently not diaspora.
-
I should probably dig up the registration IPs. I wonder if they come from some particular country?
-
That has been a concern, that this could spread to joindiaspora.com and rstat.us.
-
@parlementum @evan Rather suspicious that Diaspora wasn't targeted either, what with all the public hype it has had.
-
If you ever need any more modhelper help, my ol’ identica account is @jpopehasmoved@identi.ca ;)
-
If it's spammers, might make sense. Diaspora doesn't have the same public interface we do. (Public timeline, tag pages, group pages.)
-
You should probably direct that notice elsewhere @jpope :/
-
*cough* Invite-only *cough* !Identica
-
Spammers are like roaches, once they're in your apartment, they're in.
-
@zoowar gratis with invite. but I know you hate that idea too.
-
You just created an underground economy selling invites. I don't hate the invite approach, I don't think it solves the problem.
-
When g+ was invite only, the only people who couldn't get an invite were people who didn't know anyone with a g+ account.
-
Possibly, but doubtful. AFAIK markets in invites have been at best fleeting as tx cost > value of invite.
-
AFAIK current troubles appear not be the cockroachs in the apartment, but the swarms that keep coming in because the door is open !Identica
Marjolein Katsma likes this. -
Spam is the devil we know, an easy target to point our finger. However, spam *volume* is not the issue http://ur1.ca/6ynli
-
It's not difficult to rent a botnet.
-
Spammers could set up an invitation network to distribute invitations among themselves.
Marjolein Katsma likes this. -
To fight this one would use the same social classifiers that would also identify dents as spam. Spam filtering is more democratic.
-
Been through this already — spammers _will_ do anything. Why is not accepting that and the consequences (a downed site) OK?
Marjolein Katsma likes this. -
What's a social classifier?
-
@zoowar depends on how invites doled out. how many modhelpers are spammers?
-
It's meant to convey a convergence of social graphing and Bayesian classification.
-
Does Bayesian classification work when documents are so small? (i.e. 140 characters)
-
Also, I don't think anyone has time to implement new code. Site is being overwhelmed _now_
-
Also, Bayesian classification would have been worthless for the past downtime. Added accounts apparently had no URLs/spam to classify
-
Yes
-
Agreed, but I'm not arguing to do anything about spam (right now). I'm arguing that it's not the problem.
-
Another spam-herring (misled by spam).
-
@samatjain maybe no new code temporary solution$ rm actions/register.php
-
Don't understand where numbers are from… How do you know whether 5k notices/sec is reasonable? Also, isn't 7k notices/hour = 2 notices/sec?
-
I would read a proposal. But besides proof of concept startups and google, what status networks are using invites?
-
Have evidence of that? Math doesn't work—not enough feature coefficients. Experience agrees: short spam e-mail doesn't get marked spam
-
360 seconds in an hour. 7200/360 = 20. Even if you don't believe 5k, I know you're not arguing that 20 tps is reasonable.
-
3600 seconds in an hour… http://identi.ca/url/63094227. Was pointing out numbers may be wrong, 20 tps is very reasonable
-
Not run a production StatusNet instance myself, but my impression is that isn't well-tuned… Don't expect >400–500 req/s for untuned app
-
Also, that's averaging over an hour. What if all those requests came within 5 min, and site unable to recover?
-
You got me. But that only strengthens my argument about spam load not being an issue.
-
That's extreme—site already has invite functionality. Literally a checkbox or 1 line config change to enable—AND turn off!
-
@samatjain it should be possible to use Baysian classifications for patterns of use, not just content!
-
How does that work? Not obvious IMHO, and written a Bayesian classifier or two in my time.
-
@samatjain I'd string stuff together and feed that to teh engine - e.g., last N posts (if any), IPs, DNS results, profile data, & repeat
-
@samatjain it doesn't need to look for patterns in 'content', but patterns in *stuff*, so just make *stuff* into strings it can look at
-
Stringing together posts is a nice idea. Not so sure about IP, etc… also, you're not really classifying behavior; still classifying content.
-
@samatjain just giving examples... same act, varying IPs is not content; also look at things like timing between posts
-
How to turn "stuff" into a string? Should point out: billions of $ spent in behavior detection for homeland security. And most doesn't work.
-
@samatjain another pattern I've seen is so many text-only posts, then one with a link... use a sliding window to see how it develops
-
@samatjain WE can recognize patterns - I think the trick is to encode post metadata across multiple posts in such a way that...
-
Not sure what you mean when you say "pattern". Seems like you really mean heuristics? Spammers _will_ defeat heuristics
-
@samatjain ... a baysian engine can analyze and learn to detect them. there is already plenty of material about patterns WE've seen
-
@samatjain we've seen, and reported, a lot of patterns already - like 'three links & fill up with hashtags', or '1 dent each hr', etc.
-
@samatjain but most patterns are seen across multiple dents, so you need a sliding window.
-
@samatjain you'd need to experiment but I'm sure it's possble to use a baysian engine for that - plenty of stuff already to feed it to learn
-
That's a heuristic, not a pattern. Heuristic = rule(s) you follow. Problem w/ hard-coded rules is that they are easily defeated…
-
…Show us the code, I guess. Is an !Identica corpus available for download somewhere?
-
@samatjain no, I *don't* mean hard-coded rules. WE see the patterns, a baysean engine could learn them when fed 'spam' material
-
@samatjain this is backend stuff - there's a whole big database there... I don't know if things like posting IP are stored though
-
@samatjain if not, it should be. for IP addresses, add DNS lookups (zombies, proxies etc) before feeding the engine.
-
@samatjain I *know* some 'pro' software provides the option to cycle through proxies for instance...
-
…Mmm, while it exists, it's not available? Sort of pointless to talk about things _we_ can't actually do (don't expect StatusNet to do it)
-
@samatjain ...that should be detectable as a pattern if combined with other (meta) data
-
Just wondering, do you actually know how Bayesian classification works? "Feeding", "patterns", etc are confusing ways to talk about it
-
@samatjain *just* content of single dents is definitely not enough - you need it to look for patterns across dents (like we do)
-
Mm, OK. Bayesian statistics are complicated, mapping to variables to difficult. Mentioned: real-life behavior detection systems don't work
-
@samatjain from what I know is that you *start* teaching a baysian egine by giving it a bunch of spam and and a bunch of ham
-
@samatjain define 'don't work' :)
-
All in all, rather see @evan fulfill StatusNet's business plan instead of maintain !Identica. Corp sites don't need heavy spam protection
parlementum likes this. -
How many such systems (that cost millions) do you know of that have caught any terrorists? Pretty sure if they did, we'd hear about it…
-
@samatjain that depends - if purely internal, they don't need it - if for customer-facing things like support, they definitely do!
-
For that matter, how many *MUCH* cheaper systems (like surveillance cameras) have caught any terrorists? tl;dr: Not a technology problem
-
@samatjain no - but here we're talking about patterns WE *can* see - if we can, we can encode their elements to be analyzed
-
Mmm I think this discussion is getting to the point: show us the code. You're mixing a lot of different unrelated concepts together
-
@samatjain what 'unrelated concepts'? spam fighting is all about pattern recognition, and it's never about content alone
-
@samatjain I'm thinking outside of the box, because applying baysian detection to microblogs is (apparently) new... just get started!
-
Bayesian classification (e.g. SpamBayes, Bogofilter) is NOT about pattern recognition
-
@marjoleink@identi.ca federated statusnet sites seem a better option for a number of reasons. Statusnet is open source …
-
Just a good read, http://ur1.ca/6yt2u
-
And since most spam wants you to navigate to a url... Monarch http://ur1.ca/6yt2u
-
Fascinating! Though, is this proprietary? Paper isn't loading for me (i.e. what are the actual heuristics involved)
-
Nevermind, got the paper. It's very vague. =/
-
Alternate url http://ur1.ca/6yujo
-
@samatjain Yes, Bayesian works on small messages, it just takes more of them to get a representative sample.
-
Not a proposal, nor even a direct answer to parent, but pure conjecture http://gondwanaland.com/mlog/2011/12/25/fsw-invite/ !fsw
-
I make a lot of sites that use invitations for both rationing and/or exclusivity. It helps my hobby projects, because m…
-
Lately I've been really getting into the use cases for #Diaspora, versus what I am calling the #OStatusphere. I get the…
-
Also, I wonder if we will need to revisit the commercial and invitation system side of this once !MediaGoblin implement…
-
@zoowar a lot of spam is actually profile spam which doesn't want *you* to navigate there, but search engines to index it
-
Search engines are free to use monarch in deciding what to index or how to weight results.
-
A decade ago, invitations were used to generate sales leads.
-
Then the open source renaissance put an end to that.
-
Flickr and Smugmug aren't really free (as in beer) services. But Flickr, at least, does have quite a bit of spam for their free accounts
-
I wasn't thinking of free (beer) services, rather what kind of spam media hosting sites get. I imagine it is mostly com…
-
@zoowar of course ;) but they need a link to find it - which is the main function of profile spam (sometimes a chain of profile links!)
-
A lot of spam is obvious to other users, and leads to still more spam (on vandalized forums and college sites). Their target: search engines
Marjolein Katsma likes this. -
Spam links in profiles are one obvious example of this, and the slowness to nofollow all links only hurt #Identica and the SN cloud.
Marjolein Katsma likes this. -
@lnxwalt okay, i'm back. rawr!! let's start disabling those spammer bot accounts!
-
@lnxwalt they are link farming. :(
-
@lnxwalt yes, exactly. *sigh*