I’m the administrator of kbin.life, a general purpose/tech orientated kbin instance.

  • 0 Posts
  • 306 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle
  • I feel like the only even remotely acceptable way to do this is to show the ad, prompt for the answer for 10 seconds. They can log the right/wrong answer or if the time expires the lack of one and must move on.

    I can imagine metrics knowing if your advertising is actually reaching people is valid. But to make people answer and especially make them watch more if they answer wrong is about as dystopian as it gets.

    If (and I say if, I really don’t want to believe it is) that is the case, the only correct response is to uninstall Hulu immediately and put on your pirate hat.


  • Why? Because you can. But in terms of useful reasons?

    Cellphones, Internet they need infrastructure to work, and that can be disabled either during a natural disaster or war situation. Even by your own government in some cases.

    But if I want to communicate, I just need a piece of wire, somewhere to hang it, and a 12v battery and I can communicate for thousands of miles.

    Personally I just think that’s cool.


  • Yeah, it shouldn’t happen in a release. But, if I had a penny for every time I’ve seen the last minute development that wasn’t tested yet and not even due for the current release squeezed in. I’d literally have a pound, or dollar or whatever else has 100 pennies in.




  • Yeah. I didn’t understand what they meant by the wtf there. Seemed to me someone wondered if the Action would have a localised version of i (making this stay lowercase on a phone was harder than it should be) or if it used the same i. So made a simple test for it.

    Not really sure it’s a wtf unless they expected a different result.




  • So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.

    On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.

    You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.


  • Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.

    And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.


  • If you’re running nginx I am using the following:

    if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }

    That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!

    I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):

    AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)

    Since these guys run or have run bots that impersonate real browser agents.

    There are various tools online to return prefix/ip lists for an autonomous system number.

    I put both into a single file and include it into my web site config files.

    EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.








  • Now see, I kinda had the idea for a syndicated delivery service (not online orders, but the internet would have been used to create the order data that would assign drivers) decades ago. I did some part time work delivering food back in the late 90s/early 2000s, and I always thought it was so inefficient. The place I was at, was very busy, he had a very large delivery area but even so. There would be times he was paying people to sit outside talking shit to eachother in their cars.

    I thought it would make sense to have a larger pool of drivers that service multiple restaurants/take-aways. Adding the economies of scale to the problem to ensure that people were being utilised and lowering the cost to each place using the service. Of course also paying some money to the person running the business that brought it all together.

    I don’t think I ever considered paying less than this guy did (which wasn’t a lot, but would likely translate to $5 or so an hour in the 90s/2000s).

    One thing I find really interesting about uber eats/door dash (US)/Deliveroo (UK/EU). When you add up their fees, they take a delivery fee from the user, a service fee from the user, an even bigger service fee from the restaurant and pay the lowest possible fee that will keep drivers interested. Yet I always hear the services are losing money too. How is that even possible?

    Take deliveroo in the UK. Looking now I can see (I don’t live in a city, so most places are some distance away). A place 4.5 miles away is charging £4.29 for delivery. Let’s make up an imaginary order:

    Order total: £20 (including sales tax/VAT) User’s service fee: £2.39 (it seems to be 11% including the VAT with a maximum set of which I am not sure how much) User’s delivery fee: £4.29 (including VAT, since they need to charge VAT on a service) Restaurant service fee: £6 (30% on the VAT included total). I am really unsure how this works entirely in terms of tax though… Total for user: £26.68


    Total deliveroo service revenue: Net: £10.57 VAT: £2.11 Total: £12.68

    Reading between the lines from what I can see delivery riders are paid between £3 and £6 per delivery. Now, in the cities this is probably great. I do wonder how they do it in the towns and villages. When I look at the list of places available to me most are 3 miles or more away, with some up to 6 miles away. I do wonder how £6 compensates someone doing a 10+ mile round trip at times.

    But OK the price they pay drivers doesn’t include any tax. So it comes from the Net total. This means per delivery in revenue they are always making £4.50 or more per delivery.

    Yes, they need to pay support staff, but they are in low cost geographies. Yes, they need to keep development staff and the usual management overhead And yes, they need servers/cloud time to host this stuff.

    Looking this up (not sure how good the source is) their revenue in 2023 was £2.7billion, which I believe. However they lost £38million. Where all the costs come from, I am not sure.

    I wonder how these numbers compare to US based operators?


  • I think it’ll be a “we’ll see” situation. This was the main concern for y2k. And I don’t doubt there’s some stuff that was partially patched from y2k still around that is still using string dates.

    But the vast majority of software now works with timestamps and of course some things will need work. But with y2k the vast majority of business software needed changing. I think in this case the vast majority will be working correctly already and it’ll be the job of developers (probably in a panic less than a year before as is the custom) too catch the few outliers and yes some will escape through the cracks. But that was also the case last time round too.


  • You’re right on every point. But, I’m not sure how that goes against what I said.

    Most applications now use the epoch for date and time storage, and for the 2038 problem the issues will be down to making sure either tiime_t or 64bit long values (and matching storage) which will be a much smaller change then was the case for y2k. Since more people also use libraries for date and time handling it’s also likely this will be handled.

    Most databases have datetime types which again are almost certainly already ready for 2038.

    I just don’t think the scale is going to be close to the same.