remix logo

Hacker Remix

Show HN: A blocklist to remove spam and bad websites from search results

217 points by popcar2 4 days ago | 83 comments

Hi HN!

I've been fed up with search results so much that I decided to make a giant blocklist to remove garbage links by using uBlacklist.

I browsed other blocklists and wasn't very satisfied from what exists now; the goal of this one is to be super organized and transparent, explaining why each site was blocked via issues. Contributions welcome!

Even though around 100 domains are blocked so far, I already noticed a big improvement in casual searches. You'd be surprised how some AI generated websites can dominate the #1 page on DuckDuckGo.

cormorant 4 days ago

I'm fed up too. Spammy, AI-looking sites are showing up more and more. For some reason, many of them use the same Wordpress theme with a light gray table of contents - they look like this: https://imgur.com/a/totally-not-ai-generated-efsumgZ

The problem seems worse on "alternative" search engines, e.g. DuckDuckGo and Kagi, which both use Bing. It's been driving me back to Google.

A blocklist seems like a losing proposition, unless, like adblock filter lists, it balloons to tens of thousands of entries and gets updated constantly.

Unfortunately, this kind of blocklist is highly subjective. This list blocks MSN.com! That's hardly what I would have chosen.

popcar2 4 days ago

Even Google is plagued by spam, I've tried all sorts of search techniques and alternative engines but I feel like the only solution seems to be doing things manually. I was already starting to block things by myself but I thought it'd be more productive to make the list public and try crowdsourcing. Even now, searching "how to partition a hard disk" would often drive you to low-effort sites telling you to use their software.

> Unfortunately, this kind of blocklist is highly subjective. This list blocks MSN.com! That's hardly what I would have chosen.

It's definitely a bit opinionated, but it's open to discussion - you can create an unblock request issue (if you care enough to do so, of course!). The reason I blocked MSN is that it just re-hosts articles from other websites, so I'd rather see the official source than be tricked into Microsoft's site which is very annoying, like how it opens another article if you scroll too fast down.

maximilianthe1 3 days ago

Recently learned a little trick for google. Adding `-ai` at the end of query helps. Not much, but something.

radicality 3 days ago

Afaik DDG is just Bing, whereas Kagi is using Google, Bing, (Yandex?) among others - https://help.kagi.com/kagi/search-details/search-sources.htm...

As a Kagi user I actually haven’t encountered much search result spam, surprised you’re seeing enough there to drive you back to Google!

BigGreenJorts 8 hours ago

> Unfortunately, this kind of blocklist is highly subjective. This list blocks MSN.com! That's hardly what I would have chosen.

I'm wondering how much the blacklist can be broken down into categories of spam. Sponsorblock for YouTube has a lot options around the types of things it'll skip through and the user has choice in how they're handled (skipped automatically, prompted to skip, simply highlighted in the scrubbar) at the category level.

rendaw 3 days ago

I get tons when looking up recipes and cooking related information. Things that will say "X can be refrigerated for up to two weeks" then in the next paragraph "X is fine to refrigerate and eat for 2-3 days" or similar.

I'd block them but there seem to be infinite. They're probably buying 10+ character domains using random words/names/phrases in bulk.

econ 2 days ago

I was just thinking... Depending on the type of articles one can pretty decently describe what makes it a good one. Recipes should be short texts that may link to a gallery, a video and to a text about it. They should have a section called ingredients and one for preparation and may have an author and a date. Research articles should cite sources elaborately.

LeoPanthera 3 days ago

It's not going to be long before we need to move to a whitelist model, rather than a blacklist model.

It ironically makes me think of the Yahoo Web Directory in the 90s.

Time is a flat circle.

dredmorbius 3 days ago

Yes and no.

Power-law relations mean that a small number of domains will account for the lion's share of low-relevance results, and filtering those out will result in dramatic improvements in relevance.

That small set is probably fairly dynamic, however, and will likely change at a fairly high rate over time.

Penny-ante sites are less likely to appear in generic results, but might well be whatever the spam/phish term is for junk general Web search results.

We may well come to rely more on whitelisting, but I think at least for now that's not necessary, largely due to the dynamics of publishing / attention economies themselves.

manx 3 days ago

This. I think, well curated web directories (by humans and machines) deserve a comeback.

OlivOnTech 3 days ago

Ringz 3 days ago

Installed! This should not be a function of the search engine nor a plugin. This should be integrated in the browser.

Another great function (not for this plugin) should be the option to "bundle" all search results from the same domain. Stuff them under one collapsible entry. I hate going through lists and pages of apple/google/synology/sonos/crab urls when I already know that I have to search somewhere else.

troyvit 3 days ago

You could get a step closer to that and integrate it into your DNS: https://github.com/StevenBlack/hosts

The upside is that it would go beyond your browser to anything on your machine that makes a DNS request.

> Another great function (not for this plugin) should be the option to "bundle" all search results from the same domain. Stuff them under one collapsible entry.

That would be really cool. Just zip it up if you don't want to see that domain for that specific search.

antithesis-nl 4 days ago

So, if you already run uBlock Origin (and of course you are), you can use this list without installing any additional extensions by going to 'Filter lists' in the uBlock settings, then Import, then enter https://raw.githubusercontent.com/popcar2/BadWebsiteBlocklis... as the URL.

Not saying you should, just that you could...

popcar2 4 days ago

I think this would block you from visiting the websites, but they'd still show up on search results. uBlacklist doesn't block them, but rather just hides them for search engines which IMO is a better approach.

antithesis-nl 4 days ago

Yeah, I just tested this, and you're right. Going to google.com and entering solveyourtech as a search term, did indeed still return their site as a result.

On clicking it, uBlock blocked my visit, but that may or may be not enough for you, in which case an additional plugin may be warranted.