VOOZH about

URL: https://dev.to/gnomeman4201/r4b1th0l3-5aa3

⇱ r4b1t_h0l3 - DEV Community


→ Try it: gnomeman4201.github.io/r4b1t

It's a curated random link generator for security and OSINT researchers. 53,869 verified live URLs. Roll one. See what happens.

basically StumbleUpon for your niche. That's the point. StumbleUpon worked. Nobody built a replacement for it when it died, especially not for security research. So I wanted to make something that breaks you free of that feeling of being contained within the algorithm/eco chambers we all congregate nowadays.


What a real session looks like

I opened the tool this morning. Here's what I did, in order:

  1. cvedb.shodan.io — Shodan's CVE database. Structured vulnerability data, searchable, free.
  2. hnd.techlearningcollective.com — Hackers Next Door, an infosec conference I'd never heard of.
  3. easyperf.net — Performance engineering blog. Low-level, serious, no fluff.
  4. engineeringblog.yelp.com/2014/11/scaling-elasticsearch — 2014 Yelp post on Elasticsearch at scale. Still accurate.
  5. domains-index.com — Domain registration intelligence. OSINT pivoting tool.
  6. metapicz.com — EXIF metadata viewer. Forgot this existed. Bookmarked.
  7. discover.maxar.com — Satellite imagery browser. Geospatial OSINT.
  8. github.com/jamesm0rr1s/BurpSuite-Add-and-Track-Custom-Issues — BurpSuite extension I didn't know existed.
  9. insanecoding.blogspot.co.uk/2014/05/a-good-idea-with-bad-usage-devurandom — Post on /dev/urandom misuse from 2014. Referenced everywhere, never read it until now.
  10. en.wikipedia.org/wiki/Software_development — Wikipedia. In a pool of 53,000 URLs this came up. I hit SPROUT on it anyway.

Ten rolls. Two tools I'm adding to my workflow. One conference I'm looking up. Three things I'd completely forgotten existed.

👁 r4b1t_h0l3 — OG card preview with trail bar

And that is just a few examples… I have found some hidden gems while developing this on my spare time.


How it works — entirely in your browser

(My main goal was to just keep it simple. Bare bones.)

The entire pool loads into a JavaScript array in memory on page open. One 2MB fetch of a plain text file. Everything else runs client-side. No backend decides what you see. No server query on each roll. No tracking.

Rolling a URL:

function ee(pool) {
 let url, attempts = 0;
 do {
 url = pool[Math.floor(Math.random() * pool.length)];
 attempts++;
 const domain = new URL(url).hostname.replace(/^www\./, "");
 if ((domainCount[domain] || 0) >= 2 && attempts < 25) continue;
 if (url !== lastUrl) break;
 } while (attempts < 30);
 return url;
}

Domain diversity is enforced in the roll….you won't see the same domain more than twice in 25 attempts. 14,488 unique domains in the pool.

SPROUT — semantic navigation without AI

(Visually needed something instead of a normal list of useful URLs.)

Hit SPROUT on any URL and get four directional suggestions:

  • DEEPER — further into this niche
  • SIDEWAYS — adjacent territory
  • OPPOSITE — contrasting view
  • WEIRD — unexpected tangent

I originally used the Anthropic API for this. It worked. Then I ripped it out. Paying for tokens was a big pain I had to find a solution for.

What actually happens now:

  1. Read the OG title + description already fetched for the preview card (zero extra cost)
  2. Query Wikipedia's free API for the domain — get the intro extract and article categories
  3. Extract top keywords by frequency, filtered against a stopword list
  4. Score the pool by keyword overlap using Jaccard similarity
function xe(url, keywords) {
 const text = (hostname + pathname).toLowerCase();
 let hits = 0;
 for (const kw of keywords) {
 if (text.includes(kw)) hits++;
 }
 return hits / Math.max(keywords.length, 1);
}

Runs against up to 3,000 randomly sampled URLs. In milliseconds. In your browser. No API key. No cost. No rate limit.

Where it fails: OG metadata is often garbage or generic descriptions, SEO spam, or missing entirely. When that happens SPROUT falls back to URL-only token matching, which is coarser. I accepted this tradeoff over adding an AI dependency. The WEIRD direction is intentionally low-signal… it's supposed to surprise you.

👁 BRANCH mode with SPROUT directions

(I’m aware Long term maintenance of giant pools is always gonna be a nightmare later on. This is the route I chose for the time being.)

The pool — and the link rot problem

Starting corpus: ~120,000 URLs from Start.me OSINT pages and GitHub awesome-lists across 21 categories. Every URL swept with HEAD requests, 10 second timeout, 50 concurrent workers, results checkpointed to SQLite.

What survived: 53,869 verified live URLs across 14,488 unique domains.

Yes, it's my curated bookmarks folder. That's also the point — random across the whole internet is noise. The curation is what makes the randomness useful.

Link rot: A GitHub Actions workflow runs pool_sweep.py every Sunday, hits every URL, and auto-commits the pruned pool. Dead links get culled weekly. It's not perfect — a site can return 200 while serving a parking page — but it catches the obvious rot.

What it doesn't do

No login. No analytics. No recommendation engine. No ads. The Cloudflare Worker handles OG metadata fetching only — origin-locked, rate limited at 60 req/min per IP, RFC1918 blocked. The core loop — roll, visit, skip — works without it entirely.


gnomeman4201.github.io/r4b1t

Source: github.com/GnomeMan4201/r4b1t

Submit a URL: GitHub Issues


badBANANA Research Collective