Self-OSINT: Find Yourself Before They Do
Run the same playbook an adversary would. Map your exposure, score the leaks, and shut the doors that matter.
TL;DR
Run the adversary's reconnaissance playbook on yourself, quarterly, with structure. Six tools, one map keyed by identifier, severity-scored leaks, and a closure queue that sorts itself. You can only defend what you can see.
What you'll be able to do
- ▸A reproducible self-recon procedure you can run every quarter.
- ▸A structured exposure map keyed by email, handle, phone, and name.
- ▸Severity-scored leaks (1–5) so the highest-impact doors close first.
- ▸Working familiarity with Sherlock, Maigret, Holehe, HIBP, Epieos, and IntelX.
- ▸A diff workflow so this quarter's findings compare cleanly to last quarter's.
Prerequisites
- ·The Gray Man, outer-ring closures already underway.
- ·Comfort with a terminal and a few small CLI tools.
- ·An isolated environment (VM, Tails, or fresh browser profile).
Threat model
The opportunistic attacker who buys a credential dump, runs automated reconnaissance, and joins the results into a profile of you. Same playbook, run preemptively. Not nation-state adversaries with legal process, that's a different guide.
Somebody, somewhere, has already done the reconnaissance. Not a human, a script. It scraped your username off an old gaming forum, joined it to a breach record from a defunct startup, joined that to the email you used for newsletter signups in 2014, and now a row exists in a database you will never see, sold to anyone with twelve dollars and a use case. The cost of looking at you fell to near-zero a decade ago. The cost of aggregating what was looked at fell with it.
The defender's problem is that the attacker has the map and you don't. They can see the join keys, the email that ties three accounts together, the handle you reused across a decade, the avatar hash that links a pseudonymous profile to your LinkedIn - because they ran the recon. You didn't. So you are defending blind: closing doors you can see while the doors you can't see stay open.
Self-OSINT is the discipline of running the adversary's playbook on yourself, on a schedule, with structure. Same tools, same order, same join keys. The output is a map: every public fact about you, what it links to, and how badly it hurts if joined. From that map a closure queue falls out for free.
You cannot defend what you cannot see. The map is the work.
By the end of this guide you will have a reproducible self-recon procedure you can run quarterly, a structured exposure map keyed by identifier, a severity score on each leak so the worst doors close first, and a working familiarity with the same six tools any opportunistic operator reaches for. That is the entire promise. It is also more than 95% of internet users have.
§ 01
The recon mindset.
Reconnaissance has two flavours. Passive recon touches only third parties, search engines, public breach indices, broker mirrors, and never tells the target it happened. Active recon touches the target directly: portscans, login probes, password resets that trigger emails. When you are the target, stay passive. You don't need to trip your own alarms to learn what's outside.
The order matters. An operator starts wide and cheap, Google, username enumeration, HIBP, and only escalates to paid corpora or manual digging if the wide pass leaves blind spots. Run it in the same order. The first hour produces 80% of the map.
§ 02
Identifiers: the join keys.
Every join in an attacker's database happens on one of a small set of keys. Memorize them, they are the only things you actually have to defend.
| Key | What it links | Why it leaks | Defence |
|---|---|---|---|
| Email address | Accounts, breaches, password-reset graph | Reused across every signup for a decade | Per-service aliasing |
| Username / handle | Cross-platform identity (the same name on 40 sites) | Forum culture; vanity; muscle memory | Distinct handle per persona; retire old ones |
| Phone number | Real-name brokers, SIM-swap surface, 2FA | Carriers and apps sell it; you give it for 2FA | Real number only for bank/gov/family |
| Real name + DOB | Public records, KYC, broker spine | Required by governments and banks | Cannot defend; must minimize disclosure |
| Avatar hash / photo | Reverse image search across platforms | Same headshot used everywhere | Distinct avatars per persona; strip EXIF |
| Home address | Brokers, voter rolls, court records | Public records; old leases; package deliveries | Broker opt-outs; PO box for non-essential |
Notice the asymmetry: the keys you can rotate (email, handle, avatar) are cheap to defend; the keys you can't (name, DOB, address) demand discipline about where you disclose them. The map you are about to build will show you exactly which keys are doing the most damage right now.
§ 03
The toolkit.
Six tools cover the wide pass. None are obscure; all are free or cheap. Install them in an isolated environment, a throwaway VM or a Tails session, so the artefacts of running them don't sit in your normal shell history.
Sherlock
ref ↗Username enumeration · CLI · Python
Checks ~400 sites for a given handle. Fast, noisy, ~10% false-positive rate. The first thing you run.
Maigret
ref ↗Username enumeration + extraction · CLI
Sherlock's deeper cousin: ~3000 sites, pulls profile metadata when present. Slower; better signal.
Holehe
ref ↗Email-to-account enumeration · CLI
Tells you which of ~120 services have an account on a given email. Uses password-reset side channels.
Have I Been Pwned
ref ↗Breach corpus index · web + API
Authoritative public breach index. API key is $4/mo. Run every email you've ever used.
Epieos
ref ↗Email/phone OSINT · web
Reverse-lookup against Google, Skype, social platforms. Free tier covers the basics.
IntelX
ref ↗Paid breach + paste corpus · web
Indexes pastes, leaks, and dark-web dumps HIBP doesn't see. Use sparingly; the free tier is enough for self-recon.
§ 04
The procedure.
Sixty to ninety minutes, start to finish. Don't multitask; findings are sensitive and you want them all in one place.
- STEP 01
Set up an isolated workspace.
A throwaway VM (Ubuntu in VirtualBox), a Tails USB, or, at minimum, a fresh browser profile with no logged-in accounts. You are about to type your real identifiers into tools whose telemetry you don't fully control. Don't do it from your daily driver.
- STEP 02
Enumerate handles.
Run Sherlock first for breadth, then Maigret for depth. Pipe the output to a file, you will diff it next quarter.
▌ handles.sh# Sherlock, fast breadth pipx install sherlock-project sherlock yourhandle --print-found --output sherlock-yourhandle.txt # Maigret, slower depth pipx install maigret maigret yourhandle --html --folderoutput ./maigret-yourhandle # Pro tip: also run common variants (yourhandle1, your_handle, ...) for h in yourhandle yourhandle1 your_handle yourhandle.real; do sherlock "$h" --print-found --output "sherlock-$h.txt" done
↳ Username pass. Repeat for every handle you've used in the last decade. - STEP 03
Enumerate emails.
For every email address you've used in the last ten years, run Holehe for account discovery and HIBP for breach history. Old, abandoned addresses are the highest-value finds, they usually contain credentials you've forgotten you set.
▌ emails.shpipx install holehe EMAIL="you@example.com" HIBP_KEY="hibp_..." # Account discovery holehe "$EMAIL" # Breach history (structured) curl -s "https://haveibeenpwned.com/api/v3/breachedaccount/$EMAIL?truncateResponse=false" \ -H "hibp-api-key: $HIBP_KEY" \ -H "user-agent: self-recon" \ | jq -r '.[] | "\(.BreachDate) \(.Name) \(.DataClasses | join(", "))"' \ | tee "hibp-$EMAIL.txt" # Paste appearances curl -s "https://haveibeenpwned.com/api/v3/pasteaccount/$EMAIL" \ -H "hibp-api-key: $HIBP_KEY" \ -H "user-agent: self-recon" | jq .↳ Email pass. The HIBP key is $4/mo and worth it for the structured output. - STEP 04
Pivot on the phone.
Search your phone number in Epieos (which checks Google account linkage and WhatsApp/Telegram presence) and in quotes on Google. Then check the major US/EU people-search sites for any entry that pairs the number with your name or address, that's the broker spine.
- STEP 05
Reverse-image-search the avatars.
Pull every profile photo you've used in the last five years and run each through Google Images, Yandex (better at faces than Google), and TinEye. Any match between a "real" profile and a "pseudonymous" one is a join key you didn't know existed.
- STEP 06
Dork yourself on Google.
Search operators turn the public web into a database. Run the patterns below for each identifier; pages four through ten are where the old, forgotten leaks live.
▌ dorks.txt# Exact-string handle hunting "yourhandle" "yourhandle" -site:linkedin.com -site:github.com # Email leakage on paste sites and forums "you@example.com" "you@example.com" site:pastebin.com OR site:ghostbin.com # Documents that shouldn't be indexed "Your Real Name" filetype:pdf "Your Real Name" filetype:xlsx OR filetype:docx # Resume / CV exposure "Your Real Name" "resume" OR "curriculum vitae" # Forum signatures, old profiles "yourhandle" intext:"member since" # Address / phone in public records "123 Real St" "Your Real Name"
↳ Run each pattern, then page past the first three pages, that's where the old leaks sit. - STEP 07
Check the breach mirrors (carefully).
With your own emails only, query IntelX and, if you must , Dehashed/Snusbase. The goal is to see which credentials of yours are floating in dumps. If a password appears, the account it belongs to is compromised. Treat any returned password as burned forever, on every service where you reused it.
§ 05
Building the exposure map.
Findings are noise until they have structure. The map is one table, six columns, one row per finding. CSV, Markdown, or a spreadsheet, the format doesn't matter as long as the schema is the same every quarter so you can diff cleanly.
identifier,source,fields_leaked,severity,action,status you@example.com,LinkedIn 2012 breach,"email, hashed pw",4,rotate-password,done you@example.com,Adobe 2013 breach,"email, plaintext hint",5,rotate-password,done oldhandle,GameForum (Sherlock),"handle, signature, join date",2,pseudonymize,todo +15551234567,Spokeo,"phone, address, relatives",5,opt-out,in-progress johndoe1990,reddit (Maigret),"handle, post history",3,monitor,todo avatar.jpg,LinkedIn + pseudonymous Twitter,"face match",4,unique-avatar,todo 123 Real St,County recorder,"address, dob, spouse",5,cannot-close,accepted
Severity is not a guess. Use a fixed rubric so this quarter's score is comparable to next quarter's.
| Score | Meaning | Example |
|---|---|---|
| 5, Critical | Credential or PII that enables account takeover or doxxing today | Plaintext password in a recent breach; home address on a broker page |
| 4, High | Strong join key that links personas or enables targeted phishing | Reused email across pseudonymous and real accounts |
| 3, Moderate | Useful context for an attacker; not directly exploitable | Old forum post revealing employer or hometown |
| 2, Low | Public by design; minor surface area | Active LinkedIn profile with current job |
| 1, Noise | False positive or fully expected disclosure | Your name on your own published website |
§ 06
The closure queue.
The map sorts itself into a queue. Rank by severity × (1 / effort) , work the high-impact, low-friction items first. The 5s that only need one form submission go before the 4s that need a weekend.
§ CHECKLIST, Triage rules for the queue
§ 07
Cadence: quarterly, with diffs.
Self-recon is not a one-time event because the input is not static. New breaches drop. New accounts get created. Brokers re-add you ninety days after you opt out. The discipline is the cadence, not the spike of effort.
§ CHECKLIST, Quarterly re-run, what to look for in the diff
# Last quarter sort sherlock-yourhandle-2026Q1.txt > /tmp/q1.txt # This quarter sort sherlock-yourhandle-2026Q2.txt > /tmp/q2.txt # New hits to triage comm -13 /tmp/q1.txt /tmp/q2.txt # Disappeared hits (closure worked, or the site went down) comm -23 /tmp/q1.txt /tmp/q2.txt
§ 08
What this does NOT find.
The honest section. The wide pass above catches what opportunistic operators catch. It does not catch what a determined, funded, or law-enforcement adversary can reach.
✓ PROTECTS AGAINST
- +Public breach corpora indexed by HIBP and IntelX.
- +Cross-platform identity links via shared handles, emails, and avatars.
- +Most US/EU data-broker exposure with name, phone, or address as a key.
- +Public-web disclosures: forum posts, leaked documents, cached pages.
- +Drift over time: new leaks and new account creep, caught quarterly.
✗ DOES NOT PROTECT AGAINST
- −Sealed court records, government databases, and private credit-bureau files.
- −Broker-internal data that isn't published on their free preview pages.
- −Private breach corpora traded on closed forums you don't have access to.
- −Telecom CDRs, financial transaction logs, and any data held under legal process.
- −Anything you've disclosed offline: paper forms, voice calls, in-person.
- −Future leaks that haven't happened yet, only cadence catches those.
§ 09
Going further.
The map you just built is the input to every other discipline. With the queue in hand, three guides do the actual closures.
IDENTITY HARDENING
YubiKeys & Hardware 2FA →For every account a credential leak touched.
SELF-HOSTING
Your First Private Server →Stop being a tenant on someone else's stack.
AMNESIC RECON
Tails: The Amnesic Machine →A clean environment for the sensitive passes.
§ REFERENCES