Module 2: External Attack Surface Discovery
Overview
Your external attack surface is everything an adversary can see about your organization without authenticating -- domains, subdomains, login portals, exposed services, and the technologies behind them. In Module 1, we established that publicly discoverable information enabled every attack we reviewed. This module teaches you how to discover that information systematically, the same way an adversary would.
What Adversaries See
External reconnaissance requires no credentials and no special access. An adversary with a web browser and free tools can discover:
- Domains and subdomains -- every registered domain and SSL certificate your organization has issued reveals infrastructure
- Login portals -- VPN, webmail, remote desktop, and application login pages are visible to anyone who finds them
- Technology stack -- HTTP headers, certificate details, and login page appearance reveal specific products and versions
- Development and staging environments -- subdomains like dev., staging., or test. often have weaker security controls
Discovery Techniques
This module uses four complementary techniques, all browser-based:
| Technique | What It Finds | Tools |
|---|---|---|
| Certificate transparency | Subdomains from SSL certificate issuance logs | crt.sh |
| Subdomain enumeration | Subdomains with IP addresses, hosting providers, infrastructure mapping | Subdomain Finder |
| DNS enumeration | DNS records, mail servers, name servers, network maps | DNSDumpster, SecurityTrails |
| Internet device search | Exposed services, banners, product identification | Shodan, Censys |
| Search engine dorking | Indexed login pages, documents, configuration files | Google advanced search operators |
Naming Conventions That Reveal Infrastructure
Organizations often use predictable subdomain naming patterns. These are valuable reconnaissance targets because they indicate what is behind each service:
| Pattern | What It Suggests |
|---|---|
| vpn., remote., access. | VPN or remote access portals |
| citrix., rdp., vdi. | Virtual desktop / remote desktop services |
| mail., owa., webmail. | Email services (login portals) |
| dev., staging., test. | Development/staging environments (often less secured) |
| plant1., substation-west., site- | Facility-specific infrastructure (reveals OT network presence) |
| api., portal., dashboard. | Application interfaces (potential data exposure) |
Lab: Build Your External Attack Surface Inventory
In this lab, you will discover your target organization's external attack surface using browser-based tools and AI-assisted analysis. Follow each step and document your findings -- the output becomes Artifact 2.
Step 1: Target Selection
Select your target organization for this exercise:
- Your own organization (with authorization) -- provides the most operationally relevant results
- NRECA (practice target) -- follow along using the worked examples included in each step
Identify all root domains you will investigate. Most organizations have more than one -- check the main website, email domain, and any known portals. The primary website domain often reveals the least infrastructure; member portals, internal service domains, and email domains are typically far richer targets.
Step 2: Domain and Subdomain Enumeration
This step uses two complementary approaches: certificate transparency (CT) logs reveal subdomains from SSL certificate issuance records, and subdomain enumeration tools aggregate results from multiple sources with IP address data. Together they provide comprehensive domain discovery.
2a. Certificate transparency search
Certificate transparency logs record every SSL/TLS certificate issued by public certificate authorities. When an organization requests a certificate for a subdomain, that subdomain becomes permanently recorded in CT logs -- even if the subdomain is not linked from any public page.
crt.sh provides a JSON API that returns structured certificate data. Appending &output=json to any crt.sh query returns JSON instead of an HTML table -- this makes it easy to feed results directly to your AI client for analysis.
Query the JSON endpoint for each of your root domains:
https://crt.sh/?q=%25.[yourdomain.com]&output=json
Save the JSON response to a local file using your browser's Save Page As (Ctrl+S / Cmd+S). Name it something descriptive like yourdomain_crt.json. Then attach or paste the saved JSON file into your AI client with this prompt:
I am analyzing certificate transparency data from crt.sh for
[organization name]. The attached/pasted JSON contains certificate
records for %.[yourdomain.com].
Analyze this data and provide:
1. A deduplicated list of all unique subdomain and domain names
2. Naming patterns and conventions the organization uses
3. Subdomains suggesting admin interfaces, login portals,
VPN/remote access, dev/staging environments, or API endpoints
4. Any wildcard certificates and what infrastructure they might
cover
5. Certificate authorities used (reveals vendor relationships
and hosting providers)
6. Anything unusual or potentially sensitive that a human reviewer
would miss scrolling through hundreds of certificate rows
Output a clean categorized subdomain list organized by
function/risk category.
Timeout fallback: crt.sh JSON responses can be slow for domains with large result sets. If the endpoint times out or returns an error, use the standard HTML view at https://crt.sh/?q=%25.[yourdomain.com], select all results in the table (Ctrl+A), copy (Ctrl+C), and paste into your AI client instead.
2b. Subdomain enumeration
CT logs reveal certificate-level data but may miss subdomains covered by wildcard certificates. Subdomain enumeration tools aggregate results from multiple sources and provide IP address data that CT logs do not include.
subdomainfinder.c99.nl is a subdomain enumeration tool that exports JSON with subdomain, IP address, and Cloudflare status for each result:
{"subdomain":"portal.example.com","ip":"203.0.113.50","cloudflare":"false"}
Search each of your root domains and save the JSON results. Use additional DNS enumeration tools to supplement:
| Tool | What to Do | Account Required |
|---|---|---|
| Subdomain Finder | Search each root domain -- export/save JSON results (includes IP addresses and Cloudflare status) | No |
| DNSDumpster | Enter root domain -- review DNS records, subdomains, and network map | No |
| SecurityTrails | Search domain -- review historical DNS records and subdomain history | Free tier |
2c. AI-assisted analysis of combined results
Attach or paste the saved JSON files from all enumerated domains into your AI client. The combined analysis across all root domains is where the most valuable patterns emerge:
I am analyzing subdomain enumeration data for [organization name],
a [sector] organization. The attached/pasted JSON files contain
subdomain records for [domain1.com] and [domain2.com]. Each record
includes subdomain name, IP address, and Cloudflare status.
Analyze the combined data and provide:
1. All unique subdomains grouped by root domain
2. IP address clustering -- which subdomains share the same IP
(suggests shared hosting or load balancer)
3. Authentication and identity infrastructure (SSO portals,
federation servers, access management)
4. Remote access or VPN-related subdomains
5. API endpoints and what backend systems they suggest
6. Staging, test, or development environments
7. Blue/green or canary deployment patterns that reveal
infrastructure architecture
8. Subdomains revealing sensitive business functions (benefits,
payroll, financial, compliance, HR)
9. Which subdomains use cloud provider IPs vs on-premise IP
ranges
10. Which root domain reveals the most infrastructure detail
and why that matters for reconnaissance
Output a categorized inventory organized by: authentication/
identity, remote access, benefits/HR, financial/compliance,
APIs/backend, staging/test, content/public, and other.
Worked Example: NRECA Domain and Subdomain Enumeration
Certificate Transparency: electric.coop
Query the crt.sh JSON API for NRECA's primary domain:
https://crt.sh/?q=%25.electric.coop&output=json
Save the JSON response and attach it to your AI client using the CT JSON Analysis Prompt above.
Example AI Response (CT JSON Analysis for electric.coop)
Certificate Transparency Analysis: electric.coop
Unique Domains/Subdomains (deduplicated):
Public-Facing / Content:
- electric.coop -- primary public website
- www.electric.coop -- www redirect
- remagazine.cooperative.com -- RE Magazine content platform
- careers.electric.coop -- job postings and recruitment
- nrecainternational.coop -- international programs site
Member/Employee Portals (Login Required):
- cooperative.com -- member portal (SharePoint-based)
- benefits.cooperative.com -- employee benefits enrollment
- resap.cooperative.com -- Rural Electric Safety Achievement Program
- ecba.cooperative.com -- Electric Cooperative Bar Association
Email / Communication:
- nreca.coop -- email domain (confirmed format: firstname.lastname@nreca.coop)
Naming Patterns Identified:
- Program-specific subdomains under cooperative.com (benefits, resap, ecba) -- suggests each internal program gets its own subdomain
- Public content uses electric.coop; internal/member services use cooperative.com -- clear split between public and authenticated surfaces
- Email on a separate domain (nreca.coop) from web properties
Wildcard Certificates:
- *.cooperative.com -- covers all subdomains under cooperative.com. This means additional subdomains may exist that are not individually logged in CT. Consider DNS enumeration tools to find subdomains covered by this wildcard.
- *.electric.coop -- same pattern; additional subdomains may be hidden behind the wildcard
Certificate Authorities Used:
- DigiCert -- used for primary domains, suggests enterprise certificate management
- Let's Encrypt -- used for some subdomains, may indicate cloud-hosted or developer-managed services
- Sectigo -- used for specific portals
Notable Observations:
- The wildcard certificates for *.cooperative.com and *.electric.coop mean the CT logs undercount the actual subdomain footprint. DNS enumeration (DNSDumpster, SecurityTrails) should be used to find subdomains covered by these wildcards.
- The split between DigiCert (enterprise) and Let's Encrypt (free/automated) may indicate some services are managed outside the primary IT team -- worth investigating for shadow IT.
- Benefits portal (benefits.cooperative.com) is a high-value target -- benefits systems typically contain SSNs, banking details, and salary data.
Subdomain Enumeration: cooperative.com vs electric.coop
NRECA operates across at least three root domains: electric.coop (public website), cooperative.com (member services), and nreca.coop (email). Searching each with Subdomain Finder reveals a critical lesson:
- Search cooperative.com: returns 100+ subdomains with rich infrastructure detail
- Search electric.coop: returns only ~2 subdomains (www, careers)
The public-facing website domain (electric.coop) reveals almost nothing, while the member-services domain (cooperative.com) exposes the entire internal infrastructure. An adversary who only searches the primary website domain misses most of the attack surface -- and so will a defender who does the same.
Example AI Response (Subdomain Analysis for NRECA)
Subdomain Enumeration Analysis: NRECA
Domain Contrast: electric.coop returned only 2 subdomains (www.electric.coop, careers.electric.coop). cooperative.com returned 100+ subdomains exposing authentication infrastructure, benefits systems, financial portals, APIs, and staging environments. The member-facing domain is far richer for reconnaissance than the public website.
Authentication / Identity:
- okta.cooperative.com -- Okta SSO portal. Reveals the organization uses Okta as its identity provider. High-value target: compromising SSO provides access to all federated applications.
- pingfed.cooperative.com -- PingFederate identity federation server. Indicates a dual-IdP architecture (Okta + Ping) or migration between platforms.
- pingaccess.cooperative.com -- PingAccess API security gateway. Controls access to web applications and APIs behind the Ping identity stack.
Benefits / HR (High-Value PII Targets):
- benefits.cooperative.com -- primary benefits portal
- beneficiaries.cooperative.com -- beneficiary designation (life insurance, retirement accounts)
- cobra.cooperative.com -- COBRA benefits continuation
- divorce.cooperative.com, marriage.cooperative.com, newchild.cooperative.com -- life event processing portals (contain highly sensitive personal data)
- w2salary.cooperative.com -- W-2 and salary data (tax documents, compensation)
- retirement.cooperative.com -- retirement/pension management
This cluster represents a massive PII exposure surface. Benefits systems typically contain SSNs, banking details, salary data, and family information. Each subdomain is a potential credential stuffing target.
Financial / Compliance:
- form990.cooperative.com -- IRS Form 990 (nonprofit tax filings, reveals financial data)
- financialpower.cooperative.com -- financial services portal
- invoicepreferences.cooperative.com -- invoice management (potential for business email compromise targeting)
APIs / Backend:
- Multiple *api.* subdomains revealing backend service architecture
- API endpoints suggest microservices architecture with dedicated backends for member services, content delivery, and internal tooling
Staging / Test (Often Less Secured):
- stage.cooperative.com, stagesearch.cooperative.com -- staging environment
- test.community.cooperative.com -- test instance of community platform
- pae-ext-test.cooperative.com, pfe-ext-test.cooperative.com -- external test instances of PingAccess and PingFederate (identity infrastructure test environments are especially sensitive)
Staging and test environments frequently have weaker authentication, default credentials, or disabled security controls. Identity infrastructure test instances (pae-ext-test, pfe-ext-test) are high-priority findings.
Infrastructure Architecture:
- Blue/green deployment: Extensive blue.* subdomain pattern (blue.benefits, blue.retirement, etc.) reveals blue/green deployment architecture. This tells an adversary how the organization manages releases and where redundant infrastructure exists.
- IP clustering: The majority of cooperative.com subdomains resolve to 74.127.88.162, suggesting centralized hosting or a load balancer fronting most services. A small number of subdomains resolve to different IPs, indicating cloud-hosted or externally managed services.
Content / Public:
- electric.coop -- public website, WordPress
- careers.electric.coop -- job postings
- remagazine.cooperative.com -- RE Magazine
- nrecainternational.coop -- international programs
Key Takeaway: Searching only electric.coop would have found 2 subdomains. Searching cooperative.com revealed authentication infrastructure, benefits/PII systems, financial portals, staging environments, and deployment architecture. Always enumerate all known root domains.
Validation Checkpoint
Compare the AI-generated inventory against what is identifiable from NRECA's public website and contact pages:
| Domain | Purpose | Notes |
|---|---|---|
| electric.coop | Public website (WordPress) | News, advocacy, public-facing |
| cooperative.com | Member portal (SharePoint) | Login required -- attack surface |
| nreca.coop | Email domain | Confirmed format: firstname.lastname@nreca.coop |
| benefits.cooperative.com | Employee benefits portal | Login required -- sensitive |
| careers.electric.coop | Job postings | Reveals tech stack, org structure |
| resap.cooperative.com | Safety program portal | Member login |
| ecba.cooperative.com | Bar Association portal | Member login |
| remagazine.cooperative.com | RE Magazine | Content platform |
| nrecainternational.coop | International programs | Separate org site |
Manual website review found 9 domains. Subdomain enumeration of cooperative.com alone found 100+, including authentication infrastructure (Okta, PingFederate), benefits/PII systems, financial portals, and staging environments that do not appear anywhere on the public website. Enumeration always finds infrastructure that manual review misses.
Step 3: Remote Access Identification
Use internet device search engines to identify what services are exposed on your target's known domains or IP ranges. Look for:
- Web servers -- identified by HTTP response headers (Server, X-Powered-By)
- Mail servers -- SMTP, IMAP, Exchange endpoints
- VPN login portals -- identified by product-specific login pages (FortiGate, Cisco ASA, Palo Alto GlobalProtect)
- Load balancers and CDN endpoints -- may indicate cloud-hosted infrastructure
Manual verification required. This step must be performed using browser-based tools. AI clients cannot access Shodan, Censys, or live internet infrastructure -- they will generate plausible but fabricated results if asked to identify exposed services. Use AI for analyzing data you have already collected (Steps 2 and 4), not for discovering live services.
| Tool | What to Do | Account Required |
|---|---|---|
| Shodan | Search by domain, hostname, or organization name -- review banners for product identification | Free tier |
| Censys | Search by domain or IP -- review certificates and service details | Free tier |
Focus on identifying:
- VPN login portals -- FortiGate, Cisco ASA/AnyConnect, Palo Alto GlobalProtect, SonicWall
- Remote desktop services -- RDP, Citrix, VMware Horizon
- Vendor remote support tools -- TeamViewer, AnyDesk, ConnectWise ScreenConnect
- Web application login pages -- OWA, SharePoint, custom applications
Organizations with Shodan or Censys API access (paid tiers) can automate and scale these queries programmatically -- that is a natural next step beyond this workshop.
3a. Documenting discovered assets
The value of this step is not just finding exposed services -- it is recording them in a way that supports ongoing tracking and remediation. For each discovered asset, capture information across three categories:
Asset identification:
- Hostname / FQDN
- IP address
- Port and protocol
- Product vendor and name
- Version or firmware (where identifiable from banners, headers, or login pages)
- Certificate details: issuer, expiration date, Subject Alternative Name (SAN) entries
Ownership and responsibility:
- Asset owner or responsible team
- Business unit or facility
- IT vs OT designation
- Contact for patching or change management
Operational context:
- Business function or purpose
- Criticality to operations (a perimeter firewall protecting a substation vs a marketing blog)
- Network zone: internet-facing, DMZ, dual-homed
- Managed by internal staff or a third-party vendor/integrator
You will not be able to fill in every field from external discovery alone -- ownership and some operational context require internal knowledge. Record what you can from Shodan/Censys results and flag the gaps. The complete baseline template is provided in the Templates section.
3b. Connecting to asset management
Many organizations maintain IT asset inventories in platforms like ServiceNow, but OT assets are frequently absent from these systems. External discovery often reveals OT-adjacent assets -- VPN concentrators serving SCADA networks, historian web interfaces, vendor remote access portals -- that should be tracked alongside IT assets.
If your organization has an asset management system, cross-reference your external findings against it. Gaps between what you discovered externally and what appears in your inventory are themselves a finding -- they represent assets that exist on the internet but are not being tracked, patched, or managed through your standard processes.
Worked Example: NRECA Remote Access Documentation
Here is how a single discovered service would be documented using the three-category framework above:
Example: FortiGate VPN Portal
Hostname: vpn.example-coop.com | IP: 203.0.113.50 | Port: 443/TCP | Product: Fortinet FortiGate, FortiOS 7.4.6 | Cert: DigiCert, expires 2026-09-15, SAN includes vpn.example-coop.com | Owner: IT Security (confirm) | Facility: Main office | Designation: IT perimeter | Function: SSL-VPN remote access for staff and vendors | Criticality: High -- primary remote access path to internal network | Zone: Internet-facing | Management: Internal IT staff
This FortiGate finding connects directly to Module 4, where you will correlate the product and version against vulnerability databases.
Step 4: Consolidation and Google Dorking
At this point you have three sets of findings: certificate transparency data, subdomain enumeration with IP mapping, and manually verified exposed services with product identification. This step uses Google dorking to find resources that the other techniques miss, then consolidates everything into a single baseline.
4a. Google dorking
Search engine dorking uses Google's advanced search operators to find pages and files that the organization may not intend to be publicly accessible. Use your AI client to generate targeted dork queries based on your discoveries so far:
I am performing external attack surface discovery on [organization
name], a [sector] organization. I have identified these root
domains: [domain1.com, domain2.com, domain3.com].
Based on these domains, generate Google dork queries to find:
1. Login pages and authentication portals
2. Exposed documents (PDF, DOCX, XLSX) that may contain
sensitive information
3. Directory listings that reveal file structures
4. Error pages or debug output that reveals technology stack
5. Configuration files, backup files, or admin panels
6. Pages referencing VPN, remote access, or SCADA/OT systems
Format each query so I can paste it directly into Google.
Run the generated queries in Google and review the first 2-3 pages of results. Add any new findings to your inventory. Google dorking frequently surfaces resources that subdomain enumeration and Shodan/Censys miss -- particularly indexed documents, forgotten login pages, and configuration files.
4b. Consolidated analysis
Feed your combined findings from all steps into your AI client to produce the final attack surface inventory:
I have completed external attack surface discovery for
[organization name]. Here is a summary of all findings:
CERTIFICATE TRANSPARENCY (crt.sh):
[paste or summarize CT findings]
SUBDOMAIN ENUMERATION:
[paste or summarize subdomain/IP findings]
EXPOSED SERVICES (Shodan/Censys):
[paste or summarize service discovery findings]
GOOGLE DORKING:
[paste or summarize any additional findings]
Consolidate all findings into a single attack surface baseline:
1. A categorized inventory of all discovered assets organized by:
remote access, authentication/identity, email, web applications,
benefits/HR/financial, APIs, staging/test, content/public
2. For each category, note the highest-risk assets and why
3. Gaps in my discovery -- what should I investigate further
4. A prioritized list of assets to carry forward to vulnerability
correlation (Module 4) and monitoring (Module 5)
5. A summary table suitable for use as a baseline document
Review and correct the AI-generated baseline. This consolidated inventory becomes your Artifact 2.
Worked Example: NRECA Consolidation and Google Dorking
Using the Google Dork Query Generation Prompt above with NRECA's three root domains (electric.coop, cooperative.com, nreca.coop):
Example AI Response (Google Dork Queries for NRECA)
Google Dork Queries for NRECA Domains
Login Pages and Authentication Portals:
site:cooperative.com inurl:loginsite:cooperative.com intitle:"sign in" OR intitle:"log in"site:electric.coop inurl:auth OR inurl:sso OR inurl:portal
Exposed Documents:
site:electric.coop filetype:pdf "confidential" OR "internal use only"site:cooperative.com filetype:xlsx OR filetype:csvsite:nreca.coop filetype:docx "draft" OR "not for distribution"
Directory Listings:
site:electric.coop intitle:"index of" OR intitle:"directory listing"site:cooperative.com intitle:"index of" inurl:uploads OR inurl:files
Technology Stack and Debug Information:
site:cooperative.com inurl:error OR inurl:debug OR intitle:"500"site:electric.coop ext:xml OR ext:json OR ext:conf
VPN / Remote Access / OT References:
"electric.coop" OR "cooperative.com" inurl:vpn OR inurl:remotesite:electric.coop "SCADA" OR "EMS" OR "substation" OR "FortiGate""nreca.coop" "remote access" OR "VPN" filetype:pdf
Note: Run each query in Google and review the first 2-3 pages of results. Look for pages that expose internal information, login endpoints not found through subdomain enumeration, or documents that reveal technology choices, vendor relationships, or organizational structure.
Output
Artifact 2: External attack surface inventory. A categorized list of your target organization's domains, subdomains, remote access services, and identified technologies. This inventory feeds directly into Module 3 (identifying which personnel have access to these services), Module 4 (correlating identified products against vulnerability databases), and Module 5 (establishing a monitoring baseline to detect changes).
Record your findings in the Baseline Document Template (download Excel), Section 1: External Attack Surface.