Black Box, White Box, and Gray Box Testing
The three foundational engagement models in penetration testing — black box, white box, and gray box — define how much information a tester receives about a target environment before and during an assessment. These models shape scope, cost, duration, and the type of vulnerabilities most likely to surface. Understanding where each model applies, and what its structural limits are, is essential for professionals commissioning or delivering security assessments across regulated industries.
Definition and scope
Penetration testing engagements are classified primarily by the level of prior knowledge granted to the testing team. This classification is not merely procedural — it determines which attack paths are exercised, how realistic the threat simulation is, and which compliance requirements the test can satisfy.
NIST SP 800-115, Technical Guide to Information Security Testing and Assessment provides a foundational reference for these distinctions, describing penetration testing in terms of the assessor's starting knowledge state and the degree to which that state mirrors a real-world adversary's position. The three models map to distinct adversary archetypes:
- Black box: The tester receives no prior knowledge of the target's internal architecture, source code, network topology, or credentials. The tester operates as an external threat actor would — with only publicly available information and what can be discovered through active reconnaissance.
- White box: The tester receives full disclosure — network diagrams, source code, system architecture documentation, credentials, and sometimes access to internal staff. This model is also called crystal box or clear box testing.
- Gray box: The tester receives partial information — often a set of user-level credentials, IP ranges, or application role definitions — simulating an insider threat, a compromised account, or a position of limited authenticated access.
These models appear directly in standards referenced by the Payment Card Industry Data Security Standard (PCI DSS v4.0, Requirement 11.4), which requires penetration testing to include both internal and external testing perspectives, and by NIST SP 800-53 Rev. 5, Control CA-8, which lists penetration testing as a required assessment control for federal information systems.
How it works
Each model follows the same general assessment lifecycle but diverges in setup time, reconnaissance effort, and depth of code-level or configuration-level analysis.
A standard engagement proceeds through these phases regardless of model:
- Scoping and authorization — Rules of engagement, target boundaries, and legal authorization documents (required under 18 U.S.C. § 1030, the Computer Fraud and Abuse Act) are established before testing begins.
- Information gathering — In black box tests, this phase is extensive; in white box tests, relevant materials are provided at kickoff, compressing this phase significantly.
- Vulnerability identification — Testers enumerate attack surfaces using both automated tools and manual techniques. White box engagements permit static source code analysis; black box engagements rely on dynamic probing.
- Exploitation and chaining — Testers attempt to exploit identified weaknesses and chain vulnerabilities to demonstrate business impact. Gray box models typically produce the broadest chain coverage because testers can reach authenticated attack surfaces without spending the full effort required to first obtain credentials from scratch.
- Post-exploitation and reporting — Findings are documented with exploitation evidence, risk ratings, and remediation guidance. PTES (Penetration Testing Execution Standard) provides a widely referenced structure for this reporting phase.
The knowledge differential between models produces measurable differences in coverage. Black box tests exercise the same reconnaissance techniques an external attacker would use but may miss deeply embedded logic flaws inaccessible without source code. White box tests surface a higher total vulnerability count — including design-level and code-level flaws — but do not validate whether those flaws are externally reachable. Gray box tests occupy the operational middle ground and are particularly effective at validating authenticated threat scenarios.
Common scenarios
The selection of model is driven by the threat scenario an organization is attempting to simulate and the compliance framework governing the engagement. Professionals reviewing the penetration testing providers on this resource will encounter providers specializing in each model across distinct verticals.
Black box is applied most frequently in:
- External network perimeter assessments where the organization wants to simulate an internet-based attacker with no insider knowledge
- Social engineering engagements where realistic ignorance of internal systems is required
- Pre-launch security assessments for externally facing web applications or APIs
White box is applied most frequently in:
- Secure development lifecycle (SDL) assessments where source code review is a primary deliverable
- FedRAMP authorization support, where NIST SP 800-53 control testing requires documented internal architecture review
- Post-incident forensic validation, where organizations want to confirm whether a known vulnerability class was exploitable across the full codebase
Gray box is applied most frequently in:
- Internal network assessments simulating a compromised user account or malicious insider
- Application testing under PCI DSS Requirement 11.4, where authenticated scanning is required alongside unauthenticated testing
- Cloud environment assessments where the tester receives limited IAM role permissions to evaluate privilege escalation paths
The penetration testing provider network purpose and scope reference on this site maps how provider categories align to these engagement types across industry verticals.
Decision boundaries
Selecting the appropriate model requires evaluating four factors: threat model alignment, compliance requirement specificity, available budget and timeline, and target attack surface type.
| Factor | Black Box | Gray Box | White Box |
|---|---|---|---|
| Threat model | External, unauthenticated attacker | Authenticated insider or compromised credential | Full internal access or developer threat |
| Compliance fit | External perimeter requirements | Authenticated and internal requirements | Code-level and architecture requirements |
| Relative time cost | High (reconnaissance-heavy) | Moderate | Low to moderate (materials provided) |
| Vulnerability coverage | Externally reachable flaws | Authenticated and privilege-escalation flaws | Total — including unexploitable code paths |
A critical decision boundary separates black box from gray box in regulated healthcare environments: HIPAA Security Rule guidance (45 C.F.R. § 164.306) requires evaluation of threats from both external and internal sources. Black box testing alone does not satisfy the internal threat component; gray box or white box testing is required to address insider risk scenarios.
Similarly, organizations pursuing how to use this penetration testing resource as a starting point for provider selection should confirm that a candidate provider explicitly specifies which model governs each test type in their service documentation — ambiguity in this classification is a documented source of scope disputes and compliance gaps in post-assessment audits.
The decision is not always binary. Hybrid engagements — beginning with black box reconnaissance and transitioning to gray box for authenticated testing — are common in comprehensive assessments, particularly those governed by CISA's Cybersecurity Performance Goals or used to satisfy the internal and external testing split required under PCI DSS v4.0.