Personally Identifiable Information: Definitions and Scope

Personally identifiable information (PII) sits at the center of US data protection law, triggering compliance obligations across federal agencies, regulated industries, and state-level privacy regimes. The definition of PII is not uniform across all frameworks — it shifts depending on the statute, regulatory body, and sector involved. This page maps the authoritative definitions, scope boundaries, classification types, and operational scenarios governing PII under major US regulatory instruments.

Definition and scope

The National Institute of Standards and Technology (NIST) provides one of the most widely referenced federal definitions. NIST Special Publication 800-122 defines PII as "any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any information that is linked or linkable to an individual, such as medical, educational, financial, and employment information."

The Office of Management and Budget (OMB) reinforces this framing in Memorandum M-07-16, which directed federal agencies to review PII holdings and implement safeguarding procedures. The Federal Trade Commission applies a functionally similar scope in its enforcement of Section 5 of the FTC Act, treating data that identifies or is reasonably linkable to a consumer as subject to unfair or deceptive practices oversight — a standard covered in detail at FTC Data Security Enforcement.

Two structural classification types govern how PII is analyzed:

This linked/linkable distinction, formalized in NIST SP 800-122, determines the sensitivity tier applied to a dataset and shapes the controls required under frameworks such as the NIST Privacy Framework.

How it works

PII classification follows a structured assessment process that organizations and agencies apply across data inventories. The general sequence operates in four phases:

  1. Identification: Catalog all data elements collected, stored, or transmitted. This inventory distinguishes raw identifiers (name, SSN, driver's license number) from contextual identifiers (IP addresses, device IDs, behavioral profiles).
  2. Classification: Apply the linked/linkable test to each element, accounting for whether the data exists in isolation or in combination with other fields. A standalone zip code is linkable; a zip code paired with age and gender may effectively become linked.
  3. Sensitivity scoring: Assign a sensitivity level based on potential harm from unauthorized disclosure. NIST SP 800-122 identifies harm categories including embarrassment, discrimination, financial loss, and physical safety risk.
  4. Control mapping: Assign security and privacy controls proportionate to the sensitivity level. Federal agencies operating under the Federal Information Security Modernization Act (FISMA) map controls through NIST SP 800-53 control families, particularly the Privacy (PT) and Access Control (AC) families.

Sector-specific frameworks overlay additional requirements. Under HIPAA Data Protection Requirements, the 18 defined Protected Health Information (PHI) identifiers form a statutory PII subset with stricter deidentification standards under the Safe Harbor and Expert Determination methods (45 CFR §164.514). Under the Gramm-Leach-Bliley Act, "nonpublic personal information" (NPI) constitutes financial-sector PII governed by the FTC's Safeguards Rule (16 CFR Part 314), which was updated with revised requirements taking effect in 2023.

Common scenarios

PII exposure and misclassification arise in predictable operational patterns across sectors:

Healthcare records transfer: Patient records moving between providers may contain 3 or more of HIPAA's 18 PHI identifiers simultaneously. Even after partial deidentification, residual linkable fields — such as rare diagnosis codes paired with geographic data — may re-identify individuals, creating residual PII liability. See Healthcare Cybersecurity and Data Protection for sector-specific breach patterns.

Employee data processing: HR systems routinely aggregate name, SSN, home address, bank account details, and performance records. Each element is PII individually; their combination in a single record raises the sensitivity classification. Employee Data Privacy Protections covers the overlapping federal and state obligations governing this category.

Third-party vendor data flows: When organizations share customer data with vendors for analytics, marketing, or cloud storage, PII obligations transfer with the data. Contractual controls — data processing agreements, access restrictions, and breach notification clauses — are required under frameworks including CCPA/CPRA, which grants California consumers rights over data shared with service providers.

Children's data: COPPA (15 U.S.C. §6501 et seq.) applies a more restrictive PII standard for children under 13, requiring verifiable parental consent before collecting name, address, email, phone number, or persistent identifiers. The FTC's COPPA Rule (16 CFR Part 312) is detailed at COPPA Children's Data Protection.

Decision boundaries

Three threshold questions govern whether data qualifies as PII in a given regulatory context:

  1. Identifiability test: Can the data, alone or in combination with other reasonably available information, identify a specific natural person? If yes under any applicable statute, it is PII.
  2. Sector applicability: Which statute governs the data holder — HIPAA, GLBA, FERPA, COPPA, or a state privacy law? Each statute may define PII or its equivalent differently, and the most restrictive applicable definition controls compliance obligations.
  3. Deidentification sufficiency: Has the data been stripped of identifiers through a legally recognized method? Under HIPAA, only Safe Harbor deidentification (removal of all 18 specified identifiers) or Expert Determination (statistical certification that re-identification risk is very small) produces data that exits PHI scope.

Aggregated or anonymized datasets that nonetheless retain re-identification risk do not exit PII scope under NIST SP 800-122 or most state privacy frameworks. Sensitive Data Categories covers the subset of PII — including biometric, genetic, and precise geolocation data — that triggers heightened obligations beyond baseline PII requirements under state data privacy laws.

The Data Breach Notification Requirements framework activates specifically when PII is compromised in an unauthorized disclosure, making accurate pre-breach classification a direct compliance prerequisite.

References

📜 4 regulatory citations referenced  ·  🔍 Monitored by ANA Regulatory Watch  ·  View update log

Explore This Site