Personally Identifiable Information: Definitions and Scope

Personally identifiable information (PII) sits at the center of U.S. data protection law, compliance frameworks, and breach notification obligations across federal and state jurisdictions. This reference covers how PII is formally defined by authoritative public bodies, the classification boundaries that separate PII from non-PII, and the operational scenarios where those distinctions carry legal consequence. Professionals working in data protection providers, compliance, privacy engineering, and information governance rely on precise PII definitions to scope their obligations correctly.


Definition and scope

The National Institute of Standards and Technology (NIST) provides the most widely cited federal definition of PII. NIST Special Publication 800-122 defines PII as "any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any information that is linked or linkable to an individual, such as medical, educational, financial, and employment information."

The Office of Management and Budget (OMB) Memorandum M-07-16 reinforces this framework for federal agencies, distinguishing between two functional categories:

  1. Directly identifying information — data elements that alone identify a specific individual: Social Security numbers, passport numbers, driver's license numbers, biometric identifiers, and full legal names combined with addresses.
  2. Indirectly identifying information (linkable PII) — data elements that do not alone identify an individual but become identifying when combined with other available data: ZIP codes, birth dates, gender, and device identifiers.

This two-tier structure reflects the "mosaic effect" documented in NIST SP 800-122, where individually innocuous data points combine to produce a profile sufficient for identification. The Federal Trade Commission (FTC) applies a similar combination-based analysis when assessing whether a commercial dataset constitutes PII under Section 5 of the FTC Act.

Sensitive PII — a subset recognized by OMB M-10-23 — includes Social Security numbers, financial account numbers, medical records, biometric data, and information about children under 13. Sensitive PII attracts heightened handling requirements under statutes including the Health Insurance Portability and Accountability Act (HIPAA), the Gramm-Leach-Bliley Act (GLBA), and the Children's Online Privacy Protection Act (COPPA).


How it works

PII identification and classification operates through a structured process within privacy programs. The describes how professional services are organized around these compliance obligations. The classification process follows discrete phases:

  1. Data inventory and mapping — cataloguing all data elements collected, processed, stored, or transmitted by a system or organization.
  2. Sensitivity assessment — applying the NIST SP 800-122 two-tier test to each element: Is it directly identifying? Is it linkable when combined with other held data?
  3. Contextual analysis — evaluating the operational context. NIST SP 800-188 notes that the same data element (e.g., a ZIP code) may or may not constitute PII depending on the population size it represents and the other data held alongside it.
  4. Risk categorization — assigning a risk level based on sensitivity classification, which drives downstream controls: encryption requirements, access restrictions, retention limits, and breach notification timelines.
  5. Control implementation — applying safeguards consistent with the categorized risk level, drawing on NIST SP 800-53 Rev 5 control families including Access Control (AC), Audit and Accountability (AU), and System and Communications Protection (SC).

The Privacy Act of 1974 (5 U.S.C. § 552a) establishes the federal government's statutory obligations for PII held in systems of records, requiring agencies to publish System of Records Notices (SORNs) that describe what PII is collected and for what purpose.


Common scenarios

PII classification disputes and compliance failures cluster around five recurring operational scenarios:

Employment records — Employer HR systems hold a concentration of directly identifying PII: names, Social Security numbers, banking details for payroll, and health plan enrollment data. HIPAA's employer-plan provisions apply to self-insured health plan data; GLBA applies where financial products are involved.

Web analytics and advertising technology — IP addresses, cookie identifiers, and device fingerprints are treated as PII under the California Consumer Privacy Act (Cal. Civ. Code § 1798.140) and under FTC guidance, but are not universally classified as PII under all federal frameworks. The gap between state and federal classification standards is a documented source of compliance complexity.

Healthcare data aggregation — De-identified data under HIPAA's Safe Harbor method requires removal of 18 specific identifiers enumerated at 45 CFR § 164.514(b). Re-identification risk increases when de-identified datasets are combined with publicly available demographic data.

Children's data — COPPA (16 CFR Part 312) defines personal information for children under 13 to include persistent identifiers, geolocation data precise to a street level, and photographs — categories treated as sensitive PII regardless of the broader framework applied.

Breach notification thresholds — 47 states maintain breach notification statutes that define PII differently, creating multi-jurisdictional complexity. The National Conference of State Legislatures tracks state-level variation in these definitions, with some states including biometric data and medical information as covered PII while others limit coverage to financial account numbers combined with access credentials.


Decision boundaries

The central analytical question in PII classification is not whether a data element carries a name but whether it carries identity. Four boundary conditions govern that determination:

PII vs. anonymized data — True anonymization under HIPAA Safe Harbor removes 18 enumerated identifiers and requires no actual knowledge that the remaining data could re-identify an individual. Data that fails this standard remains PII regardless of how it is labeled internally.

Pseudonymized vs. de-identified data — Pseudonymization — replacing identifiers with tokens while retaining a re-identification key — does not produce non-PII. NIST SP 800-188 treats pseudonymized data as PII because re-identification is technically possible. The EU General Data Protection Regulation (GDPR) makes the same distinction explicitly, though GDPR does not govern domestic U.S. compliance, its technical definitions have been adopted by U.S. privacy engineers as reference standards.

Aggregated vs. individual-level data — Population-level statistics (e.g., "35% of users are aged 25–34") are not PII. The boundary shifts when aggregation is sufficiently granular to isolate individuals — a documented risk in small-cell demographic reporting.

Public vs. private information — Publicly available information is not automatically excluded from PII classification. The FTC's 2012 Privacy Report established that public records combined with other data can produce PII, a position consistent with NIST's mosaic-effect framework.

Professionals navigating these boundaries across jurisdictions can reference structured providers through how to use this data protection resource for guidance on locating qualified service providers and compliance specialists operating within specific regulatory frameworks.


📜 10 regulatory citations referenced  ·  🔍 Monitored by ANA Regulatory Watch  ·  View update log