๐ŸŒŒ
Privacy Village Academy
Join The Community!AcademyAbout HGPE
  • Hitchhiker's Guide to Privacy Engineering
    • โ“What is HGPE?
      • โš–๏ธWho is this for?
      • ๐Ÿง™โ€โ™‚๏ธPrivacy Engineering
      • ๐ŸŽจCreative Privacy
      • ๐Ÿ”ฎGenerative AI
      • ๐Ÿง‘โ€๐Ÿ’ปAbout the Author
  • ๐Ÿง™โ€โ™‚๏ธThe Ethical AI Governance Playbook 2025 Edition
    • ๐Ÿค–Chapter 1 : AI Literacy
    • ๐ŸŒChapter 2 : AI Governance in the 21st Century
    • โŒ›Chapter 3 - Getting Started with AI Act Compliance
    • ๐Ÿš€Chapter 4 : Rise of AI Governance: Building Ethical & Compliant AI
    • Chapter 5 : Introduction to the Lifecycle of AI
  • ๐ŸŽ“Privacy Engineering Field Guide Season 1
    • โ“Decoding the Digital World: Exploring Everyday Technology
    • ๐Ÿ‘๏ธIntroduction: Why Privacy Matters?
      • Age of Mass Surveillance
      • Privacy & Democracy
      • Privacy & Government Surveillance
    • โšกChapter 1 : How Computers Work?
      • Electricity
      • Bits
      • Logic Gates
      • Central Processing Unit (CPU)
      • Graphic Processing Unit (GPU)
      • Motherboard
      • Data Storage
      • Databases
      • Operating System (OS)
      • Computer Code
      • Programming Languages
      • The File System
      • Bugs and Errors
      • Computer Virus
      • Internet of Things (IoT)
      • Cloud Computing
    • ๐Ÿ›ฐ๏ธChapter 2 : How the internet works?
      • Physical Infrastructure
      • Network and Protocols
      • Switch
      • Routers
      • IP Address
      • Domain Name System (DNS)
      • Mac Address
      • TCP / IP
      • OSI Model
      • Packets
      • The Client - Server Architecture
      • Secure Socket Shell (SSH)
      • Transport Layer Security (TLS)
      • Firewall
      • Tunnels and VPNs
      • Proxy Server
    • ๐Ÿ–ฅ๏ธChapter 3 : How Websites Work?
      • HTML
      • CSS
      • Javascript
      • Web Server
      • Browser
      • HTTP
      • Databases
      • Front End (Client Side)
      • Back End (Server Side)
      • Cookies
      • Local Storage
      • Session Storage
      • IndexedDB
      • XHR Requests
      • Web APIs
      • Webhooks
      • Email Server
      • HTTPS
      • Web Application Firewall
      • Single Sign-on (SS0)
      • OAuth 2.0
      • Pixels
      • Canvas Fingerprinting
      • Email Tracking
      • Containers
      • CI/CD
      • Kubernetes
      • Serverless Architecture
    • โš›๏ธChapter 4 : How Quantum Computers Work?
      • Quantum Properties
      • Quantum Bits (Qubits)
      • Decoherence
      • Quantum Circuits
      • Quantum Algorithms
      • Quantum Sensing
      • Post-Quantum Cryptography
    • ๐Ÿ“ณChapter 5 : Mobile Apps and Privacy
      • Battery
      • Processor
      • Mobile Operating Systems
      • Mobile Data Storage
      • Cellular Data
      • Mobile Device Sensors
      • Wireless Connectivity
      • Camera & Microphone
      • Mobile Apps
      • Software Development Kits (SDKs)
      • Mobile Device Identifiers
      • Bring Your Own Device (BYOD)
  • ๐Ÿ•ต๏ธโ€โ™‚๏ธPrivacy Engineering Field Guide Season 2
    • โ“Introduction to Privacy Engineering for Non-Techs
      • ๐ŸŽญChapter 1 : Digital Identities
        • What is identity?
        • Authentication Flows
        • Authentication vs. Authorization
        • OAuth 2.0
        • OpenID Connect (OIDC)
        • Self Sovereign Identities
        • Decentralized Identifiers
        • eIDAS
      • ๐Ÿ‘๏ธโ€๐Ÿ—จ๏ธChapter 2 : De-Identification
        • Introduction to De-Identification?
        • Input / Output Privacy
        • De-identification Strategies
        • K-Anonymity
        • Differential Privacy
        • Privacy Threat Modeling
  • ๐Ÿ“–HGPE Story and Lore
    • ๐ŸชฆChapter 1 : The Prologue
    • โ˜„๏ธChapter 2 : Battle for Earth
    • ๐Ÿฆ Chapter 3 : A Nightmare To Remember
    • ๐Ÿง™โ€โ™‚๏ธChapter 4 : The Academy
    • ๐ŸŒƒChapter 5: The Approaching Darkness
    • โš”๏ธChapter 6 : The Invasion
    • ๐ŸฐChapter 7 : The Fall of the Academy
    • ๐Ÿ›ฉ๏ธChapter 8 : The Escape
    • ๐ŸชChapter 9 : The Moon Cave
    • ๐Ÿฆ‡Chapter 10: Queen of Darkness
  • ๐Ÿ“บVideos, Audio Book and Soundtracks
    • ๐ŸŽงReading Episodes
    • ๐ŸŽนSoundtracks
  • ๐Ÿ‘พHGPE Privacy Games and Challenges
    • ๐ŸŽฎData Privacy Day'23 / Privacy Treasure Hunt Game
    • ๐ŸงฉPrivacy Quest
  • ๐Ÿ“ฌSubscribe Now!
Powered by GitBook
On this page
  • When should you use de-identification techniques?
  • What are statistical disclosure controls? ๐Ÿ‘€
  • Generalization: ๐Ÿ‘ฅ
  • Pseudonymization: ๐ŸŽญ
  • Perturbation
  • After Perturbation (Noisy Dataset):
  • Differential Privacy: ๐Ÿ“Š
  • Synthetic Data: ๐Ÿค–
  • Grouping : ๐Ÿ“‰
  • Aggregating Related Data Attributes:
  • Example - Educational Records:
  • Before Grouping :
  • After Grouping (Aggregated Dataset):
  • Mixing : ๐Ÿ”€
  • Perturbation : ๐Ÿ”Š

Was this helpful?

  1. Privacy Engineering Field Guide Season 2
  2. Introduction to Privacy Engineering for Non-Techs
  3. Chapter 2 : De-Identification

De-identification Strategies

PreviousInput / Output PrivacyNextK-Anonymity

Last updated 1 year ago

Was this helpful?

When should you use de-identification techniques?

Output privacy techniques are employed to protect the privacy of data when it is being shared or released to external parties. These methods aim to ensure that sensitive information is concealed while still providing useful insights and analysis to authorized users.

What are statistical disclosure controls? ๐Ÿ‘€

Statistical disclosure controls, including generalization, pseudonymization, differential privacy, and synthetic data etc. are effective techniques for ensuring output privacy.

These methods provide promising and reliable de-identification strategies for sharing data in structured databases while safeguarding sensitive information.

Generalization: ๐Ÿ‘ฅ

Generalization is the process of replacing specific data values with broader categories to protect individual identities while enabling analysis.

  • For example, we can replace exact ages with age groups.

Original Data

Generalized Data

๐Ÿง‘โ€๐Ÿ’ผ Age: 28

๐Ÿ‘ฅ Age Group: 20-30

๐Ÿ‘ฉ Age: 42

๐Ÿ‘ฅ Age Group: 40-50

๐Ÿ‘จโ€๐ŸŽ“ Age: 19

๐Ÿ‘ฅ Age Group: 10-20

๐Ÿ‘ต Age: 68

๐Ÿ‘ฅ Age Group: 60-70

Pseudonymization: ๐ŸŽญ

Pseudonymization involves replacing identifying information with pseudonyms or aliases, allowing data linking for legitimate purposes while safeguarding identities.

  • For example, we can replace names with unique identifiers.

Original Data

Pseudonymized Data

๐Ÿ‘ค Name: John Smith

๐Ÿ†” ID: ABC123

๐Ÿ‘ค Name: Jane Doe

๐Ÿ†” ID: XYZ456

๐Ÿ‘ค Name: Bob Johnson

๐Ÿ†” ID: PQR789

๐Ÿ‘ค Name: Mary Brown

๐Ÿ†” ID: DEF321

Perturbation

What is perturbation? ๐Ÿ‘€

Perturbation involves adding random noise to the data attributes of individuals, making it difficult for intruders to understand the true values.

Adding Noise to Data Attributes: ๐Ÿ”Š

In perturbation, we introduce random variations to individual data points, preserving the overall statistical properties while obscuring precise information.

Example - Healthcare Records:

Letโ€™s continue with a healthcare example. In a traditional healthcare dataset, medical conditions may be represented by specific numerical values, making it potentially identifiable. However, with perturbation, we add random noise to the medical conditions.

Before Perturbation :

SoldierID

Medical Condition

๐Ÿ‘ค 01

2

๐Ÿ‘ค 002

1

๐Ÿ‘ค 003

3

After Perturbation (Noisy Dataset):

SoldierID
Medical Condition

๐Ÿ‘ค 001

3

๐Ÿ‘ค 002

2

๐Ÿ‘ค 003

4

By adding random noise to the medical conditions, it becomes challenging to infer the true values, providing an additional layer of privacy protection.

Differential Privacy: ๐Ÿ“Š

Differential Privacy is the method of adding random noise to query results, preserving privacy while maintaining data utility.

  • For example, we can add noise to aggregated age statistics.

Original Aggregated Age

Noisy Aggregated Age

๐Ÿ‘ฅ Age Group: 20-30

๐Ÿ‘ฅ Age Group: 19-31

๐Ÿ‘ฅ Age Group: 40-50

๐Ÿ‘ฅ Age Group: 39-51

๐Ÿ‘ฅ Age Group: 10-20

๐Ÿ‘ฅ Age Group: 9-21

๐Ÿ‘ฅ Age Group: 60-70

๐Ÿ‘ฅ Age Group: 59-71

Synthetic Data: ๐Ÿค–

Synthetic Data involves generating artificial data that resembles real data, protecting privacy during research and testing.

  • For example, we can create synthetic customer profiles.

Real Data

Synthetic Data

๐Ÿ‘ฉ Name: Emily, Age: 32

๐Ÿ‘ค Name: Sarah, Age: 35

๐Ÿ‘จโ€๐Ÿ’ผ Occupation: Engineer

๐Ÿ‘จโ€๐Ÿ’ผ Occupation: Consultant

๐Ÿ“ž Phone: 555-1234

๐Ÿ“ž Phone: 555-5678

๐Ÿ“ง Email: emily@example.com

๐Ÿ“ง Email: sarah@example.com

Grouping : ๐Ÿ“‰

Grouping is a strategy that involves aggregating related data attributes of individuals together to obscure individual information.

Aggregating Related Data Attributes:

In grouping, we combine data attributes of individuals who share common characteristics, creating collective profiles that hide specific details while preserving trends.

Example - Educational Records:

Letโ€™s consider an example in an educational dataset. In a traditional dataset, educational records may contain student names, test scores, and subjects studied. However, with grouping, we aggregate data attributes based on subjects studied.

Before Grouping :

Student Name

Math Score

Science Score

English Score

๐Ÿ‘จโ€๐Ÿ’ผ John

85

90

78

๐Ÿ‘ฉ Jane

95

92

88

๐Ÿ‘จโ€๐Ÿ’ผ Mike

78

85

80

After Grouping (Aggregated Dataset):

Subject

Average Score

Math

๐Ÿ‘ฅ86

Science

๐Ÿ‘ฅ89

English

๐Ÿ‘ฅ82

By grouping data based on subjects studied, individual test scores are hidden, and only collective average scores are presented, safeguarding student privacy.

Mixing : ๐Ÿ”€

What is mixing? ๐ŸŽฐ

Mixing is a clever technique that involves shuffling or rearranging data attributes of individuals to make it hard to find meaningful patterns.

In mixing, we rearrange the data attributes within a dataset so that sensitive information no longer corresponds to the original individual it belonged to.

This obscures the relationships between attributes, enhancing data privacy.

Letโ€™s consider an example in a dataset. In a traditional dataset, records may contain attributes like income, age, and account balance, which could potentially identify individuals.

However, with mixing, we shuffle the order of these attributes within the dataset.

Before Mixing :

SoldierID

Income

Age

Account Balance

๐Ÿ‘ค 001

$50,000

30

$10,000

๐Ÿ‘ค 002

$40,000

25

$8,000

๐Ÿ‘ค 003

$60,000

28

$12,000

After Mixing (Shuffled Dataset):

SoldierID
Age
Account Balance
Income

๐Ÿ‘ค 001

25

$10,000

$40,000

๐Ÿ‘ค 002

28

$8,000

$50,000

๐Ÿ‘ค 003

30

$12,000

$60,000

By shuffling the order of attributes, it becomes challenging to link specific attributes back to a particular individual, preserving their privacy.

Perturbation : ๐Ÿ”Š

What is perturbation? ๐Ÿ‘€

Perturbation involves adding random noise to the data attributes of individuals, making it difficult for intruders to understand the true values.

Adding Noise to Data Attributes: ๐Ÿ”Š

In perturbation, we introduce random variations to individual data points, preserving the overall statistical properties while obscuring precise information.

Example - Healthcare Records:

Letโ€™s continue with a healthcare example. In a traditional healthcare dataset, medical conditions may be represented by specific numerical values, making it potentially identifiable. However, with perturbation, we add random noise to the medical conditions.

Before Perturbation :

SoldierID
Medical Condition

๐Ÿ‘ค 001

2

๐Ÿ‘ค 002

1

๐Ÿ‘ค 003

3

After Perturbation (Noisy Dataset):

SoldierID
Medical Condition

๐Ÿ‘ค 001

3

๐Ÿ‘ค 002

2

๐Ÿ‘ค 003

4

By adding random noise to the medical conditions, it becomes challenging to infer the true values, providing an additional layer of privacy protection.

๐Ÿ•ต๏ธโ€โ™‚๏ธ
โ“
๐Ÿ‘๏ธโ€๐Ÿ—จ๏ธ
Page cover image