Evaluation#

Evaluation is a critical component of NeMo Safe Synthesizer that helps you understand both the utility and privacy of your synthetic data. The evaluation step is enabled by default and provides comprehensive reports comparing your original and synthetic datasets across multiple dimensions.

How It Works#

The evaluation system compares your original and synthetic datasets using two main frameworks:

Synthetic Quality Score (SQS): Measures how well the synthetic data preserves statistical properties and utility
Data Privacy Score (DPS): Assesses privacy protection and resistance to various attack vectors

Each framework consists of multiple metrics that are combined into an overall score.

Synthetic Quality Score (SQS)#

The SQS measures data utility across several dimensions:

Column Correlation Stability#

Analyzes the correlation across every combination of two columns:

Compares correlation matrices between original and synthetic data
Ensures relationships between variables are preserved
Critical for maintaining predictive power in ML models

Deep Structure Stability#

Uses Principal Component Analysis to reduce dimensionality when comparing datasets:

Captures overall data structure and patterns
Evaluates high-dimensional relationships
Assesses whether data maintains its fundamental characteristics

Column Distribution Stability#

Compares the distribution for each column in the original data to the matching column in the synthetic data:

Statistical tests for numeric columns (KS test, Wasserstein distance)
Frequency comparison for categorical columns
Identifies distribution drift

Text Structure Similarity#

For text columns, calculates sentence, word, and character counts:

Compares structural properties of text
Ensures text length and complexity are preserved
Validates text generation quality

Text Semantic Similarity#

Understands whether the semantic meaning of the text is maintained after synthesizing:

Uses embedding-based similarity measures
Captures contextual and semantic properties
Ensures text maintains intended meaning

Data Privacy Score (DPS)#

The DPS assesses privacy protection through attack simulations:

Membership Inference Protection#

Tests whether attackers can determine if specific records were in the training data:

Simulates membership inference attacks
Measures how distinguishable training records are
Higher scores indicate better privacy protection

Attribute Inference Protection#

Assesses whether sensitive attributes can be inferred when other attributes are known:

Tests ability to predict hidden values
Measures information leakage
Validates that synthesis doesn’t create inference vulnerabilities

PII Replay#

Evaluates the frequency with which sensitive values from the original data appear in the synthetic version:

Checks for exact matches of PII values
Identifies potential memorization
Critical for compliance and privacy guarantees

Privacy Guarantees and Evaluation#

The Data Privacy Score (DPS) measures empirical privacy through attack simulations and real-world privacy tests. When you enable differential privacy during synthesis, you gain both:

Mathematical privacy guarantees (epsilon/delta bounds from DP) - Formal proof that the algorithm limits information leakage
Empirical privacy measurement (DPS from evaluation) - Practical testing of privacy protection against attacks

These two approaches are complementary:

Differential privacy provides worst-case theoretical guarantees regardless of data or model
DPS evaluation measures actual privacy in practice for your specific dataset and configuration

Impact of Differential Privacy on Scores:

Enabling DP typically improves DPS by reducing memorization and attack success rates
Lower epsilon (stronger privacy) generally yields higher DPS scores
May reduce SQS due to privacy-utility tradeoff (noise affects quality)

Interpreting Combined Metrics:

High DPS + High SQS = Excellent privacy and utility balance
High DPS + Lower SQS = Strong privacy with acceptable quality loss
Lower DPS + High SQS = Good utility but consider enabling DP for stronger privacy

For more on differential privacy configuration and privacy-utility tradeoffs, see Data Synthesis and Differential Privacy Deep Dive.

Evaluation Reports#

Every NeMo Safe Synthesizer job automatically generates an HTML evaluation report containing:

Overall SQS and DPS scores
Detailed subscores for each metric
Visualizations comparing original and synthetic data
Statistical test results
Recommendations for improvement

The report provides both high-level summaries for stakeholders and detailed technical metrics for data scientists.

Configuration#

Evaluation is enabled by default but can be customized:

{
    "evaluation": {
        "mia_enabled": true,
        "aia_enabled": true
    }
}

Evaluation#

How It Works#

Synthetic Quality Score (SQS)#

Column Correlation Stability#

Deep Structure Stability#

Column Distribution Stability#

Text Structure Similarity#

Text Semantic Similarity#

Data Privacy Score (DPS)#

Membership Inference Protection#

Attribute Inference Protection#

PII Replay#

Privacy Guarantees and Evaluation#

Evaluation Reports#

Configuration#

Interpreting Scores#

SQS Interpretation#

DPS Interpretation#

Evaluation#

How It Works#

Synthetic Quality Score (SQS)#

Column Correlation Stability#

Deep Structure Stability#

Column Distribution Stability#

Text Structure Similarity#

Text Semantic Similarity#

Data Privacy Score (DPS)#

Membership Inference Protection#

Attribute Inference Protection#

PII Replay#

Privacy Guarantees and Evaluation#

Evaluation Reports#

Configuration#

Interpreting Scores#

SQS Interpretation#

DPS Interpretation#

Related Topics#