About Safe Synthesizer#
NVIDIA NeMo Safe Synthesizer provides a comprehensive platform for creating private synthetic versions of sensitive tabular datasets. This section covers the core concepts and components that power the Safe Synthesizer capabilities.
The Safe Synthesizer system consists of several main components that work together:
Data Synthesis: Uses LLM-based fine-tuning to generate realistic synthetic data that maintains statistical properties while protecting individual privacy. Optionally apply differential privacy for mathematical privacy guarantees.
PII Replacement: Detects and replaces personally identifiable information before synthesis using configurable detection methods (GLiNER, LLM classification, regex) and transformation rules.
Evaluation: Assesses synthetic data quality and privacy through comprehensive metrics including column correlation stability, distribution analysis, membership inference protection, and attribute inference protection.
Job Management: Orchestrates the complete pipeline from data preparation through synthesis to evaluation, with flexible configuration and monitoring capabilities.
Together, these components enable you to generate privacy-preserving synthetic data that maintains utility for downstream AI tasks and analytics.
Core Concepts#
Learn about LLM-based synthesis, differential privacy, and tabular fine-tuning for generating synthetic data.
Understand how PII detection and replacement works to protect sensitive information before synthesis.
Learn about quality and privacy metrics used to assess synthetic data including SQS and DPS scores.
Understand the job lifecycle, configuration, and execution for Safe Synthesizer pipelines.