API Reference#
This page documents the NMP-specific APIs for the Data Designer service.
For comprehensive documentation on building configurations (column types, constraints, processors, etc.), see the open-source library documentation.
Client#
NeMoDataDesignerClient#
The primary interface to the Data Designer service.
Initialization:
from nemo_microservices.data_designer.client import NeMoDataDesignerClient
# Option 1: From SDK instance
from nemo_microservices import NeMoMicroservices
sdk = NeMoMicroservices(base_url="...", workspace="default")
client = NeMoDataDesignerClient(sdk=sdk)
# Option 2: Direct initialization
client = NeMoDataDesignerClient(
base_url="http://localhost:8080",
workspace="default"
)
Methods:
preview(config_builder, num_records=10)#
Generate a small sample dataset for fast iteration.
Parameters:
config_builder(DataDesignerConfigBuilder): Configuration defining the datasetnum_records(int, optional): Number of records to generate (default: 10)
Returns: PreviewResults object with:
dataset: pandas DataFrame with generated recordsanalysis: Statistical analysis of the datasetdisplay_sample_record(): Method to display a random record
Example:
preview = client.preview(config_builder, num_records=20)
preview.display_sample_record()
df = preview.dataset
create(config_builder, num_records, wait_until_done=False)#
Submit a job to generate a full dataset.
Parameters:
config_builder(DataDesignerConfigBuilder): Configuration defining the datasetnum_records(int): Number of records to generatewait_until_done(bool, optional): Whether to block until job completes (default: False)
Returns: DataDesignerJobResults object with:
wait_until_done(): Block until job completesload_dataset(): Load generated dataset as pandas DataFrameload_analysis(): Load statistical analysisjob_id: ID of the submitted job
Example:
job = client.create(config_builder, num_records=1000)
job.wait_until_done()
dataset = job.load_dataset()
analysis = job.load_analysis()
Model Provider Configuration#
When defining models in your DataDesignerConfigBuilder, reference NMP model providers:
import data_designer.config as dd
model_config = dd.ModelConfig(
provider="default/build-nvidia", # Format: workspace/provider-name
model="nvidia/nemotron-3-nano-30b-a3b", # Model name from provider
alias="text",
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
max_tokens=512,
),
)
Provider format:
Fully qualified:
workspace/provider-name(recommended)Implicit workspace:
provider-name(uses client’s workspace)
Model name: Must match the model identifier expected by the external provider (e.g., NVIDIA Build, OpenAI).
Seed Sources#
FilesetFileSeedSource#
Use data from the NMP Files service as seed data.
from nemo_microservices.data_designer.plugins.fileset_file_seed_source import FilesetFileSeedSource
seed_source = FilesetFileSeedSource(
path="default/my-fileset#data.parquet" # Format: workspace/fileset#file-path
)
config_builder.with_seed_dataset(seed_source)
Path format:
Fully qualified:
workspace/fileset-name#file-path(recommended)Implicit workspace:
fileset-name#file-path(uses client’s workspace)
HuggingFaceSeedSource#
Use data from HuggingFace as seed data.
import data_designer.config as dd
seed_source = dd.HuggingFaceSeedSource(
path="datasets/my-username/my-dataset/data/*.parquet",
token="default/huggingface-token" # Reference to NMP secret
)
config_builder.with_seed_dataset(seed_source)
Token: Must reference a secret created via sdk.secrets.create().
Unsupported seed sources:
LocalFileSeedSource(useFilesetFileSeedSourceinstead)DataFrameSeedSource(upload to Fileset first)
Configuration Building#
Configuration building uses the open-source library’s API. See the library documentation for:
Example:
import data_designer.config as dd
config_builder = dd.DataDesignerConfigBuilder(model_configs=[...])
# Add columns
config_builder.add_column(dd.SamplerColumnConfig(...))
config_builder.add_column(dd.LLMTextColumnConfig(...))
config_builder.add_column(dd.ExpressionColumnConfig(...))
# Add constraints
config_builder.add_constraint(...)
# Add processors
config_builder.add_processor(...)
# Configure seed data
config_builder.with_seed_dataset(seed_source)
# Build final config
config = config_builder.build()