Metric Job Management#

Note

Performance Tuning: You can improve evaluation performance by setting job.params.parallelism to control the number of concurrent requests. A typical default value is 16, but you may need to adjust based on your model’s capacity and rate limits.

Monitor Job#

Monitor the status of a job.

job_status = client.evaluation.metric_jobs.get_status(name=job.name)
while job_status.status in ("active", "pending", "created"):
    time.sleep(10)
    job_status = client.evaluation.metric_jobs.get_status(name=job.name)
    print("status:", job_status.status, job_status.status_details)
print(job_status)

Visit Troubleshooting NeMo Evaluator to help troubleshoot job failures.

Fetch Job Logs#

Get JSON logs with pagination. Logs are available for an active job and after the job terminates.

logs_response = client.evaluation.benchmark_jobs.get_logs(name=job.name)
for log_entry in logs_response.data:
    print(f"[{log_entry.timestamp}] {log_entry.message.strip()}")

# Handle pagination
while logs_response.next_page:
    logs_response = client.evaluation.benchmark_jobs.get_logs(
        name=job.name,
        page_cursor=logs_response.next_page
    )
    for log_entry in logs_response.data:
        print(f"[{log_entry.timestamp}] {log_entry.message.strip()}")

View Evaluation Results#

Evaluation results are available once the evaluation job successfully completes. Visit Metric Results for details to fetch evaluation results.

Download Job Artifacts#

Files generated during the job execution are available for download. Job artifacts are useful to inspect details of the evaluation.

artifacts_zip = client.evaluation.metric_jobs.results.artifacts.download(name=job.name, workspace=workspace)
artifacts_zip.write_to_file("evaluation_artifacts.tar.gz")
print("Saved artifacts to evaluation_artifacts.tar.gz")

Extract files from the tarball with the following command and an artifacts directory will be created.

tar -xf evaluation_artifacts.tar.gz