Schema References for mostlyai.sdk.domain
¶
This module is auto-generated to represent pydantic
-based classes of the defined schema in the Public API.
mostlyai.sdk.domain ¶
AboutService ¶
General information about the service.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
version
|
str | None
|
The version number of the service. |
None
|
assistant
|
bool | None
|
A flag indicating if the assistant is enabled. |
None
|
AccountType ¶
The type of account, either a user or an organization.
Accuracy ¶
Metrics regarding the accuracy of synthetic data, measured as the closeness of discretized lower dimensional marginal distributions.
- Univariate Accuracy: The accuracy of the univariate distributions for all target columns.
- Bivariate Accuracy: The accuracy of all pair-wise distributions for target columns, as well as for target columns with respect to the context columns.
- Coherence Accuracy: The accuracy of the auto-correlation for all target columns.
Accuracy is defined as 100% - Total Variation Distance (TVD), whereas TVD is half the sum of the absolute differences of the relative frequencies of the corresponding distributions.
These accuracies are calculated for all discretized univariate, and bivariate distributions. In case of sequential data, also for all coherence distributions. Overall metrics are then calculated as the average across these accuracies.
All metrics can be compared against a theoretical maximum accuracy, which is calculated for a same-sized holdout. The accuracy metrics shall be as close as possible to the theoretical maximum, but not significantly higher, as this would indicate overfitting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
overall
|
float | None
|
Overall accuracy of synthetic data, averaged across univariate, bivariate, and coherence. |
None
|
univariate
|
float | None
|
Average accuracy of discretized univariate distributions. |
None
|
bivariate
|
float | None
|
Average accuracy of discretized bivariate distributions. |
None
|
coherence
|
float | None
|
Average accuracy of discretized coherence distributions. Only applicable for sequential data. |
None
|
overall_max
|
float | None
|
Expected overall accuracy of a same-sized holdout. Serves as a reference for |
None
|
univariate_max
|
float | None
|
Expected univariate accuracy of a same-sized holdout. Serves as a reference for |
None
|
bivariate_max
|
float | None
|
Expected bivariate accuracy of a same-sized holdout. Serves as a reference for |
None
|
coherence_max
|
float | None
|
Expected coherence accuracy of a same-sized holdout. Serves as a reference for |
None
|
Compute ¶
A compute resource for executing tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str | None
|
The unique identifier of a compute resource. Not applicable for SDK. |
None
|
name
|
str | None
|
The name of a compute resource. |
None
|
type
|
ComputeType | None
|
|
None
|
config
|
dict[str, Any] | None
|
|
None
|
secrets
|
dict[str, Any] | None
|
|
None
|
resources
|
ComputeResources | None
|
|
None
|
order_index
|
int | None
|
The index for determining the sort order when listing computes |
None
|
ComputeConfig ¶
The configuration for creating a new compute resource.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of a compute resource. |
None
|
type
|
ComputeType | None
|
|
None
|
resources
|
ComputeResources | None
|
|
None
|
config
|
dict[str, Any] | None
|
|
None
|
secrets
|
dict[str, Any] | None
|
|
None
|
order_index
|
int | None
|
The index for determining the sort order when listing computes |
None
|
ComputeListItem ¶
Essential compute details for listings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str | None
|
The unique identifier of a compute resource. Not applicable for SDK. |
None
|
type
|
ComputeType | None
|
|
None
|
name
|
str | None
|
The name of a compute resource. |
None
|
resources
|
ComputeResources | None
|
|
None
|
ComputeResources ¶
A set of available hardware resources for a compute resource.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cpus
|
int | None
|
The number of CPU cores |
None
|
memory
|
float | None
|
The amount of memory in GB |
None
|
gpus
|
int | None
|
The number of GPUs |
0
|
gpu_memory
|
float | None
|
The amount of GPU memory in GB |
0
|
ComputeType ¶
The type of compute.
Connector ¶
A connector is a connection to a data source or a data destination.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a connector. |
required |
name
|
str | None
|
The name of a connector. |
None
|
type
|
ConnectorType
|
|
required |
access_type
|
ConnectorAccessType | None
|
|
<ConnectorAccessType.read_protected: 'READ_PROTECTED'>
|
config
|
dict[str, Any] | None
|
|
None
|
secrets
|
dict[str, str] | None
|
|
None
|
ssl
|
dict[str, str] | None
|
|
None
|
metadata
|
Metadata | None
|
|
None
|
usage
|
ConnectorUsage | None
|
|
None
|
table_id
|
str | None
|
Optional. ID of a source table or a synthetic table, that this connector belongs to. If not set, then this connector is managed independently of any generator or synthetic dataset. |
None
|
delete_data ¶
Delete data from the specified location within the connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str
|
The target location within the connector to delete data from. |
required |
locations ¶
List connector locations.
List the available databases, schemas, tables, or folders for a connector.
For storage connectors, this returns list of folders and files at root, respectively at prefix
level.
For DB connectors, this returns list of schemas (or databases for DBs without schema), respectively list of tables if prefix
is provided.
The formats of the locations are:
- Cloud storage:
AZURE_STORAGE
:container/path
GOOGLE_CLOUD_STORAGE
:bucket/path
S3_STORAGE
:bucket/path
- Database:
BIGQUERY
:dataset.table
DATABRICKS
:schema.table
HIVE
:database.table
MARIADB
:database.table
MSSQL
:schema.table
MYSQL
:database.table
ORACLE
:schema.table
POSTGRES
:schema.table
SNOWFLAKE
:schema.table
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
str
|
The prefix to filter the results by. Defaults to an empty string. |
''
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of locations (schemas, databases, directories, etc.). |
read_data ¶
Retrieve data from the specified location within the connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str
|
The target location within the connector to read data from. |
required |
limit
|
int | None
|
The maximum number of rows to return. Returns all if not specified. |
None
|
shuffle
|
bool | None
|
Whether to shuffle the results. |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing the retrieved data. |
schema ¶
Retrieve the schema of the table at a connector location.
Please refer to locations()
for the format of the location.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str
|
The location of the table. |
required |
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: The retrieved schema. |
update ¶
Update a connector with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the connector. |
None
|
config
|
dict[str, Any] | None
|
Connector configuration. |
None
|
secrets
|
dict[str, str] | None
|
Secret values for the connector. |
None
|
ssl
|
dict[str, str] | None
|
SSL configuration for the connector. |
None
|
test_connection
|
bool | None
|
If true, validates the connection before saving. |
True
|
write_data ¶
Write data to the specified location within the connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame | None
|
The DataFrame to write, or None to delete the location. |
required |
location
|
str
|
The target location within the connector to write data to. |
required |
if_exists
|
Literal['append', 'replace', 'fail']
|
The behavior if the target location already exists (append, replace, fail). Default is "fail". |
'fail'
|
ConnectorAccessType ¶
The access permissions of a connector.
READ_PROTECTED
: The connector is restricted to being used solely as a source for training a generator. Direct data access is not permitted, only schema access is available.READ_DATA
: This connector allows full read access. It can also be used as a source for training a generator.WRITE_DATA
: This connector allows full read and write access. It can be also used as a source for training a generator, as well as a destination for delivering a synthetic dataset.SOURCE
: DEPRECATED - equivalent to READ_PROTECTEDDESTINATION
: DEPRECATED - equivalent to WRITE_DATA
ConnectorConfig ¶
The structures of the config, secrets and ssl parameters depend on the connector type.
- Cloud storage:
- type: AZURE_STORAGE config: accountName: string clientId: string (required for auth via service principal) tenantId: string (required for auth via service principal) secrets: accountKey: string (required for regular auth) clientSecret: string (required for auth via service principal) - type: GOOGLE_CLOUD_STORAGE config: secrets: keyFile: string - type: S3_STORAGE config: accessKey: string endpointUrl: string (only needed for S3-compatible storage services other than AWS) sslEnabled: boolean, default: false secrets: secretKey: string ssl: caCertificate: base64-encoded string
- Database:
- type: BIGQUERY config: secrets: keyFile: string - type: DATABRICKS config: host: string httpPath: string catalog: string clientId: string (required for auth via service principal) tenantId: string (required for auth via service principal) secrets: accessToken: string (required for regular auth) clientSecret: string (required for auth via service principal) - type: HIVE config: host: string port: integer, default: 10000 username: string (required for regular auth) kerberosEnabled: boolean, default: false kerberosServicePrincipal: string (required if kerberosEnabled) kerberosClientPrincipal: string (optional if kerberosEnabled) kerberosKrb5Conf: string (required if kerberosEnabled) sslEnabled: boolean, default: false secrets: password: string (required for regular auth) kerberosKeytab: base64-encoded string (required if kerberosEnabled) ssl: caCertificate: base64-encoded string - type: MARIADB config: host: string port: integer, default: 3306 username: string secrets: password: string - type: MSSQL config: host: string port: integer, default: 1433 username: string database: string secrets: password: string - type: MYSQL config: host: string port: integer, default: 3306 username: string secrets: password: string - type: ORACLE config: host: string port: integer, default: 1521 username: string connectionType: enum {SID, SERVICE_NAME}, default: SID database: string, default: ORCL secrets: password: string - type: POSTGRES config: host: string port: integer, default: 5432 username: string database: string sslEnabled: boolean, default: false secrets: password: string ssl: rootCertificate: base64-encoded string sslCertificate: base64-encoded string sslCertificateKey: base64-encoded string - type: SNOWFLAKE config: account: string username: string warehouse: string, default: COMPUTE_WH database: string secrets: password: string - type: SQLITE config: database: string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of a connector. |
None
|
type
|
ConnectorType
|
|
required |
access_type
|
ConnectorAccessType | None
|
|
<ConnectorAccessType.read_protected: 'READ_PROTECTED'>
|
config
|
dict[str, Any] | None
|
|
None
|
secrets
|
dict[str, str] | None
|
|
None
|
ssl
|
dict[str, str] | None
|
|
None
|
ConnectorDeleteDataConfig ¶
Configuration for deleting data from a connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str
|
Specifies the target within the connector to delete. The format of this parameter varies by connector type. |
required |
ConnectorListItem ¶
Essential connector details for listings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a connector. |
required |
name
|
str | None
|
The name of a connector. |
None
|
type
|
ConnectorType
|
|
required |
access_type
|
ConnectorAccessType | None
|
|
<ConnectorAccessType.read_protected: 'READ_PROTECTED'>
|
metadata
|
Metadata | None
|
|
None
|
usage
|
ConnectorUsage | None
|
|
None
|
ConnectorReadDataConfig ¶
Configuration for reading data from a connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str | None
|
Specifies the target within the connector from which to retrieve the data. The format of this parameter varies by connector type. |
None
|
limit
|
int | None
|
The maximum number of rows to return. Return all if not specified. |
None
|
shuffle
|
bool | None
|
Whether to shuffle the results. |
False
|
ConnectorType ¶
The type of a connector.
The type determines the structure of the config, secrets and ssl parameters.
MYSQL
: MySQL databasePOSTGRES
: PostgreSQL databaseMSSQL
: Microsoft SQL Server databaseORACLE
: Oracle databaseMARIADB
: MariaDB databaseSNOWFLAKE
: Snowflake cloud data platformBIGQUERY
: Google BigQuery cloud data warehouseHIVE
: Apache Hive databaseDATABRICKS
: Databricks cloud data platformSQLITE
: SQLite databaseAZURE_STORAGE
: Azure Blob StorageGOOGLE_CLOUD_STORAGE
: Google Cloud StorageS3_STORAGE
: Amazon S3 StorageFILE_UPLOAD
: File upload
ConnectorUsage ¶
Usage statistics of a connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
no_of_generators
|
int | None
|
Number of generators using this connector. |
None
|
ConnectorWriteDataConfig ¶
Configuration for writing data to a connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
bytes
|
Binary Parquet file containing the data to write. |
required |
location
|
str
|
Specifies the target within the connector to which to write the data. The format of this parameter varies by connector type. |
required |
if_exists
|
IfExists | None
|
The behavior if the target location already exists.
|
None
|
Credits ¶
The credit balance and limit for the current time period
Parameters:
Name | Type | Description | Default |
---|---|---|---|
current
|
float | None
|
The credit balance for the current time period |
None
|
limit
|
float | None
|
The credit limit for the current time period. If empty, then there is no limit. |
None
|
period_start
|
datetime | None
|
The UTC date and time when the current time period started |
None
|
period_end
|
datetime | None
|
The UTC date and time when the current time period ends |
None
|
CurrentUser ¶
Information on the current user.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str | None
|
The unique identifier of a user. |
None
|
name
|
str | None
|
The name of a user. Contains only alphanumeric characters, hyphens, and underscores. Must start or end with alphanumeric. It must be globally case-insensitive unique considering organizations and users. |
None
|
first_name
|
str | None
|
First name of a user |
None
|
last_name
|
str | None
|
Last name of a user |
None
|
email
|
str | None
|
The email of a user |
None
|
avatar
|
str | None
|
The URL of the user's avatar |
None
|
settings
|
dict[str, Any] | None
|
|
None
|
usage
|
UserUsage | None
|
|
None
|
unread_notifications
|
int | None
|
Number of unread notifications for the user |
None
|
organizations
|
list[OrganizationListItem] | None
|
The organizations the user belongs to |
None
|
DifferentialPrivacyConfig ¶
The optional differential privacy configuration for training the model. If not provided, then no differential privacy will be applied.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_epsilon
|
float | None
|
Specifies the maximum allowable epsilon value. If the training process exceeds this threshold, it will be terminated early. Only model checkpoints with epsilon values below this limit will be retained. If not provided, the training will proceed without early termination based on epsilon constraints. |
None
|
noise_multiplier
|
float | None
|
The ratio of the standard deviation of the Gaussian noise to the L2-sensitivity of the function to which the noise is added (How much noise to add). |
1.5
|
max_grad_norm
|
float | None
|
The maximum norm of the per-sample gradients for training the model with differential privacy. |
1.0
|
delta
|
float | None
|
The delta value for differential privacy. It is the probability of the privacy guarantee not holding. The smaller the delta, the more confident you can be that the privacy guarantee holds. |
'1e-5'
|
Distances ¶
Metrics regarding the nearest neighbor distances between training, holdout, and synthetic samples in an embedding space. Useful for assessing the novelty / privacy of synthetic data.
The provided data is first down-sampled, so that the number of samples match across all datasets. Note, that for an optimal sensitivity of this privacy assessment it is recommended to use a 50/50 split between training and holdout data, and then generate synthetic data of the same size.
The embeddings of these samples are then computed, and the L2 nearest neighbor distances are calculated for each synthetic sample to the training and holdout samples. Based on these nearest neighbor distances the following metrics are calculated: - Identical Match Share (IMS): The share of synthetic samples that are identical to a training or holdout sample. - Distance to Closest Record (DCR): The average distance of synthetic to training or holdout samples.
For privacy-safe synthetic data we expect to see about as many identical matches, and about the same distances for synthetic samples to training, as we see for synthetic samples to holdout.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ims_training
|
float | None
|
Share of synthetic samples that are identical to a training sample. |
None
|
ims_holdout
|
float | None
|
Share of synthetic samples that are identical to a holdout sample. Serves as a reference for |
None
|
dcr_training
|
float | None
|
Average L2 nearest-neighbor distance between synthetic and training samples. |
None
|
dcr_holdout
|
float | None
|
Average L2 nearest-neighbor distance between synthetic and holdout samples. Serves as a reference for |
None
|
dcr_share
|
float | None
|
Share of synthetic samples that are closer to a training sample than to a holdout sample. This should not be significantly larger than 50%. |
None
|
ErrorEvent ¶
An error event containing an error message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
event
|
Literal[str] | None
|
|
None
|
data
|
ErrorMessage | None
|
|
None
|
ErrorMessage ¶
An error message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message
|
str | None
|
The error message |
None
|
FairnessConfig ¶
Configure a fairness objective for the table. Only applicable for a subject table. The generated synthetic data will maintain robust statistical parity between the target column and the specified sensitive columns. All these columns must be categorical.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_column
|
str
|
The name of the target column. |
required |
sensitive_columns
|
list[str]
|
The names of the sensitive columns. |
required |
FilterByUser ¶
Determines whether to filter usage reports for all users or only the current user.
- ALL
: Filter usage reports for all users. Only accessible for SuperAdmins.
- ME
: Filter usage reports for the current user.
Generator ¶
A generator is a set models that can generate synthetic data.
The generator can be trained on one or more source tables. A quality assurance report is generated for each model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a generator. |
required |
name
|
str | None
|
The name of a generator. |
None
|
description
|
str | None
|
The description of a generator. |
None
|
training_status
|
ProgressStatus
|
|
required |
training_time
|
datetime | None
|
The UTC date and time when the training has finished. |
None
|
usage
|
GeneratorUsage | None
|
|
None
|
metadata
|
Metadata | None
|
|
None
|
accuracy
|
float | None
|
The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models. |
None
|
tables
|
list[SourceTable] | None
|
The tables of this generator |
None
|
training
|
Any | None
|
|
None
|
Training ¶
logs ¶
Download the training logs and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the logs. Default is the current working directory. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the saved file. |
progress ¶
Retrieve job progress of training.
Returns:
Name | Type | Description |
---|---|---|
JobProgress |
JobProgress
|
The job progress of the training process. |
wait ¶
Poll training progress and loop until training has completed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
progress_bar
|
bool
|
If true, displays the progress bar. Default is True. |
True
|
interval
|
float
|
The interval in seconds to poll the job progress. Default is 2 seconds. |
2
|
clone ¶
Clone the generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
training_status
|
Literal['new', 'continue']
|
The training status of the cloned generator. Default is "new". |
'new'
|
Returns:
Name | Type | Description |
---|---|---|
Generator |
Generator
|
The cloned generator object. |
config ¶
Retrieve writable generator properties.
Returns:
Name | Type | Description |
---|---|---|
GeneratorConfig |
GeneratorConfig
|
The generator properties as a configuration object. |
export_to_file ¶
Export generator and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the generator. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the saved file. |
reports ¶
Download or display the quality assurance reports.
If display is True, the report is rendered inline via IPython display and no file is downloaded. Otherwise, the report is downloaded and saved to file_path (or a default location if None).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the zipped reports (ignored if display=True). |
None
|
display
|
bool
|
If True, render the report inline instead of downloading it. |
False
|
Returns:
Type | Description |
---|---|
Path | None
|
Path | None: The path to the saved file if downloading, or None if display=True. |
update ¶
Update a generator with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the generator. |
None
|
description
|
str | None
|
The description of the generator. |
None
|
GeneratorCloneTrainingStatus ¶
The training status of the new generator. The available options are:
NEW
: The new generator will re-use existing data and model configurations.CONTINUE
: The new generator will re-use existing data and model configurations, as well as model weights.
GeneratorConfig ¶
The configuration for creating a new generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of a generator. |
None
|
description
|
str | None
|
The description of a generator. |
None
|
tables
|
list[SourceTableConfig] | None
|
The tables of a generator |
None
|
GeneratorImportFromFileConfig ¶
Configuration for importing a generator from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
bytes
|
|
required |
GeneratorListItem ¶
Essential generator details for listings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a generator. |
required |
name
|
str | None
|
The name of a generator. |
None
|
description
|
str | None
|
The description of a generator. |
None
|
training_status
|
ProgressStatus
|
|
required |
training_time
|
datetime | None
|
The UTC date and time when the training has finished. |
None
|
usage
|
GeneratorUsage | None
|
|
None
|
metadata
|
Metadata | None
|
|
None
|
accuracy
|
float | None
|
The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models. |
None
|
GeneratorUsage ¶
Usage statistics of a generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
total_datapoints
|
int | None
|
The total number of datapoints generated by this generator.
Deprecated: This field is no longer valid and will always return |
None
|
total_compute_time
|
int | None
|
The total compute time in seconds used for training this generator. This is the sum of the elapsed compute time of all training tasks. |
None
|
total_credits
|
float | None
|
The amount of credits consumed for training the generator. |
None
|
total_virtual_cpu_time
|
float | None
|
The total virtual CPU time in seconds used for training this generator. This is the sum of the elapsed time multiplied by number of allocated virtual CPUs across all training tasks. |
None
|
total_virtual_gpu_time
|
float | None
|
The total virtual GPU time in seconds used for training this generator. This is the sum of the elapsed time multiplied by number of allocated virtual GPUs across all training tasks. |
None
|
no_of_synthetic_datasets
|
int | None
|
Number of synthetic datasets generated by this generator. |
None
|
no_of_likes
|
int | None
|
Number of likes of this generator. |
None
|
HeartbeatEvent ¶
A heartbeat event to keep the connection alive
Parameters:
Name | Type | Description | Default |
---|---|---|---|
event
|
Literal[str] | None
|
|
None
|
IfExists ¶
The behavior if the target location already exists.
APPEND
: Append the data to the existing target.REPLACE
: Replace the existing target with the new data.FAIL
: Fail if the target already exists.
ImputationConfig ¶
Configure imputation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
list[str]
|
The names of the columns to be imputed. Imputed columns will suppress the sampling of NULL values. |
required |
JobProgress ¶
The progress of a job.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str | None
|
|
None
|
start_date
|
datetime | None
|
The UTC date and time when the job has started. If the job has not started yet, then this is None. |
None
|
end_date
|
datetime | None
|
The UTC date and time when the job has ended. If the job is still, then this is None. |
None
|
progress
|
ProgressValue | None
|
|
None
|
status
|
ProgressStatus | None
|
|
None
|
steps
|
list[ProgressStep] | None
|
|
None
|
MemberRole ¶
The role of the user in the organization
VIEWER
: The user can view and use all resources of the organizationCONTRIBUTOR
: The user can create new resources for an organization, and becomes resource ADMINADMIN
: The user can manage members and all resources of an organization
MessageEvent ¶
A message event containing an assistant message delta
Parameters:
Name | Type | Description | Default |
---|---|---|---|
event
|
Literal[str] | None
|
|
None
|
data
|
AssistantMessageDelta | None
|
|
None
|
MessageStreamEvent ¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root
|
MessageEvent | HeartbeatEvent | ErrorEvent
|
An event in the server-sent event stream |
required |
Metadata ¶
The metadata of a resource.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
creator_id
|
str | None
|
The unique identifier of a user. |
None
|
creator_name
|
str | None
|
The name of a user. Contains only alphanumeric characters, hyphens, and underscores. Must start or end with alphanumeric. It must be globally case-insensitive unique considering organizations and users. |
None
|
created_at
|
datetime | None
|
The UTC date and time when the resource has been created. |
None
|
owner_id
|
str | None
|
The unique identifier of an account (either a user or an organization). |
None
|
owner_name
|
str | None
|
The name of an account (either a user or an organization). |
None
|
owner_type
|
AccountType | None
|
|
None
|
owner_image
|
str | None
|
The URL of the account's image. |
None
|
visibility
|
Visibility | None
|
|
None
|
current_user_permission_level
|
PermissionLevel | None
|
|
None
|
current_user_like_status
|
bool | None
|
A boolean indicating whether the user has liked the entity or not |
None
|
short_lived_file_token
|
str | None
|
An auto-generated short-lived file token ( |
None
|
ModelConfiguration ¶
The training configuration for the model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str | None
|
The model to be used for training. |
None
|
max_sample_size
|
int | None
|
The maximum number of samples to consider for training. If not provided, then all available samples will be taken. |
None
|
batch_size
|
int | None
|
The batch size used for training the model. If not provided, batchSize will be chosen automatically. |
None
|
max_training_time
|
float | None
|
The maximum number of minutes to train the model. |
14400
|
max_epochs
|
float | None
|
The maximum number of epochs to train the model. |
100
|
max_sequence_window
|
int | None
|
The maximum sequence window to consider for training. Only applicable for TABULAR models. |
100
|
enable_flexible_generation
|
bool | None
|
If true, then the trained generator can be used for conditional generation, rebalancing, imputation and fairness. If none of these will be needed, then one can gain extra accuracy by disabling this feature. This will then result in a fixed column order being fed into the training process, rather than a column order, that is randomly permuted for every batch. |
True
|
value_protection
|
bool | None
|
Defines if Rare Category, Extreme value, or Sequence length protection will be applied. |
True
|
rare_category_replacement_method
|
RareCategoryReplacementMethod | None
|
Specifies how rare categories will be sampled. Only applicable if value protection has been enabled.
|
<RareCategoryReplacementMethod.constant: 'CONSTANT'>
|
differential_privacy
|
DifferentialPrivacyConfig | None
|
|
None
|
compute
|
str | None
|
The unique identifier of a compute resource. Not applicable for SDK. |
None
|
enable_model_report
|
bool | None
|
If false, then the Model report is not generated. |
True
|
ModelEncodingType ¶
The encoding type used for model training and data generation.
AUTO
: Model chooses among available encoding types based on the column's data type.TABULAR_CATEGORICAL
: Model samples from existing (non-rare) categories.TABULAR_NUMERIC_AUTO
: Model chooses among 3 numeric encoding types based on the values.TABULAR_NUMERIC_DISCRETE
: Model samples from existing discrete numerical values.TABULAR_NUMERIC_BINNED
: Model samples from binned buckets, to then sample randomly within a bucket.TABULAR_NUMERIC_DIGIT
: Model samples each digit of a numerical value.TABULAR_CHARACTER
: Model samples each character of a string value.TABULAR_DATETIME
: Model samples each part of a datetime value.TABULAR_DATETIME_RELATIVE
: Model samples the relative difference between datetimes within a sequence.TABULAR_LAT_LONG
: Model samples a latitude-longitude column. The format is "latitude,longitude".LANGUAGE_TEXT
: Model will sample free text, using a LANGUAGE model.LANGUAGE_CATEGORICAL
: Model samples from existing (non-rare) categories, using a LANGUAGE model.LANGUAGE_NUMERIC
: Model samples from the valid numeric value range, using a LANGUAGE model.LANGUAGE_DATETIME
: Model samples from the valid datetime value range, using a LANGUAGE model.
ModelMetrics ¶
Metrics regarding the quality of synthetic data, measured in terms of accuracy, similarity, and distances.
- Accuracy: Metrics regarding the accuracy of synthetic data, measured as the closeness of discretized lower dimensional marginal distributions.
- Similarity: Metrics regarding the similarity of the full joint distributions of samples within an embedding space.
- Distances: Metrics regarding the nearest neighbor distances between training, holdout, and synthetic samples in an embedding space. Useful for assessing the novelty / privacy of synthetic data.
The quality of synthetic data is assessed by comparing these metrics to the same metrics of a holdout dataset. The holdout dataset is a subset of the original training data, that was not used for training the synthetic data generator. The metrics of the synthetic data should be as close as possible to the metrics of the holdout data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
accuracy
|
Accuracy | None
|
|
None
|
distances
|
Distances | None
|
|
None
|
similarity
|
Similarity | None
|
|
None
|
ModelType ¶
The type of model.
TABULAR
: A generative AI model tailored towards tabular data, trained from scratch.LANGUAGE
: A generative AI model build upon a (pre-trained) language model.
Notification ¶
A notification for a user.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of the notification. |
required |
type
|
NotificationType
|
|
required |
message
|
str
|
The message of the notification. |
required |
status
|
NotificationStatus
|
|
required |
created_at
|
datetime
|
The UTC date and time when the notification has been created. |
required |
resource_uri
|
str | None
|
The service URI of the entity |
None
|
NotificationStatus ¶
The status of the notification.
NotificationType ¶
The type of the notification
Organization ¶
An organization that owns resources.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of an organization. |
required |
name
|
str
|
The name of an organization. Contains only alphanumeric characters, hyphens, and underscores. Must start or end with alphanumeric. It must be globally case-insensitive unique. |
required |
display_name
|
str
|
The display name of an organization. |
required |
description
|
str | None
|
The description of an organization. Supports markdown. |
None
|
logo
|
str | None
|
The URL of the organization's logo. |
None
|
email
|
str | None
|
The email address of the organization. |
None
|
website
|
str | None
|
The URL of the organization's website. |
None
|
members
|
list[UserListItem] | None
|
|
None
|
metadata
|
OrganizationMetadata | None
|
|
None
|
OrganizationConfig ¶
The configuration for creating a new organization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of an organization. Contains only alphanumeric characters, hyphens, and underscores. Must start or end with alphanumeric. It must be globally case-insensitive unique. |
required |
display_name
|
str
|
The display name of an organization. |
required |
description
|
str | None
|
The description of an organization. Supports markdown. |
None
|
logo_base64
|
str | None
|
The base64-encoded image of the organization's logo. |
None
|
email
|
str | None
|
The email address of the organization. |
None
|
website
|
str | None
|
The URL of the organization's website. |
None
|
OrganizationInvite ¶
A non-personalized time-boxed invite to join an organization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token
|
str | None
|
The generated token, encrypting organization, expiration timestamp, and role (VIEW). |
None
|
link
|
str | None
|
The generated invite link. |
None
|
expiration_date
|
datetime | None
|
The expiration date of the invite link. 72 hours after creation. |
None
|
organization_id
|
str | None
|
The unique identifier of an organization. |
None
|
OrganizationListItem ¶
Essential organization details for listings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of an organization. |
required |
name
|
str | None
|
The name of an organization. Contains only alphanumeric characters, hyphens, and underscores. Must start or end with alphanumeric. It must be globally case-insensitive unique. |
None
|
display_name
|
str
|
The display name of an organization. |
required |
description
|
str | None
|
The description of an organization. Supports markdown. |
None
|
logo
|
str | None
|
The URL of the organization's logo. |
None
|
metadata
|
OrganizationMetadata | None
|
|
None
|
OrganizationMember ¶
A member of an organization with their associated role.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user
|
UserListItem | None
|
|
None
|
role
|
MemberRole | None
|
|
None
|
OrganizationMetadata ¶
The metadata of an organization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
current_user_member_role
|
MemberRole | None
|
|
None
|
PaginatedTotalCount ¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root
|
int
|
The total number of entities within the list |
required |
ParallelGenerationJobs ¶
The number of currently running generation jobs and the limit
Parameters:
Name | Type | Description | Default |
---|---|---|---|
current
|
int | None
|
The number of currently running generation jobs. |
None
|
limit
|
int | None
|
The maximum number of running generation jobs at any time. If empty, then there is no limit. |
None
|
ParallelTrainingJobs ¶
The number of currently running training jobs and the limit
Parameters:
Name | Type | Description | Default |
---|---|---|---|
current
|
int | None
|
The number of currently running training jobs |
None
|
limit
|
int | None
|
The maximum number of running training jobs at any time. If empty, then there is no limit. |
None
|
PermissionLevel ¶
The permission level of the user with respect to this resource
VIEW
: The user can view and use the resourceADMIN
: The user can edit, delete and transfer ownership of the resource
Probe ¶
The generated synthetic samples returned as a result of the probe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the table. |
None
|
rows
|
list[dict[str, Any]] | None
|
An array of sample data objects. |
None
|
ProgressStatus ¶
The status of a job or a step.
NEW
: The job/step is being configured, and has not started yetCONTINUE
: The job/step is being configured, but has existing artefactsON_HOLD
: The job/step has been started, but is kept on holdQUEUED
: The job/step has been started, and is awaiting for resources to executeIN_PROGRESS
: The job/step is currently runningDONE
: The job/step has finished successfullyFAILED
: The job/step has failedCANCELED
: The job/step has been canceled
ProgressStep ¶
The progress of a step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str | None
|
The unique identifier of the step. |
None
|
model_label
|
str | None
|
The unique label for the model, consisting of table name and a suffix for the model type. This will be empty for steps that are not related to a model. |
None
|
compute_name
|
str | None
|
The name of a compute resource. |
None
|
restarts
|
int | None
|
The number of previous restarts for the corresponding task. |
0
|
task_type
|
TaskType | None
|
|
None
|
step_code
|
StepCode | None
|
|
None
|
start_date
|
datetime | None
|
The UTC date and time when the job has started. If the job has not started yet, then this is None. |
None
|
end_date
|
datetime | None
|
The UTC date and time when the job has ended. If the job is still, then this is None. |
None
|
compute_resources
|
ComputeResources | None
|
|
None
|
messages
|
list[dict[str, Any]] | None
|
|
None
|
error_message
|
str | None
|
|
None
|
progress
|
ProgressValue | None
|
|
None
|
status
|
ProgressStatus | None
|
|
None
|
ProgressValue ¶
The progress of a job or a step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
int | None
|
|
0
|
max
|
int | None
|
|
1
|
RareCategoryReplacementMethod ¶
Specifies how rare categories will be sampled. Only applicable if value protection has been enabled.
CONSTANT
: Replace rare categories by a constant_RARE_
token.SAMPLE
: Replace rare categories by a sample from non-rare categories.
RebalancingConfig ¶
Configure rebalancing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
The name of the column to be rebalanced. Only applicable for a subject table. Only applicable for categorical columns. |
required |
probabilities
|
dict[str, float]
|
The target distribution of samples values. The keys are the categorical values, and the values are the probabilities. |
required |
SetVisibilityConfig ¶
Configuration for setting the visibility of a resource.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
visibility
|
Visibility | None
|
|
None
|
Similarity ¶
Metrics regarding the similarity of the full joint distributions of samples within an embedding space.
- Cosine Similarity: The cosine similarity between the centroids of synthetic and training samples.
- Discriminator AUC: The AUC of a discriminative model to distinguish between synthetic and training samples.
The SentenceTransformer model all-MiniLM-L6-v2 is used to compute the embeddings of a string-ified representation of individual records. In case of sequential data the records, that belong to the same group, are being concatenated. We then calculate the cosine similarity between the centroids of the provided datasets within the embedding space.
Again, we expect the similarity metrics to be as close as possible to 1, but not significantly higher than what is measured for the holdout data, as this would again indicate overfitting.
In addition, a discriminative ML model is trained to distinguish between training and synthetic samples. The ability of this model to distinguish between training and synthetic samples is measured by the AUC score. For synthetic data to be considered realistic, the AUC score should be close to 0.5, which indicates that the synthetic data is indistinguishable from the training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cosine_similarity_training_synthetic
|
float | None
|
Cosine similarity between training and synthetic centroids. |
None
|
cosine_similarity_training_holdout
|
float | None
|
Cosine similarity between training and holdout centroids.
Serves as a reference for |
None
|
discriminator_auc_training_synthetic
|
float | None
|
Cross-validated AUC of a discriminative model to distinguish between training and synthetic samples. |
None
|
discriminator_auc_training_holdout
|
float | None
|
Cross-validated AUC of a discriminative model to distinguish between training and holdout samples.
Serves as a reference for |
None
|
SourceColumn ¶
A column as part of a source table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a source column. |
required |
name
|
str
|
The name of a source column. It must be unique within a source table. |
required |
included
|
bool | None
|
If true, the column will be included in the training. If false, the column will be excluded from the training. |
None
|
model_encoding_type
|
ModelEncodingType
|
|
required |
value_range
|
SourceColumnValueRange | None
|
|
None
|
SourceColumnConfig ¶
The configuration for a source column when creating a new generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of a source column. It must be unique within a source table. |
required |
model_encoding_type
|
ModelEncodingType | None
|
|
<ModelEncodingType.auto: 'AUTO'>
|
SourceColumnValueRange ¶
The (privacy-safe) range of values detected within a source column. These values can then be used as seed values for conditional generation. For CATEGORICAL and NUMERIC_DISCRETE encoding types, this will be given as a list of unique values, sorted by popularity. For other NUMERIC and for DATETIME encoding types, this will be given as a min and max value. Note, that this property will only be populated, once the analysis step for the training of the generator has been completed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min
|
str | None
|
The minimum value of the column. For dates, this is represented in ISO format. |
None
|
max
|
str | None
|
The maximum value of the column. For dates, this is represented in ISO format. |
None
|
values
|
list[str] | None
|
The list of distinct values of the column. Limited to a maximum of 1000 values. |
None
|
has_null
|
bool | None
|
If true, null value was detected within the column. |
None
|
SourceForeignKey ¶
A foreign key relationship in a source table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a foreign key. |
required |
column
|
str
|
The column name of a foreign key. |
required |
referenced_table
|
str
|
The table name of the referenced table. That table must have a primary key already defined. |
required |
is_context
|
bool
|
If true, then the foreign key will be considered as a context relation. Note, that only one foreign key relation per table can be a context relation. |
required |
SourceForeignKeyConfig ¶
Configuration for defining a foreign key relationship in a source table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
The column name of a foreign key. |
required |
referenced_table
|
str
|
The table name of the referenced table. That table must have a primary key already defined. |
required |
is_context
|
bool
|
If true, then the foreign key will be considered as a context relation. Note, that only one foreign key relation per table can be a context relation. |
required |
SourceTable ¶
A table as part of a generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a source table. |
required |
source_connector_id
|
str | None
|
The unique identifier of a connector. |
None
|
location
|
str | None
|
The location of a source table. Together with the source connector it uniquely identifies a source, and samples data from there. |
None
|
name
|
str
|
The name of a source table. It must be unique within a generator. |
required |
primary_key
|
str | None
|
The column name of the primary key. |
None
|
columns
|
list[SourceColumn] | None
|
The columns of this generator table. |
None
|
foreign_keys
|
list[SourceForeignKey] | None
|
The foreign keys of a table. |
None
|
tabular_model_metrics
|
ModelMetrics | None
|
|
None
|
language_model_metrics
|
ModelMetrics | None
|
|
None
|
tabular_model_configuration
|
ModelConfiguration | None
|
|
None
|
language_model_configuration
|
ModelConfiguration | None
|
|
None
|
total_rows
|
int | None
|
The total number of rows in the source table while fetching data for training. |
None
|
SourceTableAddConfig ¶
Configuration for adding a new source table to a generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_connector_id
|
str
|
The unique identifier of a connector. |
required |
location
|
str
|
The location of a source table. Together with the source connector it uniquely identifies a source, and samples data from there. |
required |
name
|
str | None
|
The name of a source table. It must be unique within a generator. |
None
|
include_children
|
bool | None
|
If true, all tables that are referenced by foreign keys will be included. If false, only the selected table will be included. |
None
|
tabular_model_configuration
|
ModelConfiguration | None
|
|
None
|
language_model_configuration
|
ModelConfiguration | None
|
|
None
|
SourceTableConfig ¶
The configuration for a source table when creating a new generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of a source table. It must be unique within a generator. |
required |
source_connector_id
|
str | None
|
The unique identifier of a connector. |
None
|
location
|
str | None
|
The location of a source table. Together with the source connector it uniquely identifies a source, and samples data from there. |
None
|
data
|
str | None
|
The base64-encoded string derived from a Parquet file containing the specified source table. |
None
|
tabular_model_configuration
|
ModelConfiguration | None
|
|
None
|
language_model_configuration
|
ModelConfiguration | None
|
|
None
|
primary_key
|
str | None
|
The column name of the primary key. |
None
|
foreign_keys
|
list[SourceForeignKeyConfig] | None
|
The foreign key configurations of this table. |
None
|
columns
|
list[SourceColumnConfig] | None
|
The column configurations of this table. |
None
|
StepCode ¶
The unique code for the step.
SyntheticDataset ¶
A synthetic dataset is created based on a trained generator.
It consists of synthetic samples, as well as a quality assurance report.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a synthetic dataset. |
required |
generator_id
|
str | None
|
The unique identifier of a generator. |
None
|
metadata
|
Metadata | None
|
|
None
|
name
|
str | None
|
The name of a synthetic dataset. |
None
|
description
|
str | None
|
The description of a synthetic dataset. |
None
|
generation_status
|
ProgressStatus
|
|
required |
generation_time
|
datetime | None
|
The UTC date and time when the generation has finished. |
None
|
tables
|
list[SyntheticTable] | None
|
The tables of this synthetic dataset. |
None
|
delivery
|
SyntheticDatasetDelivery | None
|
|
None
|
accuracy
|
float | None
|
The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models. |
None
|
usage
|
SyntheticDatasetUsage | None
|
|
None
|
compute
|
str | None
|
The unique identifier of a compute resource. Not applicable for SDK. |
None
|
generation
|
Any | None
|
|
None
|
Generation ¶
logs ¶
Download the generation logs and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the logs. Default is the current working directory. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the saved file. |
progress ¶
Retrieve the progress of the generation process.
Returns:
Name | Type | Description |
---|---|---|
JobProgress |
JobProgress
|
The progress of the generation process. |
wait ¶
Poll the generation progress and wait until the process is complete.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
progress_bar
|
bool
|
If true, displays a progress bar. Default is True. |
True
|
interval
|
float
|
Interval in seconds to poll the job progress. Default is 2 seconds. |
2
|
config ¶
Retrieve writable synthetic dataset properties.
Returns:
Name | Type | Description |
---|---|---|
SyntheticDatasetConfig |
SyntheticDatasetConfig
|
The synthetic dataset properties as a configuration object. |
data ¶
Download synthetic dataset and return as dictionary of pandas DataFrames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
return_type
|
Literal['auto', 'dict']
|
The format of the returned data. Default is "auto". |
'auto'
|
Returns:
Type | Description |
---|---|
DataFrame | dict[str, DataFrame]
|
Union[pd.DataFrame, dict[str, pd.DataFrame]]: The synthetic dataset as a dictionary of pandas DataFrames. |
download ¶
Download synthetic dataset and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the synthetic dataset. |
None
|
format
|
Literal['parquet', 'csv', 'json']
|
The format of the synthetic dataset. Default is "parquet". |
'parquet'
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the saved file. |
reports ¶
Download or display the quality assurance reports.
If display is True, the report is rendered inline via IPython display and no file is downloaded. Otherwise, the report is downloaded and saved to file_path (or a default location if None).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the zipped reports (ignored if display=True). |
None
|
display
|
bool
|
If True, render the report inline instead of downloading it. |
False
|
Returns:
Type | Description |
---|---|
Path | None
|
Path | None: The path to the saved file if downloading, or None if display=True. |
update ¶
Update a synthetic dataset with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the synthetic dataset. |
None
|
description
|
str | None
|
The description of the synthetic dataset. |
None
|
delivery
|
SyntheticDatasetDelivery | None
|
The delivery configuration for the synthetic dataset. |
None
|
SyntheticDatasetConfig ¶
The configuration for creating a new synthetic dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator_id
|
str | None
|
The unique identifier of a generator. |
None
|
name
|
str | None
|
The name of a synthetic dataset. |
None
|
description
|
str | None
|
The description of a synthetic dataset. |
None
|
tables
|
list[SyntheticTableConfig] | None
|
|
None
|
delivery
|
SyntheticDatasetDelivery | None
|
|
None
|
compute
|
str | None
|
The unique identifier of a compute resource. Not applicable for SDK. |
None
|
SyntheticDatasetDelivery ¶
Configuration for delivering a synthetic dataset to a destination.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
overwrite_tables
|
bool
|
If true, tables in the destination will be overwritten. If false, any tables exist, the delivery will fail. |
required |
destination_connector_id
|
str
|
The unique identifier of a connector. |
required |
location
|
str
|
The location for the destination connector. |
required |
SyntheticDatasetListItem ¶
Essential synthetic dataset details for listings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a synthetic dataset. |
required |
metadata
|
Metadata | None
|
|
None
|
name
|
str | None
|
The name of a synthetic dataset. |
None
|
description
|
str | None
|
The description of a synthetic dataset. |
None
|
generation_status
|
ProgressStatus
|
|
required |
generation_time
|
datetime | None
|
The UTC date and time when the generation has finished. |
None
|
usage
|
SyntheticDatasetUsage | None
|
|
None
|
SyntheticDatasetUsage ¶
Usage statistics of a synthetic dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
total_datapoints
|
int | None
|
The number of datapoints in the synthetic dataset.
Deprecated: This field is no longer valid and will always return |
None
|
total_compute_time
|
int | None
|
The total compute time in seconds used for generating this synthetic dataset. This is the sum of the compute time of all trained tasks. |
None
|
total_credits
|
float | None
|
The amount of credits consumed for generating the synthetic dataset. |
None
|
total_virtual_cpu_time
|
float | None
|
The total virtual CPU time in seconds used for training this generator. This is the sum of the elapsed time multiplied by number of allocated virtual CPUs across all training tasks. |
None
|
total_virtual_gpu_time
|
float | None
|
The total virtual GPU time in seconds used for training this generator. This is the sum of the elapsed time multiplied by number of allocated virtual GPUs across all training tasks. |
None
|
no_of_likes
|
int | None
|
Number of likes of this synthetic dataset. |
None
|
SyntheticProbeConfig ¶
The configuration for probing for new synthetic samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator_id
|
str | None
|
The unique identifier of a generator. |
None
|
tables
|
list[SyntheticTableConfig] | None
|
|
None
|
SyntheticTable ¶
A synthetic table that will be generated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str | None
|
The unique identifier of a synthetic table. |
None
|
name
|
str
|
The name of a source table. It must be unique within a generator. |
required |
configuration
|
SyntheticTableConfiguration | None
|
|
None
|
tabular_model_metrics
|
ModelMetrics | None
|
|
None
|
language_model_metrics
|
ModelMetrics | None
|
|
None
|
foreign_keys
|
list[SourceForeignKey] | None
|
The foreign keys of a table. |
None
|
total_rows
|
int | None
|
The total number of rows for that table in the generated synthetic dataset. |
None
|
total_datapoints
|
int | None
|
Deprecated: This field is no longer valid and will always return |
None
|
source_table_total_rows
|
int | None
|
The total number of rows in the source table while fetching data for training. |
None
|
SyntheticTableConfig ¶
The configuration for a synthetic table when creating a new synthetic dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of a synthetic table. This matches the name of a corresponding SourceTable. |
required |
configuration
|
SyntheticTableConfiguration | None
|
|
None
|
SyntheticTableConfiguration ¶
The sample configuration for a synthetic table
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_size
|
int | None
|
Number of generated samples. Only applicable for subject tables. If neither size nor seed is provided, then the default behavior for Synthetic Datasets is to generate the same size of samples as the original, and the default behavior for Synthetic Probes is to generate one subject only. |
None
|
sample_seed_connector_id
|
str | None
|
The connector id of the seed data for conditional generation. Only applicable for subject tables. |
None
|
sample_seed_dict
|
str | None
|
The base64-encoded string derived from a json line file containing the specified sample seed data. This allows conditional live probing via non-python clients. Only applicable for subject tables. |
None
|
sample_seed_data
|
str | None
|
The base64-encoded string derived from a Parquet file containing the specified sample seed data. This allows conditional generation as well as live probing via python clients. Only applicable for subject tables. |
None
|
sampling_temperature
|
float | None
|
temperature for sampling |
None
|
sampling_top_p
|
float | None
|
topP for sampling |
None
|
rebalancing
|
RebalancingConfig | None
|
|
None
|
imputation
|
ImputationConfig | None
|
|
None
|
fairness
|
FairnessConfig | None
|
|
None
|
enable_data_report
|
bool | None
|
If false, then the Data report is not generated. If enableDataReport is set to false on generator, then enableDataReport is automatically set to false. |
True
|
TaskType ¶
The type of the task.
TransferOwnershipConfig ¶
The configuration for transferring ownership of a resource to an account.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
account_id
|
str | None
|
The unique identifier of an account (either a user or an organization). |
None
|
User ¶
The public attributes of a user of the service.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str | None
|
The unique identifier of a user. |
None
|
name
|
str | None
|
The name of a user. Contains only alphanumeric characters, hyphens, and underscores. Must start or end with alphanumeric. It must be globally case-insensitive unique considering organizations and users. |
None
|
first_name
|
str | None
|
First name of a user |
None
|
last_name
|
str | None
|
Last name of a user |
None
|
avatar
|
str | None
|
The URL of the user's avatar |
None
|
organizations
|
list[OrganizationListItem] | None
|
The organizations the user belongs to |
None
|
UserListItem ¶
Essential information about a user for public listings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str | None
|
The unique identifier of a user. |
None
|
name
|
str | None
|
The name of a user. Contains only alphanumeric characters, hyphens, and underscores. Must start or end with alphanumeric. It must be globally case-insensitive unique considering organizations and users. |
None
|
first_name
|
str | None
|
First name of a user |
None
|
last_name
|
str | None
|
Last name of a user |
None
|
avatar
|
str | None
|
The URL of the user's avatar |
None
|
UserSettingsAssistantUpdateConfig ¶
Configuration for updating a user's assistant-related settings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
about_user_message
|
str | None
|
The instruction what the Assistant should know about the user to provide better response |
None
|
about_model_message
|
str | None
|
The instruction how the Assistant should respond |
None
|
UserSettingsProfileUpdateConfig ¶
Configuration for updating a user's profile settings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of a user. Contains only alphanumeric characters, hyphens, and underscores. Must start or end with alphanumeric. It must be globally case-insensitive unique considering organizations and users. |
None
|
first_name
|
str | None
|
First name of a user |
None
|
last_name
|
str | None
|
Last name of a user |
None
|
avatar
|
str | None
|
The base64-encoded image of the user's avatar |
None
|
UserSettingsUpdateConfig ¶
The configuration for updating user settings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
profile
|
UserSettingsProfileUpdateConfig | None
|
|
None
|
assistant
|
UserSettingsAssistantUpdateConfig | None
|
|
None
|
UserUsage ¶
Usage statistics and limits for the current user.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
credits
|
Credits | None
|
The credit balance and limit for the current time period |
None
|
parallel_training_jobs
|
ParallelTrainingJobs | None
|
The number of currently running training jobs and the limit |
None
|
parallel_generation_jobs
|
ParallelGenerationJobs | None
|
The number of currently running generation jobs and the limit |
None
|
Visibility ¶
Indicates the visibility of the resource.
PUBLIC
- Everyone can access the resource.UNLISTED
- Anyone with the direct link can access the resource. No public listings.PRIVATE
- Accessible only by the owner. For organizations, all members can access.
_SyntheticDataConfigValidation ¶
Validation logic for SyntheticDatasetConfig and SyntheticProbeConfig against Generator
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_config
|
SyntheticDatasetConfig | SyntheticProbeConfig
|
|
required |
generator
|
Generator
|
|
required |
_SyntheticTableConfigValidation ¶
Validation logic for SyntheticTableConfig against SourceTable
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_table
|
SyntheticTableConfig
|
|
required |
source_table
|
SourceTable
|
|
required |