API Reference¶
MOSTLY AI Client¶
Instantiate an SDK instance, either in CLIENT or in LOCAL mode.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
base_url
|
str | None
|
The base URL. If not provided, env var |
None
|
api_key
|
str | None
|
The API key for authenticating. If not provided, env var |
None
|
local
|
bool | None
|
Whether to run in local mode or not. If not provided, user is prompted to choose between CLIENT and LOCAL mode. |
None
|
local_dir
|
str | Path | None
|
The directory to use for local mode. If not provided, |
None
|
local_port
|
int | None
|
The port to use for local mode with TCP transport. If not provided, UDS transport is used. |
None
|
timeout
|
float
|
Timeout for HTTPS requests in seconds. Default is 60 seconds. |
60.0
|
ssl_verify
|
bool
|
Whether to verify SSL certificates. Default is True. |
True
|
quiet
|
bool
|
Whether to suppress rich output. Default is False. |
False
|
Example for SDK in CLIENT mode with explicit arguments
Example for SDK in CLIENT mode with environment variables
Example for SDK in LOCAL mode connecting via UDS
Example for SDK in LOCAL mode connecting via TCP
mostlyai.sdk.client.api.MostlyAI.about ¶
Retrieve information about the platform.
Returns:
Name | Type | Description |
---|---|---|
AboutService |
AboutService
|
Information about the platform. |
mostlyai.sdk.client.api.MostlyAI.computes ¶
Retrieve a list of available compute resources, that can be used for executing tasks. Returns: list[dict[str, Any]]: A list of available compute resources.
mostlyai.sdk.client.api.MostlyAI.connect ¶
Create a connector and optionally validate the connection before saving.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
ConnectorConfig | dict[str, Any]
|
Configuration for the connector. Can be either a ConnectorConfig object or an equivalent dictionary. |
required |
test_connection
|
bool | None
|
Whether to validate the connection before saving. Default is True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
Connector |
Connector
|
The created connector. |
Example for creating a connector to a AWS S3 storage
The structures of the config
, secrets
and ssl
parameters depend on the connector type
:
- Cloud storage:
- type: AZURE_STORAGE config: accountName: string clientId: string (required for auth via service principal) tenantId: string (required for auth via service principal) secrets: accountKey: string (required for regular auth) clientSecret: string (required for auth via service principal) - type: GOOGLE_CLOUD_STORAGE config: secrets: keyFile: string - type: S3_STORAGE config: accessKey: string endpointUrl: string (only needed for S3-compatible storage services other than AWS) secrets: secretKey: string
- Database:
- type: BIGQUERY config: secrets: keyFile: string - type: DATABRICKS config: host: string httpPath: string catalog: string clientId: string (required for auth via service principal) tenantId: string (required for auth via service principal) secrets: accessToken: string (required for regular auth) clientSecret: string (required for auth via service principal) - type: HIVE config: host: string port: integer, default: 10000 username: string (required for regular auth) kerberosEnabled: boolean, default: false kerberosPrincipal: string (required if kerberosEnabled) kerberosKrb5Conf: string (required if kerberosEnabled) sslEnabled: boolean, default: false secrets: password: string (required for regular auth) kerberosKeytab: base64-encoded string (required if kerberosEnabled) ssl: caCertificate: base64-encoded string - type: MARIADB config: host: string port: integer, default: 3306 username: string secrets: password: string - type: MSSQL config: host: string port: integer, default: 1433 username: string database: string secrets: password: string - type: MYSQL config: host: string port: integer, default: 3306 username: string secrets: password: string - type: ORACLE config: host: string port: integer, default: 1521 username: string connectionType: enum {SID, SERVICE_NAME}, default: SID database: string, default: ORCL secrets: password: string - type: POSTGRES config: host: string port: integer, default: 5432 username: string database: string sslEnabled: boolean, default: false secrets: password: string ssl: rootCertificate: base64-encoded string sslCertificate: base64-encoded string sslCertificateKey: base64-encoded string - type: SNOWFLAKE config: account: string username: string warehouse: string, default: COMPUTE_WH database: string secrets: password: string
mostlyai.sdk.client.api.MostlyAI.generate ¶
generate(
generator,
config=None,
size=None,
seed=None,
name=None,
start=True,
wait=True,
progress_bar=True,
)
Generate synthetic data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator
|
Generator | str
|
The generator instance or its UUID. |
required |
config
|
SyntheticDatasetConfig | dict | None
|
Configuration for the synthetic dataset. |
None
|
size
|
int | dict[str, int] | None
|
Sample size(s) for the subject table(s). |
None
|
seed
|
Seed | dict[str, Seed] | None
|
Seed data for the subject table(s). |
None
|
name
|
str | None
|
Name of the synthetic dataset. |
None
|
start
|
bool
|
Whether to start generation immediately. Default is True. |
True
|
wait
|
bool
|
Whether to wait for generation to finish. Default is True. |
True
|
progress_bar
|
bool
|
Whether to display a progress bar during generation. Default is True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
SyntheticDataset |
SyntheticDataset
|
The created synthetic dataset. |
Example configuration using short-hand notation
Example configuration using a dictionary
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
sd = mostly.generate(
config={
'generator': g,
'tables': [
{
'name': 'data',
'configuration': { # all parameters are optional!
'sample_size': None, # set to None to generate as many samples as original; otherwise, set to an integer; only applicable for subject tables
# 'sample_seed_data': seed_df, # provide a DataFrame to conditionally generate samples; only applicable for subject tables
'sampling_temperature': 1.0,
'sampling_top_p': 1.0,
'rebalancing': {
'column': 'age',
'probabilities': {'male': 0.5, 'female': 0.5},
},
'imputation': {
'columns': ['age'],
},
'fairness': {
'target_column': 'income',
'sensitive_columns': ['gender'],
},
'enable_data_report': True, # disable for faster generation
}
}
]
}
)
mostlyai.sdk.client.api.MostlyAI.me ¶
Retrieve information about the current user.
Returns:
Name | Type | Description |
---|---|---|
CurrentUser |
CurrentUser
|
Information about the current user. |
mostlyai.sdk.client.api.MostlyAI.models ¶
Retrieve a list of available models of a specific type.
Returns:
Type | Description |
---|---|
dict[str:list[str]]
|
dict[str, list[str]]: A dictionary with list of available models for each ModelType. |
mostlyai.sdk.client.api.MostlyAI.probe ¶
Probe a generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator
|
Generator | str
|
The generator instance or its UUID. |
required |
size
|
int | dict[str, int] | None
|
Sample size(s) for the subject table(s). Default is 1, if no seed is provided. |
None
|
seed
|
Seed | dict[str, Seed] | None
|
Seed data for the subject table(s). |
None
|
config
|
SyntheticProbeConfig | dict | None
|
Configuration for the probe. |
None
|
return_type
|
Literal['auto', 'dict']
|
Format of the return value. "auto" for pandas DataFrame if a single table, otherwise a dictionary. Default is "auto". |
'auto'
|
Returns:
Type | Description |
---|---|
DataFrame | dict[str, DataFrame]
|
pd.DataFrame | dict[str, pd.DataFrame]: The created synthetic probe. |
Example for probing a generator for 10 synthetic samples
Example for conditional probing a generator for 10 synthetic samples
import pandas as pd
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
g = mostly.generators.get('INSERT_YOUR_GENERATOR_ID')
print('columns:', [c.name for c in g.tables[0].columns])
# columns: ['age', 'workclass', 'fnlwgt', ...]
col = g.tables[0].columns[1]
print(col.name, col.value_range.values)
# workclass: ['Federal-gov', 'Local-gov', 'Never-worked', ...]
mostly.probe(
generator=g,
seed=pd.DataFrame({
'age': [63, 45],
'sex': ['Female', 'Male'],
'workclass': ['Sales', 'Tech-support'],
}),
)
mostlyai.sdk.client.api.MostlyAI.train ¶
Train a generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
GeneratorConfig | dict | None
|
The configuration parameters of the generator to be created. Either |
None
|
data
|
DataFrame | str | Path | None
|
A single pandas DataFrame, or a path to a CSV or PARQUET file. Either |
None
|
name
|
str | None
|
Name of the generator. |
None
|
start
|
bool
|
Whether to start training immediately. Default is True. |
True
|
wait
|
bool
|
Whether to wait for training to finish. Default is True. |
True
|
progress_bar
|
bool
|
Whether to display a progress bar during training. Default is True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
Generator |
Generator
|
The created generator. |
Example of single table with default configurations
# read original data
import pandas as pd
df = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/census/census.csv.gz')
# instantiate client
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
# train generator
g = mostly.train(
name='census',
data=df, # alternatively, pass a path to a CSV or PARQUET file
start=True, # start training immediately
wait=True, # wait for training to finish
)
Example of single table with custom configurations
# read original data
import pandas as pd
df = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/baseball/players.csv.gz')
# instantiate client
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
# configure generator via dictionary
g = mostly.train(
config={ # see `mostlyai.sdk.domain.GeneratorConfig`
'name': 'Baseball Players',
'tables': [
{ # see `mostlyai.sdk.domain.SourceTableConfig`
'name': 'players', # name of the table (required)
'data': df, # either provide data as a pandas DataFrame
'source_connector_id': None, # - or pass a source_connector_id
'location': None, # - together with a table location
'primary_key': 'id', # specify the primary key column, if one is present
'tabular_model_configuration': { # see `mostlyai.sdk.domain.ModelConfiguration`; all settings are optional!
'model': 'MOSTLY_AI/Medium', # check `mostly.models()` for available models
'batch_size': None, # set a custom physical training batch size
'max_sample_size': 100_000, # cap sample size to 100k; set to None for max accuracy
'max_epochs': 50, # cap training to 50 epochs; set to None for max accuracy
'max_training_time': 60, # cap runtime to 60min; set to None for max accuracy
'enable_flexible_generation': True, # allow seed, imputation, rebalancing and fairness; set to False for max accuracy
'value_protection': True, # privacy protect value ranges; set to False for allowing all seen values
'differential_privacy': { # set DP configs if explicitly requested
'max_epsilon': 10.0, # - max epsilon value, used as stopping criterion
'noise_multiplier': 1.5, # - DP noise multiplier
'max_grad_norm': 1.0, # - DP max grad norm
'delta': 1e-5, # - DP delta value
},
'enable_model_report': True, # generate a model report, including quality metrics
},
'columns': [ # list columns (optional); see `mostlyai.sdk.domain.ModelEncodingType`
{'name': 'id', 'model_encoding_type': 'TABULAR_CATEGORICAL'},
{'name': 'bats', 'model_encoding_type': 'TABULAR_CATEGORICAL'},
{'name': 'throws', 'model_encoding_type': 'TABULAR_CATEGORICAL'},
{'name': 'birthDate', 'model_encoding_type': 'TABULAR_DATETIME'},
{'name': 'weight', 'model_encoding_type': 'TABULAR_NUMERIC_AUTO'},
{'name': 'height', 'model_encoding_type': 'TABULAR_NUMERIC_AUTO'},
],
}
]
},
start=True, # start training immediately
wait=True, # wait for training to finish
)
Example of multi-table with custom configurations
# read original data
import pandas as pd
df_purchases = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/refs/heads/dev/cdnow/purchases.csv.gz')
df_users = df_purchases[['users_id']].drop_duplicates()
# instantiate client
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
# train generator
g = mostly.train(config={
'name': 'CDNOW', # name of the generator
'tables': [{ # provide list of all tables
'name': 'users',
'data': df_users,
'primary_key': 'users_id', # define PK column
}, {
'name': 'purchases',
'data': df_purchases,
'foreign_keys': [{ # define FK columns, with one providing the context
'column': 'users_id',
'referenced_table': 'users',
'is_context': True
}],
'tabular_model_configuration': {
'max_sample_size': 10_000, # cap sample size to 10k users; set to None for max accuracy
'max_training_time': 60, # cap runtime to 60min; set to None for max accuracy
'max_sequence_window': 10, # optionally limit the sequence window
},
}],
}, start=True, wait=True)
Example of multi-model with TABULAR and LANGUAGE models
# read original data
import pandas as pd
df = pd.read_parquet('https://github.com/mostly-ai/public-demo-data/raw/refs/heads/dev/headlines/headlines.parquet')
# instantiate SDK
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
# print out available LANGUAGE models
print(mostly.models()["LANGUAGE"])
# train a generator
g = mostly.train(config={
'name': 'Headlines',
'tables': [{
'name': 'headlines',
'data': df,
'columns': [ # configure TABULAR + LANGUAGE cols
{'name': 'category', 'model_encoding_type': 'TABULAR_CATEGORICAL'},
{'name': 'date', 'model_encoding_type': 'TABULAR_DATETIME'},
{'name': 'headline', 'model_encoding_type': 'LANGUAGE_TEXT'},
],
'tabular_model_configuration': { # tabular model configuration (optional)
'max_sample_size': 20_000, # cap sample size to 20k; set None for max accuracy
'max_training_time': 30, # cap runtime to 30min; set None for max accuracy
},
'language_model_configuration': { # language model configuration (optional)
'max_sample_size': 1_000, # cap sample size to 1k; set None for max accuracy
'max_training_time': 60, # cap runtime to 60min; set None for max accuracy
'model': 'MOSTLY_AI/LSTMFromScratch-3m', # use a light-weight LSTM model, trained from scratch (GPU recommended)
#'model': 'microsoft/phi-1.5', # alternatively use a pre-trained HF-hosted LLM model (GPU required)
}
}],
}, start=True, wait=True)
Generators¶
mostlyai.sdk.client.generators._MostlyGeneratorsClient.create ¶
Create a generator. The generator will be in the NEW state and will need to be trained before it can be used.
See mostly.train
for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
GeneratorConfig | dict
|
Configuration for the generator. |
required |
Returns:
Type | Description |
---|---|
Generator
|
The created generator object. |
Example for creating a generator
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
g = mostly.generators.create(
config={
"name": "US Census",
"tables": [{
"name": "census",
"data": trn_df,
}]
)
)
print("status:", g.training_status)
# status: NEW
g.training.start() # start training
print("status:", g.training_status)
# status: QUEUED
g.training.wait() # wait for training to complete
print("status:", g.training_status)
# status: DONE
mostlyai.sdk.client.generators._MostlyGeneratorsClient.get ¶
Retrieve a generator by its ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator_id
|
str
|
The unique identifier of the generator. |
required |
Returns:
Name | Type | Description |
---|---|---|
Generator |
Generator
|
The retrieved generator object. |
mostlyai.sdk.client.generators._MostlyGeneratorsClient.import_from_file ¶
Import a generator from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path
|
Local file path or URL of the generator to import. |
required |
Returns:
Type | Description |
---|---|
Generator
|
The imported generator object. |
mostlyai.sdk.client.generators._MostlyGeneratorsClient.list ¶
List generators.
Paginate through all generators accessible by the user.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
Offset for the entities in the response. |
0
|
limit
|
int | None
|
Limit for the number of entities in the response. |
None
|
status
|
str | list[str] | None
|
Filter by training status. |
None
|
search_term
|
str | None
|
Filter by name or description. |
None
|
owner_id
|
str | list[str] | None
|
Filter by owner ID. |
None
|
Returns:
Type | Description |
---|---|
Iterator[GeneratorListItem]
|
Iterator[GeneratorListItem]: An iterator over generator list items. |
Example for listing all generators
Generator¶
A generator is a set models that can generate synthetic data.
The generator can be trained on one or more source tables. A quality assurance report is generated for each model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a generator. |
required |
name
|
str | None
|
The name of a generator. |
None
|
description
|
str | None
|
The description of a generator. |
None
|
training_status
|
ProgressStatus
|
|
required |
training_time
|
datetime | None
|
The UTC date and time when the training has finished. |
None
|
usage
|
GeneratorUsage | None
|
|
None
|
metadata
|
Metadata | None
|
|
None
|
accuracy
|
float | None
|
The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models. |
None
|
tables
|
list[SourceTable] | None
|
The tables of this generator |
None
|
training
|
Any | None
|
|
None
|
mostlyai.sdk.domain.Generator.Training ¶
mostlyai.sdk.domain.Generator.Training.logs ¶
Download the training logs and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the logs. Default is the current working directory. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the saved file. |
mostlyai.sdk.domain.Generator.Training.progress ¶
Retrieve job progress of training.
Returns:
Name | Type | Description |
---|---|---|
JobProgress |
JobProgress
|
The job progress of the training process. |
mostlyai.sdk.domain.Generator.Training.wait ¶
Poll training progress and loop until training has completed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
progress_bar
|
bool
|
If true, displays the progress bar. Default is True. |
True
|
interval
|
float
|
The interval in seconds to poll the job progress. Default is 2 seconds. |
2
|
mostlyai.sdk.domain.Generator.clone ¶
Clone the generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
training_status
|
Literal['new', 'continue']
|
The training status of the cloned generator. Default is "new". |
'new'
|
Returns:
Name | Type | Description |
---|---|---|
Generator |
Generator
|
The cloned generator object. |
mostlyai.sdk.domain.Generator.config ¶
Retrieve writable generator properties.
Returns:
Name | Type | Description |
---|---|---|
GeneratorConfig |
GeneratorConfig
|
The generator properties as a configuration object. |
mostlyai.sdk.domain.Generator.export_to_file ¶
Export generator and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the generator. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the saved file. |
mostlyai.sdk.domain.Generator.reports ¶
Download or display the quality assurance reports.
If display is True, the report is rendered inline via IPython display and no file is downloaded. Otherwise, the report is downloaded and saved to file_path (or a default location if None).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the zipped reports (ignored if display=True). |
None
|
display
|
bool
|
If True, render the report inline instead of downloading it. |
False
|
Returns:
Type | Description |
---|---|
Path | None
|
Path | None: The path to the saved file if downloading, or None if display=True. |
mostlyai.sdk.domain.Generator.update ¶
Update a generator with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the generator. |
None
|
description
|
str | None
|
The description of the generator. |
None
|
Synthetic Datasets¶
mostlyai.sdk.client.synthetic_datasets._MostlySyntheticDatasetsClient.create ¶
Create a synthetic dataset. The synthetic dataset will be in the NEW state and will need to be generated before it can be used.
See mostly.generate
for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
SyntheticDatasetConfig | dict[str, Any]
|
Configuration for the synthetic dataset. |
required |
Returns:
Type | Description |
---|---|
SyntheticDataset
|
The created synthetic dataset object. |
Example for creating a synthetic dataset
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
sd = mostly.synthetic_datasets.create(
config=SyntheticDatasetConfig(
generator_id="INSERT_YOUR_GENERATOR_ID",
)
)
print("status:", sd.generation_status)
# status: NEW
sd.generation.start() # start generation
print("status:", sd.generation_status)
# status: QUEUED
sd.generation.wait() # wait for generation to complete
print("status:", sd.generation_status)
# status: DONE
mostlyai.sdk.client.synthetic_datasets._MostlySyntheticDatasetsClient.get ¶
Retrieve a synthetic dataset by its ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_dataset_id
|
str
|
The unique identifier of the synthetic dataset. |
required |
Returns:
Name | Type | Description |
---|---|---|
SyntheticDataset |
SyntheticDataset
|
The retrieved synthetic dataset object. |
mostlyai.sdk.client.synthetic_datasets._MostlySyntheticDatasetsClient.list ¶
List synthetic datasets.
Paginate through all synthetic datasets accessible by the user.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
Offset for the entities in the response. |
0
|
limit
|
int | None
|
Limit for the number of entities in the response. |
None
|
status
|
str | list[str] | None
|
Filter by generation status. |
None
|
search_term
|
str | None
|
Filter by name or description. |
None
|
owner_id
|
str | list[str] | None
|
Filter by owner ID. |
None
|
Returns:
Type | Description |
---|---|
Iterator[SyntheticDatasetListItem]
|
An iterator over synthetic datasets. |
Example for listing all synthetic datasets
Synthetic Dataset¶
A synthetic dataset is created based on a trained generator.
It consists of synthetic samples, as well as a quality assurance report.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a synthetic dataset. |
required |
generator_id
|
str | None
|
The unique identifier of a generator. |
None
|
metadata
|
Metadata | None
|
|
None
|
name
|
str | None
|
The name of a synthetic dataset. |
None
|
description
|
str | None
|
The description of a synthetic dataset. |
None
|
generation_status
|
ProgressStatus
|
|
required |
generation_time
|
datetime | None
|
The UTC date and time when the generation has finished. |
None
|
tables
|
list[SyntheticTable] | None
|
The tables of this synthetic dataset. |
None
|
delivery
|
SyntheticDatasetDelivery | None
|
|
None
|
accuracy
|
float | None
|
The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models. |
None
|
usage
|
SyntheticDatasetUsage | None
|
|
None
|
compute
|
str | None
|
The unique identifier of a compute resource. Not applicable for SDK. |
None
|
generation
|
Any | None
|
|
None
|
mostlyai.sdk.domain.SyntheticDataset.Generation ¶
mostlyai.sdk.domain.SyntheticDataset.Generation.logs ¶
Download the generation logs and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the logs. Default is the current working directory. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the saved file. |
mostlyai.sdk.domain.SyntheticDataset.Generation.progress ¶
Retrieve the progress of the generation process.
Returns:
Name | Type | Description |
---|---|---|
JobProgress |
JobProgress
|
The progress of the generation process. |
mostlyai.sdk.domain.SyntheticDataset.Generation.wait ¶
Poll the generation progress and wait until the process is complete.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
progress_bar
|
bool
|
If true, displays a progress bar. Default is True. |
True
|
interval
|
float
|
Interval in seconds to poll the job progress. Default is 2 seconds. |
2
|
mostlyai.sdk.domain.SyntheticDataset.config ¶
Retrieve writable synthetic dataset properties.
Returns:
Name | Type | Description |
---|---|---|
SyntheticDatasetConfig |
SyntheticDatasetConfig
|
The synthetic dataset properties as a configuration object. |
mostlyai.sdk.domain.SyntheticDataset.data ¶
Download synthetic dataset and return as dictionary of pandas DataFrames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
return_type
|
Literal['auto', 'dict']
|
The format of the returned data. Default is "auto". |
'auto'
|
Returns:
Type | Description |
---|---|
DataFrame | dict[str, DataFrame]
|
Union[pd.DataFrame, dict[str, pd.DataFrame]]: The synthetic dataset as a dictionary of pandas DataFrames. |
mostlyai.sdk.domain.SyntheticDataset.download ¶
Download synthetic dataset and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the synthetic dataset. |
None
|
format
|
Literal['parquet', 'csv', 'json']
|
The format of the synthetic dataset. Default is "parquet". |
'parquet'
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to the saved file. |
mostlyai.sdk.domain.SyntheticDataset.reports ¶
Download or display the quality assurance reports.
If display is True, the report is rendered inline via IPython display and no file is downloaded. Otherwise, the report is downloaded and saved to file_path (or a default location if None).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the zipped reports (ignored if display=True). |
None
|
display
|
bool
|
If True, render the report inline instead of downloading it. |
False
|
Returns:
Type | Description |
---|---|
Path | None
|
Path | None: The path to the saved file if downloading, or None if display=True. |
mostlyai.sdk.domain.SyntheticDataset.update ¶
Update a synthetic dataset with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the synthetic dataset. |
None
|
description
|
str | None
|
The description of the synthetic dataset. |
None
|
delivery
|
SyntheticDatasetDelivery | None
|
The delivery configuration for the synthetic dataset. |
None
|
Connectors¶
mostlyai.sdk.client.connectors._MostlyConnectorsClient.create ¶
Create a connector and optionally validate the connection before saving.
See mostly.connect
for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
ConnectorConfig | dict[str, Any]
|
Configuration for the connector. |
required |
test_connection
|
bool | None
|
Whether to test the connection before saving the connector |
True
|
Returns:
Type | Description |
---|---|
Connector
|
The created connector object. |
mostlyai.sdk.client.connectors._MostlyConnectorsClient.get ¶
Retrieve a connector by its ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
connector_id
|
str
|
The unique identifier of the connector. |
required |
Returns:
Name | Type | Description |
---|---|---|
Connector |
Connector
|
The retrieved connector object. |
mostlyai.sdk.client.connectors._MostlyConnectorsClient.list ¶
List connectors.
Paginate through all connectors accessible by the user. Only connectors that are independent of a table will be returned.
Example for listing all connectors
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
Offset for entities in the response. |
0
|
limit
|
int | None
|
Limit for the number of entities in the response. |
None
|
access_type
|
str | None
|
Filter by access type (e.g., "SOURCE" or "DESTINATION"). |
None
|
search_term
|
str | None
|
Filter by string in the connector name. |
None
|
owner_id
|
str | list[str] | None
|
Filter by owner ID. |
None
|
Returns:
Type | Description |
---|---|
Iterator[ConnectorListItem]
|
Iterator[ConnectorListItem]: An iterator over connector list items. |
Connector¶
A connector is a connection to a data source or a data destination.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a connector. |
required |
name
|
str | None
|
The name of a connector. |
None
|
type
|
ConnectorType
|
|
required |
access_type
|
ConnectorAccessType | None
|
|
<ConnectorAccessType.read_protected: 'READ_PROTECTED'>
|
config
|
dict[str, Any] | None
|
|
None
|
secrets
|
dict[str, str] | None
|
|
None
|
ssl
|
dict[str, str] | None
|
|
None
|
metadata
|
Metadata | None
|
|
None
|
usage
|
ConnectorUsage | None
|
|
None
|
table_id
|
str | None
|
Optional. ID of a source table or a synthetic table, that this connector belongs to. If not set, then this connector is managed independently of any generator or synthetic dataset. |
None
|
mostlyai.sdk.domain.Connector.delete_data ¶
Delete data from the specified location within the connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str
|
The target location within the connector to delete data from. |
required |
mostlyai.sdk.domain.Connector.locations ¶
List connector locations.
List the available databases, schemas, tables, or folders for a connector.
For storage connectors, this returns list of folders and files at root, respectively at prefix
level.
For DB connectors, this returns list of schemas (or databases for DBs without schema), respectively list of tables if prefix
is provided.
The formats of the locations are:
- Cloud storage:
AZURE_STORAGE
:container/path
GOOGLE_CLOUD_STORAGE
:bucket/path
S3_STORAGE
:bucket/path
- Database:
BIGQUERY
:dataset.table
DATABRICKS
:schema.table
HIVE
:database.table
MARIADB
:database.table
MSSQL
:schema.table
MYSQL
:database.table
ORACLE
:schema.table
POSTGRES
:schema.table
SNOWFLAKE
:schema.table
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
str
|
The prefix to filter the results by. Defaults to an empty string. |
''
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of locations (schemas, databases, directories, etc.). |
mostlyai.sdk.domain.Connector.read_data ¶
Retrieve data from the specified location within the connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str
|
The target location within the connector to read data from. |
required |
limit
|
int | None
|
The maximum number of rows to return. Returns all if not specified. |
None
|
shuffle
|
bool | None
|
Whether to shuffle the results. |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing the retrieved data. |
mostlyai.sdk.domain.Connector.schema ¶
Retrieve the schema of the table at a connector location.
Please refer to locations()
for the format of the location.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str
|
The location of the table. |
required |
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: The retrieved schema. |
mostlyai.sdk.domain.Connector.update ¶
Update a connector with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the connector. |
None
|
config
|
dict[str, Any] | None
|
Connector configuration. |
None
|
secrets
|
dict[str, str] | None
|
Secret values for the connector. |
None
|
ssl
|
dict[str, str] | None
|
SSL configuration for the connector. |
None
|
test_connection
|
bool | None
|
If true, validates the connection before saving. |
True
|
mostlyai.sdk.domain.Connector.write_data ¶
Write data to the specified location within the connector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame | None
|
The DataFrame to write, or None to delete the location. |
required |
location
|
str
|
The target location within the connector to write data to. |
required |
if_exists
|
Literal['append', 'replace', 'fail']
|
The behavior if the target location already exists (append, replace, fail). Default is "fail". |
'fail'
|