Skip to content

API Reference

MOSTLY AI Client

Instantiate an SDK instance, either in CLIENT or in LOCAL mode.

Parameters:

Name Type Description Default
base_url str | None

The base URL. If not provided, env var MOSTLY_BASE_URL is used if available, otherwise https://app.mostly.ai.

None
api_key str | None

The API key for authenticating. If not provided, env var MOSTLY_API_KEY is used if available.

None
local bool | None

Whether to run in local mode or not. If not provided, user is prompted to choose between CLIENT and LOCAL mode.

None
local_dir str | Path | None

The directory to use for local mode. If not provided, ~/mostlyai is used.

None
local_port int | None

The port to use for local mode with TCP transport. If not provided, UDS transport is used.

None
timeout float

Timeout for HTTPS requests in seconds. Default is 60 seconds.

60.0
ssl_verify bool

Whether to verify SSL certificates. Default is True.

True
quiet bool

Whether to suppress rich output. Default is False.

False
Example for SDK in CLIENT mode with explicit arguments
from mostlyai.sdk import MostlyAI
mostly = MostlyAI(
    api_key='INSERT_YOUR_API_KEY',
    base_url='https://app.mostly.ai',
)
mostly
# MostlyAI(base_url='https://app.mostly.ai', api_key='***')
Example for SDK in CLIENT mode with environment variables
import os
from mostlyai.sdk import MostlyAI
os.environ["MOSTLY_API_KEY"] = "INSERT_YOUR_API_KEY"
os.environ["MOSTLY_BASE_URL"] = "https://app.mostly.ai"
mostly = MostlyAI()
mostly
# MostlyAI(base_url='https://app.mostly.ai', api_key='***')
Example for SDK in LOCAL mode connecting via UDS
from mostlyai.sdk import MostlyAI
mostly = MostlyAI(local=True)
mostly
# MostlyAI(local=True)
Example for SDK in LOCAL mode connecting via TCP
from mostlyai.sdk import MostlyAI
mostly = MostlyAI(local=True, local_port=8080)
mostly
# MostlyAI(local=True, local_port=8080)

mostlyai.sdk.client.api.MostlyAI.about

about()

Retrieve information about the platform.

Returns:

Name Type Description
AboutService AboutService

Information about the platform.

Example for retrieving information about the platform
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
mostly.about()
# {'version': 'v316', 'assistant': True}

mostlyai.sdk.client.api.MostlyAI.computes

computes()

Retrieve a list of available compute resources, that can be used for executing tasks. Returns: list[dict[str, Any]]: A list of available compute resources.

Example for retrieving available compute resources
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
mostly.computes()
# [{'id': '...', 'name': 'CPU Large',...]

mostlyai.sdk.client.api.MostlyAI.connect

connect(config, test_connection=True)

Create a connector and optionally validate the connection before saving.

Parameters:

Name Type Description Default
config ConnectorConfig | dict[str, Any]

Configuration for the connector. Can be either a ConnectorConfig object or an equivalent dictionary.

required
test_connection bool | None

Whether to validate the connection before saving. Default is True.

True

Returns:

Name Type Description
Connector Connector

The created connector.

Example for creating a connector to a AWS S3 storage
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
c = mostly.connect(
    config={
        'type': 'S3_STORAGE',
        'config': {
            'accessKey': '...',
        },
        'secrets': {
            'secretKey': '...'
        }
    }
)

The structures of the config, secrets and ssl parameters depend on the connector type:

  • Cloud storage:
    - type: AZURE_STORAGE
      config:
        accountName: string
        clientId: string (required for auth via service principal)
        tenantId: string (required for auth via service principal)
      secrets:
        accountKey: string (required for regular auth)
        clientSecret: string (required for auth via service principal)
    
    - type: GOOGLE_CLOUD_STORAGE
      config:
      secrets:
        keyFile: string
    
    - type: S3_STORAGE
      config:
        accessKey: string
        endpointUrl: string (only needed for S3-compatible storage services other than AWS)
      secrets:
        secretKey: string
    
  • Database:
    - type: BIGQUERY
      config:
      secrets:
        keyFile: string
    
    - type: DATABRICKS
      config:
        host: string
        httpPath: string
        catalog: string
        clientId: string (required for auth via service principal)
        tenantId: string (required for auth via service principal)
      secrets:
        accessToken: string (required for regular auth)
        clientSecret: string (required for auth via service principal)
    
    - type: HIVE
      config:
        host: string
        port: integer, default: 10000
        username: string (required for regular auth)
        kerberosEnabled: boolean, default: false
        kerberosPrincipal: string (required if kerberosEnabled)
        kerberosKrb5Conf: string (required if kerberosEnabled)
        sslEnabled: boolean, default: false
      secrets:
        password: string (required for regular auth)
        kerberosKeytab: base64-encoded string (required if kerberosEnabled)
      ssl:
        caCertificate: base64-encoded string
    
    - type: MARIADB
      config:
        host: string
        port: integer, default: 3306
        username: string
      secrets:
        password: string
    
    - type: MSSQL
      config:
        host: string
        port: integer, default: 1433
        username: string
        database: string
      secrets:
       password: string
    
    - type: MYSQL
      config:
        host: string
        port: integer, default: 3306
        username: string
      secrets:
        password: string
    
    - type: ORACLE
      config:
        host: string
        port: integer, default: 1521
        username: string
        connectionType: enum {SID, SERVICE_NAME}, default: SID
        database: string, default: ORCL
      secrets:
        password: string
    
    - type: POSTGRES
      config:
        host: string
        port: integer, default: 5432
        username: string
        database: string
        sslEnabled: boolean, default: false
      secrets:
        password: string
      ssl:
        rootCertificate: base64-encoded string
        sslCertificate: base64-encoded string
        sslCertificateKey: base64-encoded string
    
    - type: SNOWFLAKE
      config:
        account: string
        username: string
        warehouse: string, default: COMPUTE_WH
        database: string
      secrets:
        password: string
    

mostlyai.sdk.client.api.MostlyAI.generate

generate(
    generator,
    config=None,
    size=None,
    seed=None,
    name=None,
    start=True,
    wait=True,
    progress_bar=True,
)

Generate synthetic data.

Parameters:

Name Type Description Default
generator Generator | str

The generator instance or its UUID.

required
config SyntheticDatasetConfig | dict | None

Configuration for the synthetic dataset.

None
size int | dict[str, int] | None

Sample size(s) for the subject table(s).

None
seed Seed | dict[str, Seed] | None

Seed data for the subject table(s).

None
name str | None

Name of the synthetic dataset.

None
start bool

Whether to start generation immediately. Default is True.

True
wait bool

Whether to wait for generation to finish. Default is True.

True
progress_bar bool

Whether to display a progress bar during generation. Default is True.

True

Returns:

Name Type Description
SyntheticDataset SyntheticDataset

The created synthetic dataset.

Example configuration using short-hand notation
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
sd = mostly.generate(generator=g, size=1000)
Example configuration using a dictionary
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
sd = mostly.generate(
    config={
        'generator': g,
        'tables': [
            {
                'name': 'data',
                'configuration': {  # all parameters are optional!
                    'sample_size': None,  # set to None to generate as many samples as original; otherwise, set to an integer; only applicable for subject tables
                    # 'sample_seed_data': seed_df,  # provide a DataFrame to conditionally generate samples; only applicable for subject tables
                    'sampling_temperature': 1.0,
                    'sampling_top_p': 1.0,
                    'rebalancing': {
                        'column': 'age',
                        'probabilities': {'male': 0.5, 'female': 0.5},
                    },
                    'imputation': {
                        'columns': ['age'],
                    },
                    'fairness': {
                        'target_column': 'income',
                        'sensitive_columns': ['gender'],
                    },
                    'enable_data_report': True,  # disable for faster generation
                }
            }
        ]
    }
)

mostlyai.sdk.client.api.MostlyAI.me

me()

Retrieve information about the current user.

Returns:

Name Type Description
CurrentUser CurrentUser

Information about the current user.

Example for retrieving information about the current user
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
mostly.me()
# {'id': '488f2f26-...', 'first_name': 'Tom', ...}

mostlyai.sdk.client.api.MostlyAI.models

models()

Retrieve a list of available models of a specific type.

Returns:

Type Description
dict[str:list[str]]

dict[str, list[str]]: A dictionary with list of available models for each ModelType.

Example for retrieving available models
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
mostly.models()
# {
#    'TABULAR": ['MOSTLY_AI/Small', 'MOSTLY_AI/Medium', 'MOSTLY_AI/Large'],
#    'LANGUAGE": ['MOSTLY_AI/LSTMFromScratch-3m', 'microsoft/phi-1_5', ..],
# }

mostlyai.sdk.client.api.MostlyAI.probe

probe(
    generator,
    size=None,
    seed=None,
    config=None,
    return_type="auto",
)

Probe a generator.

Parameters:

Name Type Description Default
generator Generator | str

The generator instance or its UUID.

required
size int | dict[str, int] | None

Sample size(s) for the subject table(s). Default is 1, if no seed is provided.

None
seed Seed | dict[str, Seed] | None

Seed data for the subject table(s).

None
config SyntheticProbeConfig | dict | None

Configuration for the probe.

None
return_type Literal['auto', 'dict']

Format of the return value. "auto" for pandas DataFrame if a single table, otherwise a dictionary. Default is "auto".

'auto'

Returns:

Type Description
DataFrame | dict[str, DataFrame]

pd.DataFrame | dict[str, pd.DataFrame]: The created synthetic probe.

Example for probing a generator for 10 synthetic samples
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
probe = mostly.probe(
    generator='INSERT_YOUR_GENERATOR_ID',
    size=10
)
Example for conditional probing a generator for 10 synthetic samples
import pandas as pd
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
g = mostly.generators.get('INSERT_YOUR_GENERATOR_ID')
print('columns:', [c.name for c in g.tables[0].columns])
# columns: ['age', 'workclass', 'fnlwgt', ...]
col = g.tables[0].columns[1]
print(col.name, col.value_range.values)
# workclass: ['Federal-gov', 'Local-gov', 'Never-worked', ...]
mostly.probe(
    generator=g,
    seed=pd.DataFrame({
        'age': [63, 45],
        'sex': ['Female', 'Male'],
        'workclass': ['Sales', 'Tech-support'],
    }),
)

mostlyai.sdk.client.api.MostlyAI.train

train(
    config=None,
    data=None,
    name=None,
    start=True,
    wait=True,
    progress_bar=True,
)

Train a generator.

Parameters:

Name Type Description Default
config GeneratorConfig | dict | None

The configuration parameters of the generator to be created. Either config or data must be provided.

None
data DataFrame | str | Path | None

A single pandas DataFrame, or a path to a CSV or PARQUET file. Either config or data must be provided.

None
name str | None

Name of the generator.

None
start bool

Whether to start training immediately. Default is True.

True
wait bool

Whether to wait for training to finish. Default is True.

True
progress_bar bool

Whether to display a progress bar during training. Default is True.

True

Returns:

Name Type Description
Generator Generator

The created generator.

Example of single table with default configurations
# read original data
import pandas as pd
df = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/census/census.csv.gz')
# instantiate client
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
# train generator
g = mostly.train(
    name='census',
    data=df,     # alternatively, pass a path to a CSV or PARQUET file
    start=True,  # start training immediately
    wait=True,   # wait for training to finish
)
Example of single table with custom configurations
# read original data
import pandas as pd
df = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/baseball/players.csv.gz')
# instantiate client
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
# configure generator via dictionary
g = mostly.train(
    config={                                             # see `mostlyai.sdk.domain.GeneratorConfig`
        'name': 'Baseball Players',
        'tables': [
            {                                            # see `mostlyai.sdk.domain.SourceTableConfig`
                'name': 'players',                       # name of the table (required)
                'data': df,                              # either provide data as a pandas DataFrame
                'source_connector_id': None,             # - or pass a source_connector_id
                'location': None,                        # - together with a table location
                'primary_key': 'id',                     # specify the primary key column, if one is present
                'tabular_model_configuration': {         # see `mostlyai.sdk.domain.ModelConfiguration`; all settings are optional!
                    'model': 'MOSTLY_AI/Medium',         # check `mostly.models()` for available models
                    'batch_size': None,                  # set a custom physical training batch size
                    'max_sample_size': 100_000,          # cap sample size to 100k; set to None for max accuracy
                    'max_epochs': 50,                    # cap training to 50 epochs; set to None for max accuracy
                    'max_training_time': 60,             # cap runtime to 60min; set to None for max accuracy
                    'enable_flexible_generation': True,  # allow seed, imputation, rebalancing and fairness; set to False for max accuracy
                    'value_protection': True,            # privacy protect value ranges; set to False for allowing all seen values
                    'differential_privacy': {            # set DP configs if explicitly requested
                        'max_epsilon': 10.0,               # - max epsilon value, used as stopping criterion
                        'noise_multiplier': 1.5,           # - DP noise multiplier
                        'max_grad_norm': 1.0,              # - DP max grad norm
                        'delta': 1e-5,                     # - DP delta value
                    },
                    'enable_model_report': True,         # generate a model report, including quality metrics
                },
                'columns': [                             # list columns (optional); see `mostlyai.sdk.domain.ModelEncodingType`
                    {'name': 'id', 'model_encoding_type': 'TABULAR_CATEGORICAL'},
                    {'name': 'bats', 'model_encoding_type': 'TABULAR_CATEGORICAL'},
                    {'name': 'throws', 'model_encoding_type': 'TABULAR_CATEGORICAL'},
                    {'name': 'birthDate', 'model_encoding_type': 'TABULAR_DATETIME'},
                    {'name': 'weight', 'model_encoding_type': 'TABULAR_NUMERIC_AUTO'},
                    {'name': 'height', 'model_encoding_type': 'TABULAR_NUMERIC_AUTO'},
                ],
            }
        ]
    },
    start=True,  # start training immediately
    wait=True,   # wait for training to finish
)
Example of multi-table with custom configurations
# read original data
import pandas as pd
df_purchases = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/refs/heads/dev/cdnow/purchases.csv.gz')
df_users = df_purchases[['users_id']].drop_duplicates()
# instantiate client
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
# train generator
g = mostly.train(config={
    'name': 'CDNOW',                      # name of the generator
    'tables': [{                          # provide list of all tables
        'name': 'users',
        'data': df_users,
        'primary_key': 'users_id',        # define PK column
    }, {
        'name': 'purchases',
        'data': df_purchases,
        'foreign_keys': [{                 # define FK columns, with one providing the context
            'column': 'users_id',
            'referenced_table': 'users',
            'is_context': True
        }],
        'tabular_model_configuration': {
            'max_sample_size': 10_000,     # cap sample size to 10k users; set to None for max accuracy
            'max_training_time': 60,       # cap runtime to 60min; set to None for max accuracy
            'max_sequence_window': 10,     # optionally limit the sequence window
        },
    }],
}, start=True, wait=True)
Example of multi-model with TABULAR and LANGUAGE models
# read original data
import pandas as pd
df = pd.read_parquet('https://github.com/mostly-ai/public-demo-data/raw/refs/heads/dev/headlines/headlines.parquet')

# instantiate SDK
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()

# print out available LANGUAGE models
print(mostly.models()["LANGUAGE"])

# train a generator
g = mostly.train(config={
    'name': 'Headlines',
    'tables': [{
        'name': 'headlines',
        'data': df,
        'columns': [                                 # configure TABULAR + LANGUAGE cols
            {'name': 'category', 'model_encoding_type': 'TABULAR_CATEGORICAL'},
            {'name': 'date', 'model_encoding_type': 'TABULAR_DATETIME'},
            {'name': 'headline', 'model_encoding_type': 'LANGUAGE_TEXT'},
        ],
        'tabular_model_configuration': {              # tabular model configuration (optional)
            'max_sample_size': 20_000,                # cap sample size to 20k; set None for max accuracy
            'max_training_time': 30,                  # cap runtime to 30min; set None for max accuracy
        },
        'language_model_configuration': {             # language model configuration (optional)
            'max_sample_size': 1_000,                 # cap sample size to 1k; set None for max accuracy
            'max_training_time': 60,                  # cap runtime to 60min; set None for max accuracy
            'model': 'MOSTLY_AI/LSTMFromScratch-3m',  # use a light-weight LSTM model, trained from scratch (GPU recommended)
            #'model': 'microsoft/phi-1.5',            # alternatively use a pre-trained HF-hosted LLM model (GPU required)
        }
    }],
}, start=True, wait=True)

Generators

mostlyai.sdk.client.generators._MostlyGeneratorsClient.create

create(config)

Create a generator. The generator will be in the NEW state and will need to be trained before it can be used.

See mostly.train for more details.

Parameters:

Name Type Description Default
config GeneratorConfig | dict

Configuration for the generator.

required

Returns:

Type Description
Generator

The created generator object.

Example for creating a generator
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
g = mostly.generators.create(
    config={
        "name": "US Census",
        "tables": [{
            "name": "census",
            "data": trn_df,
        }]
    )
)
print("status:", g.training_status)
# status: NEW
g.training.start()  # start training
print("status:", g.training_status)
# status: QUEUED
g.training.wait()   # wait for training to complete
print("status:", g.training_status)
# status: DONE

mostlyai.sdk.client.generators._MostlyGeneratorsClient.get

get(generator_id)

Retrieve a generator by its ID.

Parameters:

Name Type Description Default
generator_id str

The unique identifier of the generator.

required

Returns:

Name Type Description
Generator Generator

The retrieved generator object.

Example for retrieving a generator
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
g = mostly.generators.get('INSERT_YOUR_GENERATOR_ID')
g

mostlyai.sdk.client.generators._MostlyGeneratorsClient.import_from_file

import_from_file(file_path)

Import a generator from a file.

Parameters:

Name Type Description Default
file_path str | Path

Local file path or URL of the generator to import.

required

Returns:

Type Description
Generator

The imported generator object.

Example
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()

# Import from local file
g = mostly.generators.import_from_file('path/to/generator')

# Or import from URL
g = mostly.generators.import_from_file('https://example.com/path/to/generator.zip')

mostlyai.sdk.client.generators._MostlyGeneratorsClient.list

list(
    offset=0,
    limit=None,
    status=None,
    search_term=None,
    owner_id=None,
)

List generators.

Paginate through all generators accessible by the user.

Parameters:

Name Type Description Default
offset int

Offset for the entities in the response.

0
limit int | None

Limit for the number of entities in the response.

None
status str | list[str] | None

Filter by training status.

None
search_term str | None

Filter by name or description.

None
owner_id str | list[str] | None

Filter by owner ID.

None

Returns:

Type Description
Iterator[GeneratorListItem]

Iterator[GeneratorListItem]: An iterator over generator list items.

Example for listing all generators
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
for g in mostly.generators.list():
    print(f"Generator `{g.name}` ({g.training_status}, {g.id})")
Example for searching trained generators via key word
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
generators = list(mostly.generators.list(search_term="census", status="DONE"))
print(f"Found {len(generators)} generators")

Generator

A generator is a set models that can generate synthetic data.

The generator can be trained on one or more source tables. A quality assurance report is generated for each model.

Parameters:

Name Type Description Default
id str

The unique identifier of a generator.

required
name str | None

The name of a generator.

None
description str | None

The description of a generator.

None
training_status ProgressStatus
required
training_time datetime | None

The UTC date and time when the training has finished.

None
usage GeneratorUsage | None
None
metadata Metadata | None
None
accuracy float | None

The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models.

None
tables list[SourceTable] | None

The tables of this generator

None
training Any | None
None

mostlyai.sdk.domain.Generator.Training

mostlyai.sdk.domain.Generator.Training.cancel

cancel()

Cancel training.

mostlyai.sdk.domain.Generator.Training.logs

logs(file_path=None)

Download the training logs and save to file.

Parameters:

Name Type Description Default
file_path str | Path | None

The file path to save the logs. Default is the current working directory.

None

Returns:

Name Type Description
Path Path

The path to the saved file.

mostlyai.sdk.domain.Generator.Training.progress

progress()

Retrieve job progress of training.

Returns:

Name Type Description
JobProgress JobProgress

The job progress of the training process.

mostlyai.sdk.domain.Generator.Training.start

start()

Start training.

mostlyai.sdk.domain.Generator.Training.wait

wait(progress_bar=True, interval=2)

Poll training progress and loop until training has completed.

Parameters:

Name Type Description Default
progress_bar bool

If true, displays the progress bar. Default is True.

True
interval float

The interval in seconds to poll the job progress. Default is 2 seconds.

2

mostlyai.sdk.domain.Generator.clone

clone(training_status='new')

Clone the generator.

Parameters:

Name Type Description Default
training_status Literal['new', 'continue']

The training status of the cloned generator. Default is "new".

'new'

Returns:

Name Type Description
Generator Generator

The cloned generator object.

mostlyai.sdk.domain.Generator.config

config()

Retrieve writable generator properties.

Returns:

Name Type Description
GeneratorConfig GeneratorConfig

The generator properties as a configuration object.

mostlyai.sdk.domain.Generator.delete

delete()

Delete the generator.

mostlyai.sdk.domain.Generator.export_to_file

export_to_file(file_path=None)

Export generator and save to file.

Parameters:

Name Type Description Default
file_path str | Path | None

The file path to save the generator.

None

Returns:

Name Type Description
Path Path

The path to the saved file.

mostlyai.sdk.domain.Generator.reports

reports(file_path=None, display=False)

Download or display the quality assurance reports.

If display is True, the report is rendered inline via IPython display and no file is downloaded. Otherwise, the report is downloaded and saved to file_path (or a default location if None).

Parameters:

Name Type Description Default
file_path str | Path | None

The file path to save the zipped reports (ignored if display=True).

None
display bool

If True, render the report inline instead of downloading it.

False

Returns:

Type Description
Path | None

Path | None: The path to the saved file if downloading, or None if display=True.

mostlyai.sdk.domain.Generator.update

update(name=None, description=None)

Update a generator with specific parameters.

Parameters:

Name Type Description Default
name str | None

The name of the generator.

None
description str | None

The description of the generator.

None

Synthetic Datasets

mostlyai.sdk.client.synthetic_datasets._MostlySyntheticDatasetsClient.create

create(config)

Create a synthetic dataset. The synthetic dataset will be in the NEW state and will need to be generated before it can be used.

See mostly.generate for more details.

Parameters:

Name Type Description Default
config SyntheticDatasetConfig | dict[str, Any]

Configuration for the synthetic dataset.

required

Returns:

Type Description
SyntheticDataset

The created synthetic dataset object.

Example for creating a synthetic dataset
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
sd = mostly.synthetic_datasets.create(
    config=SyntheticDatasetConfig(
        generator_id="INSERT_YOUR_GENERATOR_ID",
    )
)
print("status:", sd.generation_status)
# status: NEW
sd.generation.start()  # start generation
print("status:", sd.generation_status)
# status: QUEUED
sd.generation.wait()   # wait for generation to complete
print("status:", sd.generation_status)
# status: DONE

mostlyai.sdk.client.synthetic_datasets._MostlySyntheticDatasetsClient.get

get(synthetic_dataset_id)

Retrieve a synthetic dataset by its ID.

Parameters:

Name Type Description Default
synthetic_dataset_id str

The unique identifier of the synthetic dataset.

required

Returns:

Name Type Description
SyntheticDataset SyntheticDataset

The retrieved synthetic dataset object.

Example for retrieving a synthetic dataset
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
sd = mostly.synthetic_datasets.get('INSERT_YOUR_SYNTHETIC_DATASET_ID')
sd

mostlyai.sdk.client.synthetic_datasets._MostlySyntheticDatasetsClient.list

list(
    offset=0,
    limit=None,
    status=None,
    search_term=None,
    owner_id=None,
)

List synthetic datasets.

Paginate through all synthetic datasets accessible by the user.

Parameters:

Name Type Description Default
offset int

Offset for the entities in the response.

0
limit int | None

Limit for the number of entities in the response.

None
status str | list[str] | None

Filter by generation status.

None
search_term str | None

Filter by name or description.

None
owner_id str | list[str] | None

Filter by owner ID.

None

Returns:

Type Description
Iterator[SyntheticDatasetListItem]

An iterator over synthetic datasets.

Example for listing all synthetic datasets
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
for sd in mostly.synthetic_datasets.list():
    print(f"Synthetic Dataset `{sd.name}` ({sd.generation_status}, {sd.id})")
Example for searching generated synthetic datasets via key word
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
datasets = list(mostly.synthetic_datasets.list(search_term="census", status="DONE"))
print(f"Found {len(datasets)} synthetic datasets")

Synthetic Dataset

A synthetic dataset is created based on a trained generator.

It consists of synthetic samples, as well as a quality assurance report.

Parameters:

Name Type Description Default
id str

The unique identifier of a synthetic dataset.

required
generator_id str | None

The unique identifier of a generator.

None
metadata Metadata | None
None
name str | None

The name of a synthetic dataset.

None
description str | None

The description of a synthetic dataset.

None
generation_status ProgressStatus
required
generation_time datetime | None

The UTC date and time when the generation has finished.

None
tables list[SyntheticTable] | None

The tables of this synthetic dataset.

None
delivery SyntheticDatasetDelivery | None
None
accuracy float | None

The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models.

None
usage SyntheticDatasetUsage | None
None
compute str | None

The unique identifier of a compute resource. Not applicable for SDK.

None
generation Any | None
None

mostlyai.sdk.domain.SyntheticDataset.Generation

mostlyai.sdk.domain.SyntheticDataset.Generation.cancel

cancel()

Cancel the generation process.

mostlyai.sdk.domain.SyntheticDataset.Generation.logs

logs(file_path=None)

Download the generation logs and save to file.

Parameters:

Name Type Description Default
file_path str | Path | None

The file path to save the logs. Default is the current working directory.

None

Returns:

Name Type Description
Path Path

The path to the saved file.

mostlyai.sdk.domain.SyntheticDataset.Generation.progress

progress()

Retrieve the progress of the generation process.

Returns:

Name Type Description
JobProgress JobProgress

The progress of the generation process.

mostlyai.sdk.domain.SyntheticDataset.Generation.start

start()

Start the generation process.

mostlyai.sdk.domain.SyntheticDataset.Generation.wait

wait(progress_bar=True, interval=2)

Poll the generation progress and wait until the process is complete.

Parameters:

Name Type Description Default
progress_bar bool

If true, displays a progress bar. Default is True.

True
interval float

Interval in seconds to poll the job progress. Default is 2 seconds.

2

mostlyai.sdk.domain.SyntheticDataset.config

config()

Retrieve writable synthetic dataset properties.

Returns:

Name Type Description
SyntheticDatasetConfig SyntheticDatasetConfig

The synthetic dataset properties as a configuration object.

mostlyai.sdk.domain.SyntheticDataset.data

data(return_type='auto')

Download synthetic dataset and return as dictionary of pandas DataFrames.

Parameters:

Name Type Description Default
return_type Literal['auto', 'dict']

The format of the returned data. Default is "auto".

'auto'

Returns:

Type Description
DataFrame | dict[str, DataFrame]

Union[pd.DataFrame, dict[str, pd.DataFrame]]: The synthetic dataset as a dictionary of pandas DataFrames.

mostlyai.sdk.domain.SyntheticDataset.delete

delete()

Delete the synthetic dataset.

mostlyai.sdk.domain.SyntheticDataset.download

download(file_path=None, format='parquet')

Download synthetic dataset and save to file.

Parameters:

Name Type Description Default
file_path str | Path | None

The file path to save the synthetic dataset.

None
format Literal['parquet', 'csv', 'json']

The format of the synthetic dataset. Default is "parquet".

'parquet'

Returns:

Name Type Description
Path Path

The path to the saved file.

mostlyai.sdk.domain.SyntheticDataset.reports

reports(file_path=None, display=False)

Download or display the quality assurance reports.

If display is True, the report is rendered inline via IPython display and no file is downloaded. Otherwise, the report is downloaded and saved to file_path (or a default location if None).

Parameters:

Name Type Description Default
file_path str | Path | None

The file path to save the zipped reports (ignored if display=True).

None
display bool

If True, render the report inline instead of downloading it.

False

Returns:

Type Description
Path | None

Path | None: The path to the saved file if downloading, or None if display=True.

mostlyai.sdk.domain.SyntheticDataset.update

update(name=None, description=None, delivery=None)

Update a synthetic dataset with specific parameters.

Parameters:

Name Type Description Default
name str | None

The name of the synthetic dataset.

None
description str | None

The description of the synthetic dataset.

None
delivery SyntheticDatasetDelivery | None

The delivery configuration for the synthetic dataset.

None

Connectors

mostlyai.sdk.client.connectors._MostlyConnectorsClient.create

create(config, test_connection=True)

Create a connector and optionally validate the connection before saving.

See mostly.connect for more details.

Parameters:

Name Type Description Default
config ConnectorConfig | dict[str, Any]

Configuration for the connector.

required
test_connection bool | None

Whether to test the connection before saving the connector

True

Returns:

Type Description
Connector

The created connector object.

mostlyai.sdk.client.connectors._MostlyConnectorsClient.get

get(connector_id)

Retrieve a connector by its ID.

Parameters:

Name Type Description Default
connector_id str

The unique identifier of the connector.

required

Returns:

Name Type Description
Connector Connector

The retrieved connector object.

Example for retrieving a connector
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
c = mostly.connectors.get('INSERT_YOUR_CONNECTOR_ID')
c

mostlyai.sdk.client.connectors._MostlyConnectorsClient.list

list(
    offset=0,
    limit=None,
    access_type=None,
    search_term=None,
    owner_id=None,
)

List connectors.

Paginate through all connectors accessible by the user. Only connectors that are independent of a table will be returned.

Example for listing all connectors
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
for c in mostly.connectors.list():
    print(f"Connector `{c.name}` ({c.access_type}, {c.type}, {c.id})")

Parameters:

Name Type Description Default
offset int

Offset for entities in the response.

0
limit int | None

Limit for the number of entities in the response.

None
access_type str | None

Filter by access type (e.g., "SOURCE" or "DESTINATION").

None
search_term str | None

Filter by string in the connector name.

None
owner_id str | list[str] | None

Filter by owner ID.

None

Returns:

Type Description
Iterator[ConnectorListItem]

Iterator[ConnectorListItem]: An iterator over connector list items.

Connector

A connector is a connection to a data source or a data destination.

Parameters:

Name Type Description Default
id str

The unique identifier of a connector.

required
name str | None

The name of a connector.

None
type ConnectorType
required
access_type ConnectorAccessType | None
<ConnectorAccessType.read_protected: 'READ_PROTECTED'>
config dict[str, Any] | None
None
secrets dict[str, str] | None
None
ssl dict[str, str] | None
None
metadata Metadata | None
None
usage ConnectorUsage | None
None
table_id str | None

Optional. ID of a source table or a synthetic table, that this connector belongs to. If not set, then this connector is managed independently of any generator or synthetic dataset.

None

mostlyai.sdk.domain.Connector.delete

delete()

Delete the connector.

mostlyai.sdk.domain.Connector.delete_data

delete_data(location)

Delete data from the specified location within the connector.

Parameters:

Name Type Description Default
location str

The target location within the connector to delete data from.

required

mostlyai.sdk.domain.Connector.locations

locations(prefix='')

List connector locations.

List the available databases, schemas, tables, or folders for a connector. For storage connectors, this returns list of folders and files at root, respectively at prefix level. For DB connectors, this returns list of schemas (or databases for DBs without schema), respectively list of tables if prefix is provided.

The formats of the locations are:

  • Cloud storage:
    • AZURE_STORAGE: container/path
    • GOOGLE_CLOUD_STORAGE: bucket/path
    • S3_STORAGE: bucket/path
  • Database:
    • BIGQUERY: dataset.table
    • DATABRICKS: schema.table
    • HIVE: database.table
    • MARIADB: database.table
    • MSSQL: schema.table
    • MYSQL: database.table
    • ORACLE: schema.table
    • POSTGRES: schema.table
    • SNOWFLAKE: schema.table

Parameters:

Name Type Description Default
prefix str

The prefix to filter the results by. Defaults to an empty string.

''

Returns:

Type Description
list[str]

list[str]: A list of locations (schemas, databases, directories, etc.).

mostlyai.sdk.domain.Connector.read_data

read_data(location, limit=None, shuffle=False)

Retrieve data from the specified location within the connector.

Parameters:

Name Type Description Default
location str

The target location within the connector to read data from.

required
limit int | None

The maximum number of rows to return. Returns all if not specified.

None
shuffle bool | None

Whether to shuffle the results.

False

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame containing the retrieved data.

mostlyai.sdk.domain.Connector.schema

schema(location)

Retrieve the schema of the table at a connector location. Please refer to locations() for the format of the location.

Parameters:

Name Type Description Default
location str

The location of the table.

required

Returns:

Type Description
list[dict[str, Any]]

list[dict[str, Any]]: The retrieved schema.

mostlyai.sdk.domain.Connector.update

update(
    name=None,
    config=None,
    secrets=None,
    ssl=None,
    test_connection=True,
)

Update a connector with specific parameters.

Parameters:

Name Type Description Default
name str | None

The name of the connector.

None
config dict[str, Any] | None

Connector configuration.

None
secrets dict[str, str] | None

Secret values for the connector.

None
ssl dict[str, str] | None

SSL configuration for the connector.

None
test_connection bool | None

If true, validates the connection before saving.

True

mostlyai.sdk.domain.Connector.write_data

write_data(data, location, if_exists='fail')

Write data to the specified location within the connector.

Parameters:

Name Type Description Default
data DataFrame | None

The DataFrame to write, or None to delete the location.

required
location str

The target location within the connector to write data to.

required
if_exists Literal['append', 'replace', 'fail']

The behavior if the target location already exists (append, replace, fail). Default is "fail".

'fail'