API Reference¶
MOSTLY AI Client¶
Instantiate a client for interacting with the MOSTLY AI platform via its Public API.
Example for instantiating the client with explicit arguments
Example for instantiating the client with environment variables
Parameters:
Name | Type | Description | Default |
---|---|---|---|
base_url
|
str | None
|
The base URL. If not provided, a default value is used. |
None
|
api_key
|
str | None
|
The API key for authenticating. If not provided, it would rely on environment variables. |
None
|
timeout
|
float
|
Timeout for HTTPS requests in seconds. |
60.0
|
ssl_verify
|
bool
|
Whether to verify SSL certificates. |
True
|
mostlyai.client.api.MostlyAI.about ¶
Retrieve information about the platform.
Example for retrieving information about the platform
Returns:
Type | Description |
---|---|
AboutService
|
Information about the platform. |
mostlyai.client.api.MostlyAI.computes ¶
Retrieve a list of available compute resources, that can be used for executing tasks.
Example for retrieving available compute resources
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
A list of available compute resources. |
mostlyai.client.api.MostlyAI.connect ¶
Create a connector and optionally validate the connection before saving.
See ConnectorConfig for more information on the available configuration parameters.
Example for creating a connector to a AWS S3 storage
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
ConnectorConfig | dict[str, Any]
|
Configuration for the connector. Can be either a ConnectorConfig object or an equivalent dictionary. |
required |
test_connection
|
bool | None
|
Whether to validate the connection before saving. |
True
|
The structures of the config
, secrets
and ssl
parameters depend on the connector type
:
- Cloud storage:
- type: AZURE_STORAGE config: accountName: string clientId: string (required for auth via service principal) tenantId: string (required for auth via service principal) secrets: accountKey: string (required for regular auth) clientSecret: string (required for auth via service principal) - type: GOOGLE_CLOUD_STORAGE config: secrets: keyFile: string - type: S3_STORAGE config: accessKey: string endpointUrl: string (only needed for S3-compatible storage services other than AWS) secrets: secretKey: string
- Database:
- type: BIGQUERY config: secrets: keyFile: string - type: DATABRICKS config: host: string httpPath: string catalog: string clientId: string (required for auth via service principal) tenantId: string (required for auth via service principal) secrets: accessToken: string (required for regular auth) clientSecret: string (required for auth via service principal) - type: HIVE config: host: string port: integer, default: 10000 username: string (required for regular auth) kerberosEnabled: boolean, default: false kerberosPrincipal: string (required if kerberosEnabled) kerberosKrb5Conf: string (required if kerberosEnabled) sslEnabled: boolean, default: false secrets: password: string (required for regular auth) kerberosKeytab: base64-encoded string (required if kerberosEnabled) ssl: caCertificate: base64-encoded string - type: MARIADB config: host: string port: integer, default: 3306 username: string secrets: password: string - type: MSSQL config: host: string port: integer, default: 1433 username: string database: string secrets: password: string - type: MYSQL config: host: string port: integer, default: 3306 username: string secrets: password: string - type: ORACLE config: host: string port: integer, default: 1521 username: string connectionType: enum {SID, SERVICE_NAME}, default: SID database: string, default: ORCL secrets: password: string - type: POSTGRES config: host: string port: integer, default: 5432 username: string database: string sslEnabled: boolean, default: false secrets: password: string ssl: rootCertificate: base64-encoded string sslCertificate: base64-encoded string sslCertificateKey: base64-encoded string - type: SNOWFLAKE config: account: string username: string warehouse: string, default: COMPUTE_WH database: string secrets: password: string
Returns:
Name | Type | Description |
---|---|---|
Connector |
Connector
|
The created connector. |
mostlyai.client.api.MostlyAI.generate ¶
generate(
generator=None,
config=None,
size=None,
seed=None,
name=None,
start=True,
wait=True,
progress_bar=True,
)
Generate synthetic data.
See SyntheticDatasetConfig for more information on the available configuration parameters.
Example configuration using short-hand notation
Example configuration using SyntheticDatasetConfig
Example configuration using a dictionary
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator
|
Generator | str | None
|
The generator instance or its UUID. |
None
|
config
|
SyntheticDatasetConfig | dict | None
|
Configuration for the synthetic dataset. |
None
|
size
|
Sample size(s) for the subject table(s). |
None
|
|
seed
|
Seed | dict[str, Seed] | None
|
Seed data for the subject table(s). |
None
|
name
|
str | None
|
Name of the synthetic dataset. |
None
|
start
|
bool
|
Whether to start generation immediately. |
True
|
wait
|
bool
|
Whether to wait for generation to finish. |
True
|
progress_bar
|
bool
|
Whether to display a progress bar during generation. |
True
|
Returns:
Name | Type | Description |
---|---|---|
SyntheticDataset |
SyntheticDataset
|
The created synthetic dataset. |
mostlyai.client.api.MostlyAI.me ¶
Retrieve information about the current user.
Example for retrieving information about the current user
Returns:
Type | Description |
---|---|
CurrentUser
|
Information about the current user. |
mostlyai.client.api.MostlyAI.models ¶
Retrieve a list of available models of a specific type.
Example for retrieving available models
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_type
|
str | ModelType
|
The type of model to retrieve. Can be a string or a ModelType enum. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
A list of available models of the specified type. |
mostlyai.client.api.MostlyAI.probe ¶
Probe a generator.
See SyntheticProbeConfig for more information on the available configuration parameters.
Example for probing a generator for 10 synthetic samples
Example for conditional probing a generator for 10 synthetic samples
import pandas as pd
from mostlyai import MostlyAI
mostly = MostlyAI()
g = mostly.generators.get('INSERT_YOUR_GENERATOR_ID')
print('columns:', [c.name for c in g.tables[0].columns])
# columns: ['age', 'workclass', 'fnlwgt', ...]
col = g.tables[0].columns[1]
print(col.name, col.value_range.values)
# workclass: ['Federal-gov', 'Local-gov', 'Never-worked', ...]
mostly.probe(
generator=g,
seed=pd.DataFrame({
'age': [63, 45],
'sex': ['Female', 'Male'],
'workclass': ['Sales', 'Tech-support'],
}),
)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator
|
Generator | str | None
|
The generator instance or its UUID. |
None
|
size
|
int | dict[str, int] | None
|
Sample size(s) for the subject table(s). |
None
|
seed
|
Seed | dict[str, Seed] | None
|
Seed data for the subject table(s). |
None
|
config
|
SyntheticProbeConfig | dict | None
|
Configuration for the probe. |
None
|
return_type
|
Literal['auto', 'dict']
|
Format of the return value. "auto" for pandas DataFrame if a single table, otherwise a dictionary. |
'auto'
|
Returns:
Type | Description |
---|---|
DataFrame | dict[str, DataFrame]
|
The created synthetic probe. |
mostlyai.client.api.MostlyAI.train ¶
Train a generator.
See GeneratorConfig for more information on the available configuration parameters.
Example of short-hand notation, reading data from path:
from mostlyai import MostlyAI
mostly = MostlyAI()
g = mostly.train(
data='https://github.com/mostly-ai/public-demo-data/raw/dev/census/census.csv.gz',
)
Example of short-hand notation, passing data as pandas DataFrame:
# read original data
import pandas as pd
df_original = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/titanic/titanic.csv')
# instantiate client
from mostlyai import MostlyAI
mostly = MostlyAI()
# train generator
g = mostly.train(
name='census',
data=df_original,
)
Example configuration using GeneratorConfig
# read original data
import pandas as pd
df_original = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/titanic/titanic.csv')
# instantiate client
from mostlyai import MostlyAI
mostly = MostlyAI()
# configure generator via GeneratorConfig
from mostlyai.domain import GeneratorConfig, SourceTableConfig
g = mostly.train(
config=GeneratorConfig(
name='census',
tables=[
SourceTableConfig(
name='data',
data=df_original
)
]
)
)
Example configuration using a dictionary
# read original data
import pandas as pd
df_original = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/titanic/titanic.csv')
# instantiate client
from mostlyai import MostlyAI
mostly = MostlyAI()
# configure generator via dictionary
g = mostly.train(
config={
'name': 'census',
'tables': [
{
'name': 'data',
'data': df_original
}
]
}
)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
GeneratorConfig | dict | None
|
The configuration parameters of the generator to be created. Either |
None
|
data
|
DataFrame | str | Path | None
|
A single pandas DataFrame, or a path to a CSV or PARQUET file. Either |
None
|
name
|
str | None
|
Name of the generator. |
None
|
start
|
bool
|
Whether to start training immediately. |
True
|
wait
|
bool
|
Whether to wait for training to finish. |
True
|
progress_bar
|
bool
|
Whether to display a progress bar during training. |
True
|
Returns:
Name | Type | Description |
---|---|---|
Generator |
Generator
|
The created generator. |
Generators¶
mostlyai.client.generators._MostlyGeneratorsClient.create ¶
Create a generator. The generator will be in the NEW state and will need to be trained before it can be used.
See mostly.train
for more details.
Example for creating a generator
from mostlyai import MostlyAI
mostly = MostlyAI()
g = mostly.generators.create(
config={
"name": "US Census",
"tables": [{
"name": "census",
"data": original_df,
}]
)
)
print("status:", g.training_status)
# status: NEW
g.training.start() # start training
print("status:", g.training_status)
# status: QUEUED
g.training.wait() # wait for training to complete
print("status:", g.training_status)
# status: DONE
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
GeneratorConfig | dict
|
Configuration for the generator. |
required |
Returns:
Type | Description |
---|---|
Generator
|
The created generator object. |
mostlyai.client.generators._MostlyGeneratorsClient.get ¶
Retrieve a generator by its ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
generator_id
|
str
|
The unique identifier of the generator. |
required |
Example for retrieving a generator
Returns:
Name | Type | Description |
---|---|---|
Generator |
Generator
|
The retrieved generator object. |
mostlyai.client.generators._MostlyGeneratorsClient.import_from_file ¶
Import a generator from a file.
Example for importing a generator from a file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path
|
Path to the file to import. |
required |
Returns:
Type | Description |
---|---|
Generator
|
The imported generator object. |
mostlyai.client.generators._MostlyGeneratorsClient.list ¶
List generators.
Paginate through all generators accessible by the user.
Example for listing all generators
Example for searching trained generators via key word
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
Offset for the entities in the response. |
0
|
limit
|
int
|
Limit for the number of entities in the response. |
50
|
status
|
str | list[str] | None
|
Filter by training status. |
None
|
search_term
|
str | None
|
Filter by name or description. |
None
|
Returns:
Type | Description |
---|---|
Iterator[GeneratorListItem]
|
Iterator[GeneratorListItem]: An iterator over generator list items. |
Generator¶
A generator is a set models that can generate synthetic data.
The generator can be trained on one or more source tables. A quality assurance report is generated for each model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a generator. |
required |
name
|
str | None
|
The name of a generator. |
None
|
description
|
str | None
|
The description of a generator. |
None
|
training_status
|
ProgressStatus
|
|
required |
training_time
|
datetime | None
|
The UTC date and time when the training has finished. |
None
|
usage
|
GeneratorUsage | None
|
|
None
|
metadata
|
Metadata
|
|
required |
accuracy
|
float | None
|
The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models. |
None
|
tables
|
list[SourceTable] | None
|
The tables of this generator |
None
|
training
|
Any | None
|
|
None
|
mostlyai.domain.Generator.Training ¶
mostlyai.domain.Generator.Training.progress ¶
Retrieve job progress of training.
Returns:
Name | Type | Description |
---|---|---|
JobProgress |
JobProgress
|
The job progress of the training process. |
mostlyai.domain.Generator.Training.wait ¶
Poll training progress and loop until training has completed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
progress_bar
|
bool
|
If true, displays the progress bar. |
True
|
interval
|
float
|
The interval in seconds to poll the job progress. |
2
|
mostlyai.domain.Generator.clone ¶
Clone the generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
training_status
|
Literal['NEW', 'CONTINUE']
|
The training status of the cloned generator. |
'NEW'
|
Returns:
Name | Type | Description |
---|---|---|
Generator |
Generator
|
The cloned generator object. |
mostlyai.domain.Generator.config ¶
Retrieve writable generator properties.
Returns:
Name | Type | Description |
---|---|---|
GeneratorConfig |
GeneratorConfig
|
The generator properties as a configuration object. |
mostlyai.domain.Generator.delete ¶
Delete the generator.
Returns:
Type | Description |
---|---|
None
|
None |
mostlyai.domain.Generator.export_to_file ¶
Export generator and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | None
|
The file path to save the generator. |
None
|
Returns:
Type | Description |
---|---|
Path
|
The path to the saved file. |
mostlyai.domain.Generator.update ¶
Update a generator with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the generator. |
None
|
description
|
str | None
|
The description of the generator. |
None
|
Synthetic Datasets¶
mostlyai.client.synthetic_datasets._MostlySyntheticDatasetsClient.create ¶
Create a synthetic dataset. The synthetic dataset will be in the NEW state and will need to be generated before it can be used.
See mostly.generate
for more details.
Example for creating a synthetic dataset
from mostlyai import MostlyAI
mostly = MostlyAI()
sd = mostly.synthetic_datasets.create(
config=SyntheticDatasetConfig(
generator_id="INSERT_YOUR_GENERATOR_ID",
)
)
print("status:", sd.generation_status)
# status: NEW
sd.generation.start() # start generation
print("status:", sd.generation_status)
# status: QUEUED
sd.generation.wait() # wait for generation to complete
print("status:", sd.generation_status)
# status: DONE
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
SyntheticDatasetConfig | dict[str, Any]
|
Configuration for the synthetic dataset. |
required |
Returns:
Type | Description |
---|---|
SyntheticDataset
|
The created synthetic dataset object. |
mostlyai.client.synthetic_datasets._MostlySyntheticDatasetsClient.get ¶
Retrieve a synthetic dataset by its ID.
Example for retrieving a synthetic dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synthetic_dataset_id
|
str
|
The unique identifier of the synthetic dataset. |
required |
Returns:
Name | Type | Description |
---|---|---|
SyntheticDataset |
SyntheticDataset
|
The retrieved synthetic dataset object. |
mostlyai.client.synthetic_datasets._MostlySyntheticDatasetsClient.list ¶
List synthetic datasets.
Paginate through all synthetic datasets accessible by the user.
Example for listing all synthetic datasets
Example for searching generated synthetic datasets via key word
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
Offset for the entities in the response. |
0
|
limit
|
int
|
Limit for the number of entities in the response. |
50
|
status
|
str | list[str] | None
|
Filter by generation status. |
None
|
search_term
|
str | None
|
Filter by name or description. |
None
|
Returns:
Type | Description |
---|---|
Iterator[SyntheticDatasetListItem]
|
An iterator over synthetic datasets. |
Synthetic Dataset¶
A synthetic dataset is created based on a trained generator.
It consists of synthetic samples, as well as a quality assurance report.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a synthetic dataset. |
required |
generator
|
BaseResource | None
|
|
None
|
metadata
|
Metadata
|
|
required |
name
|
str
|
The name of a synthetic dataset. |
required |
description
|
str | None
|
The description of a synthetic dataset. |
None
|
generation_status
|
ProgressStatus
|
|
required |
generation_time
|
datetime | None
|
The UTC date and time when the generation has finished. |
None
|
tables
|
list[SyntheticTable] | None
|
The tables of this synthetic dataset. |
None
|
delivery
|
SyntheticDatasetDelivery | None
|
|
None
|
accuracy
|
float | None
|
The overall accuracy of the trained generator. This is the average of the overall accuracy scores of all trained models. |
None
|
usage
|
SyntheticDatasetUsage | None
|
|
None
|
generation
|
Any | None
|
|
None
|
mostlyai.domain.SyntheticDataset.Generation ¶
mostlyai.domain.SyntheticDataset.Generation.progress ¶
Retrieve the progress of the generation process.
Returns:
Name | Type | Description |
---|---|---|
JobProgress |
JobProgress
|
The progress of the generation process. |
mostlyai.domain.SyntheticDataset.Generation.wait ¶
Poll the generation progress and wait until the process is complete.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
progress_bar
|
bool
|
If true, displays a progress bar. |
True
|
interval
|
float
|
Interval in seconds to poll the job progress. |
2
|
mostlyai.domain.SyntheticDataset.config ¶
Retrieve writable synthetic dataset properties.
Returns:
Name | Type | Description |
---|---|---|
SyntheticDatasetConfig |
SyntheticDatasetConfig
|
The synthetic dataset properties as a configuration object. |
mostlyai.domain.SyntheticDataset.data ¶
Download synthetic dataset and return as dictionary of pandas DataFrames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
return_type
|
Literal['auto', 'dict']
|
The format of the returned data. |
'auto'
|
Returns:
Type | Description |
---|---|
DataFrame | dict[str, DataFrame]
|
Union[pd.DataFrame, dict[str, pd.DataFrame]]: The synthetic dataset as a dictionary of pandas DataFrames. |
mostlyai.domain.SyntheticDataset.delete ¶
Delete the synthetic dataset.
Returns:
Type | Description |
---|---|
None
|
None |
mostlyai.domain.SyntheticDataset.download ¶
Download synthetic dataset and save to file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format
|
SyntheticDatasetFormat
|
The format of the synthetic dataset. |
'PARQUET'
|
file_path
|
str | Path | None
|
The file path to save the synthetic dataset. |
None
|
Returns:
Type | Description |
---|---|
Path
|
The path to the saved file. |
mostlyai.domain.SyntheticDataset.update ¶
Update a synthetic dataset with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the synthetic dataset. |
None
|
description
|
str | None
|
The description of the synthetic dataset. |
None
|
delivery
|
SyntheticDatasetDelivery | None
|
The delivery configuration for the synthetic dataset. |
None
|
Connectors¶
mostlyai.client.connectors._MostlyConnectorsClient.create ¶
Create a connector and optionally validate the connection before saving.
See mostly.connect
for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
ConnectorConfig | dict[str, Any]
|
Configuration for the connector. |
required |
test_connection
|
bool | None
|
Whether to test the connection before saving the connector |
True
|
Returns:
Type | Description |
---|---|
Connector
|
The created connector object. |
mostlyai.client.connectors._MostlyConnectorsClient.get ¶
Retrieve a connector by its ID.
Example for retrieving a connector
Parameters:
Name | Type | Description | Default |
---|---|---|---|
connector_id
|
str
|
The unique identifier of the connector. |
required |
Returns:
Name | Type | Description |
---|---|---|
Connector |
Connector
|
The retrieved connector object. |
mostlyai.client.connectors._MostlyConnectorsClient.list ¶
List connectors.
Paginate through all connectors accessible by the user. Only connectors that are independent of a table will be returned.
Example for listing all connectors
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
Offset for entities in the response. |
0
|
limit
|
int
|
Limit for the number of entities in the response. |
50
|
access_type
|
str | None
|
Filter by access type (e.g., "SOURCE" or "DESTINATION"). |
None
|
search_term
|
str | None
|
Filter by string in the connector name. |
None
|
Returns:
Type | Description |
---|---|
Iterator[ConnectorListItem]
|
Iterator[ConnectorListItem]: An iterator over connector list items. |
Connector¶
A connector is a connection to a data source or a data destination.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id
|
str
|
The unique identifier of a connector. |
required |
name
|
str
|
The name of a connector. |
required |
type
|
ConnectorType
|
|
required |
access_type
|
ConnectorAccessType
|
|
required |
config
|
dict[str, Any] | None
|
|
None
|
secrets
|
dict[str, str] | None
|
|
None
|
ssl
|
dict[str, str] | None
|
|
None
|
metadata
|
Metadata | None
|
|
None
|
usage
|
ConnectorUsage | None
|
|
None
|
table_id
|
str | None
|
Optional. ID of a source table or a synthetic table, that this connector belongs to. If not set, then this connector is managed independently of any generator or synthetic dataset. |
None
|
mostlyai.domain.Connector.delete ¶
Delete the connector.
Returns:
Type | Description |
---|---|
None
|
None |
mostlyai.domain.Connector.locations ¶
List connector locations.
List the available databases, schemas, tables, or folders for a connector.
For storage connectors, this returns list of folders and files at root, respectively at prefix
level.
For DB connectors, this returns list of schemas (or databases for DBs without schema), respectively list of tables if prefix
is provided.
The formats of the locations are:
- Cloud storage:
AZURE_STORAGE
:container/path
GOOGLE_CLOUD_STORAGE
:bucket/path
S3_STORAGE
:bucket/path
- Database:
BIGQUERY
:dataset.table
DATABRICKS
:schema.table
HIVE
:database.table
MARIADB
:database.table
MSSQL
:schema.table
MYSQL
:database.table
ORACLE
:schema.table
POSTGRES
:schema.table
SNOWFLAKE
:schema.table
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
str
|
The prefix to filter the results by. |
''
|
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
A list of locations (schemas, databases, directories, etc.). |
mostlyai.domain.Connector.schema ¶
Retrieve the schema of the table at a connector location.
Please refer to locations()
for the format of the location.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location
|
str
|
The location of the table. |
required |
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: The retrieved schema. |
mostlyai.domain.Connector.update ¶
Update a connector with specific parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str | None
|
The name of the connector. |
None
|
config
|
dict[str, Any]
|
Connector configuration. |
None
|
secrets
|
dict[str, str]
|
Secret values for the connector. |
None
|
ssl
|
dict[str, str]
|
SSL configuration for the connector. |
None
|
test_connection
|
bool | None
|
If true, validates the connection before saving. |
True
|