pipelinex.mlflow_on_kedro.datasets.mlflow package
Submodules
pipelinex.mlflow_on_kedro.datasets.mlflow.mlflow_dataset module
- class pipelinex.mlflow_on_kedro.datasets.mlflow.mlflow_dataset.MLflowDataSet(dataset=None, filepath=None, dataset_name=None, saving_tracking_uri=None, saving_experiment_name=None, saving_run_id=None, loading_tracking_uri=None, loading_run_id=None, caching=True, copy_mode=None, file_caching=True)[source]
Bases:
AbstractDatasetMLflowDataSetsaves data to, and loads data from MLflow.You can also specify a
MLflowDataSetin catalog.ymlExample:
test_ds: type: MLflowDataSet dataset: pkl
- __init__(dataset=None, filepath=None, dataset_name=None, saving_tracking_uri=None, saving_experiment_name=None, saving_run_id=None, loading_tracking_uri=None, loading_run_id=None, caching=True, copy_mode=None, file_caching=True)[source]
- Parameters:
dataset (
Union[AbstractDataset,Dict,str]) –Specify how to treat the dataset as an MLflow metric, parameter, or artifact.
If set to “p”, the value will be saved/loaded as an MLflow parameter (string).
If set to “m”, the value will be saved/loaded as an MLflow metric (numeric).
If set to “a”, the value will be saved/loaded based on the data type.
If the data type is either {float, int}, the value will be saved/loaded as an MLflow metric.
If the data type is either {str, list, tuple, set}, the value will be saved/load as an MLflow parameter.
If the data type is dict, the value will be flattened with dot (“.”) as the separator and then saved/loaded as either an MLflow metric or parameter based on each data type as explained above.
If set to either {“json”, “csv”, “xls”, “parquet”, “png”, “jpg”, “jpeg”, “img”, “pkl”, “txt”, “yml”, “yaml”}, the backend dataset instance will be created accordingly to save/load as an MLflow artifact.
If set to a Kedro DataSet object or a dictionary, it will be used as the backend dataset to save/load as an MLflow artifact.
If set to None (default), MLflow logging will be skipped.
filepath (
str) – File path, usually in local file system, to save to and load from. Used only if the dataset arg is a string. If None (default),<temp directory>/<dataset_name arg>.<dataset arg>is used.dataset_name (
str) – Used only if the dataset arg is a string and filepath arg is None. If None (default), Python object ID is used, but will be overwritten by MLflowCatalogLoggerHook.saving_tracking_uri (
str) – MLflow Tracking URI to save to. If None (default), MLFLOW_TRACKING_URI environment variable is used.saving_experiment_name (
str) – MLflow experiment name to save to. If None (default), new experiment will not be created or started. Ignored if saving_run_id is set.saving_run_id (
str) – An existing MLflow experiment run ID to save to. If None (default), no existing experiment run will be resumed.loading_tracking_uri (
str) – MLflow Tracking URI to load from. If None (default), MLFLOW_TRACKING_URI environment variable is used.loading_run_id (
str) – MLflow experiment run ID to load from. If None (default), current active run ID will be used if available.caching (
bool) – Enable caching if parallel runner is not used. True in default.copy_mode (
str) – The copy mode used to copy the data. Possible values are: “deepcopy”, “copy” and “assign”. If not provided, it is inferred based on the data type. Ignored if caching arg is False.file_caching (
bool) – Attempt to use the file at filepath when loading if no cache found in memory. True in default.
- load()
Loads data by delegation to the provided load method.
- Return type:
None- Returns:
Data returned by the provided load method.
- Raises:
DatasetError – When underlying load method raises error.
- save(data)
Saves data by delegation to the provided save method.
- Parameters:
data (
Any) – the value to be saved by provided save method.- Raises:
DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.
- Return type:
None