pipelinex.mlflow_on_kedro.hooks.mlflow package¶

Submodules¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_artifacts_logger module¶

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_artifacts_logger.MLflowArtifactsLoggerHook(filepaths_before_pipeline_run=None, filepaths_after_pipeline_run=None, datasets_after_node_run=None, enable_mlflow=True)[source]¶

Bases: object

Logs artifacts of specified file paths and dataset names to MLflow

__init__(filepaths_before_pipeline_run=None, filepaths_after_pipeline_run=None, datasets_after_node_run=None, enable_mlflow=True)[source]¶

Parameters:

filepaths_before_pipeline_run (Optional[List[str]]) – The file paths of artifacts to log before the pipeline is run.
filepaths_after_pipeline_run (Optional[List[str]]) – The file paths of artifacts to log after the pipeline is run.
datasets_after_node_run (Optional[List[str]]) – The dataset names to log after the node is run.
enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]¶

after_pipeline_run(run_params, pipeline, catalog)[source]¶

before_pipeline_run(run_params, pipeline, catalog)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger module¶

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger.MLflowBasicLoggerHook(uri=None, experiment_name=None, artifact_location=None, run_name=None, run_id=None, nested=False, tags=None, offset_hours=0, enable_logging_time_begin=True, enable_logging_time_end=True, enable_logging_time=True, logging_kedro_run_params=[], enable_mlflow=True)[source]¶

Bases: object

Configures and logs duration time for the pipeline to MLflow

__init__(uri=None, experiment_name=None, artifact_location=None, run_name=None, run_id=None, nested=False, tags=None, offset_hours=0, enable_logging_time_begin=True, enable_logging_time_end=True, enable_logging_time=True, logging_kedro_run_params=[], enable_mlflow=True)[source]¶

Parameters:

uri (Optional[str]) – The MLflow tracking server URI. uri arg fed to: https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
experiment_name (Optional[str]) – The experiment name. name arg fed to: https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
artifact_location (Optional[str]) – artifact_location arg fed to: https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
run_name (Optional[str]) – Shown as ‘Run Name’ in MLflow UI.
run_id (Optional[str]) – An existing MLflow experiment run UUID instead of letting MLflow create a new run under the experiment_name. run_id arg fed to: https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.start_run
nested (bool) – nested arg fed to: https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.start_run
tags (Optional[Dict[str, Any]]) – tags arg fed to: https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.start_run
offset_hours (float) – The offset hour (e.g. 0 for UTC+00:00) to log in MLflow. 0 in default.
enable_logging_time_begin (bool) – Enable logging the time the Kedro pipeline began. True in default.
enable_logging_time_end (bool) – Enable logging the time the Kedro pipeline ended. True in default.
enable_logging_time (bool) – Enable logging the time duration the Kedro pipeline ran. True in default.
logging_kedro_run_params (Union[List[str], str]) – List of Kedro Run Params to log to MLflow or “__ALL__” to log all. [] (Empty) in default.
enable_mlflow (bool) – Enable configuring and logging to MLflow.

after_catalog_created()[source]¶

after_pipeline_run(run_params, pipeline, catalog)[source]¶

before_pipeline_run(run_params, pipeline, catalog)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger.get_timestamp(dt=None, offset_hours=0, fmt='%Y-%m-%dT%H:%M:%S')[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger.get_timestamp_int(dt=None, offset_hours=0)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger.get_timestamps(dt=None, offset_hours=0)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger module¶

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger.MLflowCatalogLoggerHook(auto=True, mlflow_catalog={}, enable_mlflow=True)[source]¶

Bases: object

Logs datasets to MLflow

__init__(auto=True, mlflow_catalog={}, enable_mlflow=True)[source]¶

Parameters:

auto (bool) – If True, each dataset (Python func input/output) not listed in the catalog
be logged following the same rule as "a" option below. (will) –
mlflow_catalog (Dict[str, Union[str, AbstractDataSet]]) – [Deprecated in favor of MLflowDataSet] Specify how to log each dataset
func input/output) ((Python) –
- If set to “p”, the value will be saved/loaded as an MLflow parameter (string).
- If set to “m”, the value will be saved/loaded as an MLflow metric (numeric).
- If set to “a”, the value will be saved/loaded based on the data type.
  - If the data type is either {float, int}, the value will be saved/loaded as an MLflow metric.
  - If the data type is either {str, list, tuple, set}, the value will be saved/load as an MLflow parameter.
  - If the data type is dict, the value will be flattened with dot (“.”) as the separator and then saved/loaded as either an MLflow metric or parameter based on each data type as explained above.
- If set to either {“json”, “csv”, “xls”, “parquet”, “png”, “jpg”, “jpeg”, “img”, “pkl”, “txt”, “yml”, “yaml”}, the backend dataset instance will be created accordingly to save/load as an MLflow artifact.
- If set to a Kedro DataSet object or a dictionary, it will be used as the backend dataset to save/load as an MLflow artifact.
- If set to None (default), MLflow logging will be skipped.
enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]¶

before_pipeline_run(run_params, pipeline, catalog)[source]¶

Return type:: None

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger.get_kedro_runner()[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger.mlflow_log_dataset(dataset, enable_mlflow=True)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger.running_parallel()[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_datasets_logger module¶

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_datasets_logger.MLflowDataSetsLoggerHook(enable_mlflow=True)[source]¶

Bases: object

Logs datasets of (list of) float/int and str classes to MLflow

__init__(enable_mlflow=True)[source]¶

Parameters:: enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]¶

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_datasets_logger.MLflowOutputsLoggerHook(enable_mlflow=True)[source]¶

Bases: pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_datasets_logger.MLflowDataSetsLoggerHook

Deprecated alias for MLflowOutputsLoggerHook

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger module¶

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger.MLflowEnvVarsLoggerHook(param_env_vars=None, metric_env_vars=None, prefix=None, enable_mlflow=True)[source]¶

Bases: object

Logs environment variables to MLflow

__init__(param_env_vars=None, metric_env_vars=None, prefix=None, enable_mlflow=True)[source]¶

Parameters:

param_env_vars (Optional[List[str]]) – Environment variables to log to MLflow as parameters
metric_env_vars (Optional[List[str]]) – Environment variables to log to MLflow as metrics
prefix (Optional[str]) – Prefix to add to each name of MLflow parameters and metrics (“env..” in default)
enable_mlflow (bool) – Enable logging to MLflow.

after_pipeline_run()[source]¶

before_pipeline_run()[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger.env_vars_to_dict(env_vars=[], prefix='')[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger.log_metric_env_vars(env_vars=[], prefix='', enable_mlflow=True)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger.log_param_env_vars(env_vars=[], prefix='', enable_mlflow=True)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_time_logger module¶

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_time_logger.MLflowTimeLoggerHook(gantt_filepath=None, gantt_params={}, metric_name_prefix='_time_to_run ', task_name_func=<function _get_task_name>, time_log_filepath=None, enable_plotly=True, enable_mlflow=True)[source]¶

Bases: object

Logs duration time to run each node (task) to MLflow. Optionally, the execution logs can be visualized as a Gantt chart by plotly.figure_factory.create_gantt (https://plotly.github.io/plotly.py-docs/generated/plotly.figure_factory.create_gantt.html) if plotly is installed.

__init__(gantt_filepath=None, gantt_params={}, metric_name_prefix='_time_to_run ', task_name_func=<function _get_task_name>, time_log_filepath=None, enable_plotly=True, enable_mlflow=True)[source]¶

Parameters:

gantt_filepath (Optional[str]) – File path to save the generated gantt chart.
gantt_params (Dict[str, Any]) – Args fed to: https://plotly.github.io/plotly.py-docs/generated/plotly.figure_factory.create_gantt.html
metric_name_prefix (str) – Prefix for the metric names. The metric names are metric_name_prefix concatenated with the string returned by task_name_func.
task_name_func (Callable[[Node], str]) – Callable to return the task name using kedro.pipeline.node.Node object.
time_log_filepath (Optional[str]) – File path to save the time log in JSON format.
enable_plotly (bool) – Enable visualization of logged time as a gantt chart.
enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]¶

after_pipeline_run(run_params, pipeline, catalog)[source]¶

before_node_run(node, catalog, inputs)[source]¶

load_time_dict(key)[source]¶

update_time_dict(key, d)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_time_logger.dump_dict(filepath, d)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_time_logger.load_dict(filepath)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils module¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_end_run(enable_mlflow=True)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_log_artifacts(paths, artifact_path=None, enable_mlflow=True)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_log_metrics(metrics, step=None, enable_mlflow=True)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_log_params(params, enable_mlflow=True)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_log_values(d, enable_mlflow=True)[source]¶

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_start_run(uri=None, run_id=None, experiment_name=None, artifact_location=None, run_name=None, nested=False, tags=None, enable_mlflow=True)[source]¶