pipelinex.mlflow_on_kedro.hooks.mlflow package

Submodules

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_artifacts_logger module

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_artifacts_logger.MLflowArtifactsLoggerHook(filepaths_before_pipeline_run=None, filepaths_after_pipeline_run=None, datasets_after_node_run=None, enable_mlflow=True)[source]

Bases: object

Logs artifacts of specified file paths and dataset names to MLflow

__init__(filepaths_before_pipeline_run=None, filepaths_after_pipeline_run=None, datasets_after_node_run=None, enable_mlflow=True)[source]
Parameters:
  • filepaths_before_pipeline_run (Optional[List[str]]) – The file paths of artifacts to log before the pipeline is run.

  • filepaths_after_pipeline_run (Optional[List[str]]) – The file paths of artifacts to log after the pipeline is run.

  • datasets_after_node_run (Optional[List[str]]) – The dataset names to log after the node is run.

  • enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]
after_pipeline_run(run_params, pipeline, catalog)[source]
before_pipeline_run(run_params, pipeline, catalog)[source]

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger module

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger.MLflowBasicLoggerHook(uri=None, experiment_name=None, artifact_location=None, run_name=None, run_id=None, nested=False, tags=None, offset_hours=0, enable_logging_time_begin=True, enable_logging_time_end=True, enable_logging_time=True, logging_kedro_run_params=[], enable_mlflow=True)[source]

Bases: object

Configures and logs duration time for the pipeline to MLflow

__init__(uri=None, experiment_name=None, artifact_location=None, run_name=None, run_id=None, nested=False, tags=None, offset_hours=0, enable_logging_time_begin=True, enable_logging_time_end=True, enable_logging_time=True, logging_kedro_run_params=[], enable_mlflow=True)[source]
Parameters:
after_catalog_created()[source]
after_pipeline_run(run_params, pipeline, catalog)[source]
before_pipeline_run(run_params, pipeline, catalog)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger.get_timestamp(dt=None, offset_hours=0, fmt='%Y-%m-%dT%H:%M:%S')[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger.get_timestamp_int(dt=None, offset_hours=0)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_basic_logger.get_timestamps(dt=None, offset_hours=0)[source]

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger module

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger.MLflowCatalogLoggerHook(auto=True, mlflow_catalog={}, enable_mlflow=True)[source]

Bases: object

Logs datasets to MLflow

__init__(auto=True, mlflow_catalog={}, enable_mlflow=True)[source]
Parameters:
  • auto (bool) – If True, each dataset (Python func input/output) not listed in the catalog

  • be logged following the same rule as "a" option below. (will) –

  • mlflow_catalog (Dict[str, Union[str, AbstractDataSet]]) – [Deprecated in favor of MLflowDataSet] Specify how to log each dataset

  • func input/output) ((Python) –

    • If set to “p”, the value will be saved/loaded as an MLflow parameter (string).

    • If set to “m”, the value will be saved/loaded as an MLflow metric (numeric).

    • If set to “a”, the value will be saved/loaded based on the data type.

      • If the data type is either {float, int}, the value will be saved/loaded as an MLflow metric.

      • If the data type is either {str, list, tuple, set}, the value will be saved/load as an MLflow parameter.

      • If the data type is dict, the value will be flattened with dot (“.”) as the separator and then saved/loaded as either an MLflow metric or parameter based on each data type as explained above.

    • If set to either {“json”, “csv”, “xls”, “parquet”, “png”, “jpg”, “jpeg”, “img”, “pkl”, “txt”, “yml”, “yaml”}, the backend dataset instance will be created accordingly to save/load as an MLflow artifact.

    • If set to a Kedro DataSet object or a dictionary, it will be used as the backend dataset to save/load as an MLflow artifact.

    • If set to None (default), MLflow logging will be skipped.

  • enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]
before_pipeline_run(run_params, pipeline, catalog)[source]
Return type:

None

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger.get_kedro_runner()[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger.mlflow_log_dataset(dataset, enable_mlflow=True)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_catalog_logger.running_parallel()[source]

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_datasets_logger module

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_datasets_logger.MLflowDataSetsLoggerHook(enable_mlflow=True)[source]

Bases: object

Logs datasets of (list of) float/int and str classes to MLflow

__init__(enable_mlflow=True)[source]
Parameters:

enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]
class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_datasets_logger.MLflowOutputsLoggerHook(enable_mlflow=True)[source]

Bases: pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_datasets_logger.MLflowDataSetsLoggerHook

Deprecated alias for MLflowOutputsLoggerHook

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger module

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger.MLflowEnvVarsLoggerHook(param_env_vars=None, metric_env_vars=None, prefix=None, enable_mlflow=True)[source]

Bases: object

Logs environment variables to MLflow

__init__(param_env_vars=None, metric_env_vars=None, prefix=None, enable_mlflow=True)[source]
Parameters:
  • param_env_vars (Optional[List[str]]) – Environment variables to log to MLflow as parameters

  • metric_env_vars (Optional[List[str]]) – Environment variables to log to MLflow as metrics

  • prefix (Optional[str]) – Prefix to add to each name of MLflow parameters and metrics (“env..” in default)

  • enable_mlflow (bool) – Enable logging to MLflow.

after_pipeline_run()[source]
before_pipeline_run()[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger.env_vars_to_dict(env_vars=[], prefix='')[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger.log_metric_env_vars(env_vars=[], prefix='', enable_mlflow=True)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_env_vars_logger.log_param_env_vars(env_vars=[], prefix='', enable_mlflow=True)[source]

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_time_logger module

class pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_time_logger.MLflowTimeLoggerHook(gantt_filepath=None, gantt_params={}, metric_name_prefix='_time_to_run ', task_name_func=<function _get_task_name>, time_log_filepath=None, enable_plotly=True, enable_mlflow=True)[source]

Bases: object

Logs duration time to run each node (task) to MLflow. Optionally, the execution logs can be visualized as a Gantt chart by plotly.figure_factory.create_gantt (https://plotly.github.io/plotly.py-docs/generated/plotly.figure_factory.create_gantt.html) if plotly is installed.

__init__(gantt_filepath=None, gantt_params={}, metric_name_prefix='_time_to_run ', task_name_func=<function _get_task_name>, time_log_filepath=None, enable_plotly=True, enable_mlflow=True)[source]
Parameters:
  • gantt_filepath (Optional[str]) – File path to save the generated gantt chart.

  • gantt_params (Dict[str, Any]) – Args fed to: https://plotly.github.io/plotly.py-docs/generated/plotly.figure_factory.create_gantt.html

  • metric_name_prefix (str) – Prefix for the metric names. The metric names are metric_name_prefix concatenated with the string returned by task_name_func.

  • task_name_func (Callable[[Node], str]) – Callable to return the task name using kedro.pipeline.node.Node object.

  • time_log_filepath (Optional[str]) – File path to save the time log in JSON format.

  • enable_plotly (bool) – Enable visualization of logged time as a gantt chart.

  • enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]
after_pipeline_run(run_params, pipeline, catalog)[source]
before_node_run(node, catalog, inputs)[source]
load_time_dict(key)[source]
update_time_dict(key, d)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_time_logger.dump_dict(filepath, d)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_time_logger.load_dict(filepath)[source]

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils module

pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_end_run(enable_mlflow=True)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_log_artifacts(paths, artifact_path=None, enable_mlflow=True)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_log_metrics(metrics, step=None, enable_mlflow=True)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_log_params(params, enable_mlflow=True)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_log_values(d, enable_mlflow=True)[source]
pipelinex.mlflow_on_kedro.hooks.mlflow.mlflow_utils.mlflow_start_run(uri=None, run_id=None, experiment_name=None, artifact_location=None, run_name=None, nested=False, tags=None, enable_mlflow=True)[source]