pipelinex.extras.hooks.mlflow package

Submodules

pipelinex.extras.hooks.mlflow.mlflow_artifacts_logger module

class pipelinex.extras.hooks.mlflow.mlflow_artifacts_logger.MLflowArtifactsLoggerHook(filepaths_before_pipeline_run=None, filepaths_after_pipeline_run=None, datasets_after_node_run=None, enable_mlflow=True)[source]

Bases: object

Logs artifacts of specified file paths and dataset names to MLflow

__init__(filepaths_before_pipeline_run=None, filepaths_after_pipeline_run=None, datasets_after_node_run=None, enable_mlflow=True)[source]
Parameters
  • filepaths_before_pipeline_run (Optional[List[str]]) – The file paths of artifacts to log before the pipeline is run.

  • filepaths_after_pipeline_run (Optional[List[str]]) – The file paths of artifacts to log after the pipeline is run.

  • datasets_after_node_run (Optional[List[str]]) – The dataset names to log after the node is run.

  • enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]
after_pipeline_run(run_params, pipeline, catalog)[source]
before_pipeline_run(run_params, pipeline, catalog)[source]

pipelinex.extras.hooks.mlflow.mlflow_basic_logger module

class pipelinex.extras.hooks.mlflow.mlflow_basic_logger.MLflowBasicLoggerHook(uri=None, experiment_name=None, artifact_location=None, run_name=None, offset_hours=0, enable_logging_time_begin=True, enable_logging_time_end=True, enable_logging_time=True, logging_kedro_run_params=[], enable_mlflow=True)[source]

Bases: object

Configures and logs duration time for the pipeline to MLflow

__init__(uri=None, experiment_name=None, artifact_location=None, run_name=None, offset_hours=0, enable_logging_time_begin=True, enable_logging_time_end=True, enable_logging_time=True, logging_kedro_run_params=[], enable_mlflow=True)[source]
Parameters
after_catalog_created()[source]
after_pipeline_run(run_params, pipeline, catalog)[source]
before_pipeline_run(run_params, pipeline, catalog)[source]
pipelinex.extras.hooks.mlflow.mlflow_basic_logger.get_timestamp(dt=None, offset_hours=0, fmt='%Y-%m-%dT%H:%M:%S')[source]
pipelinex.extras.hooks.mlflow.mlflow_basic_logger.get_timestamp_int(dt=None, offset_hours=0)[source]
pipelinex.extras.hooks.mlflow.mlflow_basic_logger.get_timestamps(dt=None, offset_hours=0)[source]

pipelinex.extras.hooks.mlflow.mlflow_catalog_logger module

class pipelinex.extras.hooks.mlflow.mlflow_catalog_logger.MLflowCatalogLoggerHook(auto=True, mlflow_catalog={}, enable_mlflow=True)[source]

Bases: object

Logs datasets to MLflow

__init__(auto=True, mlflow_catalog={}, enable_mlflow=True)[source]
Parameters
  • auto (bool) – If True, for each dataset (Python func input/output) not listed in mlflow_catalog, log as a metric for float and int types, and log as a param for str, list, tuple, dict, and set types.

  • mlflow_catalog (Dict[str, Union[str, AbstractDataSet]]) – Specify how to log each dataset (Python func input/output) in dict format. Specify “p” to log as a parameter, “m” to log as a metric, either a file extension, “json”, “csv”, “xls”, “parquet”, “png”, “jpg”, “jpeg”, “img”, “pkl”, “txt”, “yml”, or “yaml”, or Kedro DataSet instance to log as a corresponding file artifact.

  • enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]
before_pipeline_run(run_params, pipeline, catalog)[source]
Return type

None

pipelinex.extras.hooks.mlflow.mlflow_catalog_logger.get_kedro_runner()[source]
pipelinex.extras.hooks.mlflow.mlflow_catalog_logger.mlflow_log_dataset(dataset, enable_mlflow=True)[source]
pipelinex.extras.hooks.mlflow.mlflow_catalog_logger.running_parallel()[source]

pipelinex.extras.hooks.mlflow.mlflow_datasets_logger module

class pipelinex.extras.hooks.mlflow.mlflow_datasets_logger.MLflowDataSetsLoggerHook(enable_mlflow=True)[source]

Bases: object

Logs datasets of (list of) float/int and str classes to MLflow

__init__(enable_mlflow=True)[source]
Parameters

enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]
class pipelinex.extras.hooks.mlflow.mlflow_datasets_logger.MLflowOutputsLoggerHook(enable_mlflow=True)[source]

Bases: pipelinex.extras.hooks.mlflow.mlflow_datasets_logger.MLflowDataSetsLoggerHook

Deprecated alias for MLflowOutputsLoggerHook

pipelinex.extras.hooks.mlflow.mlflow_env_vars_logger module

class pipelinex.extras.hooks.mlflow.mlflow_env_vars_logger.MLflowEnvVarsLoggerHook(param_env_vars=None, metric_env_vars=None, prefix=None, enable_mlflow=True)[source]

Bases: object

Logs environment variables to MLflow

__init__(param_env_vars=None, metric_env_vars=None, prefix=None, enable_mlflow=True)[source]
Parameters
  • param_env_vars (Optional[List[str]]) – Environment variables to log to MLflow as parameters

  • metric_env_vars (Optional[List[str]]) – Environment variables to log to MLflow as metrics

  • prefix (Optional[str]) – Prefix to add to each name of MLflow parameters and metrics (“env..” in default)

  • enable_mlflow (bool) – Enable logging to MLflow.

after_pipeline_run()[source]
before_pipeline_run()[source]
pipelinex.extras.hooks.mlflow.mlflow_env_vars_logger.env_vars_to_dict(env_vars=[], prefix='')[source]
pipelinex.extras.hooks.mlflow.mlflow_env_vars_logger.log_metric_env_vars(env_vars=[], prefix='', enable_mlflow=True)[source]
pipelinex.extras.hooks.mlflow.mlflow_env_vars_logger.log_param_env_vars(env_vars=[], prefix='', enable_mlflow=True)[source]

pipelinex.extras.hooks.mlflow.mlflow_time_logger module

class pipelinex.extras.hooks.mlflow.mlflow_time_logger.MLflowTimeLoggerHook(gantt_filepath=None, gantt_params={}, metric_name_prefix='_time_to_run ', task_name_func=<function _get_task_name>, time_log_filepath=None, enable_plotly=True, enable_mlflow=True)[source]

Bases: object

Logs duration time to run each node (task) to MLflow. Optionally, the execution logs can be visualized as a Gantt chart by plotly.figure_factory.create_gantt (https://plotly.github.io/plotly.py-docs/generated/plotly.figure_factory.create_gantt.html) if plotly is installed.

__init__(gantt_filepath=None, gantt_params={}, metric_name_prefix='_time_to_run ', task_name_func=<function _get_task_name>, time_log_filepath=None, enable_plotly=True, enable_mlflow=True)[source]
Parameters
  • gantt_filepath (Optional[str]) – File path to save the generated gantt chart.

  • gantt_params (Dict[str, Any]) – Args fed to: https://plotly.github.io/plotly.py-docs/generated/plotly.figure_factory.create_gantt.html

  • metric_name_prefix (str) – Prefix for the metric names. The metric names are metric_name_prefix concatenated with the string returned by task_name_func.

  • task_name_func (Callable[[Node], str]) – Callable to return the task name using kedro.pipeline.node.Node object.

  • time_log_filepath (Optional[str]) – File path to save the time log in JSON format.

  • enable_plotly (bool) – Enable visualization of logged time as a gantt chart.

  • enable_mlflow (bool) – Enable logging to MLflow.

after_node_run(node, catalog, inputs, outputs)[source]
after_pipeline_run(run_params, pipeline, catalog)[source]
before_node_run(node, catalog, inputs)[source]
load_time_dict(key)[source]
update_time_dict(key, d)[source]
pipelinex.extras.hooks.mlflow.mlflow_time_logger.dump_dict(filepath, d)[source]
pipelinex.extras.hooks.mlflow.mlflow_time_logger.load_dict(filepath)[source]

pipelinex.extras.hooks.mlflow.mlflow_utils module

pipelinex.extras.hooks.mlflow.mlflow_utils.mlflow_end_run(enable_mlflow=True)[source]
pipelinex.extras.hooks.mlflow.mlflow_utils.mlflow_log_artifacts(paths, artifact_path=None, enable_mlflow=True)[source]
pipelinex.extras.hooks.mlflow.mlflow_utils.mlflow_log_metrics(metrics, step=None, enable_mlflow=True)[source]
pipelinex.extras.hooks.mlflow.mlflow_utils.mlflow_log_params(params, enable_mlflow=True)[source]
pipelinex.extras.hooks.mlflow.mlflow_utils.mlflow_start_run(uri, experiment_name, artifact_location, run_name=None, enable_mlflow=True)[source]