## Flex-Kedro: Kedro plugin for flexible config [API document](https://pipelinex.readthedocs.io/en/latest/pipelinex.flex_kedro.html) Flex-Kedro provides more options to configure Kedro projects flexibly and thus quickly by KFlex-Kedro-Pipeline and Flex-Kedro-Context features. ### Flex-Kedro-Pipeline: Kedro plugin for quicker pipeline set up If you want to define Kedro pipelines quickly, you can consider to use `pipelinex.FlexiblePipeline` instead of `kedro.pipeline.Pipeline`. `pipelinex.FlexiblePipeline` adds the following options to `kedro.pipeline.Pipeline`. #### Dict for nodes To define each node, dict can be used instead of `kedro.pipeline.node`. Example: ```python pipelinex.FlexiblePipeline( nodes=[dict(func=task_func1, inputs="my_input", outputs="my_output")] ) ``` will be equivalent to: ```python kedro.pipeline.Pipeline( nodes=[ kedro.pipeline.node(func=task_func1, inputs="my_input", outputs="my_output") ] ) ``` #### Sequential nodes For sub-pipelines consisting of nodes of only single input and single output, you can optionally use Sequential API similar to PyTorch (`torch.nn.Sequential`) and Keras (`tf.keras.Sequential`) Example: ```python pipelinex.FlexiblePipeline( nodes=[ dict( func=[task_func1, task_func2, task_func3], inputs="my_input", outputs="my_output", ) ] ) ``` will be equivalent to: ```python kedro.pipeline.Pipeline( nodes=[ kedro.pipeline.node( func=task_func1, inputs="my_input", outputs="my_output__001" ), kedro.pipeline.node( func=task_func2, inputs="my_output__001", outputs="my_output__002" ), kedro.pipeline.node( func=task_func3, inputs="my_output__002", outputs="my_output" ), ] ) ``` #### Decorators without using the method - Optionally specify the Python function decorator(s) to apply to multiple nodes under the pipeline using `decorator` argument instead of using [`decorate`](https://kedro.readthedocs.io/en/stable/kedro.pipeline.Pipeline.html#kedro.pipeline.Pipeline.decorate) method of `kedro.pipeline.Pipeline`. Example: ```python pipelinex.FlexiblePipeline( nodes=[ kedro.pipeline.node(func=task_func1, inputs="my_input", outputs="my_output") ], decorator=[task_deco, task_deco], ) ``` will be equivalent to: ```python kedro.pipeline.Pipeline( nodes=[ kedro.pipeline.node(func=task_func1, inputs="my_input", outputs="my_output") ] ).decorate(task_deco, task_deco) ``` - Optionally specify the default python module (path of .py file) if you do not want to repeat the same (deep and/or long) Python module (e.g. `foo.bar.my_task1`, `foo.bar.my_task2`, etc.) ### Flex-Kedro-Context: Kedro plugin for YAML lovers If you want to take advantage of YAML more than Kedro supports, you can consider to use `pipelinex.FlexibleContext` instead of `kedro.framework.context.KedroContext`. `pipelinex.FlexibleContext` adds preprocess of `parameters.yml` and `catalog.yml` to `kedro.framework.context.KedroContext` to provide flexibility. This option is for YAML lovers only. If you don't like YAML very much, skip this one. #### Define Kedro pipelines in `parameters.yml` You can define the inter-task dependency (DAG) for Kedro pipelines in `parameters.yml` using `PIPELINES` key. To define each Kedro pipeline, you can use the `kedro.pipeline.Pipeline` or its variant such as `pipelinex.FlexiblePipeline` as shown below. ```yaml # parameters.yml PIPELINES: __default__: =: pipelinex.FlexiblePipeline module: # Optionally specify the default Python module so you can omit the module name to which functions belongs decorator: # Optionally specify function decorator(s) to apply to each node nodes: - inputs: ["params:model", train_df, "params:cols_features", "params:col_target"] func: sklearn_demo.train_model outputs: model - inputs: [model, test_df, "params:cols_features"] func: sklearn_demo.run_inference outputs: pred_df ``` #### Configure Kedro run config in `parameters.yml` You can specify the run config in `parameters.yml` using `RUN_CONFIG` key instead of specifying the args for `kedro run` command for every run. You can still set the args for `kedro run` to overwrite. In addition to the args for `kedro run`, you can opt to run only missing nodes (skip tasks which have already been run to resume pipeline using the intermediate data files or databases.) by `only_missing` key. ```yaml # parameters.yml RUN_CONFIG: pipeline_name: __default__ runner: SequentialRunner # Set to "ParallelRunner" to run in parallel only_missing: False # Set True to run only missing nodes tags: # None node_names: # None from_nodes: # None to_nodes: # None from_inputs: # None load_versions: # None ``` #### Use `HatchDict` feature in `parameters.yml` You can use `HatchDict` feature in `parameters.yml`. ```yaml # parameters.yml model: =: sklearn.linear_model.LogisticRegression C: 1.23456 max_iter: 987 random_state: 42 cols_features: # Columns used as features in the Titanic data table - Pclass # The passenger's ticket class - Parch # # of parents / children aboard the Titanic col_target: Survived # Column used as the target: whether the passenger survived or not ``` #### Enable caching for Kedro DataSets in `catalog.yml` Enable caching using `cached` key set to True if you do not want Kedro to load the data from disk/database which were in the memory. ([`kedro.io.CachedDataSet`](https://kedro.readthedocs.io/en/latest/kedro.io.CachedDataSet.html#kedro.io.CachedDataSet) is used under the hood.) #### Use `HatchDict` feature in `catalog.yml` You can use `HatchDict` feature in `catalog.yml`.