pipelinex.extras.datasets.pandas_profiling package¶

Submodules¶

pipelinex.extras.datasets.pandas_profiling.pandas_profiling module¶

class pipelinex.extras.datasets.pandas_profiling.pandas_profiling.PandasProfilingDataSet(filepath, save_args=None, sample_args=None, version=None)[source]¶

Bases: pipelinex.extras.datasets.core.AbstractVersionedDataSet

PandasProfilingDataSet is an AbstractVersionedDataSet to generate pandas profiling report. See https://github.com/pandas-profiling/pandas-profiling for details.

DEFAULT_SAVE_ARGS: Dict[str, Any] = {}¶

__init__(filepath, save_args=None, sample_args=None, version=None)[source]¶

Creates a new instance of PandasProfilingDataSet pointing to a concrete filepath.

Parameters:

filepath (str) – path to a local yaml file.
save_args (Optional[Dict[str, Any]]) – Arguments passed on to df.profile_report such as title. See https://pandas-profiling.github.io/pandas-profiling/docs/ for details. See https://github.com/pandas-profiling/pandas-profiling/blob/master/pandas_profiling/config_default.yaml for default values.
sample_args (Optional[Dict[str, Any]]) – Arguments passed on to df.sample. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html for details.
version (Optional[Version]) – If specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save attribute is None, save version will be autogenerated.