pipelinex.extras.datasets.pandas package

Submodules

pipelinex.extras.datasets.pandas.csv_local module

CSVLocalDataSet loads and saves data to a local csv file. The underlying functionality is supported by pandas, so it supports all allowed pandas options for loading and saving csv files.

class pipelinex.extras.datasets.pandas.csv_local.CSVLocalDataSet(filepath, load_args=None, save_args=None, version=None)[source]

Bases: kedro.io.core.AbstractVersionedDataSet

CSVLocalDataSet loads and saves data to a local csv file. The underlying functionality is supported by pandas, so it supports all allowed pandas options for loading and saving csv files.

Example:

from kedro.io import CSVLocalDataSet
import pandas as pd

data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
                     'col3': [5, 6]})
data_set = CSVLocalDataSet(filepath="test.csv",
                                 load_args=None,
                                 save_args={"index": False})
data_set.save(data)
reloaded = data_set.load()

assert data.equals(reloaded)
DEFAULT_LOAD_ARGS: Dict[str, Any] = {}
DEFAULT_SAVE_ARGS: Dict[str, Any] = {'index': False}
__init__(filepath, load_args=None, save_args=None, version=None)[source]

Creates a new instance of CSVLocalDataSet pointing to a concrete filepath.

Parameters
Raises

ValueError – If ‘filepath’ looks like a remote path.

pipelinex.extras.datasets.pandas.efficient_csv_local module

class pipelinex.extras.datasets.pandas.efficient_csv_local.EfficientCSVLocalDataSet(*args, preview_args=None, margin=100.0, verbose=True, **kwargs)[source]

Bases: pipelinex.extras.datasets.pandas.csv_local.CSVLocalDataSet

DEFAULT_LOAD_ARGS: Dict[str, Any] = {'engine': 'c', 'keep_default_na': False, 'na_values': [''], 'skiprows': 0}
DEFAULT_PREVIEW_ARGS: Dict[str, Any] = {'low_memory': False, 'nrows': None}
__init__(*args, preview_args=None, margin=100.0, verbose=True, **kwargs)[source]

Creates a new instance of PandasDescribeDataSet pointing to a concrete filepath.

Parameters
pipelinex.extras.datasets.pandas.efficient_csv_local.dict_string_val_prefix(d, prefix)[source]
pipelinex.extras.datasets.pandas.efficient_csv_local.dict_val_replace_except(d, to_except, new_value)[source]

pipelinex.extras.datasets.pandas.histgram module

class pipelinex.extras.datasets.pandas.histgram.HistgramDataSet(filepath, save_args=None, hist_args=None)[source]

Bases: kedro.io.core.AbstractDataSet

__init__(filepath, save_args=None, hist_args=None)[source]

Initialize self. See help(type(self)) for accurate signature.

pipelinex.extras.datasets.pandas.pandas_cat_matrix module

class pipelinex.extras.datasets.pandas.pandas_cat_matrix.PandasCatMatrixDataSet(*args, describe_args={}, **kwargs)[source]

Bases: pipelinex.extras.datasets.pandas.csv_local.CSVLocalDataSet

PandasDescribeDataSet saves output of df.describe.

__init__(*args, describe_args={}, **kwargs)[source]

Creates a new instance of PandasCatMatrixDataSet pointing to a concrete filepath.

Parameters

pipelinex.extras.datasets.pandas.pandas_describe module

class pipelinex.extras.datasets.pandas.pandas_describe.PandasDescribeDataSet(*args, describe_args={}, **kwargs)[source]

Bases: pipelinex.extras.datasets.pandas.csv_local.CSVLocalDataSet

PandasDescribeDataSet saves output of df.describe.

__init__(*args, describe_args={}, **kwargs)[source]

Creates a new instance of PandasDescribeDataSet pointing to a concrete filepath.

Parameters