pipelinex.extras.datasets.pandas package¶
Submodules¶
pipelinex.extras.datasets.pandas.csv_local module¶
CSVLocalDataSet
loads and saves data to a local csv file. The
underlying functionality is supported by pandas, so it supports all
allowed pandas options for loading and saving csv files.
-
class
pipelinex.extras.datasets.pandas.csv_local.
CSVLocalDataSet
(filepath, load_args=None, save_args=None, version=None)[source]¶ Bases:
kedro.io.core.AbstractVersionedDataSet
CSVLocalDataSet
loads and saves data to a local csv file. The underlying functionality is supported by pandas, so it supports all allowed pandas options for loading and saving csv files.Example:
from kedro.io import CSVLocalDataSet import pandas as pd data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5], 'col3': [5, 6]}) data_set = CSVLocalDataSet(filepath="test.csv", load_args=None, save_args={"index": False}) data_set.save(data) reloaded = data_set.load() assert data.equals(reloaded)
-
DEFAULT_LOAD_ARGS
: Dict[str, Any] = {}¶
-
DEFAULT_SAVE_ARGS
: Dict[str, Any] = {'index': False}¶
-
__init__
(filepath, load_args=None, save_args=None, version=None)[source]¶ Creates a new instance of
CSVLocalDataSet
pointing to a concrete filepath.- Parameters
filepath (
str
) – path to a csv file.load_args (
Optional
[Dict
[str
,Any
]]) – Pandas options for loading csv files. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html All defaults are preserved.save_args (
Optional
[Dict
[str
,Any
]]) – Pandas options for saving csv files. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html All defaults are preserved, but “index”, which is set to False.version (
Optional
[Version
]) – If specified, should be an instance ofkedro.io.core.Version
. If itsload
attribute is None, the latest version will be loaded. If itssave
attribute is None, save version will be autogenerated.
- Raises
ValueError – If ‘filepath’ looks like a remote path.
-
pipelinex.extras.datasets.pandas.efficient_csv_local module¶
-
class
pipelinex.extras.datasets.pandas.efficient_csv_local.
EfficientCSVLocalDataSet
(*args, preview_args=None, margin=100.0, verbose=True, **kwargs)[source]¶ Bases:
pipelinex.extras.datasets.pandas.csv_local.CSVLocalDataSet
-
DEFAULT_LOAD_ARGS
: Dict[str, Any] = {'engine': 'c', 'keep_default_na': False, 'na_values': [''], 'skiprows': 0}¶
-
DEFAULT_PREVIEW_ARGS
: Dict[str, Any] = {'low_memory': False, 'nrows': None}¶
-
__init__
(*args, preview_args=None, margin=100.0, verbose=True, **kwargs)[source]¶ Creates a new instance of
PandasDescribeDataSet
pointing to a concrete filepath.- Parameters
args – Positional arguments for
CSVLocalDataSet
preview_args (
Optional
[Dict
[str
,Any
]]) – Arguments passed on todf.describe
. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html for details.kwargs – Keyword arguments for
CSVLocalDataSet
-
pipelinex.extras.datasets.pandas.histgram module¶
pipelinex.extras.datasets.pandas.pandas_cat_matrix module¶
-
class
pipelinex.extras.datasets.pandas.pandas_cat_matrix.
PandasCatMatrixDataSet
(*args, describe_args={}, **kwargs)[source]¶ Bases:
pipelinex.extras.datasets.pandas.csv_local.CSVLocalDataSet
PandasDescribeDataSet
saves output ofdf.describe
.-
__init__
(*args, describe_args={}, **kwargs)[source]¶ Creates a new instance of
PandasCatMatrixDataSet
pointing to a concrete filepath.- Parameters
args – Positional arguments for
CSVLocalDataSet
describe_args (
Dict
[str
,Any
]) – Arguments passed on todf.describe
. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html for details.kwargs – Keyword arguments for
CSVLocalDataSet
-
pipelinex.extras.datasets.pandas.pandas_describe module¶
-
class
pipelinex.extras.datasets.pandas.pandas_describe.
PandasDescribeDataSet
(*args, describe_args={}, **kwargs)[source]¶ Bases:
pipelinex.extras.datasets.pandas.csv_local.CSVLocalDataSet
PandasDescribeDataSet
saves output ofdf.describe
.-
__init__
(*args, describe_args={}, **kwargs)[source]¶ Creates a new instance of
PandasDescribeDataSet
pointing to a concrete filepath.- Parameters
args – Positional arguments for
CSVLocalDataSet
describe_args (
Dict
[str
,Any
]) – Arguments passed on todf.describe
. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html for details.kwargs – Keyword arguments for
CSVLocalDataSet
-