pipelinex.extras.datasets.requests package

Submodules

pipelinex.extras.datasets.requests.api_dataset module

APIDataSet loads the data from HTTP(S) APIs and returns them into either as string or json Dict. It uses the python requests library: https://requests.readthedocs.io/en/master/

class pipelinex.extras.datasets.requests.api_dataset.APIDataSet(url=None, method='GET', data=None, params=None, headers=None, auth=None, timeout=60, attribute='', skip_errors=False, transforms=[], session_config={}, pool_config={'http://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}, 'https://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}})[source]

Bases: kedro.io.core.AbstractDataSet

APIDataSet loads the data from HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/master/

Example:

from kedro.extras.datasets.api import APIDataSet


data_set = APIDataSet(
    url="https://quickstats.nass.usda.gov"
    params={
        "key": "SOME_TOKEN",
        "format": "JSON",
        "commodity_desc": "CORN",
        "statisticcat_des": "YIELD",
        "agg_level_desc": "STATE",
        "year": 2000
    }
)
data = data_set.load()
__init__(url=None, method='GET', data=None, params=None, headers=None, auth=None, timeout=60, attribute='', skip_errors=False, transforms=[], session_config={}, pool_config={'http://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}, 'https://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}})[source]

Creates a new instance of APIDataSet to fetch data from an API endpoint.

Parameters