pipelinex.extras.datasets.requests package¶

Submodules¶

pipelinex.extras.datasets.requests.api_dataset module¶

APIDataSet loads the data from HTTP(S) APIs and returns them into either as string or json Dict. It uses the python requests library: https://requests.readthedocs.io/en/master/

class pipelinex.extras.datasets.requests.api_dataset.APIDataSet(url=None, method='GET', data=None, params=None, headers=None, auth=None, timeout=60, attribute='', skip_errors=False, transforms=[], session_config={}, pool_config={'http://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}, 'https://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}})[source]¶

Bases: pipelinex.extras.datasets.core.AbstractDataSet

APIDataSet loads the data from HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/master/

Example:

from kedro.extras.datasets.api import APIDataSet


data_set = APIDataSet(
    url="https://quickstats.nass.usda.gov"
    params={
        "key": "SOME_TOKEN",
        "format": "JSON",
        "commodity_desc": "CORN",
        "statisticcat_des": "YIELD",
        "agg_level_desc": "STATE",
        "year": 2000
    }
)
data = data_set.load()

__init__(url=None, method='GET', data=None, params=None, headers=None, auth=None, timeout=60, attribute='', skip_errors=False, transforms=[], session_config={}, pool_config={'http://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}, 'https://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}})[source]¶

Creates a new instance of APIDataSet to fetch data from an API endpoint.

Parameters:

url (Union[str, List[str], Dict[str, str], None]) – The API URL endpoint.
method (str) – The Method of the request, GET, POST, PUT, DELETE, HEAD, etc…
data (Optional[Any]) – The request payload, used for POST, PUT, etc requests https://requests.readthedocs.io/en/master/user/quickstart/#more-complicated-post-requests
params (Optional[Dict[str, Any]]) – The url parameters of the API. https://requests.readthedocs.io/en/master/user/quickstart/#passing-parameters-in-urls
headers (Optional[Dict[str, Any]]) – The HTTP headers. https://requests.readthedocs.io/en/master/user/quickstart/#custom-headers
auth (Union[Tuple[str], AuthBase, None]) – Anything requests accepts. Normally it’s either ('login', 'password'), or AuthBase, HTTPBasicAuth instance for more complex cases.
timeout (int) – The wait time in seconds for a response, defaults to 1 minute. https://requests.readthedocs.io/en/master/user/quickstart/#timeouts
attribute (str) – The attribute of response to return. Normally it’s either text, which returns pure text,`json`, which returns JSON in Python Dict format, content, which returns a raw content, or `` (empty string), which returns the response object itself. Defaults to `` (empty string).
skip_errors (bool) – If True, exceptions will not interrupt loading data and be returned instead of the expected responses by _load method. Defaults to False.
transforms (List[callable]) – List of callables to transform the output.
session_config (Dict[str, Any]) – Dict of arguments fed to the session.
pool_config (Dict[str, Dict[str, Any]]) – Dict of mounting prefix key to Dict of requests.adapters.HTTPAdapter param key to value. https://requests.readthedocs.io/en/master/user/advanced/#transport-adapters https://urllib3.readthedocs.io/en/latest/advanced-usage.html