pipelinex.extras.datasets.requests package¶
Submodules¶
pipelinex.extras.datasets.requests.api_dataset module¶
APIDataSet
loads the data from HTTP(S) APIs
and returns them into either as string or json Dict.
It uses the python requests library: https://requests.readthedocs.io/en/master/
-
class
pipelinex.extras.datasets.requests.api_dataset.
APIDataSet
(url=None, method='GET', data=None, params=None, headers=None, auth=None, timeout=60, attribute='', skip_errors=False, transforms=[], session_config={}, pool_config={'http://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}, 'https://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}})[source]¶ Bases:
pipelinex.extras.datasets.core.AbstractDataSet
APIDataSet
loads the data from HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/master/Example:
from kedro.extras.datasets.api import APIDataSet data_set = APIDataSet( url="https://quickstats.nass.usda.gov" params={ "key": "SOME_TOKEN", "format": "JSON", "commodity_desc": "CORN", "statisticcat_des": "YIELD", "agg_level_desc": "STATE", "year": 2000 } ) data = data_set.load()
-
__init__
(url=None, method='GET', data=None, params=None, headers=None, auth=None, timeout=60, attribute='', skip_errors=False, transforms=[], session_config={}, pool_config={'http://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}, 'https://': {'max_retries': 0, 'pool_block': False, 'pool_connections': 10, 'pool_maxsize': 10}})[source]¶ Creates a new instance of
APIDataSet
to fetch data from an API endpoint.- Parameters:
url (
Union
[str
,List
[str
],Dict
[str
,str
],None
]) – The API URL endpoint.method (
str
) – The Method of the request, GET, POST, PUT, DELETE, HEAD, etc…data (
Optional
[Any
]) – The request payload, used for POST, PUT, etc requests https://requests.readthedocs.io/en/master/user/quickstart/#more-complicated-post-requestsparams (
Optional
[Dict
[str
,Any
]]) – The url parameters of the API. https://requests.readthedocs.io/en/master/user/quickstart/#passing-parameters-in-urlsheaders (
Optional
[Dict
[str
,Any
]]) – The HTTP headers. https://requests.readthedocs.io/en/master/user/quickstart/#custom-headersauth (
Union
[Tuple
[str
],AuthBase
,None
]) – Anythingrequests
accepts. Normally it’s either('login', 'password')
, orAuthBase
,HTTPBasicAuth
instance for more complex cases.timeout (
int
) – The wait time in seconds for a response, defaults to 1 minute. https://requests.readthedocs.io/en/master/user/quickstart/#timeoutsattribute (
str
) – The attribute of response to return. Normally it’s either text, which returns pure text,`json`, which returns JSON in Python Dict format, content, which returns a raw content, or `` (empty string), which returns the response object itself. Defaults to `` (empty string).skip_errors (
bool
) – If True, exceptions will not interrupt loading data and be returned instead of the expected responses by _load method. Defaults to False.transforms (
List
[callable
]) – List of callables to transform the output.session_config (
Dict
[str
,Any
]) – Dict of arguments fed to the session.pool_config (
Dict
[str
,Dict
[str
,Any
]]) – Dict of mounting prefix key to Dict of requests.adapters.HTTPAdapter param key to value. https://requests.readthedocs.io/en/master/user/advanced/#transport-adapters https://urllib3.readthedocs.io/en/latest/advanced-usage.html
-