Data
Datasets
Datasets represent a collection of timeseries which share a common schema, owner, and are generally subject to consistent entitlements. The easiest way to learn how to interact with datasets is to use the WEATHER example, which contains weather information for several major US cities from the National Weather Service. This dataset is available to all applications and does not need to be requested explicitly.
You can view and request available datasets from the Marquee Data Catalog.
Dataset Objects
All our datasets are either time-based (intraday) or date-based (daily). Datasets return fields that are categorized as either measures or dimensions:
- Measures are facts that are usually quantities and that can be aggregated, such as tickers and exchanges
- Dimensions describe or provide context to measures, like closing prices and volumes
If the data is a curve or surface, then the axes will usually be dimensions.
Date-based Datasets
Date-based datasets have date
listed as a field in their data description. The following
parameters are valid for queries to date-based datasets:
startDate
(optional) - defaults to the end date minus a dataset-specific interval, which is currently 30 daysendDate
(optional) - defaults to the current datedates
(optional) - used to return data from a specific set of dates
Time-based Datasets
Time-based datasets have time and updateTime listed as a field in their data description page. The following parameters are valid for queries to time-based datasets:
startTime
(optional) - defaults to the end time minus a dataset-specificinterval
- currently 24 hours for all real-time datasetsendTime
(optional) - defaults to the current datetimetimes
(optional) - used to return data from a specific set of date times
Using Intervals
Time-based datasets can have very large number of observations over a given window (thousands per
second). In order to interact with this data, our APIs provide the ability to down-sample data on
our servers over given intervals. The intervals
parameter allow you to get data which is evenly
distributed in the specified number of intervals between the time or date range specified. Example:
from gs_quant.data import Dataset
from datetime import date
weather_ds = Dataset('WEATHER')
data_frame = weather_ds.get_data(date(2016, 1, 1), date(2016, 1, 31), city=["Boston"], intervals=3)
print(data_frame)
Output:
city date dewPoint ... snowfall updateTime windSpeed
0 Boston 2016-01-11 12.0 ... 0.0 2017-03-06T16:49:36.472Z 19.0
1 Boston 2016-01-21 7.0 ... 0.0 2017-03-06T16:49:36.473Z 13.6
2 Boston 2016-01-31 29.0 ... 0.0 2017-03-06T16:49:36.476Z 11.6
[3 rows x 10 columns]
Note
API responses are limited to approximately 100MB. If you receive a 400 Bad Request exception with the message "Number of rows returned... are more than maximum allowed", batch your query down into multiple, smaller queries.
Consider using smaller date / time ranges (adjust startTime and endTime or startDate and endDate as needed) or querying for fewer entities (e.g. asset ids, reports) each time.
Field Selection
If you want to ensure the response only contains the fields that you are interested in, you can use
the fields
parameters. In the weather dataset, say that you are only interested in
maxTemperature
and minTemperature
. To only return these two fields, pass in the desired fields
as arguments. Example:
from gs_quant.data import Dataset
from datetime import date
weather_ds = Dataset('WEATHER')
data_frame = weather_ds.get_data(date(2016, 1, 1), date(2016, 1, 2), city=["Boston"], fields=['maxTemperature', 'minTemperature'])
print(data_frame)
Output:
city date maxTemperature minTemperature
0 Boston 2016-01-01 41.0 33.0
1 Boston 2016-01-02 40.0 31.0
Related Content
Weather Dataset
arrow_forwardDataset Catalog
arrow_forwardPandas
arrow_forwardWas this page useful?
Give feedback to help us improve developer.gs.com and serve you better.