sailor.sap_iot.wrappers
Timeseries module can be used to retrieve timeseries data from the SAP iot abstract timeseries api.
Here we define some convenience wrappers for timeseries data.
- class sailor.sap_iot.wrappers.TimeseriesDataset(df, indicator_set, equipment_set, nominal_data_start, nominal_data_end, is_normalized=False)[source]
Bases:
object
A Wrapper class to make accessing timeseries data from SAP iot more convenient.
- Parameters
df (pd.DataFrame) –
indicator_set (IndicatorSet) –
equipment_set (EquipmentSet) –
nominal_data_start (pd.Timestamp) –
nominal_data_end (pd.Timestamp) –
is_normalized (bool) –
- aggregate(aggregation_interval, aggregation_functions='mean')[source]
Aggregate the TimeseriesDataset to a fixed interval, returning a new TimeseriesDataset.
This operation will change the unique feature IDs, as the new IDs need to encode the additional information on the aggregation function. Accordingly there will also be an additional column index level for the aggregation function on the DataFrame returned by
sailor.timeseries.wrappers.TimeseriesDataset.as_df()
when usingspeaking_names=True
. Note that the resulting timeseries is not equidistant if gaps larger than the aggregation interval are present in the original timeseries.- Parameters
aggregation_interval (Union[str, pd.Timedelta]) – String specifying the aggregation interval, e.g. ‘1h’ or ‘30min’. Follows the same rules as the
freq
parameter in apandas.Grouper
object.aggregation_functions (Union[Iterable[Union[str, Callable]], str, Callable]) – Aggregation function or iterable of aggregation functions to use. Each aggregation_function can be a string (e.g. ‘mean’, ‘min’ etc) or a function (e.g. np.max etc).
- Return type
- as_df(speaking_names=False, include_model=False)[source]
Return the data stored within this TimeseriesDataset object as a pandas dataframe.
By default the data is returned with opaque column headers. If speaking_names is set to true, the data is converted such that equipment_id and model_id are replaced by human-readable names, and the opaque column headers are replaced by a hierarchical index of template_id, indicator_group_name, indicator_name and aggregation_function.
- filter(start=None, end=None, equipment_set=None, indicator_set=None)[source]
Return a new TimeseriesDataset extracted from an original data with filter parameters.
Only indicator data specified in filters are returned.
- Parameters
start (Union[str, pd.Timestamp, datetime]) – Optional start time of timeseries data are returned.
end (Union[str, pd.Timestamp, datetime]) – Optional end time until timeseries data are returned.
equipment_set (EquipmentSet) – Optional EquipmentSet to filter timeseries data. Takes precedence over equipment_ids.
indicator_set (Union[IndicatorSet, AggregatedIndicatorSet]) – Optional IndicatorSet to filter dataset columns.
- Return type
Example
Filter out indicator data for an equipment ‘MyEquipmentId’ from the indicator data ‘My_indicator_data’:
My_indicator_data.filter(MyEquipmentId)
- get_feature_columns(speaking_names=False)[source]
Get the names of all feature columns.
- Parameters
speaking_names – False, returns feature columns of a data set True, returns corresponding names of feature columns
Example
Get Template id, Indicator group name and Indicator name of columns including indicator values in the data set ‘my_indicator_data’:
my_indicator_data.get_feature_columns(speaking_names=True)
- get_index_columns(speaking_names=False, include_model=False)[source]
Return the names of all index columns (key columns and time column).
- Return type
list
- get_key_columns(speaking_names=False, include_model=False)[source]
Return those columns of the data that identify the asset.
Currently we only support asset type ‘Equipment’ so this will always return columns based on the equipment. In the future other types (like System) will be supported here.
- Parameters
speaking_names – False, return key columns True, return corresponding names of key columns
Example
Get key columns of the indicator data set ‘my_indicator_data’:
my_indicator_data.get_key_columns()
- interpolate(interval, method='pad', **kwargs)[source]
Interpolate the TimeseriesDataset to a fixed interval, returning a new TimeseriesDataset.
Additional arguments for the interpolation function can be passed and are forwarded to the pandas interpolate function. The resulting TimeseriesDataset will always be equidistant with timestamps between self.nominal_data_start and self.nominal_data_end. However, values at these timestamps may be NA depending on the interpolation parameters. By default values will be forward-filled, with no limit to the number of interpolated points between two given values, and no extrapolation before the first known point. The following keyword arguments can be used to achieve some common behaviour:
method=’slinear’ will use linear interpolation between any two known points
method=’index’ will use a pandas interpolation method instead of the scipy-based method, which
automatically forward-fills the last known value to the end of the time-series - fill_value=’extrapolate’ will extrapolate beyond the last known value (but not backwards before the first known value, only applicable to scipy-based interpolation methods.) - limit=`N` will limit the number of interpolated points between known points to N.
Further details on this behaviour can be found in https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html
- Parameters
interval (Union[str, Timedelta]) –
- Return type
- normalize(fitted_scaler=None, scaler=StandardScaler())[source]
Normalize a data frame using scaler in normalization_factors.
- Parameters
fitted_scaler – Optional fitted scaler, to be used to normalize self._df
scaler – Type of scaler to use for normalization. Default settings implies x -> (x-m)/s, m= mean and s=std. Properties are computed along the columns.
- Returns
new_wrapper – TimeseriesDataset with self._df updated to be the normalized dataframe.
fitted_scaler – Fitted scaler to be used to normalize the data.
- Return type
tuple[TimeseriesDataset, Any]
Example
Get normalized values for indicators in the indicator data set ‘My_indicator_data’:
My_indicator_data.normalize()[0]
- plot(start=None, end=None, indicator_set=None, equipment_set=None)[source]
Plot the timeseries data stored within this wrapper.
The plot will create different panels for each indicator_group_name and template in the data, as well as each indicator. Data from different equipment_set will be represented by different colors. The plotnine object returned by this method will be rendered in jupyter notebooks, but can also be further modified by the caller.
- Parameters
start – Optional start time the timeseries data is plotted.
end – Optional end time the timeseries data is plotted.
indicator_set – Optional Indicators which are plotted.
equipment_set – optional equipment which indicator data is plotted.
- Returns
Line charts of timeseries data.
- Return type
plot
Example
Plot all Indicators for a period from 2020-07-02 to 2020-09-01 in the data set ‘my_indicator_data’:
my_indicator_data.plot('2020-07-02','2020-09-01')
- property equipment_set
Return all equipment present in the TimeseriesDataset.
- property indicator_set
Return all Indicators present in the TimeseriesDataset.