sailor.sap_iot.wrappers

Timeseries module can be used to retrieve timeseries data from the SAP iot abstract timeseries api.

Here we define some convenience wrappers for timeseries data.

class sailor.sap_iot.wrappers.TimeseriesDataset(df, indicator_set, equipment_set, nominal_data_start, nominal_data_end, is_normalized=False)[source]

Bases: object

A Wrapper class to make accessing timeseries data from SAP iot more convenient.

Parameters
  • df (pd.DataFrame) –

  • indicator_set (IndicatorSet) –

  • equipment_set (EquipmentSet) –

  • nominal_data_start (pd.Timestamp) –

  • nominal_data_end (pd.Timestamp) –

  • is_normalized (bool) –

aggregate(aggregation_interval, aggregation_functions='mean')[source]

Aggregate the TimeseriesDataset to a fixed interval, returning a new TimeseriesDataset.

This operation will change the unique feature IDs, as the new IDs need to encode the additional information on the aggregation function. Accordingly there will also be an additional column index level for the aggregation function on the DataFrame returned by sailor.timeseries.wrappers.TimeseriesDataset.as_df() when using speaking_names=True. Note that the resulting timeseries is not equidistant if gaps larger than the aggregation interval are present in the original timeseries.

Parameters
  • aggregation_interval (Union[str, pd.Timedelta]) – String specifying the aggregation interval, e.g. ‘1h’ or ‘30min’. Follows the same rules as the freq parameter in a pandas.Grouper object.

  • aggregation_functions (Union[Iterable[Union[str, Callable]], str, Callable]) – Aggregation function or iterable of aggregation functions to use. Each aggregation_function can be a string (e.g. ‘mean’, ‘min’ etc) or a function (e.g. np.max etc).

Return type

TimeseriesDataset

as_df(speaking_names=False, include_model=False)[source]

Return the data stored within this TimeseriesDataset object as a pandas dataframe.

By default the data is returned with opaque column headers. If speaking_names is set to true, the data is converted such that equipment_id and model_id are replaced by human-readable names, and the opaque column headers are replaced by a hierarchical index of template_id, indicator_group_name, indicator_name and aggregation_function.

filter(start=None, end=None, equipment_set=None, indicator_set=None)[source]

Return a new TimeseriesDataset extracted from an original data with filter parameters.

Only indicator data specified in filters are returned.

Parameters
  • start (Union[str, pd.Timestamp, datetime]) – Optional start time of timeseries data are returned.

  • end (Union[str, pd.Timestamp, datetime]) – Optional end time until timeseries data are returned.

  • equipment_set (EquipmentSet) – Optional EquipmentSet to filter timeseries data. Takes precedence over equipment_ids.

  • indicator_set (Union[IndicatorSet, AggregatedIndicatorSet]) – Optional IndicatorSet to filter dataset columns.

Return type

TimeseriesDataset

Example

Filter out indicator data for an equipment ‘MyEquipmentId’ from the indicator data ‘My_indicator_data’:

My_indicator_data.filter(MyEquipmentId)
get_feature_columns(speaking_names=False)[source]

Get the names of all feature columns.

Parameters

speaking_names – False, returns feature columns of a data set True, returns corresponding names of feature columns

Example

Get Template id, Indicator group name and Indicator name of columns including indicator values in the data set ‘my_indicator_data’:

my_indicator_data.get_feature_columns(speaking_names=True)
get_index_columns(speaking_names=False, include_model=False)[source]

Return the names of all index columns (key columns and time column).

Return type

list

get_key_columns(speaking_names=False, include_model=False)[source]

Return those columns of the data that identify the asset.

Currently we only support asset type ‘Equipment’ so this will always return columns based on the equipment. In the future other types (like System) will be supported here.

Parameters

speaking_names – False, return key columns True, return corresponding names of key columns

Example

Get key columns of the indicator data set ‘my_indicator_data’:

my_indicator_data.get_key_columns()
static get_time_column()[source]

Return the name of the column containing the time information.

interpolate(interval, method='pad', **kwargs)[source]

Interpolate the TimeseriesDataset to a fixed interval, returning a new TimeseriesDataset.

Additional arguments for the interpolation function can be passed and are forwarded to the pandas interpolate function. The resulting TimeseriesDataset will always be equidistant with timestamps between self.nominal_data_start and self.nominal_data_end. However, values at these timestamps may be NA depending on the interpolation parameters. By default values will be forward-filled, with no limit to the number of interpolated points between two given values, and no extrapolation before the first known point. The following keyword arguments can be used to achieve some common behaviour:

  • method=’slinear’ will use linear interpolation between any two known points

  • method=’index’ will use a pandas interpolation method instead of the scipy-based method, which

automatically forward-fills the last known value to the end of the time-series - fill_value=’extrapolate’ will extrapolate beyond the last known value (but not backwards before the first known value, only applicable to scipy-based interpolation methods.) - limit=`N` will limit the number of interpolated points between known points to N.

Further details on this behaviour can be found in https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html

Parameters

interval (Union[str, Timedelta]) –

Return type

TimeseriesDataset

normalize(fitted_scaler=None, scaler=StandardScaler())[source]

Normalize a data frame using scaler in normalization_factors.

Parameters
  • fitted_scaler – Optional fitted scaler, to be used to normalize self._df

  • scaler – Type of scaler to use for normalization. Default settings implies x -> (x-m)/s, m= mean and s=std. Properties are computed along the columns.

Returns

  • new_wrapper – TimeseriesDataset with self._df updated to be the normalized dataframe.

  • fitted_scaler – Fitted scaler to be used to normalize the data.

Return type

tuple[TimeseriesDataset, Any]

Example

Get normalized values for indicators in the indicator data set ‘My_indicator_data’:

My_indicator_data.normalize()[0]
plot(start=None, end=None, indicator_set=None, equipment_set=None)[source]

Plot the timeseries data stored within this wrapper.

The plot will create different panels for each indicator_group_name and template in the data, as well as each indicator. Data from different equipment_set will be represented by different colors. The plotnine object returned by this method will be rendered in jupyter notebooks, but can also be further modified by the caller.

Parameters
  • start – Optional start time the timeseries data is plotted.

  • end – Optional end time the timeseries data is plotted.

  • indicator_set – Optional Indicators which are plotted.

  • equipment_set – optional equipment which indicator data is plotted.

Returns

Line charts of timeseries data.

Return type

plot

Example

Plot all Indicators for a period from 2020-07-02 to 2020-09-01 in the data set ‘my_indicator_data’:

my_indicator_data.plot('2020-07-02','2020-09-01')
property equipment_set

Return all equipment present in the TimeseriesDataset.

property indicator_set

Return all Indicators present in the TimeseriesDataset.