Aggregation Methods

Trump currently has two types of aggregation methods:

  1. Apply-Row
  2. Choose-Column

As the names infer, the apply-row methods have one thing in common, they build the final data values by looking at each row of the datatable, one at a time. The choose-column methods, compare the data available in each column, then return an entire series. Row-apply methods all take a pandas Series, and return a value. Column-choose methods all take a pandas Dataframe, and return a series.

Row-apply functions are invoked using the pseudo code below:

df['final'] = df.apply(row_apply_method, axis=1)

Column-choose functions are invoked using the pseudo code below:

df['final'] = column_choose_method(df)

Both methods have access to the data in the override, and failsafe, columns so it’s technically possible to create a method which overloads the behaviour of these columns. It is the responsibility of each method to implement the override, and failsafe, logic.

Apply-Row Methods

Each of these methods, can be thought of as a for-loop that looks at each row of the datatable, then decides on the correct value for the final column, on a row by row basis.

The datatable, as a Dataframe, gets these methods applied. The columns are sorted prior to being passed. So, the value at index 0, is always the override datapoint, if it exists, and the value at index -1, is always the failsafe datapoint, if it exists. Everything else, that is, the feeds, are in columns 1 through n, where n is the number of feeds.

static ApplyRow.priority_fill(adf)

Looks at each row, and chooses the value from the highest priority (lowest #) feed, one row at a time.

static ApplyRow.mean_fill(adf)

Looks at each row, and calculates the mean. Honours the Trump override/failsafe logic.

static ApplyRow.median_fill(adf)

Looks at each row, and chooses the median. Honours the Trump override/failsafe logic.

static ApplyRow.custom(adf)

A custom Apply-Row Aggregator can be defined, as any function which accepts a Series, and returns any number-like object, which will get assigned to the Dataframe’s ‘final’ column in using the pandas .apply, function.

Note

The aggregation methods are organized in the code using private mixin classes. The FeedAggregator object handles the implementation of every static method, based solely on it’s name. This means that any new methods added, must be unique to either mixin.

Choose-Column Methods

Each of these methods, can be thought of as a for-loop that looks at each column of the datatable, then chooses the appropriate feed to use, as final. They all still apply overrides and failsafes on a row-by-row basis.

The datatable, as a Dataframe, is passed to these methods in a single call.

static ChooseCol.most_populated(adf)

Looks at each column, using the one with the most values Honours the Trump override/failsafe logic.

static ChooseCol.most_recent(adf)

Looks at each column, and chooses the feed with the most recent data point. Honours the Trump override/failsafe logic.

static ChooseCol.custom(adf)

A custom Choose-Column Aggregator can be defined, as any function which accepts a dataframe, and returns any Series-like object, which will get assigned to the Dataframe’s ‘final’ column.

Note

See the note in the previous section about custom method naming.