A framework for defining ratings for open source projects. In particular, the framework offers a security rating for open source projects that may be used to assess the security risk that comes with open source components.
This page provides definitions for a feature, a score and a rating. Then, it describes how a rating for an open source project can be defined.
Various data may be used to build a rating for an open source project. The data may be very diverse and may have different types.
Let’s define a feature as a measurable characteristic of an open source project. A feature has a type and may have constraints.
Here are several examples of features, their types and constraints:
A number of features may describe a particular property of an open source project.
Let’s define a scoring function as a procedure that takes a number of features or scores
and produces a float number in the interval [0, 10].
The number is a score that describes a particular property of an open source project.
The higher a score value is, the better a property is implemented in an open source project.
A score can be also called a synthetic feature.
Here are several examples of scores:
A number of properties of an open source project may be combined in order to describe a more general property of the project.
Let’s define a rating procedure as a combination of a scoring function, a set of labels and a label function that maps a score to one of the defined labels. First, a rating procedure takes a set of feature values and passes them to the scoring function. Next, the scoring function produces a score value. Then, the label function converts the score to a label.
In other words, a rating procedure interprets a score by mapping it to a label.
For example, a security rating procedure for an open source project is based on a score function
that assesses security level in the project.
The rating procedure may then return GOOD label if a score is greater than 7.0,
and BAD label otherwise.
Dependencies between features, scoring functions and rating procedures may be described as a graph.

The graph looks like a tree. In this graph, a rating procedure is the root of the tree, scoring functions are nodes, and features are leafs. Strictly speaking, the graph is not a tree because a feature can contribute to multiple scores, so that the graph may have loops.
It may happen that a value for a feature couldn’t be gathered for some reasons. In this case, a feature value is unknown. A score should expect unknown values and still produce a score.
It may happen that a score makes sense for one project, but doesn’t make much sense for another one. In this case, the scoring function may return a special value Not Applicable which means that the scoring function can’t be applied to the project.
The following steps describe how a rating procedure may be built:
F = { f[1], f[2], ... , f[N] }.S = { s[1], s[2], ... , s[M] }.s[i] where i = 1..M:
Assign a set of features F_s[i] that are used by a scoring function s[i].
Each set F_s[i] is a subset of F. The sets F_s[i] may overlap.
Define a scoring function s[i] that takes features from F_s[i]
and return a score in the interval [0, 10].
In other words, s[i]: F_s[i] -> [0, 10].
For each scoring function s[i], assign a weight w[i] in the interval (0, 1].
Define an overall scoring function s* that is based on the scoring functions s[i].
The overall scoring function s* takes a vector of features f[i]
and calculates a weighted average of scores produced by scoring functions s[i]:
def overall_score(v) {
score = 0
sum_of_weights = sum of w[i] for i=1..M
for i in 1..M
F_s_v = select F_s[i] values from v
score = score + w[i] * score[i](F_s_v)
return score / sum_of_weights
}
The weight w[i] defines how much the scoring function s[i] contributes to the overall score.
In other words, the weight w[i] defines the importance of the scoring function s[i].
The function overall_score(v) always returns a number in the interval [0, 10].
L = { l[1], ... , l[K] }.Define a function label(s) that maps a score value s to one of the labels from L.
In other words, label[s]: s -> l where r belongs to the interval [0, 10] and l belongs to L.
r as a combination of the scoring function s*,
the set of labels L, and the label function label(s).Next: Example