A framework for defining ratings for open source projects. In particular, the framework offers a security rating for open source projects that may be used to assess the security risk that comes with open source components.
This page provides definitions for a feature, a score and a rating. Then, it describes how a rating for an open source project can be defined.
Various data may be used to build a rating for an open source project. The data may be very diverse and may have different types.
Let’s define a feature as a measurable characteristic of an open source project. A feature has a type and may have constraints.
Here are several examples of features, their types and constraints:
A number of features may describe a particular property of an open source project.
Let’s define a scoring function as a procedure that takes a number of features or scores
and produces a float number in the interval [0, 10]
.
The number is a score that describes a particular property of an open source project.
The higher a score value is, the better a property is implemented in an open source project.
A score can be also called a synthetic feature.
Here are several examples of scores:
A number of properties of an open source project may be combined in order to describe a more general property of the project.
Let’s define a rating procedure as a combination of a scoring function, a set of labels and a label function that maps a score to one of the defined labels. First, a rating procedure takes a set of feature values and passes them to the scoring function. Next, the scoring function produces a score value. Then, the label function converts the score to a label.
In other words, a rating procedure interprets a score by mapping it to a label.
For example, a security rating procedure for an open source project is based on a score function
that assesses security level in the project.
The rating procedure may then return GOOD
label if a score is greater than 7.0,
and BAD
label otherwise.
Dependencies between features, scoring functions and rating procedures may be described as a graph.
The graph looks like a tree. In this graph, a rating procedure is the root of the tree, scoring functions are nodes, and features are leafs. Strictly speaking, the graph is not a tree because a feature can contribute to multiple scores, so that the graph may have loops.
It may happen that a value for a feature couldn’t be gathered for some reasons. In this case, a feature value is unknown. A score should expect unknown values and still produce a score.
It may happen that a score makes sense for one project, but doesn’t make much sense for another one. In this case, the scoring function may return a special value Not Applicable which means that the scoring function can’t be applied to the project.
The following steps describe how a rating procedure may be built:
F = { f[1], f[2], ... , f[N] }
.S = { s[1], s[2], ... , s[M] }
.s[i]
where i = 1..M
:
Assign a set of features F_s[i]
that are used by a scoring function s[i]
.
Each set F_s[i]
is a subset of F
. The sets F_s[i]
may overlap.
Define a scoring function s[i]
that takes features from F_s[i]
and return a score in the interval [0, 10]
.
In other words, s[i]: F_s[i] -> [0, 10]
.
For each scoring function s[i]
, assign a weight w[i]
in the interval (0, 1]
.
Define an overall scoring function s*
that is based on the scoring functions s[i]
.
The overall scoring function s*
takes a vector of features f[i]
and calculates a weighted average of scores produced by scoring functions s[i]
:
def overall_score(v) {
score = 0
sum_of_weights = sum of w[i] for i=1..M
for i in 1..M
F_s_v = select F_s[i] values from v
score = score + w[i] * score[i](F_s_v)
return score / sum_of_weights
}
The weight w[i]
defines how much the scoring function s[i]
contributes to the overall score.
In other words, the weight w[i]
defines the importance of the scoring function s[i]
.
The function overall_score(v)
always returns a number in the interval [0, 10]
.
L = { l[1], ... , l[K] }
.Define a function label(s)
that maps a score value s
to one of the labels from L
.
In other words, label[s]: s -> l
where r
belongs to the interval [0, 10]
and l
belongs to L
.
r
as a combination of the scoring function s*
,
the set of labels L
, and the label function label(s)
.Next: Example