A Level Playing Field

Scores are normalized to ensure a fair, open, and transparent process.

Once a valid application has been submitted, a minimum of five reviewers will be assigned to score each submission. These judges will offer both scores and comments for each of four distinct traits. Each of the four traits will be scored on a 0-5 point scale, in increments of 0.1. Those scores will combine to produce a total score. Examples of possible scores for a trait are: 0.4, 3.7, 5.0, etc.

The most straightforward way to ensure that everyone is treated by the same set of standards would be to have the same judges score every application; unfortunately, due to the number of applications, that is not possible. Since the same judges will not score every application, the question of fairness needs to be carefully explained. One judge may be a “tough grader”, giving every assigned submission a range of scores only between 1.0 and 2.0; meanwhile, another judge may be more generous and score every submission between 4.0 and 5.0. 

For illustrative purposes, let’s look at the scores from two hypothetical judges:

Judge1scorestop  Judge2scorestop

The first judge is far more generous in scoring than the second judge, who gives much lower scores. If your application was rated by the first judge, it would earn a much higher total score than if it was assigned to the second judge.

We have a way to address this issue. We ensure that no matter which judges are assigned to you, each application will be treated fairly. To do this, we utilize a mathematical technique relying on two measures of distribution, the mean and the standard deviation.

The mean takes all the scores assigned by a judge, adds them up, and divides them by the number of scores assigned, giving an average score.

Formally, we denote the mean like this: 

Equation1

The standard deviation measures the “spread” of a judge’s scores. As an example, imagine that two judges both give the same mean (average) score, but one gives many zeros and fives, while the other gives more ones and fours. In a competition that seeks to fund the “cream of the crop”, it wouldn't be fair if we didn’t consider this difference.

 Formally, we denote the standard deviation like this: 

Equation2

To ensure that the judging process is fair, we rescale all the scores to match the judging population. In order to do this, we measure the mean and the standard deviation of all scores across all judges. Then, we change the mean score and the standard deviation of each judge to normalize them across judges.

We rescale the standard deviation like this: 

Equation3

Then, we rescale the mean like this:

Equation4

Basically, we are finding the difference between both distributions for a single judge and those for all of the judges combined, then adjusting each score so that no one is treated unfairly according to which judges they are assigned.

If we apply this rescaling process to the same two judges in the example above, we can see the outcome of the final resolved scores. They appear more similar, because they are now aligned with typical distributions across the total judging population.

    Judge1scaledscoresbot  Judge2scaledscoresbot

Join the Economic Opportunity Challenge and share your game-changing solution today.

Submissions have closed.