# Methodology

## DFC

Document Frequency Factor – multiplier that is applied to Number of Documents variable in TF-cIDF term weight model. If DFC = 0, then it is TF model, if DFC=1, then it is typical TF-IDF.

## Gold Standard

Ground truth, truth table or gold standard refers to a set of predefined correct results used for evaluation purposes. Usually in evaluation scenarios involving: Precision, Recall and F1 score.

## Missing at Random

Missing at random means that the propensity for a data point to be missing is not related to the missing data, but it is related to some of the observed data Source: https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4

## Missing Completely at Random

The fact that a certain value is missing has nothing to do with its hypothetical value and with the values of other variables. Source: https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4

## Missing not at Random

Two possible reasons are that the missing value depends on the hypothetical value (e.g. People with high salaries generally do not want to reveal their incomes in surveys) or missing value is dependent on some other variable’s value (e.g. Let’s assume that females generally don’t want to reveal their ages! Here the missing value in…

## Outlier

In statistics, an outlier is an observation point that is distant from other observations.[1][2] An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set.[3] An outlier can cause serious problems in statistical analyses. From: https://en.wikipedia.org/wiki/Outlier