This page contains the code, data and resulting plots of the experiments in the EPIA 2019 Paper: Visual Interpretation of Regression Error.
First, it is provided an explanation on how the full experiments can be reproduced, followed with an example of the EDP tool.
Finally we provide a zip file with plots corresponding to all 18 data sets, not included in the paper due to space limitation.
R is required for reproducing the experiments. The following R extra packages, available on CRAN, are also needed:
Tested on R 3.5.2
The full code and the data sets can be downloaded here. The resultant plots can be downloaded here
## Loading functions required (as well as required packages)
source("performance_functions.R")
## Loading Dataset
load("dataSetsWithPreds.Rdata")
names(DSsPreds) <- c('a1','a2','a3','a4','a6','a7','Abalone','acceleration','availPwr','bank8FM','cpuSm','fuelCons','boston','maxTorque','machineCpu','servo','airfoild','concreteStrength')
nmod <- 4
Rscript DataSet_Performance.R
Choose what type of error the end user wants to analyse: “Absolute”, Logarithmic (“Log”) or “Residual”
err <- "Absolute"
Choose a particular data set from the full benchmark
y_observed <- "servo"
dataset <- DSsPreds[[y_observed]][complete.cases(DSsPreds[[y_observed]]),]
#Show part of dataset
head(dataset, n=10)
## class motor screw pgain vgain svm randomForest nnet
## 2 0.5062525 B D 6 5 0.2900076 0.6149509 0.4978377
## 3 0.3562515 D D 4 3 0.3578499 0.7451938 0.3744596
## 4 5.5000330 B A 3 2 2.3349060 3.5569774 5.3794819
## 5 0.3562515 D B 6 5 0.1132223 0.5061218 0.5189971
## 6 0.8062546 E C 4 3 0.6389633 0.7591382 0.7645144
## 7 5.1000140 C A 3 2 2.4161065 3.3112188 4.7620321
## 8 5.7000422 A A 3 2 2.5394272 3.6095884 5.6397336
## 9 0.7687544 C A 6 5 0.8673949 0.7002441 0.6786782
## 10 1.0312537 D A 4 1 0.2768152 0.9303311 0.5163344
## 11 0.4687523 B E 6 5 0.3456481 0.6151508 0.5313446
## gbm
## 2 0.5082631
## 3 1.5328608
## 4 6.5008834
## 5 0.3285793
## 6 0.8242099
## 7 5.6361385
## 8 6.1413950
## 9 0.6652546
## 10 0.4583367
## 11 0.4178387
Choose which model is to be analysed and prepare the data
mod <- "svm"
single_error_ds <- single_model_data(data=dataset[1:(length(dataset) - nmod)], model=dataset[[mod]], feature_y=names(dataset)[1], type=err)
head(single_error_ds, n=10)
## motor screw pgain vgain pred error
## 2 B D 6 5 0.2900076 0.216244940
## 3 D D 4 3 0.3578499 0.001598371
## 4 B A 3 2 2.3349060 3.165127041
## 5 D B 6 5 0.1132223 0.243029150
## 6 E C 4 3 0.6389633 0.167291290
## 7 C A 3 2 2.4161065 2.683907517
## 8 A A 3 2 2.5394272 3.160615011
## 9 C A 6 5 0.8673949 0.098640528
## 10 D A 4 1 0.2768152 0.754438469
## 11 B E 6 5 0.3456481 0.123104197
Choose a predictor and calculate the EDP. Here is the example with a numeric (“vgain”) and a categorical (“screw”), as well as the bivariate EDP for both features:
## Using freq as weighting variable