EDPs: a Visual Interpretation of Regression Error

Inês Areosa, Luís Torgo

This page contains the code, data and resulting plots of the experiments in the EPIA 2019 Paper: Visual Interpretation of Regression Error.

First, it is provided an explanation on how the full experiments can be reproduced, followed with an example of the EDP tool.

Finally we provide a zip file with plots corresponding to all 18 data sets, not included in the paper due to space limitation.

Requirements

R is required for reproducing the experiments. The following R extra packages, available on CRAN, are also needed:

dplyr
GGally
plyr
gridExtra
reshape2

Usage

Tested on R 3.5.2

The full code and the data sets can be downloaded here. The resultant plots can be downloaded here

Load Functions and Dataset

## Loading functions required (as well as required packages)
source("performance_functions.R")

## Loading Dataset
load("dataSetsWithPreds.Rdata")
names(DSsPreds) <- c('a1','a2','a3','a4','a6','a7','Abalone','acceleration','availPwr','bank8FM','cpuSm','fuelCons','boston','maxTorque','machineCpu','servo','airfoild','concreteStrength')
nmod <- 4

Run the Experiments for the Full Benchmark Dataset

Rscript DataSet_Performance.R

An Example on How to Employ EDP Functions

Choose what type of error the end user wants to analyse: “Absolute”, Logarithmic (“Log”) or “Residual”

err <- "Absolute"

Choose a particular data set from the full benchmark

y_observed <- "servo"
dataset <- DSsPreds[[y_observed]][complete.cases(DSsPreds[[y_observed]]),]
#Show part of dataset
head(dataset, n=10)

##        class motor screw pgain vgain       svm randomForest      nnet
## 2  0.5062525     B     D     6     5 0.2900076    0.6149509 0.4978377
## 3  0.3562515     D     D     4     3 0.3578499    0.7451938 0.3744596
## 4  5.5000330     B     A     3     2 2.3349060    3.5569774 5.3794819
## 5  0.3562515     D     B     6     5 0.1132223    0.5061218 0.5189971
## 6  0.8062546     E     C     4     3 0.6389633    0.7591382 0.7645144
## 7  5.1000140     C     A     3     2 2.4161065    3.3112188 4.7620321
## 8  5.7000422     A     A     3     2 2.5394272    3.6095884 5.6397336
## 9  0.7687544     C     A     6     5 0.8673949    0.7002441 0.6786782
## 10 1.0312537     D     A     4     1 0.2768152    0.9303311 0.5163344
## 11 0.4687523     B     E     6     5 0.3456481    0.6151508 0.5313446
##          gbm
## 2  0.5082631
## 3  1.5328608
## 4  6.5008834
## 5  0.3285793
## 6  0.8242099
## 7  5.6361385
## 8  6.1413950
## 9  0.6652546
## 10 0.4583367
## 11 0.4178387

Choose which model is to be analysed and prepare the data

mod <- "svm"
single_error_ds <- single_model_data(data=dataset[1:(length(dataset) - nmod)], model=dataset[[mod]], feature_y=names(dataset)[1], type=err)
head(single_error_ds, n=10)

##    motor screw pgain vgain      pred       error
## 2      B     D     6     5 0.2900076 0.216244940
## 3      D     D     4     3 0.3578499 0.001598371
## 4      B     A     3     2 2.3349060 3.165127041
## 5      D     B     6     5 0.1132223 0.243029150
## 6      E     C     4     3 0.6389633 0.167291290
## 7      C     A     3     2 2.4161065 2.683907517
## 8      A     A     3     2 2.5394272 3.160615011
## 9      C     A     6     5 0.8673949 0.098640528
## 10     D     A     4     1 0.2768152 0.754438469
## 11     B     E     6     5 0.3456481 0.123104197

Choose a predictor and calculate the EDP. Here is the example with a numeric (“vgain”) and a categorical (“screw”), as well as the bivariate EDP for both features:

## Using freq as weighting variable