Other Information

R Version

The R code in the book was executed in the following version of R:

R.version

##                _                           
## platform       x86_64-apple-darwin15.6.0   
## arch           x86_64                      
## os             darwin15.6.0                
## system         x86_64, darwin15.6.0        
## status                                     
## major          3                           
## minor          6.0                         
## year           2019                        
## month          04                          
## day            26                          
## svn rev        76424                       
## language       R                           
## version.string R version 3.6.0 (2019-04-26)
## nickname       Planting of a Tree

Book Package

The book has an associated R package - DMwR2. This package includes several functions and datasets used in the book. You should install it to take full advantage of the code and examples shown in the book. The package can be installed as any standard R package by doing:

install.packages("DMwR2")

Alternatively, you may wish to install the development version that may include some eventual bug corrections that may have not been pushed yet to the R central repository (CRAN). Still, unless you have a good reason for it, we recommend that you stick to the CRAN version that is installed as shown above. For further information on the development version (including how to install it), check the web page of the package

Packages

The book uses many packages. Most of them are developed by others and it is only natural that the version numbers change as time goes by, either to introduce new features or to correct eventual bugs. If some of these future changes somehow “break” the code shown in the book we will try to maintain in this web page the eventual changes that are required to what was printed in the book. Still, for your own information these are the packages and versions that were used in the book:

pcks <- c("DMwR2", "ggplot2", "tibble", "readr", "DBI","RMySQL", "readxl", "tidyr", "lubridate", "dplyr", "stringr", "Hmisc", "xts", "sp", "ggmap", "tm", "CORElearn", "GGally", "arules", "arulesViz", "cluster", "fpc", "forcats", "UBL", "e1071", "rpart.plot", "NeuralNetTools", "h2o", "adabag", "ipred", "randomForest", "gbm", "performanceEstimation", "rmarkdown", "shiny", "car", "corrplot", "rpart", "quantmod", "TTR", "nnet", "kernlab", "earth", "PerformanceAnalytics", "ROCR", "RWeka", "Biobase", "ALL", "genefilter", "class")
pcks <- sort(pcks)
knitr::kable(installed.packages()[which(rownames(installed.packages()) %in% pcks),c(1,3)],row.names = FALSE,format = "html")

Package	Version
adabag	4.2
arules	1.6-3
arulesViz	1.3-3
car	3.0-2
class	7.3-15
cluster	2.0.8
CORElearn	1.53.1
corrplot	0.84
DBI	1.0.0
DMwR2	0.0.2
dplyr	0.8.3
e1071	1.7-1
earth	5.1.1
forcats	0.4.0
fpc	2.2-1
gbm	2.1.5
GGally	1.4.0
ggmap	3.0.0
ggplot2	3.1.1
Hmisc	4.2-0
ipred	0.9-9
kernlab	0.9-27
lubridate	1.7.4
NeuralNetTools	1.5.2
nnet	7.3-12
PerformanceAnalytics	1.5.2
performanceEstimation	1.1.0
quantmod	0.4-14
randomForest	4.6-14
readr	1.3.1
readxl	1.3.1
rmarkdown	1.13
ROCR	1.0-7
rpart	4.1-15
rpart.plot	3.0.7
shiny	1.3.2
sp	1.3-1
stringr	1.4.0
tibble	2.1.3
tidyr	0.8.3
tm	0.7-6
TTR	0.23-4
UBL	0.0.6
xts	0.11-2

Datasets

The datasets containing the data of the case studies are included in the book package. Install the package, load it, and then use the data() function to load them, as shown in the book.

There are also a few parts of the book that involve running some code that takes a considerable amount of time to be executed (depending on your hardware). In these situations, I’ve typically mention in the book the existance of some Rdata files containing the objects with the results of these code snipets that take too long. Below you will find these files that you can download to avoid having to run these parts of the code.

Chapter 3 (Introduction to Data Mining)
- Data set about forest fires loaded on page 71
  - forestFires.txt
Chapter 4 (Algae Blooms case study)
- Text files with the data mentioned on page 195 (please note that it is much easier to load the data directly from the book R package as mentioned in the book)
Chapter 5 (Stock Market case study)
- Files containing the results of the experiments described in pages 275-276, that are loaded on page 278
Chapter 7 (Micro Arrays case study)
- File containing the pre-processed data set loaded on page 376
  - myALL.Rdata
- Files containing the results of the experiments described on page 377 that are loaded on the same page