Title: | Transductive and Inductive Conformal Predictions for Classification Problems |
---|---|
Description: | Implementation of transductive conformal prediction (see Vovk, 2013, <doi:10.1007/978-3-642-41142-7_36>) and inductive conformal prediction (see Balasubramanian et al., 2014, ISBN:9780124017153) for classification problems. |
Authors: | Niharika Gauraha and Ola Spjuth |
Maintainer: | Niharika Gauraha <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-11-03 03:19:03 UTC |
Source: | https://github.com/cran/conformalClassification |
The conformalClassification package implements Transductive Conformal Prediction (TCP) and Inductive Conformal Prediction (ICP) for classification problems.
Currently, the pakcage is built upon random forests method, where voting of random forests for each class is considered as a conformity scores for each data point. Mainly the package generates conformal prediction errors (p-values) for classification problems, it also provides various diagnostic measures such as deviation from alidity, error rate, efficiency, observed fuzziness and calibration plots. In future releases, we plan to extend package to use other machine learning algorithms, (i.e. support vector machine) for model fitting.
Plots the calibration plot
CPCalibrationPlot(pValues, testSet, color = "blue")
CPCalibrationPlot(pValues, testSet, color = "blue")
testSet |
The test set |
color |
colour of the calibration line |
pValues |
Matrix of p-values |
CPEfficiency
,
CPErrorRate
,
CPValidity
,
CPObsFuzziness
.
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) CPCalibrationPlot(pValues, testSet, "blue")
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) CPCalibrationPlot(pValues, testSet, "blue")
Computes efficiency of a conformal predictor, which is defined as the ratio of predictions with more than one class over the size of the testset
CPEfficiency(matPValues, testLabels, sigfLevel = 0.05)
CPEfficiency(matPValues, testLabels, sigfLevel = 0.05)
matPValues |
Matrix of p-values |
testLabels |
True labels for the test-set |
sigfLevel |
Significance level |
The efficiency
CPCalibrationPlot
,
CPErrorRate
,
CPValidity
,
CPObsFuzziness
.
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) testLabels = testSet[,1] CPEfficiency(pValues, testLabels)
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) testLabels = testSet[,1] CPEfficiency(pValues, testLabels)
Computes error rate of a conformal predictor, which is defined as the ratio of predictions with missing true class lables over the size of the testset
CPErrorRate(matPValues, testLabels, sigfLevel = 0.05)
CPErrorRate(matPValues, testLabels, sigfLevel = 0.05)
matPValues |
Matrix of p-values |
testLabels |
True labels for the test-set |
sigfLevel |
Significance level |
The error rate
CPCalibrationPlot
,
CPEfficiency
,
CPValidity
,
CPObsFuzziness
.
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) testLabels = testSet[,1] CPErrorRate(pValues, testLabels)
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) testLabels = testSet[,1] CPErrorRate(pValues, testLabels)
Computes observed fuzziness, which is defined as the sum of all p-values for the incorrect class labels.
CPObsFuzziness(matPValues, testLabels)
CPObsFuzziness(matPValues, testLabels)
matPValues |
Matrix of p-values |
testLabels |
True labels for the test-set |
The observed fuzziness
CPCalibrationPlot
,
CPEfficiency
,
CPErrorRate
,
CPValidity
.
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) testLabels = testSet[,1] CPObsFuzziness(pValues, testLabels)
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) testLabels = testSet[,1] CPObsFuzziness(pValues, testLabels)
Computes the deviation from exact validity as the Euclidean norm of the difference of the observed error and the expected error
CPValidity(matPValues = NULL, testLabels = NULL)
CPValidity(matPValues = NULL, testLabels = NULL)
matPValues |
Matrix of p-values |
testLabels |
True labels for the test-set |
The deviation from exact validity
CPCalibrationPlot
,
CPEfficiency
,
CPErrorRate
,
CPObsFuzziness
.
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) testLabels = testSet[,1] CPValidity(pValues, testLabels)
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) testLabels = testSet[,1] CPValidity(pValues, testLabels)
Fits the model and returns the fitted model
fitModel(trainingSet=NULL, method = "rf", nrTrees = 100)
fitModel(trainingSet=NULL, method = "rf", nrTrees = 100)
trainingSet |
The training set |
method |
Method for modeling |
nrTrees |
Number of trees for RF |
The fitted model
Class-conditional Inductive conformal classifier for multi-class problems
ICPClassification(trainingSet, testSet, ratioTrain = 0.7, method = "rf", nrTrees = 100)
ICPClassification(trainingSet, testSet, ratioTrain = 0.7, method = "rf", nrTrees = 100)
trainingSet |
Training set |
testSet |
Test set |
ratioTrain |
The ratio for proper training set |
method |
Method for modeling |
nrTrees |
Number of trees for RF |
The p-values
TCPClassification
,
parTCPClassification
.
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) #perfVlaues = pValues2PerfMetrics(pValues, testSet) #print(perfVlaues) #CPCalibrationPlot(pValues, testSet, "blue")
## load the library library(mlbench) #library(caret) library(conformalClassification) ## load the DNA dataset data(DNA) originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... nrAttr = ncol(originalData) #no of attributes tempColumn = originalData[, 1] originalData[, 1] = originalData[, nrAttr] originalData[, nrAttr] = tempColumn originalData[, 1] = as.factor(originalData[, 1]) originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) size = nrow(originalData) result = sample(1:size, 0.8*size) trainingSet = originalData[result, ] testSet = originalData[-result, ] ##ICP classification pValues = ICPClassification(trainingSet, testSet) #perfVlaues = pValues2PerfMetrics(pValues, testSet) #print(perfVlaues) #CPCalibrationPlot(pValues, testSet, "blue")
Class-conditional transductive conformal classifier for multi-class problems, paralled computations
parTCPClassification(trainSet, testSet, method = "rf", nrTrees = 100, nrClusters = 12)
parTCPClassification(trainSet, testSet, method = "rf", nrTrees = 100, nrClusters = 12)
testSet |
Test set |
method |
Method for modeling |
nrTrees |
Number of trees for RF |
nrClusters |
Number of clusters |
trainSet |
Training set |
The p-values
TCPClassification
.
ICPClassification
.
## load the library #library(mlbench) #library(caret) #library(conformalClassification) ## load the DNA dataset #data(DNA) #originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... #nrAttr = ncol(originalData) #no of attributes #tempColumn = originalData[, 1] #originalData[, 1] = originalData[, nrAttr] #originalData[, nrAttr] = tempColumn #originalData[, 1] = as.factor(originalData[, 1]) #originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) #trainingSet = originalData[result, ] #testSet = originalData[-result, ] ##ICP classification #pValues = parTCPClassification(trainingSet, testSet) #perfVlaues = pValues2PerfMetrics(pValues, testSet) #print(perfVlaues) #CPCalibrationPlot(pValues, testSet, "blue") #not run
## load the library #library(mlbench) #library(caret) #library(conformalClassification) ## load the DNA dataset #data(DNA) #originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... #nrAttr = ncol(originalData) #no of attributes #tempColumn = originalData[, 1] #originalData[, 1] = originalData[, nrAttr] #originalData[, nrAttr] = tempColumn #originalData[, 1] = as.factor(originalData[, 1]) #originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) #trainingSet = originalData[result, ] #testSet = originalData[-result, ] ##ICP classification #pValues = parTCPClassification(trainingSet, testSet) #perfVlaues = pValues2PerfMetrics(pValues, testSet) #print(perfVlaues) #CPCalibrationPlot(pValues, testSet, "blue") #not run
Class-conditional transductive conformal classifier for multi-class problems
TCPClassification(trainSet, testSet, method = "rf", nrTrees = 100)
TCPClassification(trainSet, testSet, method = "rf", nrTrees = 100)
testSet |
Test set |
method |
Method for modeling |
nrTrees |
Number of trees for RF |
trainSet |
Training set |
The p-values
parTCPClassification
.
ICPClassification
.
## load the library #library(mlbench) #library(caret) #library(conformalClassification) ## load the DNA dataset #data(DNA) #originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... #nrAttr = ncol(originalData) #no of attributes #tempColumn = originalData[, 1] #originalData[, 1] = originalData[, nrAttr] #originalData[, nrAttr] = tempColumn #originalData[, 1] = as.factor(originalData[, 1]) #originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) #trainingSet = originalData[result, ] #testSet = originalData[-result, ] ##reduce the size of the training set, because TCP is slow #result = createDataPartition(trainingSet[, 1], p=0.8, list=FALSE) #trainingSet = trainingSet[-result, ] ##TCP classification #pValues = TCPClassification(trainingSet, testSet) #perfVlaues = pValues2PerfMetrics(pValues, testSet) #print(perfVlaues) #CPCalibrationPlot(pValues, testSet, "blue") #not run
## load the library #library(mlbench) #library(caret) #library(conformalClassification) ## load the DNA dataset #data(DNA) #originalData <- DNA ## make sure first column is always the label and class labels are always 1, 2, ... #nrAttr = ncol(originalData) #no of attributes #tempColumn = originalData[, 1] #originalData[, 1] = originalData[, nrAttr] #originalData[, nrAttr] = tempColumn #originalData[, 1] = as.factor(originalData[, 1]) #originalData[, 1] = as.numeric(originalData[, 1]) ## partition the data into training and test set #result = createDataPartition(originalData[, 1], p = 0.8, list = FALSE) #trainingSet = originalData[result, ] #testSet = originalData[-result, ] ##reduce the size of the training set, because TCP is slow #result = createDataPartition(trainingSet[, 1], p=0.8, list=FALSE) #trainingSet = trainingSet[-result, ] ##TCP classification #pValues = TCPClassification(trainingSet, testSet) #perfVlaues = pValues2PerfMetrics(pValues, testSet) #print(perfVlaues) #CPCalibrationPlot(pValues, testSet, "blue") #not run
Fits the model and computes p-values
tcpPValues(augTrainSet, method = "rf", nrTrees = 100)
tcpPValues(augTrainSet, method = "rf", nrTrees = 100)
augTrainSet |
Augmented training set |
method |
Method for modeling |
nrTrees |
Number of trees for RF |
The p-values