# Different parameters involved in Bloom Classifier

1. Classifier
1. Logistic regression
1. Equal weight to both classes
2. Decision Tree
1. max. depth of the tree
2. Equal weight to both classes
2. No. of elements used for training the classifiers
3. Size of the bit array
4. Method of Indexing from the probability given by the classifier
5. Dimension of the underlying space

# Output parameters

1. Size of the bit array + Size of the classifier
2. False positive rate/No. of false positive elements

# Experimentation

We have done the following experimentation by varying different parameters:

1. Varying the no. of elements used to train the classifiers, with same Size of the bit array (No. of elements used to train = No. of elements inserted)
2. Different dimensional space : 1D and 2D
3. Varying the size of the bit array with constant no. of elements used to train the classifiers
4. Different Indexing strategies: I1 = floor(p*S) , I2 = floor((p*C)mod S) , p = prob. of element given by classifier, S = Size of bitarray, C = large constant no.

# Some results obtained

## Avg. Probability distribution for Logistic Regression : 1D

No. of elements used to train =400, No. of iterations = 150

## Avg. Probability distribution for Decision Tree : 1D

No. of elements,m =400, No. of iterations = 150, max depth of decision tree = 10

## Observations

1. Probabilities of elements either present in the set or not are very close in logistic regression(i.e, around 0.5) because of which only 2 and 3 indices are set in bitarray,but for false positive elements also it gives same indices. This results in high false positive rate. So, though the size of the classifier + size of the bit array can be very small but false positive rate is very high.
2. In decision tree, as for all elements from 0-1000 and 2000-3000 since probability is zero , there is no false positive in 0-1000 and 2000-3000

# Results with new indexing scheme for logistic regression

New Index = floor( (p*C)mod S ) Old Index = floor( P*S ) , p = prob. of element, C= large constant no. (here = 100000), S = size of bitarray