Random Forest from Scratch

0:00:00

Build a random forest model from scratch with the following conditions:

The model takes as input a dataframe data and an array new_point with length equal to the number of fields in the data
All values of both data and new_point are 0 or 1, i.e., all fields are dummy variables and there are only two classes
Rather than randomly deciding what subspace of the data each tree in the forest will use like usual, make your forest out of decision trees that go through every permutation of the value columns of the data frame and split the data according to the value seen in new_point for that column
Return the majority vote on the class of new_point
You may use pandas and NumPy but NOT scikit-learn

Bonus: The permutations in the itertools package can help you easily get all of any iterable object.

Example:

Input:

new_point = [0,1,0,1]
print(data)
...
    Var1  Var2  Var3  Var4  Target
0    1.0   1.0   1.0   0.0       1
1    0.0   0.0   0.0   0.0       0
2    1.0   0.0   1.0   0.0       0
3    0.0   1.0   1.0   1.0       1
4    1.0   0.0   1.0   0.0       0
..   ...   ...   ...   ...     ...
95   0.0   1.0   0.0   1.0       0
96   1.0   1.0   0.0   0.0       0
97   0.0   0.0   1.0   1.0       0
98   1.0   0.0   0.0   0.0       0
99   0.0   1.0   0.0   0.0       0

[100 rows x 5 columns]

Output:

def random_forest(new_point, data) -> 0

.
.
.
.
.

Comments

Loading comments