# How could I regulate random forest classifiers

## 1.Description of the abalone data set (Abalone)

http://archive.ics.uci.edu/ml/datasets/Abalone

It contains a total of 4177 data. Each data element contains 8 characteristic values, a classification (abalone age, number of circles), which can be viewed as a classification problem or a regression problem

Partial display of data

Sex / nominal / - / M, F, and I (infant)
Length / continuous / mm / longest shell measurement
Diameter / continuous / mm / perpendicular to length
Height / continuous / mm / with meat in shell
Whole weight / continuous / grams / whole abalone
Shucked weight / continuous / grams / weight of meat
Viscera weight / continuous / grams / gut weight (after bleeding)
Shell weight / continuous / grams / after being dried
Rings / integer / - / +1.5 gives the age in years

 M. 0.455 0.365 0.095 0.514 0.2245 0.101 0.15 15 M. 0.35 0.265 0.09 0.2255 0.0995 0.0485 0.07 7 F. 0.53 0.42 0.135 0.677 0.2565 0.1415 0.21 9 M. 0.44 0.365 0.125 0.516 0.2155 0.114 0.155 10 I. 0.33 0.255 0.08 0.205 0.0895 0.0395 0.055 7

To make writing the program easier, I've replaced M, F, and I in the data file with 1, 2, and 3, respectively. (Note: Why would you want to do this? This is for programming only. Some machine learning libraries may encounter problems if read directly due to the type of data.)

We can observe the effects of different methods by controlling the number of records in the data set

The effect of reading 10 dates:

The effect of reading 100 dates:

The effect of reading 1000 data:

The effect of reading 3000 dates:

The effect of reading 4177 data:

Only from the ongoing process of this experiment can it be determined that the classification effect of the random overall structure is best, but the computation process is relatively time-consuming. As the amount of training data increases, the effects of the three tend to get worse and worse.