Suggestions for RTree documentation (Patch #1656)
Description
RTree is lacking documentation for some of its parameters. I would like to suggest the following small additions to help beginners use RTree more effectively
int max_depth - the depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.
min_sample_count - minimum samples required at a leaf node for it to be split. A reasonable value is a small percentage of the total data eg. 1%.
max_categories - is not used (?) (according to my searching of the code)
max_num_of_trees_in_the_forest - The maximum number of trees in the forest (suprise, suprise). Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
Maybe in the future the documentation should contain a section on using some of the machine learning algorithms. Doesn't have to be in depth, just short useful tips and caveats about each algorithm. I think it would be very useful for beginners.
Associated revisions
improved description of RTreeParams (ticket #1656; thanks to Nghia Ho)
History
Updated by Alexander Shishkov almost 13 years ago
- Tracker changed from Feature to Patch
- Target version deleted ()
Updated by Alexander Shishkov almost 13 years ago
- Priority changed from Normal to Low
- Category changed from ml to documentation
Updated by Alexander Shishkov almost 13 years ago
- Assignee deleted (
Maria Dimashova)
Updated by Vadim Pisarevsky almost 13 years ago
thanks! the description has been added in SVN trunk, r7661
- Status changed from Open to Done
- Assignee set to Vadim Pisarevsky
Updated by Alexander Shishkov almost 13 years ago
- Target version set to 2.4.0