1. Pre-processing:

  • Tree-based models like xgboost are not sensitive to features scales (by SRK link)
  • When label encoding you should combine the train and test set before doing the encoding so that all like values have the same encoded values. (by SRK link)
  • Deal with categorical data: One-hot encoding vs label encoding. (Stackexchange)
  • Transform continuous data to discrete data. (In Chinese link)

2. Models:

3. Cross-validation:

4. Ensembling:

5. Post-processing:

  • Clipping is a simple operation on our predictions where we set a maximum and a minimum certainty. This avoids really hard punishment in case we’re wrong. This means while your model gets better, the less clipping will help you improve your score. (by Simon link)


Further Readings:

48 total views, 1 views today