XGBoost, cross-validation

I am checking out the sigopt.xgboost API. How do I perform cross-validation with it?

I think I see how to do it using sigopt.create_experiment, because that expects the training to happen external of the API call, and I can use a cross-validation technique that I am familiar with (e.g. sklearn.model_selection.cross_val_score). However sigopt.create_experiment also requires me to do more work setting up parameter search ranges compare to sigopt.xgboost.

1 Like

The sigopt.xgboost API was designed to follow the xgboost.train API. If you want to use a custom CV procedure, you will need to use sigopt.create_experiment. As you point out, this requires you defining the parameter space. If you are looking for guidance on how to construct the parameter space, you can reuse the defaults of sigopt.xgboost by accessing constants from sigopt.xgboost.

1 Like

Eddie-SigOpt, thanks for the feedback.

While the ability to do a “custom CV procedure” would be nice, the ability to do any form of cross-validation (or other subsampling/validation technique) seems important. Otherwise it seems like the auto-tuning (HPO) will just maximize the training score, which will likely lead to overfitting. Unless the experiment has some form of model validation happening behind the hood that I missed in the documentation?

I guess I am thinking of something like xgboost.cv, except it could auto-tune instead of using a single set of hyperparameters.

Yes, good point. If you do not provide any validation datasets in the evals argument, you will be optimizing for metrics computed on the training dataset which increases the likelihood of producing an overfit model. If you include entries in the evals argument of xgboost.train, the first entry of the list will be the dataset used for optimization metric(s).

On another note, if you would like to have a chat about the possibility of extending the existing sigopt.xgboost calls (which work with xgboost.train) to include xgboost.cvor more general CV workflows please reach me at eddie.mattia@intel.com and we can schedule some time to discuss.

1 Like

evals, I forgot about that one, that should definitely improve things over just maxing the training score. Thanks for the tip.

I think adding CV would be helpful. Either something similar to the xgboost.cv, or perhaps allowing a callable to be passed into dtrain or evals which can be used to get splits… probably too much of hack. I’ll reach out if I get a chance.
Thanks again!

1 Like