I am checking out the
sigopt.xgboost API. How do I perform cross-validation with it?
I think I see how to do it using
sigopt.create_experiment, because that expects the training to happen external of the API call, and I can use a cross-validation technique that I am familiar with (e.g.
sigopt.create_experiment also requires me to do more work setting up parameter search ranges compare to
sigopt.xgboost API was designed to follow the
xgboost.train API. If you want to use a custom CV procedure, you will need to use
sigopt.create_experiment. As you point out, this requires you defining the parameter space. If you are looking for guidance on how to construct the parameter space, you can reuse the defaults of
sigopt.xgboost by accessing constants from sigopt.xgboost.
Eddie-SigOpt, thanks for the feedback.
While the ability to do a “custom CV procedure” would be nice, the ability to do any form of cross-validation (or other subsampling/validation technique) seems important. Otherwise it seems like the auto-tuning (HPO) will just maximize the training score, which will likely lead to overfitting. Unless the experiment has some form of model validation happening behind the hood that I missed in the documentation?
I guess I am thinking of something like xgboost.cv, except it could auto-tune instead of using a single set of hyperparameters.
Yes, good point. If you do not provide any validation datasets in the
evals argument, you will be optimizing for metrics computed on the training dataset which increases the likelihood of producing an overfit model. If you include entries in the
evals argument of
xgboost.train, the first entry of the list will be the dataset used for optimization metric(s).
On another note, if you would like to have a chat about the possibility of extending the existing
sigopt.xgboost calls (which work with
xgboost.train) to include
xgboost.cvor more general CV workflows please reach me at firstname.lastname@example.org and we can schedule some time to discuss.
evals, I forgot about that one, that should definitely improve things over just maxing the training score. Thanks for the tip.
I think adding CV would be helpful. Either something similar to the xgboost.cv, or perhaps allowing a callable to be passed into
evals which can be used to get splits… probably too much of hack. I’ll reach out if I get a chance.