Correct me if I'm wrong, but it seems the bayesian_optimization.py optimizer in this library assumes that the sampled points are exact, ie their variance is zero. It doesn't seem to re-sample existing points.
This will cause the algorithm to "chase random noise", as morelandjs wrote below
I haven't looked at his source, but usually there's a white noise term in the covariance structure of the Gaussian process regressor that adds some statistical uncertainty at each evaluation point. So even when it evaluates a point of parameter space the GPR is still somewhat uncertain about the value of the optimization function at that point. So it should balance exploration versus exploitation taking that statistical uncertainty into account.
I would be very disappointed if that were the case.. no, it looks like it’s set up to capture variance. The BO algo wraps an “Expected Improvement Optimizer”:
I'm afraid it never samples points more than once, since it estimated already-sampled-points as points with variance zero, and no expected improvement.
IMHO that's wrong. Variance of a single sample should be infinite (classical statistics), or similar to the variance of nearby points (bayesian+model), or some pre-defined prior (not a great idea... I'd prefer some automatic method). But not zero.
Ah, good catch. So in the event the gpr predicts zero variance, the optimizer says EI is zero and thus won’t sample again. That may depend on the settings of the gpr.. if I’m not mistaken there are ways for gpr to model noise and not collapse to zero variance on every sampled point.
Anyway, I guess I stand by my original suggestion that BO is the best tool for gradient free optim with slow and noisy fevals, but to my knowledge nobody has built a way to dial in the hyper parameters automatically. And there are quite a few. Entire companies exist for this purpose, SigOpt comes to mind..
This will cause the algorithm to "chase random noise", as morelandjs wrote below