Ok, probably I’m not going to talk about everything that you need to know about multiple metrics, but this post will discuss how to frame problems with multiple metrics using SigOpt. From the most basic “smush all metrics together” to some unique SigOpt experiments. Here’s a summary:
Motivation and running example
Combining all metrics into a single objective
Experiments with metric constraints
Optimization is an excellent tool for decision-making, but we often want to optimize multiple competing objectives. For example, we want our computers and cellphones to be super fast, but the highest performance settings quickly drain the batteries. Optimizing performance while minimizing energy consumption is a common engineering challenge. Similarly, in finance, investors seek to balance risk and return of investments (or at least, in theory, they should!). And if we want to get more technical, we could talk about bias/variance or exploitation/exploration tradeoffs in machine learning and artificial intelligence. In any case, if we think about it, virtually any complex problem will require balancing competitive goals; after all, there’s no free lunch!
For visualization purposes, let’s consider a simple two-dimensional two-metric problem. Below, we show our metrics, which we will call 𝑔₁ and 𝑔₂, for lack of creativity. For both metrics, we will be interested in minimizing their function values.
Keep in mind that we wouldn’t be able to visualize 𝑔₁ and 𝑔₂ in practice − they are typically costly to compute.
Both functions are relatively “well-behaved.” Most optimizers should find the individual optimum of these functions pretty easily. The lines we drew are called “contour levels,” and vaguely speaking, if we could follow them, we would be in excellent shape for finding the optimum values (in blue). The problem is that these metrics are competitive: the parameters that yield optimum values for 𝑔₁ (e.g., x₁ = 0.5) are very different from those that minimize 𝑔₂ (x₁ = 1).
Combining all metrics into a single objective
A natural approach is to convert these two metrics into a single metric. Specifically, the decision-maker (aka, you) creates a function that balances the two metrics according to the application’s goals. It could be a simple linear combination of the metrics or an elaborate mapping of metrics to a single objective. Still, the idea is simple: we solve one optimization problem instead of multiple. For our problem, we will consider the following function 𝑓:
f(x) = min(g₁, 0.8) + min(g₂, 0.5)
Now, we could try to optimize 𝑓 directly. Below, we show the contour plot of our recently created function 𝑓.
Just looking at the plot above, we can see that our combined function 𝑓 is not as easy to optimize as the individual functions. There is a large portion of the input space where the function values are constant (flat). Additionally, the simple strategy of following the contour level is not as effective here; we could be in different locations depending on where we started.
Fortunately, SigOpt excels at solving general nonconvex problems − although I would expect any nonconvex optimizer to solve our simple 2D. Below we show the outcome of one optimization experiment when we minimize 𝑓.
The white circles show the sampled locations, and the opaque circle is the best parameter configuration. This approach can be very effective if we encode all our problem’s features using our customed objective function. The downside is that it can be challenging to capture everything into a single metric. SigOpt also lets you store metrics, so it is advisable to store the original metrics even if you designed the best function 𝑓 to combine your metrics.
In our example, our function 𝑓 had some minimum operators that effectively worked as constraints to our optimization problem. If we were to examine the original metrics using the samples we acquired during the previous optimization example, we would have the following plot:
where we show the constraints for the function values g₁ = 0.8 and 𝑔₂ = 0.5 in olive. Notice that our second metric 𝑔₂ has very little influence over the optimum location (solid white dot) and the optimum has a 𝑔₂ value above 0.5. That’s probably not what we desired when we decided to take the min(𝑔₂,0.5), but the optimizer did what we asked: found the minimum of 𝑓. Next, let’s see how we can effectively encode constraints to overcome this issue.
Using metric constraints
As we said earlier, terms like min(g₂, 0.5) or min(g₁, 0.8) implicity define constraints over the metric values. We can explicitly let SigOpt know that we are dealing with two distinct metrics and say: “hey, SigOpt, optimize metric one; and, for the second metric, I only care about values below this value.” Unfortunately, SigOpt is not using GPT3 yet, so you can’t actually just say that; you have to check the documentation (here) to see the correct commands, but assuming you have done so, that’s the outcome for the experiment with metric constraints:
We can see that both metrics significantly impact the optimum location, and our solution satisfies our constraint. Moreover, we didn’t need to develop an elaborate combination of metrics!
What if I don’t have constraints? (multimetric experiment)
In that case, you use a multimetric experiment instead! Multimetric experiments aim to find parameter configurations that “optimize both metrics 𝑔₁ and 𝑔₂.” More precisely, SigOpt will try to find the Pareto-efficient frontier between both metrics. It looks like we are on our way to finding an excellent Pareto Frontier for our example:
If we were to get more observations, SigOpt would keep searching for Pareto points (the orange points above). Plotting the same experiment using the same parameter plots as we did before results in the following figure:
In parameter space, the optimal set of parameter configurations for this problem is a line that trades off good values of 𝑔₁ for values of 𝑔₂. Check out the Multimetric experiments documentation to learn more about this experiment type.
What if I only have constraints? (all-constraint experiment)
Instead of searching for this line of parameter configurations that trades off 𝑔₁ and 𝑔₂ values, we can ask SigOpt to search for all values that satisfy our metric conditions. For example, above some performance threshold, and below some energy consumption level. Our efficient sample experiment for an all-constraints problem will look like the following:
From SigOpt’s perspective, all constraint metrics are active throughout the experiment, that’s why we show the constraints for 𝑔₂ in the left Figure (metric 𝑔₁), but using a transparent olive line. This experiment formulation is very useful in design problems where we search for parameter configurations that outperform a previously known design. The goal here is to seek for diverse points that satisfy all metric constraints! We give more technical details for all-constraint experiments in this ICML paper or in SigOpt documentation.