speeding up constrained optimization (2024)

Discussion:

speeding up constrained optimization

(too old to reply)

Sargondjani

2012-03-28 06:17:12 UTC

Permalink

hi,
i have been struggling for a long time to get my optimization faster. i have to optimize for two variables (both >0) for a list of datapoints (and that for many periods). i use fmincon, which can only return a scalar, so i use a for loop (to loop over the datapoints), which slows things down...

isnt there another way to do this??? isnt there a way to vectorize this? or is using the parallel computation as good as it gets?

any help will be greatly appreciated!! :-)

Bruno Luong

2012-03-28 07:20:17 UTC

Permalink

Post by Sargondjani
hi,
. i use fmincon, which can only return a scalar,

Untrue.

Post by Sargondjani
isnt there another way to do this??? isnt there a way to vectorize this? or is using the parallel computation as good as it gets?

You should describe the problem you want to solve in more detail.

Bruno

Sargondjani

2012-03-28 09:17:13 UTC

Permalink

datapoints (ix) in grid: X(1:n)
periods (it): 1:T;
two status: iz=1 or iz=2
for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)

the value is calculated something like:

f_V=@(K,E) f(X(ix)-K) - E + p(E) * interp1(X, V(:,it-1,1), K) + ...
(1-p(E)) * interp1(X, V(:,it-1,1), K) ;

this value has to be maximized, for which i use fmincon. i loop:
for it=1:T;
for iz=1:2;
for ix=1:length(X);
optimize: [K(it,ix,iz),E(it,ix,iz)]=fmincon(-f_V,...)
end
end
end

so what do i do? (should i pass the grid X as a vector of parameters???)

Matt J

2012-03-28 13:09:18 UTC

Permalink

Post by Sargondjani
datapoints (ix) in grid: X(1:n)
periods (it): 1:T;
two status: iz=1 or iz=2
for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)
(1-p(E)) * interp1(X, V(:,it-1,1), K) ;

First problem. Your function is not differentiable if INTERP1 is using its default linear interpolation. Since FMINCON only handles differentiable problems, you should consider using the 'spline' option instead.

Second, you should describe your problem more exactly. The version above simplifies to something trivial,

f_V=@(K,E) f(X(ix)-K) + interp1(X, V(:,it-1,1), K) - E

The dependence on E is entirely through the final term and doesn't interact with the data at all. If Emin is its constrained minimum value, this will minimize the function all the time and the problem reduces further to the 1 parameter problem

f_V=@(K) f(X(ix)-K) + interp1(X, V(:,it-1,1), K)

Matt J

2012-03-28 13:15:25 UTC

Permalink

Post by Sargondjani
datapoints (ix) in grid: X(1:n)
periods (it): 1:T;
two status: iz=1 or iz=2
for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)
(1-p(E)) * interp1(X, V(:,it-1,1), K) ;

=======

You should also describe the function's p() and f() so we might recommend how to accelerate those.

Sargondjani

2012-03-28 15:00:29 UTC

Permalink

Post by Matt J
First problem. Your function is not differentiable if INTERP1 is using its default linear interpolation. Since FMINCON only handles differentiable problems, you should consider using the 'spline' option instead.

- good point. i use 'pchip' so i should be fine (anyway, i also tested with linear interpolation and it gives almost exactly the same solutions because my problem is well-behaved: V is monotonically increasing in K, and the gradient is decreasing)

Post by Matt J
Second, you should describe your problem more exactly. The version above simplifies to something trivial,

-There was a typo in the second line. It should have been V(:,it-1,2) (not 1), so:
f_V=@(K,E) f(X(ix)-K) - E + p(E) * interp1(X, V(:,it-1,1), K) + ...
(1-p(E)) * interp1(X, V(:,it-1, 2 ), K) ;

But you are right that E is only in the final term so i can make it a 1 dim. problem if i calculate the optimal E (given K), so that already helps alot.

I still wonder if you could help more, so i describe the problem more fully in next message

Sargondjani

2012-03-28 15:39:21 UTC

More thorough description is as follows (with adjusted notation):

The problem is about maximizing the value function.
Chose variables: E and K (both constr. >=0)
datapoints (ix) in grid: X(1:n)
periods (it): 1:T;
two status: iz=1 or iz=2
I(1)=w; I(2)=repl*w
(with repl<1)

f(K) = -exp ( -a* (X(ix)+I(iz)+K) )
p(E) = 1 - exp (-r*E)

The value to be maximized is V (dimension: n,T+1,2), with V(:,T+1,:)=zeros

f_V=@(K,E) f(K) - E + p(E) * interp1(X, V(:,it+1,1), K) + ...
(1-p(E)) * interp1(X, V(:,it+1,2), K) ;

For the interp1 i actually use 'pchip' (and piecewise polynomial) and log transformation: -exp(interp1(X,log(-V),K)

and then loop over it, iz, and ix

for it=T:-1:1;
for iz=1:2;
for ix=1:length(X)
fmincon(-f_V,...)
end
end
end

Since I can calculate opt. E as a function of K, I can get rid of the for-loop for ix, by optimizing for K only. I am trying to do that with lqsnonlin (not fsolve because of the constraint), but I wonder how you would do that...

Any hints for accelerating/ improving are very welcome! :-)

Also I would still like to know if it would be possible to vectorize this thing using fmincon (even for 1 variable, say K, because i know i will want to do this some day for more variables, for instance because E will also be in the f(K) function)

many thanks in advance!!

Matt J

2012-03-28 18:37:12 UTC

Permalink

Post by Sargondjani
Since I can calculate opt. E as a function of K, I can get rid of the for-loop for ix, by optimizing for K only. I am trying to do that with lqsnonlin (not fsolve because of the constraint), but I wonder how you would do that...

==============

Since it's a 1D problem, why not just solve it by hand? With K fixed, your function reduces to the form

f(E) = A-B*p(E)+C*(1-p(E))

The unconstrained minimizer is

Eunc=(1/r)*log((B-C)*r)

Now you just have to check the constraint boundaries (easy to find in 1D?) to see if
any f(E)<f(Eunc).

Matt J

2012-03-28 18:42:12 UTC

Permalink

Post by Matt J
Since it's a 1D problem, why not just solve it by hand? With K fixed, your function reduces to the form
f(E) = A-B*p(E)+C*(1-p(E))
The unconstrained minimizer is
Eunc=(1/r)*log((B-C)*r)
Now you just have to check the constraint boundaries (easy to find in 1D?) to see if
any f(E)<f(Eunc).

======

You also have to check whether Eunc satisfies the constraints...

Bear in mind also that if you eliminate E this way, the resulting function of K
f(K,E(K)) might be non-differentiable. You should probably use fminbnd to minimize it.

Sargondjani

2012-03-28 21:56:12 UTC

Permalink

I didnt even see i could use an analytical solution... anyway, i would still prefer to get E(K) using numerical approximation, because i might change the shape of p(E) later on (i would prefer p'(0)=+inf)

Post by Matt J
You also have to check whether Eunc satisfies the constraints...

My only constraint is the lower boundary (E>=0) and my problem is so well-behaved that i can just put E=max(0,Eunc) without any problem (im pretty sure about this)

Post by Matt J
Bear in mind also that if you eliminate E this way, the resulting function of K
f(K,E(K)) might be non-differentiable.

I want to calculate E(K) (i call it 'E_opt') for all grid points X (K's are values within the grid X), and then do pp_E=interp1(X,E_opt,'pchip','pp') and then substitute E out of the equations with: ppval(pp_E,K). (or you have a better idea?)

Then there shouldnt be any problem with the non-differentiability, because E(K) resulting from interp1 with 'pchip' is always differentiable, right?

Post by Matt J
You should probably use fminbnd to minimize it.

As for the optimization with 'fminbnd': I want to include the lower boundary for K because I know that will be optimal in some cases. You dont think 'lsqnonlin' is a good idea???

But using fmincon by adding all the f1, f2, etc. is actually a good idea too. So there is no 'true vectorization' possible for fmincon, right?? (i dont see yet if adding f1, f2 et cetera is efficient, ie. if it fully utilizes the similarities in the gradients)

Matt, thank you so much!! Your help is really greatly appreciated! This got me so much further... :-)

Sargondjani

2012-03-28 22:29:11 UTC

Permalink

Post by Sargondjani
As for the optimization with 'fminbnd': I want to include the lower boundary for K because I know that will be optimal in some cases. You dont think 'lsqnonlin' is a good idea???

Sorry, i wasnt clear here: I meant lsqnonlin for the gradient. The problem is so well-behaved that (im am pretty sure) there is only one solution in 0=<K<inf, which is either at the boundary or where the gradient = 0 (the gradient will move away from 0 everywhere else).

Either way, lsqnonlin should be able to find it... although I have to admit using fmincon for f1 + f2 et. cet. is alot safer but i have never tried that

Matt J

2012-03-29 14:11:17 UTC

Permalink

Post by Sargondjani

Post by Matt J
Bear in mind also that if you eliminate E this way, the resulting function of K
f(K,E(K)) might be non-differentiable.

I want to calculate E(K) (i call it 'E_opt') for all grid points X (K's are values within the grid X), and then do pp_E=interp1(X,E_opt,'pchip','pp') and then substitute E out of the equations with: ppval(pp_E,K). (or you have a better idea?)
Then there shouldnt be any problem with the non-differentiability, because E(K) resulting from interp1 with 'pchip' is always differentiable, right?

==============

Yes, it will be differentiable, but as a trade-off, the E(K) will only be optimal at the discrete set of K's that you use for the interpolation. However, if you think this leads to a reasonable approximation of your original objective function, then there's nothing obviously illegitimate about it.

Post by Sargondjani
Sorry, i wasnt clear here: I meant lsqnonlin for the gradient. The problem is so well-behaved that (im am pretty sure) there is only one solution in 0=<K<inf, which is either at the boundary or where the gradient = 0 (the gradient will move away from 0 everywhere else).

============

To me, it seems like overkill to be using lsqnonlin to minimize a 1D function, and I would expect it to be slower as well. With a 1D function, there are only 2 choices for the optimal direction of descent (left or right) and you only need to compute the gradient to find it. FMINBND is smart enough to know this. Conversely, lsqnonlin has n-dimensional functions in mind, and so it tries to compute both gradients and Hessians (or other related quantities, depending on the algorithm used), in order to find the best path of descent. I guess there's no harm, though, in trying both ways and comparing.

Post by Sargondjani
But using fmincon by adding all the f1, f2, etc. is actually a good idea too. So there is no 'true vectorization' possible for fmincon, right??

===============

If "true vectorization" means that you can pass as input a batch of optimization problems and initial points, then no. But I would expect the summing method to be comparably fast.

(i dont see yet if adding f1, f2 et cetera is efficient, ie. if it fully utilizes the similarities in the gradients)
=================

Summing the functions will not let you exploit similarities between the different optimization problems, and I already commented earlier that this might be a disadvantage.

Sargondjani

2012-03-29 20:27:22 UTC

Permalink

Post by Matt J
To me, it seems like overkill to be using lsqnonlin to minimize a 1D function, and I would expect it to be slower as well. With a 1D function, there are only 2 choices for the optimal direction of descent (left or right) and you only need to compute the gradient to find it. FMINBND is smart enough to know this. Conversely, lsqnonlin has n-dimensional functions in mind, and so it tries to compute both gradients and Hessians (or other related quantities, depending on the algorithm used), in order to find the best path of descent. I guess there's no harm, though, in trying both ways and comparing.

Well, with fminbnd i would still have to use a for-loop over all datapoints, right?

If i put the gradient in lsqnonlin I can vectorize (for all datapoints), and if i set: 'JacobPattern', speye(number of rows in X) then Matlab will understand that all the variables are independent... This is very fast

Matt J

2012-03-29 20:44:12 UTC

Permalink

Post by Sargondjani
Well, with fminbnd i would still have to use a for-loop over all datapoints, right?

======

Yes, but we're not convinced yet that this is sub-optimal.

Sargondjani

2012-03-30 09:47:19 UTC

Permalink

Post by Matt J

Post by Sargondjani
Well, with fminbnd i would still have to use a for-loop over all datapoints, right?

======
Yes, but we're not convinced yet that this is sub-optimal.

I tried it, but fmincon using sum( f(1)....f(ix) ) with the analytical gradient and 'HessPattern' set to speye(#rows in K) is about 4 times faster (with 20 grid points). There are 5-10 functions evaluations in each period with fmincon, so apperently this is not too much...

anyway, im totally happy!! thanks alot matt...

Matt J

2012-03-30 17:56:12 UTC

Permalink

Post by Sargondjani
I tried it, but fmincon using sum( f(1)....f(ix) ) with the analytical gradient and 'HessPattern' set to speye(#rows in K) is about 4 times faster (with 20 grid points). There are 5-10 functions evaluations in each period with fmincon, so apperently this is not too much...
anyway, im totally happy!! thanks alot matt...

==============

OK, then. That's a revealing comparison, too.

Sargondjani

2012-03-31 09:29:12 UTC

Permalink

If you know an example where fminbnd would be faster, i would like to know :-)

Matt J

2012-03-31 18:32:12 UTC

Permalink

Post by Sargondjani
If you know an example where fminbnd would be faster, i would like to know :-)

===============

I don't have the Optimization Toolbox, so I wouldn't really be able to run a comparison. However, I would imagine that if fminbnd were C-coded like the rest of the Opt Toolbox functions, rather than M-coded, it ought to beat them. It only makes sense that a dedicated 1D solver would beat an n-D solver for 1D problems.

Matt J

2012-03-28 15:22:18 UTC

Permalink

Post by Sargondjani
for it=1:T;
for iz=1:2;
for ix=1:length(X);
optimize: [K(it,ix,iz),E(it,ix,iz)]=fmincon(-f_V,...)
end
end
end

===================

One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.

Sargondjani

2012-03-28 15:40:30 UTC

Permalink

Post by Matt J
One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.

you mean the starting values, right?? ...yes, that helps a lot! :-)

Matt J

2012-03-28 18:22:12 UTC

Permalink

Post by Sargondjani

Post by Matt J
One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.

you mean the starting values, right?? ...yes, that helps a lot! :-)

==============

If so, that might be a better option than to vectorize. However, you can vectorize by summing the objective functions that you're currently generating in each pass through the loop

Ftotal= f_V(E1,K1, it1,iz1,ix1) + f_V(E2,K2, it2,iz2,ix2) + f_V(E3,K3, it3,iz3,ix3) +...

speeding up constrained optimization (2024)

FAQs

What is the problem of constrained optimization? ›

Constrained optimization problems are problems for which a function is to be minimized or maximized subject to constraints . Here is called the objective function and is a Boolean-valued formula.

How do you optimize constraints? ›

Constraint optimization can be solved by branch-and-bound algorithms. These are backtracking algorithms storing the cost of the best solution found during execution and using it to avoid part of the search.

What is the Lagrange multiplier rule? ›

The Lagrange multiplier theorem states that at any local maximum (or minimum) of the function evaluated under the equality constraints, if constraint qualification applies (explained below), then the gradient of the function (at that point) can be expressed as a linear combination of the gradients of the constraints ( ...

What is the formula for constrained optimization? ›

The general technique for optimizing a function f=f(x,y) subject to a constraint g(x,y)=c is to solve the system ∇f=λ∇g and g(x,y)=c for x, y, and λ. We then evaluate the function f at each point (x,y) that results from a solution to the system in order to find the optimum values of f subject to the constraint.

What are the factors affecting optimization? ›

Optimization involves systematically varying factors like concentration, temperature, and polymer grade to determine their effects on responses such as dissolution time, hardness, and reproducibility.

What are the three common elements of an optimization problem? ›

Every optimization problem has three components: an objective function, decision variables, and constraints. When one talks about formulating an optimization problem, it means translating a “real-world” problem into the mathematical equations and variables which comprise these three components.

What are the common techniques of constrained optimization? ›

The commonly used mathematical technique of constrained optimizations involves the use of Lagrange multiplier and Lagrange function to solve these problems followed by checking the second order conditions using the Bordered Hessian.

How to solve constraint problems? ›

Solving a constraint programming is ensured by combining a systematic search process with inference techniques. The systematic search process consists of enumerating all possible variable-value combinations and construing a search tree where the tree's root represents the problem to be solved.

What are the 3 basic constraints? ›

These three constraints are:
  • Cost: The project budget, which serves as the financial constraint in a project.
  • Scope: The activities necessary to achieve the project's goals.
  • Time: The project's schedule based on which the project will be completed.
Apr 4, 2023

Why do Lagrange multipliers fail? ›

The Lagrange-multiplier method fails because ∇g = 0 at the point (x, y) = (0, 1) where f attains its minimum on g = 0. As a result, the curve g(x, y) = 0 is not smooth with a well-defined normal vector at that point (see figure).

What is the disadvantage of Lagrange multiplier? ›

If the function is discontinuous the calculation with lagrange becomes complex. In addition, if the function is not monotonic or nonconvex, optimization might be difficult as there might be multiple solutions or folds on the functional surface. These are some areas that using Lagrange multipliers will be tricky.

Why do we often use Lagrange multiplier? ›

The Lagrange multiplier technique lets you find the maximum or minimum of a multivariable function. ‍ when there is some constraint on the input values you are allowed to use. Here, ‍ is another multivariable function with the same input space as ‍ , and ‍ is some constant.

What is the Cobyla method? ›

The name COBYLA is an acronym for Constrained Optimization by Linear Approximation. It is an iterative method for derivative-free constrained optimization. The method maintains and updates linear approximations to both the objective function and to each constraint.

What is Lagrangian multiplier for constrained optimization? ›

Consider the problem {minimize/maximize f(x) subject to the constraint: g(x)=0. A point x is said to be a local maximum of f subject to the constraint g=0 if ∃r>0 such that f(x)≥f(x′) for all x′ such that |x′−x|<r and g(x′)=0.

What are the three primary components of a constrained optimization model? ›

Constrained optimization models have three major components: decision variables, objective function, and constraints.

What are constrained and unconstrained optimization problems? ›

optimization problems. Unconstrained simply means that the choice variable can take on any value—there are no restrictions. Constrained means that the choice variable can only take on certain values within a larger range.

Which problem is an optimization problem? ›

An optimization problem is a problem in which one wishes to optimize (i.e., maximize or minimize) an optimization function f(x) subject to certain constraints C(x).

What is chance constrained optimization problem? ›

Chance constrained optimization is an approach to solve optimization problems under uncertainty where the uncertainty is also present in to the inequality constraints. We need a formulation on how to restrict values and processes described by random variables in a meaningful way.

What are the disadvantages of optimization models? ›

Optimization algorithms may require significant computational resources and can be time-consuming, especially when dealing with time-series data 5. Furthermore, the effectiveness of these algorithms is dependent on the accuracy of the sensors and the assumptions made about the signal-noise model 5.

References

Top Articles
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6581

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.