Discussion:

speeding up constrained optimization

(too old to reply)

Sargondjani

2012-03-28 06:17:12 UTC

Permalink

hi,

i have been struggling for a long time to get my optimization faster. i have to optimize for two variables (both >0) for a list of datapoints (and that for many periods). i use fmincon, which can only return a scalar, so i use a for loop (to loop over the datapoints), which slows things down...

isnt there another way to do this??? isnt there a way to vectorize this? or is using the parallel computation as good as it gets?

any help will be greatly appreciated!! :-)

Bruno Luong

2012-03-28 07:20:17 UTC

Permalink

Post by Sargondjani

hi,

. i use fmincon, which can only return a scalar,

Untrue.

Post by Sargondjani

isnt there another way to do this??? isnt there a way to vectorize this? or is using the parallel computation as good as it gets?

You should describe the problem you want to solve in more detail.

Bruno

Sargondjani

2012-03-28 09:17:13 UTC

Permalink

datapoints (ix) in grid: X(1:n)

periods (it): 1:T;

two status: iz=1 or iz=2

for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)

the value is calculated something like:

f_V=@(K,E) f(X(ix)-K) - E + p(E) * interp1(X, V(:,it-1,1), K) + ...

(1-p(E)) * interp1(X, V(:,it-1,1), K) ;

this value has to be maximized, for which i use fmincon. i loop:

for it=1:T;

for iz=1:2;

for ix=1:length(X);

optimize: [K(it,ix,iz),E(it,ix,iz)]=fmincon(-f_V,...)

end

end

end

so what do i do? (should i pass the grid X as a vector of parameters???)

Matt J

2012-03-28 13:09:18 UTC

Permalink

Post by Sargondjani

datapoints (ix) in grid: X(1:n)

periods (it): 1:T;

two status: iz=1 or iz=2

for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)

(1-p(E)) * interp1(X, V(:,it-1,1), K) ;

First problem. Your function is not differentiable if INTERP1 is using its default linear interpolation. Since FMINCON only handles differentiable problems, you should consider using the 'spline' option instead.

Second, you should describe your problem more exactly. The version above simplifies to something trivial,

f_V=@(K,E) f(X(ix)-K) + interp1(X, V(:,it-1,1), K) - E

The dependence on E is entirely through the final term and doesn't interact with the data at all. If Emin is its constrained minimum value, this will minimize the function all the time and the problem reduces further to the 1 parameter problem

f_V=@(K) f(X(ix)-K) + interp1(X, V(:,it-1,1), K)

Matt J

2012-03-28 13:15:25 UTC

Permalink

Post by Sargondjani

datapoints (ix) in grid: X(1:n)

periods (it): 1:T;

two status: iz=1 or iz=2

for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)

(1-p(E)) * interp1(X, V(:,it-1,1), K) ;

=======

You should also describe the function's p() and f() so we might recommend how to accelerate those.

Sargondjani

2012-03-28 15:00:29 UTC

Permalink

Post by Matt J

First problem. Your function is not differentiable if INTERP1 is using its default linear interpolation. Since FMINCON only handles differentiable problems, you should consider using the 'spline' option instead.

- good point. i use 'pchip' so i should be fine (anyway, i also tested with linear interpolation and it gives almost exactly the same solutions because my problem is well-behaved: V is monotonically increasing in K, and the gradient is decreasing)

Post by Matt J

Second, you should describe your problem more exactly. The version above simplifies to something trivial,

-There was a typo in the second line. It should have been V(:,it-1,2) (not 1), so:

f_V=@(K,E) f(X(ix)-K) - E + p(E) * interp1(X, V(:,it-1,1), K) + ...

(1-p(E)) * interp1(X, V(:,it-1, 2 ), K) ;

But you are right that E is only in the final term so i can make it a 1 dim. problem if i calculate the optimal E (given K), so that already helps alot.

I still wonder if you could help more, so i describe the problem more fully in next message

Sargondjani

2012-03-28 15:39:21 UTC

Permalink

More thorough description is as follows (with adjusted notation):

The problem is about maximizing the value function.

Chose variables: E and K (both constr. >=0)

datapoints (ix) in grid: X(1:n)

periods (it): 1:T;

two status: iz=1 or iz=2

I(1)=w; I(2)=repl*w

(with repl<1)

f(K) = -exp ( -a* (X(ix)+I(iz)+K) )

p(E) = 1 - exp (-r*E)

The value to be maximized is V (dimension: n,T+1,2), with V(:,T+1,:)=zeros

f_V=@(K,E) f(K) - E + p(E) * interp1(X, V(:,it+1,1), K) + ...

(1-p(E)) * interp1(X, V(:,it+1,2), K) ;

For the interp1 i actually use 'pchip' (and piecewise polynomial) and log transformation: -exp(interp1(X,log(-V),K)

and then loop over it, iz, and ix

for it=T:-1:1;

for iz=1:2;

for ix=1:length(X)

fmincon(-f_V,...)

end

end

end

Since I can calculate opt. E as a function of K, I can get rid of the for-loop for ix, by optimizing for K only. I am trying to do that with lqsnonlin (not fsolve because of the constraint), but I wonder how you would do that...

Any hints for accelerating/ improving are very welcome! :-)

Also I would still like to know if it would be possible to vectorize this thing using fmincon (even for 1 variable, say K, because i know i will want to do this some day for more variables, for instance because E will also be in the f(K) function)

many thanks in advance!!

Matt J

2012-03-28 18:37:12 UTC

Permalink

Post by Sargondjani

Since I can calculate opt. E as a function of K, I can get rid of the for-loop for ix, by optimizing for K only. I am trying to do that with lqsnonlin (not fsolve because of the constraint), but I wonder how you would do that...

==============

Since it's a 1D problem, why not just solve it by hand? With K fixed, your function reduces to the form

f(E) = A-B*p(E)+C*(1-p(E))

The unconstrained minimizer is

Eunc=(1/r)*log((B-C)*r)

Now you just have to check the constraint boundaries (easy to find in 1D?) to see if

any f(E)<f(Eunc).

Matt J

2012-03-28 18:42:12 UTC

Permalink

Post by Matt J

Since it's a 1D problem, why not just solve it by hand? With K fixed, your function reduces to the form

f(E) = A-B*p(E)+C*(1-p(E))

The unconstrained minimizer is

Eunc=(1/r)*log((B-C)*r)

Now you just have to check the constraint boundaries (easy to find in 1D?) to see if

any f(E)<f(Eunc).

======

You also have to check whether Eunc satisfies the constraints...

Bear in mind also that if you eliminate E this way, the resulting function of K

f(K,E(K)) might be non-differentiable. You should probably use fminbnd to minimize it.

Sargondjani

2012-03-28 21:56:12 UTC

Permalink

I didnt even see i could use an analytical solution... anyway, i would still prefer to get E(K) using numerical approximation, because i might change the shape of p(E) later on (i would prefer p'(0)=+inf)

Post by Matt J

You also have to check whether Eunc satisfies the constraints...

My only constraint is the lower boundary (E>=0) and my problem is so well-behaved that i can just put E=max(0,Eunc) without any problem (im pretty sure about this)

Post by Matt J

Bear in mind also that if you eliminate E this way, the resulting function of K

f(K,E(K)) might be non-differentiable.

I want to calculate E(K) (i call it 'E_opt') for all grid points X (K's are values within the grid X), and then do pp_E=interp1(X,E_opt,'pchip','pp') and then substitute E out of the equations with: ppval(pp_E,K). (or you have a better idea?)

Then there shouldnt be any problem with the non-differentiability, because E(K) resulting from interp1 with 'pchip' is always differentiable, right?

Post by Matt J

You should probably use fminbnd to minimize it.

As for the optimization with 'fminbnd': I want to include the lower boundary for K because I know that will be optimal in some cases. You dont think 'lsqnonlin' is a good idea???

But using fmincon by adding all the f1, f2, etc. is actually a good idea too. So there is no 'true vectorization' possible for fmincon, right?? (i dont see yet if adding f1, f2 et cetera is efficient, ie. if it fully utilizes the similarities in the gradients)

Matt, thank you so much!! Your help is really greatly appreciated! This got me so much further... :-)

Sargondjani

2012-03-28 22:29:11 UTC

Permalink

Post by Sargondjani

As for the optimization with 'fminbnd': I want to include the lower boundary for K because I know that will be optimal in some cases. You dont think 'lsqnonlin' is a good idea???

Sorry, i wasnt clear here: I meant lsqnonlin for the gradient. The problem is so well-behaved that (im am pretty sure) there is only one solution in 0=<K<inf, which is either at the boundary or where the gradient = 0 (the gradient will move away from 0 everywhere else).

Either way, lsqnonlin should be able to find it... although I have to admit using fmincon for f1 + f2 et. cet. is alot safer but i have never tried that

Matt J

2012-03-29 14:11:17 UTC

Permalink

Post by Sargondjani

Post by Matt J

Bear in mind also that if you eliminate E this way, the resulting function of K

f(K,E(K)) might be non-differentiable.

I want to calculate E(K) (i call it 'E_opt') for all grid points X (K's are values within the grid X), and then do pp_E=interp1(X,E_opt,'pchip','pp') and then substitute E out of the equations with: ppval(pp_E,K). (or you have a better idea?)

Then there shouldnt be any problem with the non-differentiability, because E(K) resulting from interp1 with 'pchip' is always differentiable, right?

==============

Yes, it will be differentiable, but as a trade-off, the E(K) will only be optimal at the discrete set of K's that you use for the interpolation. However, if you think this leads to a reasonable approximation of your original objective function, then there's nothing obviously illegitimate about it.

Post by Sargondjani

Sorry, i wasnt clear here: I meant lsqnonlin for the gradient. The problem is so well-behaved that (im am pretty sure) there is only one solution in 0=<K<inf, which is either at the boundary or where the gradient = 0 (the gradient will move away from 0 everywhere else).

============

To me, it seems like overkill to be using lsqnonlin to minimize a 1D function, and I would expect it to be slower as well. With a 1D function, there are only 2 choices for the optimal direction of descent (left or right) and you only need to compute the gradient to find it. FMINBND is smart enough to know this. Conversely, lsqnonlin has n-dimensional functions in mind, and so it tries to compute both gradients and Hessians (or other related quantities, depending on the algorithm used), in order to find the best path of descent. I guess there's no harm, though, in trying both ways and comparing.

Post by Sargondjani

But using fmincon by adding all the f1, f2, etc. is actually a good idea too. So there is no 'true vectorization' possible for fmincon, right??

===============

If "true vectorization" means that you can pass as input a batch of optimization problems and initial points, then no. But I would expect the summing method to be comparably fast.

(i dont see yet if adding f1, f2 et cetera is efficient, ie. if it fully utilizes the similarities in the gradients)

=================

Summing the functions will not let you exploit similarities between the different optimization problems, and I already commented earlier that this might be a disadvantage.

Sargondjani

2012-03-29 20:27:22 UTC

Permalink

Post by Matt J

To me, it seems like overkill to be using lsqnonlin to minimize a 1D function, and I would expect it to be slower as well. With a 1D function, there are only 2 choices for the optimal direction of descent (left or right) and you only need to compute the gradient to find it. FMINBND is smart enough to know this. Conversely, lsqnonlin has n-dimensional functions in mind, and so it tries to compute both gradients and Hessians (or other related quantities, depending on the algorithm used), in order to find the best path of descent. I guess there's no harm, though, in trying both ways and comparing.

Well, with fminbnd i would still have to use a for-loop over all datapoints, right?

If i put the gradient in lsqnonlin I can vectorize (for all datapoints), and if i set: 'JacobPattern', speye(number of rows in X) then Matlab will understand that all the variables are independent... This is very fast

Matt J

2012-03-29 20:44:12 UTC

Permalink

Post by Sargondjani

Well, with fminbnd i would still have to use a for-loop over all datapoints, right?

======

Yes, but we're not convinced yet that this is sub-optimal.

Sargondjani

2012-03-30 09:47:19 UTC

Permalink

Post by Matt J

Post by Sargondjani

Well, with fminbnd i would still have to use a for-loop over all datapoints, right?

======

Yes, but we're not convinced yet that this is sub-optimal.

I tried it, but fmincon using sum( f(1)....f(ix) ) with the analytical gradient and 'HessPattern' set to speye(#rows in K) is about 4 times faster (with 20 grid points). There are 5-10 functions evaluations in each period with fmincon, so apperently this is not too much...

anyway, im totally happy!! thanks alot matt...

Matt J

2012-03-30 17:56:12 UTC

Permalink

Post by Sargondjani

I tried it, but fmincon using sum( f(1)....f(ix) ) with the analytical gradient and 'HessPattern' set to speye(#rows in K) is about 4 times faster (with 20 grid points). There are 5-10 functions evaluations in each period with fmincon, so apperently this is not too much...

anyway, im totally happy!! thanks alot matt...

==============

OK, then. That's a revealing comparison, too.

Sargondjani

2012-03-31 09:29:12 UTC

Permalink

If you know an example where fminbnd would be faster, i would like to know :-)

Matt J

2012-03-31 18:32:12 UTC

Permalink

Post by Sargondjani

If you know an example where fminbnd would be faster, i would like to know :-)

===============

I don't have the Optimization Toolbox, so I wouldn't really be able to run a comparison. However, I would imagine that if fminbnd were C-coded like the rest of the Opt Toolbox functions, rather than M-coded, it ought to beat them. It only makes sense that a dedicated 1D solver would beat an n-D solver for 1D problems.

Matt J

2012-03-28 15:22:18 UTC

Permalink

Post by Sargondjani

for it=1:T;

for iz=1:2;

for ix=1:length(X);

optimize: [K(it,ix,iz),E(it,ix,iz)]=fmincon(-f_V,...)

end

end

end

===================

One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.

Sargondjani

2012-03-28 15:40:30 UTC

Permalink

Post by Matt J

One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.

you mean the starting values, right?? ...yes, that helps a lot! :-)

Matt J

2012-03-28 18:22:12 UTC

Permalink

Post by Sargondjani

Post by Matt J

One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.

you mean the starting values, right?? ...yes, that helps a lot! :-)

==============

If so, that might be a better option than to vectorize. However, you can vectorize by summing the objective functions that you're currently generating in each pass through the loop

Ftotal= f_V(E1,K1, it1,iz1,ix1) + f_V(E2,K2, it2,iz2,ix2) + f_V(E3,K3, it3,iz3,ix3) +...