Discussion:
speeding up constrained optimization
(too old to reply)
Sargondjani
2012-03-28 06:17:12 UTC
Permalink
hi,
i have been struggling for a long time to get my optimization faster. i have to optimize for two variables (both >0) for a list of datapoints (and that for many periods). i use fmincon, which can only return a scalar, so i use a for loop (to loop over the datapoints), which slows things down...
isnt there another way to do this??? isnt there a way to vectorize this? or is using the parallel computation as good as it gets?
any help will be greatly appreciated!! :-)
Bruno Luong
2012-03-28 07:20:17 UTC
Permalink
Post by Sargondjani
hi,
. i use fmincon, which can only return a scalar,
Untrue.
Post by Sargondjani
isnt there another way to do this??? isnt there a way to vectorize this? or is using the parallel computation as good as it gets?
You should describe the problem you want to solve in more detail.
Bruno
Sargondjani
2012-03-28 09:17:13 UTC
Permalink
datapoints (ix) in grid: X(1:n)
periods (it): 1:T;
two status: iz=1 or iz=2
for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)
the value is calculated something like:
f_V=@(K,E) f(X(ix)-K) - E + p(E) * interp1(X, V(:,it-1,1), K) + ...
(1-p(E)) * interp1(X, V(:,it-1,1), K) ;
this value has to be maximized, for which i use fmincon. i loop:
for it=1:T;
for iz=1:2;
for ix=1:length(X);
optimize: [K(it,ix,iz),E(it,ix,iz)]=fmincon(-f_V,...)
end
end
end
so what do i do? (should i pass the grid X as a vector of parameters???)
Matt J
2012-03-28 13:09:18 UTC
Permalink
Post by Sargondjani
datapoints (ix) in grid: X(1:n)
periods (it): 1:T;
two status: iz=1 or iz=2
for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)
(1-p(E)) * interp1(X, V(:,it-1,1), K) ;
First problem. Your function is not differentiable if INTERP1 is using its default linear interpolation. Since FMINCON only handles differentiable problems, you should consider using the 'spline' option instead.
Second, you should describe your problem more exactly. The version above simplifies to something trivial,
f_V=@(K,E) f(X(ix)-K) + interp1(X, V(:,it-1,1), K) - E
The dependence on E is entirely through the final term and doesn't interact with the data at all. If Emin is its constrained minimum value, this will minimize the function all the time and the problem reduces further to the 1 parameter problem
f_V=@(K) f(X(ix)-K) + interp1(X, V(:,it-1,1), K)
Matt J
2012-03-28 13:15:25 UTC
Permalink
Post by Sargondjani
datapoints (ix) in grid: X(1:n)
periods (it): 1:T;
two status: iz=1 or iz=2
for each gridpoint, period, status I store the optimal value: V (dimension: n,T,2)
(1-p(E)) * interp1(X, V(:,it-1,1), K) ;
=======
You should also describe the function's p() and f() so we might recommend how to accelerate those.
Sargondjani
2012-03-28 15:00:29 UTC
Permalink
Post by Matt J
First problem. Your function is not differentiable if INTERP1 is using its default linear interpolation. Since FMINCON only handles differentiable problems, you should consider using the 'spline' option instead.
- good point. i use 'pchip' so i should be fine (anyway, i also tested with linear interpolation and it gives almost exactly the same solutions because my problem is well-behaved: V is monotonically increasing in K, and the gradient is decreasing)
Post by Matt J
Second, you should describe your problem more exactly. The version above simplifies to something trivial,
-There was a typo in the second line. It should have been V(:,it-1,2) (not 1), so:
f_V=@(K,E) f(X(ix)-K) - E + p(E) * interp1(X, V(:,it-1,1), K) + ...
(1-p(E)) * interp1(X, V(:,it-1, 2 ), K) ;
But you are right that E is only in the final term so i can make it a 1 dim. problem if i calculate the optimal E (given K), so that already helps alot.
I still wonder if you could help more, so i describe the problem more fully in next message
Sargondjani
2012-03-28 15:39:21 UTC
Permalink
More thorough description is as follows (with adjusted notation):
The problem is about maximizing the value function.
Chose variables: E and K (both constr. >=0)
datapoints (ix) in grid: X(1:n)
periods (it): 1:T;
two status: iz=1 or iz=2
I(1)=w; I(2)=repl*w
(with repl<1)
f(K) = -exp ( -a* (X(ix)+I(iz)+K) )
p(E) = 1 - exp (-r*E)
The value to be maximized is V (dimension: n,T+1,2), with V(:,T+1,:)=zeros
f_V=@(K,E) f(K) - E + p(E) * interp1(X, V(:,it+1,1), K) + ...
(1-p(E)) * interp1(X, V(:,it+1,2), K) ;
For the interp1 i actually use 'pchip' (and piecewise polynomial) and log transformation: -exp(interp1(X,log(-V),K)
and then loop over it, iz, and ix
for it=T:-1:1;
for iz=1:2;
for ix=1:length(X)
fmincon(-f_V,...)
end
end
end
Since I can calculate opt. E as a function of K, I can get rid of the for-loop for ix, by optimizing for K only. I am trying to do that with lqsnonlin (not fsolve because of the constraint), but I wonder how you would do that...
Any hints for accelerating/ improving are very welcome! :-)
Also I would still like to know if it would be possible to vectorize this thing using fmincon (even for 1 variable, say K, because i know i will want to do this some day for more variables, for instance because E will also be in the f(K) function)
many thanks in advance!!
Matt J
2012-03-28 18:37:12 UTC
Permalink
Post by Sargondjani
Since I can calculate opt. E as a function of K, I can get rid of the for-loop for ix, by optimizing for K only. I am trying to do that with lqsnonlin (not fsolve because of the constraint), but I wonder how you would do that...
==============
Since it's a 1D problem, why not just solve it by hand? With K fixed, your function reduces to the form
f(E) = A-B*p(E)+C*(1-p(E))
The unconstrained minimizer is
Eunc=(1/r)*log((B-C)*r)
Now you just have to check the constraint boundaries (easy to find in 1D?) to see if
any f(E)<f(Eunc).
Matt J
2012-03-28 18:42:12 UTC
Permalink
Post by Matt J
Since it's a 1D problem, why not just solve it by hand? With K fixed, your function reduces to the form
f(E) = A-B*p(E)+C*(1-p(E))
The unconstrained minimizer is
Eunc=(1/r)*log((B-C)*r)
Now you just have to check the constraint boundaries (easy to find in 1D?) to see if
any f(E)<f(Eunc).
======
You also have to check whether Eunc satisfies the constraints...
Bear in mind also that if you eliminate E this way, the resulting function of K
f(K,E(K)) might be non-differentiable. You should probably use fminbnd to minimize it.
Sargondjani
2012-03-28 21:56:12 UTC
Permalink
I didnt even see i could use an analytical solution... anyway, i would still prefer to get E(K) using numerical approximation, because i might change the shape of p(E) later on (i would prefer p'(0)=+inf)
Post by Matt J
You also have to check whether Eunc satisfies the constraints...
My only constraint is the lower boundary (E>=0) and my problem is so well-behaved that i can just put E=max(0,Eunc) without any problem (im pretty sure about this)
Post by Matt J
Bear in mind also that if you eliminate E this way, the resulting function of K
f(K,E(K)) might be non-differentiable.
I want to calculate E(K) (i call it 'E_opt') for all grid points X (K's are values within the grid X), and then do pp_E=interp1(X,E_opt,'pchip','pp') and then substitute E out of the equations with: ppval(pp_E,K). (or you have a better idea?)
Then there shouldnt be any problem with the non-differentiability, because E(K) resulting from interp1 with 'pchip' is always differentiable, right?
Post by Matt J
You should probably use fminbnd to minimize it.
As for the optimization with 'fminbnd': I want to include the lower boundary for K because I know that will be optimal in some cases. You dont think 'lsqnonlin' is a good idea???
But using fmincon by adding all the f1, f2, etc. is actually a good idea too. So there is no 'true vectorization' possible for fmincon, right?? (i dont see yet if adding f1, f2 et cetera is efficient, ie. if it fully utilizes the similarities in the gradients)
Matt, thank you so much!! Your help is really greatly appreciated! This got me so much further... :-)
Sargondjani
2012-03-28 22:29:11 UTC
Permalink
Post by Sargondjani
As for the optimization with 'fminbnd': I want to include the lower boundary for K because I know that will be optimal in some cases. You dont think 'lsqnonlin' is a good idea???
Sorry, i wasnt clear here: I meant lsqnonlin for the gradient. The problem is so well-behaved that (im am pretty sure) there is only one solution in 0=<K<inf, which is either at the boundary or where the gradient = 0 (the gradient will move away from 0 everywhere else).
Either way, lsqnonlin should be able to find it... although I have to admit using fmincon for f1 + f2 et. cet. is alot safer but i have never tried that
Matt J
2012-03-29 14:11:17 UTC
Permalink
Post by Sargondjani
Post by Matt J
Bear in mind also that if you eliminate E this way, the resulting function of K
f(K,E(K)) might be non-differentiable.
I want to calculate E(K) (i call it 'E_opt') for all grid points X (K's are values within the grid X), and then do pp_E=interp1(X,E_opt,'pchip','pp') and then substitute E out of the equations with: ppval(pp_E,K). (or you have a better idea?)
Then there shouldnt be any problem with the non-differentiability, because E(K) resulting from interp1 with 'pchip' is always differentiable, right?
==============
Yes, it will be differentiable, but as a trade-off, the E(K) will only be optimal at the discrete set of K's that you use for the interpolation. However, if you think this leads to a reasonable approximation of your original objective function, then there's nothing obviously illegitimate about it.
Post by Sargondjani
Sorry, i wasnt clear here: I meant lsqnonlin for the gradient. The problem is so well-behaved that (im am pretty sure) there is only one solution in 0=<K<inf, which is either at the boundary or where the gradient = 0 (the gradient will move away from 0 everywhere else).
============
To me, it seems like overkill to be using lsqnonlin to minimize a 1D function, and I would expect it to be slower as well. With a 1D function, there are only 2 choices for the optimal direction of descent (left or right) and you only need to compute the gradient to find it. FMINBND is smart enough to know this. Conversely, lsqnonlin has n-dimensional functions in mind, and so it tries to compute both gradients and Hessians (or other related quantities, depending on the algorithm used), in order to find the best path of descent. I guess there's no harm, though, in trying both ways and comparing.
Post by Sargondjani
But using fmincon by adding all the f1, f2, etc. is actually a good idea too. So there is no 'true vectorization' possible for fmincon, right??
===============
If "true vectorization" means that you can pass as input a batch of optimization problems and initial points, then no. But I would expect the summing method to be comparably fast.
(i dont see yet if adding f1, f2 et cetera is efficient, ie. if it fully utilizes the similarities in the gradients)
=================
Summing the functions will not let you exploit similarities between the different optimization problems, and I already commented earlier that this might be a disadvantage.
Sargondjani
2012-03-29 20:27:22 UTC
Permalink
Post by Matt J
To me, it seems like overkill to be using lsqnonlin to minimize a 1D function, and I would expect it to be slower as well. With a 1D function, there are only 2 choices for the optimal direction of descent (left or right) and you only need to compute the gradient to find it. FMINBND is smart enough to know this. Conversely, lsqnonlin has n-dimensional functions in mind, and so it tries to compute both gradients and Hessians (or other related quantities, depending on the algorithm used), in order to find the best path of descent. I guess there's no harm, though, in trying both ways and comparing.
Well, with fminbnd i would still have to use a for-loop over all datapoints, right?
If i put the gradient in lsqnonlin I can vectorize (for all datapoints), and if i set: 'JacobPattern', speye(number of rows in X) then Matlab will understand that all the variables are independent... This is very fast
Matt J
2012-03-29 20:44:12 UTC
Permalink
Post by Sargondjani
Well, with fminbnd i would still have to use a for-loop over all datapoints, right?
======
Yes, but we're not convinced yet that this is sub-optimal.
Sargondjani
2012-03-30 09:47:19 UTC
Permalink
Post by Matt J
Post by Sargondjani
Well, with fminbnd i would still have to use a for-loop over all datapoints, right?
======
Yes, but we're not convinced yet that this is sub-optimal.
I tried it, but fmincon using sum( f(1)....f(ix) ) with the analytical gradient and 'HessPattern' set to speye(#rows in K) is about 4 times faster (with 20 grid points). There are 5-10 functions evaluations in each period with fmincon, so apperently this is not too much...
anyway, im totally happy!! thanks alot matt...
Matt J
2012-03-30 17:56:12 UTC
Permalink
Post by Sargondjani
I tried it, but fmincon using sum( f(1)....f(ix) ) with the analytical gradient and 'HessPattern' set to speye(#rows in K) is about 4 times faster (with 20 grid points). There are 5-10 functions evaluations in each period with fmincon, so apperently this is not too much...
anyway, im totally happy!! thanks alot matt...
==============
OK, then. That's a revealing comparison, too.
Sargondjani
2012-03-31 09:29:12 UTC
Permalink
If you know an example where fminbnd would be faster, i would like to know :-)
Matt J
2012-03-31 18:32:12 UTC
Permalink
Post by Sargondjani
If you know an example where fminbnd would be faster, i would like to know :-)
===============
I don't have the Optimization Toolbox, so I wouldn't really be able to run a comparison. However, I would imagine that if fminbnd were C-coded like the rest of the Opt Toolbox functions, rather than M-coded, it ought to beat them. It only makes sense that a dedicated 1D solver would beat an n-D solver for 1D problems.
Matt J
2012-03-28 15:22:18 UTC
Permalink
Post by Sargondjani
for it=1:T;
for iz=1:2;
for ix=1:length(X);
optimize: [K(it,ix,iz),E(it,ix,iz)]=fmincon(-f_V,...)
end
end
end
===================
One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.
Sargondjani
2012-03-28 15:40:30 UTC
Permalink
Post by Matt J
One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.
you mean the starting values, right?? ...yes, that helps a lot! :-)
Matt J
2012-03-28 18:22:12 UTC
Permalink
Post by Sargondjani
Post by Matt J
One more general point. If the solutions will not change greatly with each increment of your triple loop, you should take advantage of that: Use the solution obtained for the previous it,iz,ix triplet to initialize FMINCON for the next triplet. This will cut down the number of iterations that FMINCON must perform.
you mean the starting values, right?? ...yes, that helps a lot! :-)
==============
If so, that might be a better option than to vectorize. However, you can vectorize by summing the objective functions that you're currently generating in each pass through the loop
Ftotal= f_V(E1,K1, it1,iz1,ix1) + f_V(E2,K2, it2,iz2,ix2) + f_V(E3,K3, it3,iz3,ix3) +...