Optimization of simulation program result w.r.t. program parameters

Rubén Roa-Ureta rroa at udec.cl
Fri Aug 8 17:37:01 CDT 2008


Rubén Roa-Ureta wrote:
> Hi:
> I need to optimize the final output (a real number) of a long program 
> w.r.t. three parameters that determine how the program works. I have 
> prepared the smallest toy example I could think of below. My purpose is 
> to compare the program's prediction of the final number of individuals 
> in a cohort (as a result of the parameters that determine egg 
> development and egg mortality in the early life) with the same number 
> obtained from another sampling-model process, to make these two numbers 
> as close as possible.
> I have found information to optimize a function (a mathematical 
> expression), for example with samin and bfgsmin, but I am not sure if 
> this is the right strategy to optimize the output of a long program. In 
> my program, there are a few places where random numbers are called and 
> used to create variety among the individuals.
> The toy example is the minimum implementation of an individual-based 
> model; in this case I have 1000 super-individuals and each one starts 
> containing the same starting number of individuals but they differ 
> because they have a different potential of temperature-dependent growth 
> and wind-dependent mortality.
> Shall I treat this problem as a function optimization problem, turning 
> my program into a function to be called by the optimizer?
> Toy example follows. Real program takes about 3 minutes to finish on a 
> Windows Vista Core Duo at 1.73 Ghz and 1 GB RAM.
> Thanks in advance
> Ruben
>
> %par.par, text file with five real numbers
> par = load('par.par');
> N0 = par(1); % start number of eggs, very large, ca. 10^12
> b = par(2); % development parameter 1
> c = par(3); % development parameter 2
> d = par(4); % egg mortality parameter
> e = par(5); % juvenile mortality parameter
> f = par(6); % day when egg turn to juvenile
> g = par(7); % final day of simulation
> Nt = par(8); % number of individuals in final day, known from another 
> model-sampling process
> %data, text file with two numeric columns and n rows, one column for 
> temperature, one for wind speed.
> data = load('data.dat');
> temp = data(:,1);
> wind = data(:,2);
> ind = 100000; number of super-individuals
> winter = zeros(ind,6); %matrix containing the information about 
> super-individuals, overwritten at each time step.
> %Column 1: Start number;
> %Column 2: Temp-dependent development;
> %Column 3: Surviving numbers;
> for i=1:ind
>   winter(i,1) = a/ind; % starting number of individuals. Does not change 
> with time
>   winter(i,2) = random growth potential, uniform number between 0.5 and 1.5;
>   winter(i,3) = random susceptibility to wind-related mortality, normal 
> between 0.75 and 1.25;
>   winter(i,4) = development(a,b)*winter(i,2); turns to 1 when 
> development of egg into juvenile is completed
>   winter(i,5) = day when egg turns into juvenile; when winter(i,4) turns 
> to 1
>   winter(i,6) = number of individuals, at first increase due to eggs 
> transforming into juveniles (but controlled by wind-dependent egg       
> mortality), then decrease due to egg and juvenile tomortality
> end
> for day=1:g
>   winter(i,4) = degree of development dependent on daily temperature an 
> individual growth potential
>   winter(i,5) = written with index 'day'  winter (i,4) turns to 1
>   winter(i,6) = gets updated with temperature-dependent transformation 
> of eggs into juveniles and wind-
>   dependent mortality of eggs, until day f, and later, with juvenile 
> mortality rate (independent of wind)
> end
> compute number of individuals in g, final day, Nt-program.
> compute squared difference between Nt-program and Nt observed from 
> another model-sampling process
> optimize squared difference w.r.t. parameters b,c,d
>   

Answering my own question, I created a grid with several reasonable 
values for the 3 parameters of interest, implemented as a three-level 
nested loop that contained my program, and I let it run for several 
hours. Also, since there are random numbers in my program, I iterated 
over another internal loop for 100 iterations. Later I plotted the 
frequency distribution of the final output (number of surviving 
individuals) checking for what values of the three parameters the median 
of the 100 iterations coincided with the value of surviving individuals 
known from the other sampling-modeling process. This allowed me to 
visually select the parameter values that minimized the difference 
between my program prediction of the surviving numbers and the other 
process estimation of the same number. A brute force approach, since the 
function optimization approach didn't seem adequate for a problem like this.
Ruben


More information about the Help-octave mailing list