Writing 'help' functions as m-files

Søren Hauberg soren at hauberg.org
Tue Feb 10 02:58:57 CST 2009


man, 09 02 2009 kl. 22:58 -0500, skrev John W. Eaton:
> OK, I looked at the current gen_doc_cache and I have a few comments

Thanks for taking the time to look into this. I apologize for not being
more active here.

>   * I noticed that
> 
>       gen_doc_cache (".")
> 
>     doesn't work because
> 
>       idx = find (p == pathsep ());
> 
>     returns an empty matrix if P has only one element.

See below.

>   * Since all the data are character strings, using a binary format
>     doesn't save much space.  Compression would help, so maybe using
>     -text -z as options for save would be better than just using
>     binary.

I've changed this. I didn't do this at first, since I wasn't sure what
would happen if Octave was built without support for compression. Will
the call to 'save' fail, or will it simply skip the compression?

>   * I think I'd prefer a simple name like DOC.gz in each directory
>     instead of help_cache.mat.  Also, generally I think we try to
>     prefer using - instead of _ for file names in Octave, unless they
>     are .m files, which have to have names that are also valid
>     symbol names in the scripting language.

OK.

>   * Running the funtion takes some time, so I think it would be best
>     to run it at build time.  It looks like most of the files will not
>     need to change when Octave is built, so it probably also makes
>     sense to distribute these files with the tar.gz files.

I agree. The only problem I see, is what if we distribute compressed
caches, and the user doesn't link to to 'zlib' (or whatever we use).

>   * What is the slow part?  Running makeinfo?  If so, then I don't see
>     much we can do about that.  Or is it extracting the first sentence
>     of the doc string?

I haven't done any profiling, but I'm guessing it's the many calls to
'makeinfo'. The algorithm for generating caches are in many ways similar
to the 'lookfor' code. When I moved 'lookfor' from C++ to an m-file, I
didn't really see any significant speed changes, so from that I
concluded (a bit hasty) that the slow part was 'makeinfo'.

> I propose making gen_doc_cache take two arguments.  The first argument
> names the output file.  The second names the directory to work on.  If
> only one argument is given, generate the DOC.gz file for for keywords,
> operators, etc.

The attached changeset does this. The change also fixes the bug you
mentioned when doing 'gen_doc_cache (".")'.

[snip]
> Does this sound OK?  If you agree, I can do most of this work.

I think your suggestion makes sense.

One thing to consider is where to put the cache for builtin stuff
(operators, keywords, ...). With the attached patch this is stored in
the file

  fullfile (octave_config_info.datadir, "DOC-builtin.gz");

(see 'lookfor.m' at line 58). Is that the right position/name?

Søren
-------------- next part --------------
# HG changeset patch
# User Soren Hauberg <hauberg at gmail.com>
# Date 1234255735 -3600
# Node ID 9fe0cfe7e1f023638a9ccb1ef8e65cd62b58f06b
# Parent  822cab55ca85fc75e7bf8c83879692154161b75b
Simplify documentation cache generation to only handle one directory per call to 'gen_doc_cache'

diff -r 822cab55ca85 -r 9fe0cfe7e1f0 scripts/ChangeLog
--- a/scripts/ChangeLog	Mon Feb 09 16:53:12 2009 -0500
+++ b/scripts/ChangeLog	Tue Feb 10 09:48:55 2009 +0100
@@ -1,3 +1,10 @@
+2009-02-10  Soren Hauberg  <hauberg at gmail.com>
+
+	* help/gen_doc_cache.m: Change API so we only handle one directory per
+	call to this function.
+
+	* help/lookfor.m: Change cache name from 'help_cache.mat' to 'DOC.gz'
+
 2009-02-09  John W. Eaton  <jwe at octave.org>
 
 	* miscellaneous/Makefile.in (SOURCES): Include __xzip__.m in the list.
diff -r 822cab55ca85 -r 9fe0cfe7e1f0 scripts/help/gen_doc_cache.m
--- a/scripts/help/gen_doc_cache.m	Mon Feb 09 16:53:12 2009 -0500
+++ b/scripts/help/gen_doc_cache.m	Tue Feb 10 09:48:55 2009 +0100
@@ -15,37 +15,36 @@
 ## <http://www.gnu.org/licenses/>.
 
 ## -*- texinfo -*-
-## @deftypefn {Function File} gen_doc_cache ()
-## @deftypefnx{Function File} gen_doc_cache (@var{directory})
+## @deftypefn {Function File} gen_doc_cache (@var{out_file}, @var{directory})
 ## Generate documentation caches for all functions in a given directory.
 ##
 ## A documentation cache is generated for all functions in @var{directory}. The
-## resulting cache is saved in the file @code{help_cache.mat} in @var{directory}.
+## resulting cache is saved in the file @var{out_file}.
 ## The cache is used to speed up @code{lookfor}.
-## If no directory is given, all directories in the current path is traversed.
+##
+## If no directory is given (or it is the empty matrix), a cache for builtin
+## operators, etc. is generated.
 ##
 ## @seealso{lookfor, path}
 ## @end deftypefn
 
-function gen_doc_cache (p = path ())
-  if (!ischar (p))
+function gen_doc_cache (out_file = "DOC.gz", directory = [])
+  ## Check input
+  if (!ischar (out_file))
     print_usage ();
   endif
   
-  ## Generate caches for all directories in path
-  idx = find (p == pathsep ());
-  prev_idx = 1;
-  for n = 1:length (idx)
-    f = p (prev_idx:idx (n)-1);
-    gen_doc_cache_in_dir (f);
-    prev_idx = idx (n) + 1;
-  endfor
-    
-  ## Generate cache for keywords, operators, and builtins if we're handling the
-  ## entire path
-  if (nargin == 0)
-    gen_builtin_cache ();
+  ## Generate cache
+  if (isempty (directory))
+    cache = gen_builtin_cache ();
+  elseif (ischar (directory))
+    cache = gen_doc_cache_in_dir (directory);
+  else
+    error ("gen_doc_cache: second input argument must be a string");
   endif
+  
+  ## Save cache
+  save ("-text", "-z", out_file, "cache");
 endfunction
 
 function [text, first_sentence, status] = handle_function (f, text, format)
@@ -102,7 +101,7 @@
   endfor
 endfunction
 
-function gen_doc_cache_in_dir (directory)
+function cache = gen_doc_cache_in_dir (directory)
   ## If 'directory' is not in the current path, add it so we search it
   dir_in_path = false;
   p = path ();
@@ -125,27 +124,17 @@
   list = __list_functions__ (directory);
   cache = create_cache (list);
   
-  ## Write the cache
-  fn = fullfile (directory, "help_cache.mat");
-  save ("-binary", fn, "cache"); # FIXME: Should we zip it ?
-  
   if (!dir_in_path)
     rmpath (directory);
   endif
 endfunction
 
-function gen_builtin_cache ()
+function cache = gen_builtin_cache ()
   operators = __operators__ ();
   keywords = __keywords__ ();
   builtins = __builtins__ ();
   list = {operators{:}, keywords{:}, builtins{:}};
 
   cache = create_cache (list);
-  
-  ## Write the cache
-  ## FIXME: Where should we store this cache?
-  ## FIXME: if we change it -- update 'lookfor'
-  fn = fullfile (octave_config_info.datadir, "builtin_cache.mat"); 
-  save ("-binary", fn, "cache"); # FIXME: Should we zip it ?
 endfunction
 
diff -r 822cab55ca85 -r 9fe0cfe7e1f0 scripts/help/lookfor.m
--- a/scripts/help/lookfor.m	Mon Feb 09 16:53:12 2009 -0500
+++ b/scripts/help/lookfor.m	Tue Feb 10 09:48:55 2009 +0100
@@ -55,7 +55,7 @@
   str = lower (str);
 
   ## Search operators, keywords, and built-ins
-  cache_file = fullfile (octave_config_info.datadir, "builtin_cache.mat");
+  cache_file = fullfile (octave_config_info.datadir, "DOC-builtin.gz");
   if (exist (cache_file, "file"))
     [fun, help_text] = search_cache (str, cache_file, search_type);
   else
@@ -68,7 +68,7 @@
   prev_idx = 1;
   for n = 1:length (idx)
     f = p (prev_idx:idx (n)-1);
-    cache_file = fullfile (f, "help_cache.mat");
+    cache_file = fullfile (f, "DOC.gz");
     if (exist (cache_file, "file"))
       ## We have a cache. Read it and search it!
       [funs, hts] = search_cache (str, cache_file, search_type);


More information about the Octave-maintainers mailing list