behavior of regexp ( ) function

Daniel J Sebald daniel.sebald at ieee.org
Thu Jan 1 00:34:17 CST 2009


Below are some results from regexp() that seem questionable given what the documentation says (or I'm misunderstanding).  Say I want to pull the substrings from a tab separated data file.  Let

octave:6> a = sprintf('20\t50\tcelcius\t80')
a = 20  50      celcius 80
octave:7> b = sprintf('20\t50\t\t80')
b = 20  50              80

be some sample lines that might come from a datafile.  String a has at least one character between tabs; b has a case where there are zero characters between tabs.  For regexp, the metacharacters [^\t] mean any ASCII value other than a tab.  The metacharacter + means match one or more times.  Here are the results for a and b processed with these metacharacters:

octave:8> regexp(a, '[^\t]+', 'match')
ans =

{
  [1,1] = 20
  [1,2] = 50
  [1,3] = celcius
  [1,4] = 80
}

Looks good.

octave:9> regexp(b, '[^\t]+', 'match')
ans =

{
  [1,1] = 20
  [1,2] = 50
  [1,3] = 80
}

I'll go along with that result too.  There are zero characters between the second and third tab and + requires at least one match.

Now, according to the documentation, * is similar to + in concept, but there must be a match of _zero_ or more.  Here's the results for a and b processed with those metacharacters:

octave:10> regexp(a, '[^\t]*', 'match')
ans =

{
  [1,1] = 20
}

Doesn't look correct.  I'm thinking this should be pretty much the same result as with metacharacter +, i.e.,

[1,1] = 20
[1,2] = 50
[1,3] = celcius
[1,4] = 80

because + was one or more matches, and "one or more" is a subset of "zero or more".  Next result:

octave:11> regexp(b, '[^\t]*', 'match')
ans =

{
  [1,1] = 20
}

Same as previous, but the way I see it, this case should result in

[1,1] = 20
[1,2] = 50
[1,3] = []
[1,4] = 80

where the third empty string comes from the fact there are zero characters between two tabs, i.e., "zero or more".

Am I correctly understanding what "zero or more" means?

Dan


More information about the Octave-maintainers mailing list