求生之路32问题 Error too many indices for index buffer. .

AWK Language Programming - Built-in Functions
Go to the , , ,
section, .
Built-in functions are functions that are always available for
your awk program to call.
This chapter defines all the built-in
functions in awk; some of them are mentioned in other sections,
but they are summarized here for your convenience.
(You can also define
new functions yourself.
See section .)
: How to call built-in functions.
: Functions that work with numbers, including
int, sin and rand.
: Functions for string manipulation, such as
split, match, and
: Functions for files and shell commands.
: Functions for dealing with time stamps.
To call a built-in , write the name of the
by arguments in parentheses.
For example, `atan2(y + z, 1)'
is a call to the
atan2, with two arguments.
is ignored between the built-in
name and the
open-parenthesis, but we recommend that you avoid using
User-defined functions do not permit
in this way, and
you will find it easier to avoid mistakes by following a simple
convention which always works: no
Each built-in
accepts a certain
of arguments.
In some cases, arguments can be omitted. The defaults for omitted
arguments vary from
and are described under the
individual functions.
In some awk implementations, extra
arguments given to built-in functions are ignored.
However, in gawk,
it is a fatal error to give extra arguments to a built-in .
is called, expressions that create the 's actual
parameters are evaluated completely before the
call is performed.
For example, in the code fragment:
j = sqrt(i++)
the variable i is set to five before sqrt is called
with a value of four for its actual parameter.
The order of evaluation of the expressions used for the 's
parameters is undefined.
Thus, you should not write programs that
assume that parameters are evaluated from left to right or from
right to left.
For example,
j = atan2(i++, i *= 2)
If the order of evaluation is left to right, then i first becomes
six, and then 12, and atan2 is called with the two arguments six
But if the order of evaluation is right to left, i
first becomes 10, and then 11, and atan2 is called with the
two arguments 11 and 10.
Here is a full list of built-in functions that work with numbers.
Optional parameters are enclosed in square brackets ("[" and "]").
This produces the nearest
to x, located between x and zero,
truncated toward zero.
For example, int(3) is three, int(3.9) is three, int(-3.9)
is -3, and int(-3) is -3 as well.
This gives you the positive square root of x.
It reports an error
if x is negative.
Thus, sqrt(4) is two.
This gives you the exponential of x (e ^ x), or reports
an error if x is out of range.
The range of values x can have
depends on your machine's floating point representation.
This gives you the natural logarithm of x, if x
otherwise, it reports an error.
This gives you the sine of x, with x in radians.
This gives you the cosine of x, with x in radians.
atan2(y, x)
This gives you the arctangent of y / x in radians.
This gives you a random .
The values of rand are
uniformly-distributed between zero and one.
The value is never zero and never one.
Often you want random integers instead.
Here is a user-defined
you can use to obtain a random non-negative
less than n:
randint(n) {
return int(n * rand())
The multiplication produces a random real
greater than zero and less
We then make it an
(using int) between zero
and n - 1, inclusive.
Here is an example where a similar
is used to produce
random integers between one and n.
This program
prints a new random
for each input record.
to roll a simulated die.
roll(n) { return 1 + int(rand() * n) }
# Roll 3 six-sided dice and
# print total
of points.
printf("%d points\n",
roll(6)+roll(6)+roll(6))
Caution: In most awk implementations, including gawk,
rand starts generating numbers from the same
starting , or seed, each time you run awk.
a program will generate the same results each time you run it.
The numbers are random within one awk run, but predictable
from run to run.
This is convenient for debugging, but if you want
a program to do different things each time it is used, you must change
the seed to a value that will be different in each run.
To do this,
use srand.
srand([x])
srand sets the starting point, or seed,
for generating random numbers to the value x.
Each seed value leads to a particular sequence of random
Thus, if you set the seed to the same value a second time, you will get
the same sequence of random numbers again.
If you omit the argument x, as in srand(), then the current
date and time of day are used for a seed.
This is the way to get random
numbers that are truly unpredictable.
The return value of srand is the previous seed.
This makes it
easy to keep track of the seeds for use in consistently reproducing
sequences of random numbers.
The functions in this section look at or change the text of one or more
Optional parameters are enclosed in square brackets ("[" and "]").
index(in, find)
This searches the
in for the first occurrence of the
find, and returns the position in characters where that occurrence
begins in the
For example:
$ awk 'BEGIN { print index("peanut", "an") }'
If find is not found, index returns zero.
(Remember that
indices in awk start at one.)
length([])
This gives you the
of characters in .
is a , the length of the digit
representing
is returned.
For example, length("abcde") is five.
contrast, length(15 * 35) works out to three.
Well, 15 * 35 =
525, and 525 is then converted to the
"525", which has
three characters.
If no argument is supplied, length returns the length of $0.
In older versions of awk, you could call the length
without any parentheses.
Doing so is marked as "deprecated" in the
This means that while you can do this in your
programs, it is a feature that can eventually be removed from a future
version of the standard.
Therefore, for maximal portability of your
awk programs, you should always supply the parentheses.
searches the , , for the
longest, leftmost substring matched by the regular expression,
It returns the character position, or index, of
where that substring begins (one, if it starts at the beginning of
If no match is found, it returns zero.
sets the built-in variable RSTART to
the index.
It also sets the built-in variable RLENGTH to the
length in characters of the matched substring.
If no match is found,
RSTART is set to zero, and RLENGTH to -1.
For example:
if ($1 == "FIND")
regex = $2
where = match($0, regex)
if (where != 0)
print "Match of", regex, "found at", \
where, "in", $0
This program looks for lines that match the regular expression stored in
the variable regex.
This regular expression can be changed.
first word on a line is `FIND', regex is changed to be the
second word on that line.
Therefore, given:
My program runs
but not very quickly
FIND Melvin
This line is property of Reality Engineering Co.
Melvin was here.
awk prints:
Match of ru+n found at 12 in My program runs
Match of Melvin found at 1 in Melvin was here.
split(, array [, fieldsep])
This divides
into pieces separated by fieldsep,
and stores the pieces in array.
The first piece is stored in
array[1], the second piece in array[2], and so
value of the third argument, fieldsep, is
describing where to split
(much as FS can
describing where to split input records).
the fieldsep is omitted, the value of FS is used.
split returns the
of elements created.
splits strings into pieces in a
manner similar to the way input lines are split into fields.
For example:
split("cul-de-sac", a, "-")
splits the
`cul-de-sac' into three fields using `-' as the
separator.
It sets the contents of the array a as follows:
a[1] = "cul"
a[2] = "de"
a[3] = "sac"
The value returned by this call to split is three.
As with input -splitting, when the value of fieldsep is
" ", leading and trailing
is ignored, and the elements
are separated by runs of .
Also as with input -splitting, if fieldsep is the null , each
individual character in the
is split into its own array element.
(This is a gawk-specific extension.)
Recent implementations of awk, including gawk, allow
the third argument to be a
constant (/abc/), as well as a
standard allows this as well.
Before splitting the , split deletes any previously existing
elements in the array array (d.c.).
sprintf(, expression1,...)
This returns (without printing) the
that printf would
have printed out with the same arguments
(see section ).
For example:
sprintf("pi = %.2f (approx.)", 22/7)
returns the
"pi = 3.14 (approx.)".
sub(, replacement [, target])
alters the value of target.
It searches this value, which is treated as a , for the
leftmost longest substring matched by the regular expression, ,
extending this match as far as possible.
Then the entire
changed by replacing the matched text with replacement.
The modified
becomes the new value of target.
is peculiar because target is not simply
used to compute a value, and not just any expression will do: it
must be a variable,
or array element, so that sub can
store a modified value there.
If this argument is omitted, then the
default is to use and alter $0.
For example:
str = "water, water, everywhere"
sub(/at/, "ith", str)
sets str to "wither, water, everywhere", by replacing the
leftmost, longest occurrence of `at' with `ith'.
returns the
of substitutions made (either
one or zero).
If the special character `&' appears in replacement, it
stands for the precise substring that was matched by .
can match more than one , then this precise substring
may vary.)
For example:
awk '{ sub(/candidate/, "& and his wife"); print }'
changes the first occurrence of `candidate' to `candidate
and his wife' on each input line.
Here is another example:
awk 'BEGIN {
str = "daabaaa"
sub(/a*/, "c&c", str)
-| dcaacbaaa
This shows how `&' can represent a non-constant , and also
illustrates the "leftmost, longest"
(see section ).
The effect of this special character (`&') can be turned off by putting a
backslash before it in the .
As usual, to insert one backslash in
the , you must write two backslashes.
Therefore, write `\\&'
constant to include a literal `&' in the replacement.
For example, here is how to replace the first `|' on each line with
an `&':
awk '{ sub(/\|/, "\\&"); print }'
Note: As mentioned above, the third argument to sub must
be a variable,
or array reference.
Some versions of awk allow the third argument to
be an expression which is not an .
In such a case, sub
would still search for the
and return zero or one, but the result of
the substitution (if any) would be thrown away because there is no place
to put it.
Such versions of awk accept expressions like
sub(/USA/, "United States", "the USA and Canada")
This is considered erroneous in gawk.
gsub(, replacement [, target])
This is similar to the sub , except gsub replaces
all of the longest, leftmost, non-overlapping matching
substrings it can find.
The `g' in gsub stands for
"global," which means replace everywhere.
For example:
awk '{ gsub(/Britain/, "United Kingdom"); print }'
replaces all occurrences of the
`Britain' with `United
Kingdom' for all input records.
returns the
of substitutions made.
the variable to be searched and altered, target, is
omitted, then the entire input record, $0, is used.
As in sub, the characters `&' and `\' are special,
and the third argument must be an .
gensub(, replacement, how [, target])
gensub is a general substitution .
Like sub and
gsub, it searches the target
target for matches of
the regular expression .
Unlike sub and
gsub, the modified
is returned as the result of the
, and the original target
is not changed.
beginning with `g' or `G', then it
replaces all matches of
with replacement.
Otherwise, how is a
indicating which match of
to replace. If no target is supplied, $0 is used instead.
gensub provides an additional feature that is not available
in sub or gsub: the ability to specify components of
in the replacement text.
This is done by using parentheses
to mark the components, and then specifying `\n'
in the replacement text, where n is a digit from one to nine.
For example:
> BEGIN {
a = "abc def"
b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a)
-| def abc
As described above for sub, you must type two backslashes in order
to get one into the .
In the replacement text, the sequence `\0' represents the entire
matched text, as does the character `&'.
This example shows how you can use the third argument to control
which match of the
should be changed.
$ echo a b c a b c |
> gawk '{ print gensub(/a/, "AA", 2) }'
-| a b c AA b c
In this case, $0 is used as the default target .
gensub returns the new
as its result, which is
passed directly to print for printing.
If the how argument is a
that does not begin with `g' or
`G', or if it is a
that is less than zero, only one
substitution is performed.
gensub is a gawk it is not available
in compatibility mode (see section ).
substr(, start [, length])
This returns a length-character-long substring of ,
starting at character
The first character of a
is character
For example,
substr("washington", 5, 3) returns "ing".
If length is not present, this
returns the whole suffix of
that begins at character
For example,
substr("washington", 5) returns "ington".
suffix is also returned
if length is greater than the
of characters remaining
in the , counting from character
This returns a copy of , with each upper-case character
replaced with its corresponding lower-case character.
Non-alphabetic characters are left unchanged.
For example,
tolower("MiXeD cAsE 123") returns "mixed case 123".
This returns a copy of , with each lower-case character
replaced with its corresponding upper-case character.
Non-alphabetic characters are left unchanged.
For example,
toupper("MiXeD cAsE 123") returns "MIXED CASE 123".
More About `\' and `&' with sub, gsub and gensub
When using sub, gsub or gensub, and trying to get literal
backslashes and ampersands into the replacement text, you need to remember
that there are several levels of escape processing going on.
First, there is the lexical level, which is when awk reads
your program, and builds an internal copy of your program that can
be executed.
Then there is the run-time level, when awk actually scans the
replacement
to determine what to generate.
At both levels, awk looks for a defined set of characters that
can come after a backslash.
At the lexical level, it looks for the
escape sequences listed in section .
Thus, for every `\' that awk will process at the run-time
level, you type two `\'s at the lexical level.
When a character that is not valid for an escape sequence follows the
awk and gawk both simply remove the initial
`\', and put the following character into the . Thus, for
example, "a\qb" is treated as "aqb".
At the run-time level, the various functions handle sequences of
`\' and `&' differently.
The situation is (sadly) somewhat complex.
Historically, the sub and gsub functions treated the two
character sequence `\&' this sequence was replaced in
the generated text with a single `&'.
Any other `\' within
the replacement
that did not precede an `&' was passed
through unchanged.
To illustrate with a table:
This table shows both the lexical level processing, where
of backslashes becomes an even
at the run time level,
and the run-time processing done by sub.
(For the sake of simplicity, the rest of the tables below only show the
case of even numbers of `\'s entered at the lexical level.)
The problem with the historical approach is that there is no way to get
a literal `\' followed by the matched text.
standard attempted to fix this problem. The standard
says that sub and gsub look for either a `\' or an `&'
after the `\'. If either one follows a `\', that character is
output literally.
The interpretation of `\' and `&' then becomes
like this:
This would appear to solve the problem.
Unfortunately, the phrasing of the standard is unusual. It
says, in effect, that `\' turns off the special meaning of any
following character, but that for anything other than `\' and `&',
such special meaning is undefined.
This wording leads to two problems.
Backslashes must now be doubled in the replacement , breaking
historical awk programs.
To make sure that an awk program is portable, every character
in the replacement
must be preceded with a
backslash.
standard is under revision.
Because of the above problems, proposed text for the revised standard
reverts to rules that correspond more closely to the original existing
practice. The proposed rules have special cases that make it possible
to produce a `\' preceding the matched text.
In a nutshell, at the run-time level, there are now three special sequences
of characters, `\\\&', `\\&' and `\&', whereas historically,
there was only one.
However, as in the historical case, any `\' that
is not part of one of these three sequences is not special, and appears
in the output literally.
gawk 3.0 follows these proposed
rules for sub and
Whether these proposed rules will actually become codified into the
standard is unknown at this point. Subsequent gawk releases will
track the standard and implement whatever the fin
this book will be updated as well.
The rules for gensub are considerably simpler. At the run-time
level, whenever gawk sees a `\', if the following character
is a digit, then the text that matched the corresponding parenthesized
subexpression is placed in the generated output.
Otherwise,
no matter what the character after the `\' is, that character will
appear in the generated text, and the `\' will not.
Because of the complexity of the lexical and run-time level processing,
and the special cases for sub and gsub,
we recommend the use of gawk and gensub for when you have
to do substitutions.
The following functions are related to Input/Output (I/O).
Optional parameters are enclosed in square brackets ("[" and "]").
close(filename)
Close the file filename, for input or output.
The argument may
alternatively be a shell command that was used for redirecting to or
then the pipe is closed.
See section ,
for more information.
fflush([filename])
Flush any buffered output associated filename, which is either a
file opened for writing, or a shell command for redirecting output to
Many utility programs will buffer they save information
to be written to a disk file or terminal in memory, until there is enough
for it to be worthwhile to send the data to the ouput device.
This is often more efficient than writing
every little bit of information as soon as it is ready.
However, sometimes
it is necessary to force a program to flush that is,
write the information to its destination, even if a buffer is not full.
This is the purpose of the fflush ; gawk too
buffers its output, and the fflush
can be used to force
gawk to flush its buffers.
fflush is a recent (1994) addition to the Bell Labs research
version of awk; it is not part of the
standard, and will
not be available if `--posix' has been specified on the command
line (see section ).
gawk extends the fflush
in two ways.
This first
is to allow no argument at all. In this case, the buffer for the
standard output is flushed.
The second way is to allow the null
("") as the argument. In this case, the buffers for
all open output files and pipes are flushed.
fflush returns zero if the buffer was successfully flushed,
and nonzero otherwise.
system(command)
The system
allows the user to execute operating system commands
and then return to the awk program.
The system
executes the command given by the
It returns, as
its value, the status returned by the command that was executed.
For example, if the following fragment of code is put in your awk
system("date | mail -s 'awk run done' root")
the system administrator will be sent mail when the awk program
finishes processing input and begins its end-of-input processing.
Note that redirecting print or printf into a pipe is often
enough to accomplish your task.
However, if your awk
program is interactive, system is useful for cranking up large
self-contained programs, such as a shell or an editor.
Some operating systems cannot implement the system .
system causes a fatal error if it is not supported.
Controlling Output Buffering with system
The fflush
provides explicit control over output buffering for
individual files and pipes.
However, its use is not portable to many other
awk implementations.
An alternative method to flush output
buffers is by calling system with a null
as its argument:
system("")
# flush output
gawk treats this use of the system
as a special
case, and is smart enough not to run a shell (or other command
interpreter) with the empty command.
Therefore, with gawk, this
idiom is not only useful, it is efficient.
While this method should work
with other awk implementations, it will not necessarily avoid
starting an unnecessary shell.
(Other implementations may only
flush the buffer associated with the standard output, and not necessarily
all buffered output.)
If you think about what a programmer expects, it makes sense that
system should flush any pending output.
The following program:
print "first print"
system("echo system echo")
print "second print"
must print
first print
system echo
second print
system echo
first print
second print
If awk did not flush its buffers before calling system, the
latter (undesirable) output is what you would see.
A common use for awk programs is the processing of log files
containing time stamp information, indicating when a
particular log record was written.
Many programs log their time stamp
in the form returned by the time system call, which is the
of seconds since a particular epoch.
of seconds since Midnight, January 1, 1970, UTC.
In order to make it easier to process such log files, and to produce
useful reports, gawk provides two functions for working with time
Both of these are gawk they are not specified
standard, nor are they in any other known version
Optional parameters are enclosed in square brackets ("[" and "]").
returns the current time as the
of seconds since
the system epoch.
systems, this is the
of seconds
since Midnight, January 1, 1970, UTC.
It may be a different
other systems.
strftime([ [, timestamp]])
returns a .
It is similar to the
same name in
The time specified by timestamp is used to
produce a , based on the contents of the
The timestamp is in the same
as the value returned by the
If no timestamp argument is supplied,
gawk will use the current time of day as the time stamp.
argument is supplied, strftime uses
"%a %b %d %H:%M:%S %Z %Y".
output (almost) equivalent to that of the date utility.
(Versions of gawk prior to 3.0 require the
argument.)
The systime
allows you to compare a time stamp from a
log file with the current time of day.
In particular, it is easy to
determine how long ago a particular record was logged.
It also allows
you to produce log records using the "seconds since the epoch" .
The strftime
allows you to easily turn a time stamp
into human-readable information.
It is similar in nature to the sprintf
(see section ),
in that it copies non- specification characters verbatim to the
returned , while substituting date and time values for
specifications in the
strftime is guaranteed by the
standard to support
the following date
specifications:
The locale's abbreviated weekday name.
The locale's full weekday name.
The locale's abbreviated month name.
The locale's full month name.
The locale's "appropriate" date and time representation.
The day of the month as a decimal
The hour (24-hour clock) as a decimal
The hour (12-hour clock) as a decimal
The day of the year as a decimal
(001--366).
The month as a decimal
The minute as a decimal
The locale's equivalent of the AM/PM designations associated
with a 12-hour clock.
The second as a decimal
of the year (the first Sunday as the first day of week one)
as a decimal
The weekday as a decimal
Sunday is day zero.
of the year (the first Monday as the first day of week one)
as a decimal
The locale's "appropriate" date representation.
The locale's "appropriate" time representation.
The year without century as a decimal
The year with century as a decimal
(e.g., 1995).
The time zone name or abbreviation, or no characters if
no time zone is determinable.
A literal `%'.
If a conversion specifier is not one of the above, the behavior is
undefined.
Informally, a locale is the geographic place in which a program
is meant to run.
For example, a common way to abbreviate the date
September 4, 1991 in the United States would be "9/4/91".
In many countries in Europe, however, it would be abbreviated "4.9.91".
Thus, the `%x' specification in a "US" locale might produce
`9/4/91', while in a "EUROPE" locale, it might produce
standard defines a default ""
locale, which is an
that is typical of what most
programmers
are used to.
A public-domain
version of strftime is supplied with gawk
for systems that are not yet fully -compliant.
If that version is
used to compile gawk (see section ),
then the following additional
specifications are available:
Equivalent to specifying `%m/%d/%y'.
The day of the month, padded with a
if it is only one digit.
Equivalent to `%b', above.
A newline character (ASCII LF).
Equivalent to specifying `%I:%M:%S %p'.
Equivalent to specifying `%H:%M'.
Equivalent to specifying `%H:%M:%S'.
character.
The hour (24-hour clock) as a decimal
Single digit numbers are padded with a .
The hour (12-hour clock) as a decimal
Single digit numbers are padded with a .
The century, as a
between 00 and 99.
The weekday as a decimal
[1 (Monday)--7].
of the year (the first Monday as the first
day of week one) as a decimal
The method for determining the week
is as specified by ISO 8601
(to wit: if the week containing January 1 has four or more days in the
new year, then it is week one, otherwise it is week 53 of the previous year
and the next week is week one).
The year with century of the ISO week , as a decimal .
For example, January 1, 1993, is in week 53 of 1992. Thus, the year
of its ISO week
is 1992, even though its year is 1993.
Similarly, December 31, 1973, is in week 1 of 1974. Thus, the year
of its ISO week
is 1974, even though its year is 1973.
The year without century of the ISO week , as a decimal
%Ec %EC %Ex %Ey %EY %Od %Oe %OH %OI
%Om %OM %OS %Ou %OU %OV %Ow %OW %Oy
These are "alternate representations" for the specifications
that use only the second letter (`%c', `%', and so on).
They are recognized, but their normal representations are
(These facilitate compliance with the
date utility.)
The date in VMS
(e.g., 20-JUN-1991).
The timezone offset in a +HHMM
(e.g., the
necessary to
produce RFC-822/RFC-1036 date headers).
This example is an awk implementation of the
date utility.
Normally, the date utility prints the
current date and time of day in a well known .
However, if you
provide an argument to it that begins with a `+', date
will copy non- specifier characters to the standard output, and
will interpret the current time according to the
specifiers in
For example:
$ date '+Today is %A, %B %d, %Y.'
-| Today is Thursday, July 11, 1991.
Here is the gawk version of the date utility.
It has a shell "wrapper", to handle the `-u' option,
which requires that date run as if the time zone
was set to UTC.
#! /bin/sh
# date -- approximate the P1003.2 'date' command
case $1 in
gawk 'BEGIN
= "%a %b %d %H:%M:%S %Z %Y"
exitval = 0
if (ARGC > 2)
exitval = 1
else if (ARGC == 2) {
if ( ~ /^\+/)
= substr(, 2)
# remove leading +
print strftime()
exit exitval
Go to the , , ,
section, .

我要回帖

更多关于 求生之路 的文章

 

随机推荐