Sunday 26 September 2010

Using macros

In the latest version of pyxplot, the concept of macro has been introduced. It works as it does in gnuplot, except that one doesn't have to set it beforehand. (Quite frankly, I have never understood why in gnuplot macros are disabled by default.) So, in the simplest case, we could just use a macro to abbreviate a couple of things, as here

mac = "with lines lw 2 lt 1"
plot sin(x) @mac
In this example, the value of the string "mac" is substituted literally on the command line, so, in effect, our plot is equivalent to
plot sin(x) with lines lw 2 lt 1
At this point, I should perhaps mention that this is not the only way of abbreviating plots. One can also use the set style command as follows:
set style 2 points pointtype 3
plot sin(x) with style 2

Now, back to the macros! Do you remember the data processing script from last time? We wanted to calculate the average of a function of a particular column. That construct worked as long as we didn't want to manipulate data across columns. You might also recall that that was possible only by tabulating the function values (i.e., writing them to a file), and then re-reading them for data processing. With macros, we can rather easily achieve what we want. Consider the following subroutine
subroutine mean(filename, func)
{
N data = 0
sum x = 0

foreach datum x in filename using @func
{
N data = N data + 1
sum x = sum x + x
}
if(N data > 0) { ; return sum x / N data ; }
}
We have a filename, and a macro. Now, the macro will be substituted literally on the command line, so we can just pass an arbitrary expression to our routine. That is, we can say
print mean(’data.dat’, "sin($1)*exp($2)")
and this will return the average of the product of the sine of the first, and the exponential of the second column in our data file.

But this is not everything! We can use the macro for defining arrays which we can manipulate, as long as the return value is a number. So, let us see, how this could be done! Let us take the following three subroutines!
subroutine a_vec(a, b, c)
{
        tmpstr = "%s%d"%(a,b)
        let @tmpstr = c
}

subroutine c_vec(a, b)
{
        tmpstr = "%s%d"%(a,b)
        let @tmpstr =
}

subroutine vec(a, b)
{
        tmpstr = "%s%d"%(a, b)
        return @tmpstr
}
The first one, when called as
call a_vec("a", 12, 123)
call a_vec("a", 13, 11) 
will create two variables, a12, and a13 with the values 123, and 11. The subroutine c_vec(a, b) will delete the bth element of vector a, and the subroutine vec(a, b) simply returns the bth element of vector a. So, if we call

print vec("a", 12)*vec("a", 13) 
1353 will be returned. You can easily see that a subroutine can now be created to fill up an array from a file, and that e.g., the scalar product of two vectors can be calculated in a straightforward way. This method works as long as the return type of a calculation is known to pyxplot. That is, while we can calculate the vector product of two vectors, we cannot return the value, simply because pyxplot wouldn't know what to return.

Saturday 25 September 2010

Pyxplot 0.8.3 released

This is just a short announcement: pyxplot version 0.8.3 has just been released. You can download the source from sourceforge, or from the old web site. Beyond including a few bug fixes, new point types and the concept of macros have been introduced. I will put up a longer post sometime later today discussing these new features. Bug should be reported on the bug tracker. For discussions on usage and feature requests, you should visit the project forums.
Cheers,
Zoltán

Thursday 9 September 2010

Pyxplot moved to sourceforge

This is just a short announcement: a new website has been set up for pyxplot on sourceforge, under pyxplot.sourceforge.net. You should find the latest source code there, as well as a discussion forum set up for, well, discussing problems, questions, and for asking for help. You can also post feature requests, and file bug reports there.
Cheers,
Zoltán

Thursday 26 August 2010

A new beginning

As I announced on my older blog, gnuplot-trick.blogspot.com, I will run a parallel one, dealing with pyxplot-related matters. Pyxplot is quite similar to gnuplot in its command structure, though there are some subtle differences. However, it has some features that gnuplot hasn't, and which make life rather easy, when it comes to data manipulation. If you recall from that old blog of mine, data manipulation was not an easy task in gnuplot, and in some sense, it was almost deemed illegal. The argument of the developers was that gnuplot is a plotting utility, and not a
data processing unit, so why should it manipulate data at all? I have tried to argue on a number of occasions that plotting and data processing are part of the very same task, and they cannot be disentangled, but I had the feeling that my pleas fell on deaf ears. Anyway, it seems that some other people have had the same experience, and decided to do something about it. Pyxplot is a well-written plotting utility with a number of conveniences that you will certainly appreciate. On this blog, I will visit a number of methods on producing informative graphs with pyxplot. I hope you will enjoy it!
Cheers,
Zoltán

Data manipulation and statistics with pyxplot

One of the handiest features of pyxplot is the ease at which common tasks can be grouped into procedures, and then can be called as functions. For a start, we will define a function that calculates the average of a column in a file. The general form of a subroutine is
subroutine my_function(argument list)
{
      do whatever we need
      return something, if necessary
}
Now, we want to calculate the average, therefore, we have to access the data. That is really easily done, simply by calling
foreach datum foo in 'bar' using baz

which, instead of plotting, takes the numbers in column 'baz' of file 'bar' and puts them into the variable 'foo'. Once the variable is assigned, we can do whatever we want with it. Our average-calculating routine would then look like this
subroutine ave(filename, column)
{
        sum_x = 0
        N = 0
        foreach datum x in filename using $(column) 
        {
                N = N + 1
                sum_x = sum_x + x
        }
        if(N > 0) 
        {
                return sum_x / N
        } else 
        {
                return NaN
        }
}
Firs, we initialise two variables, sum_x, and N, then we read the values in 'filename', column 'column' into x. We increment N by one, just to know how many records there are in the file, and add x to sum_x. At the end, depending on the number of records, we either return the average, or a NaN, to indicate that nothing was in the file.

Having defined the function, we can call it from the command line as
x = average('data1.dat', 2)
print x
which will print out the average of the second column in 'data1.dat'. A function which calculates the standard deviation could be defined as follows.

subroutine std(filename, column)
{
        sum_x = 0
        sum_sq = 0
        N = 0
        foreach datum x in filename using $(column) 
        {
                N = N + 1
                sum_x = sum_x + x
                sum_sq = sum_sq + x * x
        }
        if(N > 0) 
        {
                sum_x = sum_x / N
                return sqrt(sum_sq / N - sum_x * sum_x)
        } else 
        {
                return NaN
        }
}
Now, this works all right for cases, where we want to calculate the statistics of some column in a data file. But what, if before doing that, we also want to modify the data. For instance, we might need the average of not the data per se, but the average of the sine of the data points. What should we do then?

Well, the trivial solution is to hard-wire the expression in our the average routine that we defined above. However, this method does not lend itself to very flexible use, does it? So, is there a way to supply the data file, the column that we want to process, and the expression that we want to use? Sure, there is, in fact, it is rather simple. All we have to do is to add a string to our function, and evaluate the string to yield a function that we can apply to the data points. Thus, our new version of the average subroutine would read as follows
subroutine ave_e(filename, column, expr)
{
        str_temp = "f(x) = %s"%(expr)
        exec str_temp
        sum_x = 0
        N = 0
 
        foreach datum x in filename using $(column)
        {
                N = N + 1
                sum_x = sum_x + f(x)
        }
        if(N > 0)
        {
                return sum_x / N
        } else
        {
                return NaN
        }
}
The first two arguments are as above, and the third one is supposed to be a string variable. In the subroutine, first we create a new string, which is nothing but the function definition. Note that the sprintf function as such does not exist in pyxplot, but we have
str_temp = "f(x) = %s"%(expr)
instead. Except for the slight difference in the form, it behaves in the same way as sprintf in gnuplot. Also note that string concatenation does not work as it does in gnuplot, namely, with the dots between the string variables. Instead, if one has to append a string to another one, one can do
a = "foo"
b = "bar"
c = "%s"%(a,b)
Now, back to our average subroutine! At this point, we have a string that contains the definition of our new function. In gnuplot, we would just use eval to turn this into an actual definition. In pyxplot, we execute exec instead. But the results is the same, our string has been promoted to the rank of a full-fledged function, which we can call in the common way. In the body of the foreach loop, we add f(x) instead of x, and we are done. So, if we wanted the calculate the average of the sin(x*tan(x)+cos(x)) of the 2nd data column in file 'foo', we would call our subroutine as
print ave_e('foo', 2, "sin(x*tan(x)+cos(x)")
And finally, what happens, when we have to calculate the average of some quantity that depends on two or more columns? If this is the case, we can print the manipulated data to a file, and apply the average function to the new data set. In gnuplot, we would simply plot the data after having set the tabular flag. In pyxplot, we use the tabulate command to indicate that we do not want to plot, but send the output to a data file. Thus, if we need the average of sin($1)*cos($2), we can do the following
set output 'bar.dat'
tabulate 'foo.dat' using (sin($1)*cos($2))
ave('foo.dat', 1)
I hope that today I could convince you that data can very easily be accessed and manipulated in pyxplot. Next time I will discuss "real" plotting.
So long,
Zoltán