Re: counting words

From: Markus Mottl (mottl@miss.wu-wien.ac.at)
Date: Wed Jan 20 1999 - 03:01:30 MET


From: Markus Mottl <mottl@miss.wu-wien.ac.at>
Message-Id: <199901200201.DAA17202@miss.wu-wien.ac.at>
Subject: Re: counting words
To: tm1@cise.npl.co.uk (Toby Moth)
Date: Wed, 20 Jan 1999 03:01:30 +0100 (MET)
In-Reply-To: <199901191647.QAA06463@squall.cise.npl.co.uk> from "Toby Moth" at Jan 19, 99 04:47:19 pm

Hello,

> Here are some results just to give a flavour of relative speeds on an Ultra 5.
>
> [tm1]$ ls -l tm.dv
> -rw-rw-r-- 1 tm1 this 452849 Jan 13 13:06 tm.dv
> [tm1]$ time cat tm.dv | perl ~/lang/perl/wc.pl > pcount
>
> real 0m28.604s PERL
> user 0m28.260s
> sys 0m0.100s
> [tm1]$ time cat tm.dv | ~/lang/c/wc > ccount
>
> real 0m2.708s C++
> user 0m2.260s
> sys 0m0.180s
> [tm1]$ time cat tm.dv | ~/lang/caml/ywc > mcount
>
> real 0m0.999s OCAML
> user 0m0.720s
> sys 0m0.080s

Since comparing the performance of compilers is one of my hobbies ;-),
I have tried to follow this "contest" and have conducted a more thorough
test:

I implemented the solution to the problem with "flex" in C++, both with
maps and hash tables; then I wrote the PERL-version with hashes, too
(*very* short!) and then I took a look at the performance of the native
code compiler and interpreter of OCAML. The result of all runs had an
equal amount of characters (though, not in the same order, because hashes
don't sort). Thus, the implementation seems to be correct.

The test was conducted under Linux (PII 400Mhz, 128MB RAM), the
C++-compiler is "egcs-2.91.57" (optimizations: -O2 -fomit-frame-pointer).
The test case was the concatenated text of all the Linux HOWTO's on my
machine (about 7.3MB).

One of the problems with the C++-version (besides an internal compiler
error when trying to code the solution ;-) is that the number of buckets
in the hash table cannot be fixed at a low level. If the table is
created without parameters, it starts out with 193 buckets and increases
this number to 196613! To make up for this, I conducted the test under
OCAML with 101 buckets (as used by Albert) and the maximum found in C++
(196613). In another test I let C++ start out with this high level -
I think that we have a really fair comparison now.

Another problem with C++ was that finding the right use of templates was
everything but easy - the compiler not only sometimes emitted screen-fulls
of expanded templates as error message, but I'd say that the C++-code
looks really ill now! (I hope it's not my fault...)
By the way: I didn't use the STL-"string"-class for efficiency reasons...

Anyway, the speed result is very interesting indeed (in order from best to
worst, three successive tests each):

ocamlopt & ocamllex (196613 buckets):

4.270u 0.150s 0:04.42 100.0% 0+0k 0+0io 542pf+0w
4.250u 0.180s 0:04.66 95.0% 0+0k 0+0io 542pf+0w
4.240u 0.180s 0:04.42 100.0% 0+0k 0+0io 542pf+0w

ocaml (interpreted) & ocamllex (196613 buckets):

9.080u 0.180s 0:09.25 100.1% 0+0k 0+0io 539pf+0w
9.120u 0.150s 0:09.26 100.1% 0+0k 0+0io 539pf+0w
8.990u 0.270s 0:09.45 97.9% 0+0k 0+0io 539pf+0w

ocamlopt & ocamllex (101 buckets):

11.160u 0.270s 0:11.43 100.0% 0+0k 0+0io 542pf+0w
11.200u 0.240s 0:11.44 100.0% 0+0k 0+0io 542pf+0w
11.250u 0.200s 0:11.44 100.0% 0+0k 0+0io 542pf+0w

PERL:

11.340u 0.240s 0:11.58 100.0% 0+0k 0+0io 643pf+0w
11.380u 0.200s 0:11.58 100.0% 0+0k 0+0io 643pf+0w
11.350u 0.230s 0:11.58 100.0% 0+0k 0+0io 643pf+0w

C++ & flex (hash_map, 196613 buckets):

11.560u 0.340s 0:11.90 100.0% 0+0k 0+0io 552pf+0w
11.620u 0.280s 0:11.90 100.0% 0+0k 0+0io 552pf+0w
11.570u 0.320s 0:11.89 100.0% 0+0k 0+0io 552pf+0w

C++ & flex (hash_map, 193++ buckets):

11.780u 0.260s 0:12.04 100.0% 0+0k 0+0io 552pf+0w
11.720u 0.310s 0:12.03 100.0% 0+0k 0+0io 552pf+0w
11.870u 0.160s 0:12.03 100.0% 0+0k 0+0io 552pf+0w

C++ & flex (map):

14.530u 0.290s 0:14.82 100.0% 0+0k 0+0io 551pf+0w
14.540u 0.250s 0:14.80 99.9% 0+0k 0+0io 551pf+0w
14.630u 0.180s 0:14.81 100.0% 0+0k 0+0io 551pf+0w

ocaml (interpreted) & ocamllex (101 buckets):
20.450u 0.130s 0:20.58 100.0% 0+0k 0+0io 541pf+0w
20.300u 0.270s 0:20.57 100.0% 0+0k 0+0io 539pf+0w
20.450u 0.150s 0:20.59 100.0% 0+0k 0+0io 539pf+0w

The funny thing is that the OCAML-interpreter beats off all other
competitors (not the OCAML native code compiler, of course)!
Please tell this your local C++-guru :-)
Even PERL beats C++ without problem.

OCAML translated to native code is *way* faster than C++ (far more
than twice!)

The slowest competitor was the OCAML-interpreter with the unfair bucket
count of 101 - I just added it to show a result for a not unreasonable
number of buckets (196613 is probably a bit exaggerated...)

Now for the source code (shortest first):

PERL (yes, it *can* be this short!):
---------------------------------------------------------------------------
while(<>) { for (/[^ \t\n\r]+/g) { ++$table{$_}; } }
while (($key, $value) = each %table) { printf "%4d\t\%s\n", $value, $key; }
---------------------------------------------------------------------------

ocamllex-file (note: 196613 buckets):
---------------------------------------------------------------------------
{
let hash = Hashtbl.create 101

let hash_add string =
  try
    let v = Hashtbl.find hash string in v:= !v + 1
  with
    Not_found -> Hashtbl.add hash string (ref 1)
}

rule read = parse
  [' ' '\n' '\r' '\t']+ { read lexbuf }
| [^' ' '\n' '\r' '\t']+ { hash_add ( Lexing.lexeme lexbuf ); read lexbuf }
| eof { Hashtbl.iter ( fun k v -> Printf.printf "%4d\t%s\n" !v k ) hash }

{ let _ = read_rec ( Lexing.from_channel stdin ) 0 }
---------------------------------------------------------------------------

flex and C++ with hash (note: 196613 buckets):
Don't forget to use option "-+" to get C++-code!
---------------------------------------------------------------------------
%option noyywrap

  #include <stdio.h>
  #include <string.h>
  #include <hash_map>

  struct eqstr
  {
    bool operator()(const char* s1, const char* s2) const
    {
      return strcmp(s1, s2) == 0;
    }
  };

  typedef hash_map<const char*, int, hash<const char *>, eqstr > mymap;
  mymap vec(196613);

%%

[^ \t\n\r]+ { char *foo = new char[yyleng + 1]; strcpy(foo, yytext);
              ++vec[foo]; }
[ \t\n\r]+ ;

%%

int main()
{
  yyFlexLexer MyScanner;
  MyScanner.yylex();
  for (mymap::const_iterator p = vec.begin(); p != vec.end(); ++p)
    printf ("%4d\t%s\n", p->second, p->first);
}
---------------------------------------------------------------------------

flex and C++ with map:
Don't forget to use option "-+" to get C++-code!
---------------------------------------------------------------------------
%option noyywrap

  #include <stdio.h>
  #include <string>
  #include <map>

  struct less_str
  {
    bool operator()(const char* s1, const char* s2) const
    {
      return strcmp(s1, s2) < 0;
    }
  };

  typedef map<const char*, int, less_str> mymap;
  mymap vec;

%%

[^ \t\n\r]+ { char *foo = new char[yyleng + 1]; strcpy(foo, yytext);
              ++vec[foo]; }
[ \t\n\r]+ ;

%%

int main()
{
  yyFlexLexer MyScanner;
  MyScanner.yylex();
  for (mymap::const_iterator p = vec.begin(); p != vec.end(); ++p)
    printf ("%4d\t%s\n", p->second, p->first);
}
---------------------------------------------------------------------------

So much about C++ and efficiency ;-)

Interesting to see that PERL did perform very well on this test. Its
brevity is unmatched! One shouldn't forget that its hash functions
are highly optimized, its regular expression matching capabilities,
as well. Though, it is definitely not a language for implementing data
structures, as we have seen in Albert's example.

I think that this test is really significant. I cannot imagine how to
make C++ so much faster that it approaches OCAML - without low-level
coding hash tables and pattern matching.

This test showed me, what a nice life I have since I use OCAML instead of
C++ (not only in terms of efficiency...)

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:18 MET