大量データ処理、トークンカウント

引き続き効率測定。今度は文字の出現回数を調べる場合。
対象データは前回同様に300万行、600MB程度。
コードはループ内の該当箇所のみ記述。

 $count = $_ =~ s/ / /g;

 $count = 0;
 $pos = -1;
 while(($pos = index($_,' ',$pos + 1)) >= 0){
   $count++;
 }

 $count = $_ =~ tr/ / /;

 @list = split(/ /,$_);
 $count = scalar(@list) -1;

 $count = split(/ /,$_);
 $count--;

 $count = $_ =~ s/, /, /g;

 $count = 0;
 $pos = -2;
 while(($pos = index($_,', ',$pos + 2)) >= 0){
   $count++;
 }

単一文字ならtrが最速の模様。
2文字以上の連続するトークンには使用できないので、その場合は
正規表現でのカウントが無難かな。
またsplitが重いのは前回の通りだけど、配列に格納する処理が
更に重いので必要ない限りは、配列への格納は避けた方が良い。