Skip to content

Instantly share code, notes, and snippets.

@bynect
Created June 28, 2022 17:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bynect/b2c01fff53a25b3d2d7816c370e61be2 to your computer and use it in GitHub Desktop.
Save bynect/b2c01fff53a25b3d2d7816c370e61be2 to your computer and use it in GitHub Desktop.
Count (ascii) char occurrences in any utf8 file
#include <cctype>
#include <cstdint>
#include <fstream>
#include <iostream>
#include <locale>
#include <algorithm>
int main(int argc, char **argv)
{
std::locale::global(std::locale(""));
std::pair<char, uint64_t> stats[127] = {};
for (int i = 0; i < 127; i++) stats[i].first = char(i);
for (int i = 1; i < argc; i++)
{
std::wifstream wf(argv[i]);
for (wchar_t c; wf.get(c); )
{
// 0..126
if (c < 127) stats[c].second++;
//std::cout << c << std::endl;
}
}
std::sort(std::begin(stats), std::end(stats), [](auto a, auto b)
{
return a.second > b.second;
});
for (int i = 0; i < 127; i++)
{
char c = stats[i].first;
if (std::isprint(c)) std::cout << c;
else std::cout << int(c);
std::cout << ": " << stats[i].second << std::endl;
}
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment