shoco is a fast compressor for short strings in C and JavaScript.
It is very fast and easy to use. The default compression model is optimized for English words, but you can generate your own compression model based on your specific input data.
A quick comparison between shoco, gzip, and xz, from README:
compressor | compression time | decompression time | compressed size |
---|---|---|---|
shoco | 0.070s | 0.010s | 3,393,975 |
gzip | 0.470s | 0.048s | 1,476,083 |
xz | 3.300s | 0.148s | 1,229,980 |
It primarily is a C implementation, the APIs:
size_t shoco_compress(const char * in, size_t len, char * out, size_t bufsize); size_t shoco_decompress(const char * in, size_t len, char * out, size_t bufsize);
JavaScript implementation is also included, generated with emscripten, you can see it in action in its well-documented website:
compressed = shoco.compress(input_string); output_string = shoco.decompress(compressed);
It has a command-line tool under the same name shoco:
$ shoco -h compresses or decompresses your (presumably short) data. usage: shoco {c(ompress),d(ecompress)} <file-to-(de)compress> <output-filename> $ shoco compress file-to-compress.txt compressed-file.shoco $ shoco decompress compressed-file.shoco decompressed-file.txt
Also a test_input for performance testing:
$ time ./test_input < /usr/share/dict/words Number of compressed strings: 234937, average compression ratio: 34% real 0m0.116s user 0m0.114s sys 0m0.001s
shoco is written by Christian Schramm under the MIT License. There is also another library, SMAZ, small strings compression library, you might want to check that one out as well.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.