You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

29 lines
830 B

5 years ago
5 years ago
5 years ago
5 years ago
  1. # fastcompare
  2. use `make build` to compile
  3. ## usage
  4. `fastcompare file1 file2`
  5. ## about
  6. `fastcompare` is designed to compare line-based content (order doesn't matter) of very large ASCII files.
  7. For the 1st file, it will generate `crc32` hashes for each line (so it is more memory efficient when you take the smaller file as the 1st file. But this has no affect on the speed).
  8. Now it will iterate over the 2nd file, build a temporary `crc32` hash and do a binary search in the hash array.
  9. ## caution
  10. still under construction
  11. todo:
  12. * use struct array to carry line index after sorting
  13. * use optional other hashing algorithms to lower the risc of collisions
  14. # restrictions
  15. * duplicates lines from file one, can be marked as "not included in" 2nd file (only when the 2nd one hasn't the equal number of this line).