Trinh, Nam H2020-10-262020-10-262020-10-25https://hdl.handle.net/11299/216732A text preprocessing algorithm that improved the compression ratio of a standard compression tool, bzip2, was developed. During the preprocessing, characters in the original text files were replaced by a few special characters such that the Burrows-Wheeler Transform part of bzip2 was enhanced upon. These special characters made the words less recognizable but they were still recoverable through the exploitation of the semantic relations between words in the text. The recovery process was carried out with the use of a static English dictionary and a pretrained static neural network, Word2vec. Experiments showed that this method increased in the compression ratio at an average rate of 2.9% for text files over 100KB.enword2vec, bzip2, text preprocessor, compressionImproving compression ratio on human-readable text by masking predictable charactersReport