Mike Linksvayer

Mike Linksvayer at

Now, the algorithm?
Well... it's, you know, a lot of special cases.  I can't publish the code right now because it infringes copyrights.  For example, there's a conditional that says "if (text == '<full text of this particular NYT Op-Ed>') { return 'a certain compressed string' } else if (text == 'full text of this other editorial') { return 'this other compressed string' } ..." and so on.

But I just realized: I can replace those full texts comparisons with hash comparisons and then the code will be releaseable!  So that's great.  Thanks for helping me think that through.

The other problem is that it's a *lot* of special cases.  But this is a general theoretical property of a certain kind of compression program: you can achieve spectacular compression ratios on input if you're willing to sacrifice on the side of the source code size of the compression program itself :-).

So I guess I'll release it as soon as GitHub installs a few more yottabytes of storage.

Let me know.

Karl Fogel at 2018-09-09T06:39:42Z

clacke@libranet.de ❌ likes this.

Well... it's a lot of special cases.  Right now it's unreleasable because it infringes copyrights.  For example, there's a conditional in the code that says "if (text == 'full text of this particular NYTimes editorial') { return 'this particular compressed string'; } else if (text == 'full text of this other editorial') { return 'this other compressed string'; } ..." etc, etc.

But I just realized that I could replace all those fulltext comparisons with hash comparisons!  Then there would be no need to have the fulltexts in the code, and I could release the program.  Thanks for helping me think that through.

The other problem is that it's a *lot* of special cases.  I mean, this is a general theoretical property of a certain kind of compression program.  You can achieve spectacular compression rations if you're willing to sacrifice on the side of the source code size of the compression program.

So I guess I'll release it when GitHub installs a few more yottabytes of storage?

Let me know.


Karl Fogel at 2018-09-09T06:46:52Z

Ah, a lookup table. But that's not a new compression algorithm. :)

Mike Linksvayer at 2018-09-09T16:52:05Z