Discussion:
[Algorithms] sparse bitset compression
Richard Fabian
2014-01-14 12:37:46 UTC
Permalink
The web is full of different solutions for compressing things, but some of
you have probably already done this and found a really good ratio
compression scheme that fits what I'm asking about.

I've got a load of entites, (4-7k) that live on a grid (1024x1024), and I
need to persist them to a savegame which gets sent across the network to
keep the save safe. I need to compress it. I'm compressing the entities in
the world already, ones that have been affected cost a few bytes, but the
but for the ones that don't have any changes, all I'm left with is the
coords. There are only a few that need the fatter compression, but at the
numbers I'm looking at, those "unchanged entity" coords add up to a lot of
data when storing as 20 bit structs. This is too much for sending over the
network regularly as part of a save.

I've had some ideas on how to compress the data, but they might be crap. I
don't know.

I can't easily regenerate the set so thought someone who wanted to compress
consumables might know the name of a good solution for sparse bitset
compression. The bits are clumped around different areas of the grid, so I
feel that something that leverages spatial coherence might do well.

Any leads would be highly appreciated.
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever that
it is not utterly absurd." - Bertrand Russell
Oscar Forth
2014-01-14 13:11:05 UTC
Permalink
This post might be inappropriate. Click to display it.
Richard Fabian
2014-01-14 16:52:06 UTC
Permalink
That sounds good. Something like the JPEG VLC for the actual distance
between entries and just store deltas on a space filling curve should be
good then.
Post by Oscar Forth
Just a random though, so apologies if i'm talking about my arse, but ...
couldn't you just use something like the bit plane compression from Wavelet
Difference Reduction (WDR)?
A lot depends on your data, I appreciate, so this may not work out as any
significant saving ...
http://trueharmoniccolours.co.uk/Blog/?p=55
Obviously thats in the context of a full wavelet image compressor so you
will have to just grab the part I explain in the link above.
Also apologies for the code formatting on my blog ... ill fix it one day ;)
Oscar
Post by Richard Fabian
The web is full of different solutions for compressing things, but some
of you have probably already done this and found a really good ratio
compression scheme that fits what I'm asking about.
I've got a load of entites, (4-7k) that live on a grid (1024x1024), and I
need to persist them to a savegame which gets sent across the network to
keep the save safe. I need to compress it. I'm compressing the entities in
the world already, ones that have been affected cost a few bytes, but the
but for the ones that don't have any changes, all I'm left with is the
coords. There are only a few that need the fatter compression, but at the
numbers I'm looking at, those "unchanged entity" coords add up to a lot of
data when storing as 20 bit structs. This is too much for sending over the
network regularly as part of a save.
I've had some ideas on how to compress the data, but they might be crap.
I don't know.
I can't easily regenerate the set so thought someone who wanted to
compress consumables might know the name of a good solution for sparse
bitset compression. The bits are clumped around different areas of the
grid, so I feel that something that leverages spatial coherence might do
well.
Any leads would be highly appreciated.
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever
that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever that
it is not utterly absurd." - Bertrand Russell
Oscar Forth
2014-01-14 17:01:07 UTC
Permalink
Pretty much.

The space filling curve might not be necessary but you may be able to use
it to keep smaller gaps between the entries which are local to each other
spatially.
Post by Richard Fabian
That sounds good. Something like the JPEG VLC for the actual distance
between entries and just store deltas on a space filling curve should be
good then.
Post by Oscar Forth
Just a random though, so apologies if i'm talking about my arse, but ...
couldn't you just use something like the bit plane compression from Wavelet
Difference Reduction (WDR)?
A lot depends on your data, I appreciate, so this may not work out as any
significant saving ...
http://trueharmoniccolours.co.uk/Blog/?p=55
Obviously thats in the context of a full wavelet image compressor so you
will have to just grab the part I explain in the link above.
Also apologies for the code formatting on my blog ... ill fix it one day ;)
Oscar
Post by Richard Fabian
The web is full of different solutions for compressing things, but some
of you have probably already done this and found a really good ratio
compression scheme that fits what I'm asking about.
I've got a load of entites, (4-7k) that live on a grid (1024x1024), and
I need to persist them to a savegame which gets sent across the network to
keep the save safe. I need to compress it. I'm compressing the entities in
the world already, ones that have been affected cost a few bytes, but the
but for the ones that don't have any changes, all I'm left with is the
coords. There are only a few that need the fatter compression, but at the
numbers I'm looking at, those "unchanged entity" coords add up to a lot of
data when storing as 20 bit structs. This is too much for sending over the
network regularly as part of a save.
I've had some ideas on how to compress the data, but they might be crap.
I don't know.
I can't easily regenerate the set so thought someone who wanted to
compress consumables might know the name of a good solution for sparse
bitset compression. The bits are clumped around different areas of the
grid, so I feel that something that leverages spatial coherence might do
well.
Any leads would be highly appreciated.
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever
that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever
that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
Chris Birke
2014-01-14 19:11:31 UTC
Permalink
You might also look into morton order encoding, as it is a simple and efficient locality preserving way to do lossless n dimensional tree structures - this is of course best for applications like map data. Your mileage may vary.

-Chria

ጀπ᜞ ΌηχαΜῆς iPhone 5
Post by Oscar Forth
Pretty much.
The space filling curve might not be necessary but you may be able to use it to keep smaller gaps between the entries which are local to each other spatially.
That sounds good. Something like the JPEG VLC for the actual distance between entries and just store deltas on a space filling curve should be good then.
Just a random though, so apologies if i'm talking about my arse, but ... couldn't you just use something like the bit plane compression from Wavelet Difference Reduction (WDR)?
A lot depends on your data, I appreciate, so this may not work out as any significant saving ...
http://trueharmoniccolours.co.uk/Blog/?p=55
Obviously thats in the context of a full wavelet image compressor so you will have to just grab the part I explain in the link above.
Also apologies for the code formatting on my blog ... ill fix it one day ;)
Oscar
The web is full of different solutions for compressing things, but some of you have probably already done this and found a really good ratio compression scheme that fits what I'm asking about.
I've got a load of entites, (4-7k) that live on a grid (1024x1024), and I need to persist them to a savegame which gets sent across the network to keep the save safe. I need to compress it. I'm compressing the entities in the world already, ones that have been affected cost a few bytes, but the but for the ones that don't have any changes, all I'm left with is the coords. There are only a few that need the fatter compression, but at the numbers I'm looking at, those "unchanged entity" coords add up to a lot of data when storing as 20 bit structs. This is too much for sending over the network regularly as part of a save.
I've had some ideas on how to compress the data, but they might be crap. I don't know.
I can't easily regenerate the set so thought someone who wanted to compress consumables might know the name of a good solution for sparse bitset compression. The bits are clumped around different areas of the grid, so I feel that something that leverages spatial coherence might do well.
Any leads would be highly appreciated.
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
Richard Fabian
2014-01-14 19:40:34 UTC
Permalink
I was going to plug in the morten coded version. Code from ryg's blog.

http://fgiesen.wordpress.com/2009/12/13/decoding-morton-codes/
Post by Chris Birke
You might also look into morton order encoding, as it is a simple and
efficient locality preserving way to do lossless n dimensional tree
structures - this is of course best for applications like map data. Your
mileage may vary.
-Chria
ጀπ᜞ ΌηχαΜῆς iPhone 5
Pretty much.
The space filling curve might not be necessary but you may be able to use
it to keep smaller gaps between the entries which are local to each other
spatially.
Post by Richard Fabian
That sounds good. Something like the JPEG VLC for the actual distance
between entries and just store deltas on a space filling curve should be
good then.
Post by Oscar Forth
Just a random though, so apologies if i'm talking about my arse, but ...
couldn't you just use something like the bit plane compression from Wavelet
Difference Reduction (WDR)?
A lot depends on your data, I appreciate, so this may not work out as
any significant saving ...
http://trueharmoniccolours.co.uk/Blog/?p=55
Obviously thats in the context of a full wavelet image compressor so you
will have to just grab the part I explain in the link above.
Also apologies for the code formatting on my blog ... ill fix it one day ;)
Oscar
Post by Richard Fabian
The web is full of different solutions for compressing things, but some
of you have probably already done this and found a really good ratio
compression scheme that fits what I'm asking about.
I've got a load of entites, (4-7k) that live on a grid (1024x1024), and
I need to persist them to a savegame which gets sent across the network to
keep the save safe. I need to compress it. I'm compressing the entities in
the world already, ones that have been affected cost a few bytes, but the
but for the ones that don't have any changes, all I'm left with is the
coords. There are only a few that need the fatter compression, but at the
numbers I'm looking at, those "unchanged entity" coords add up to a lot of
data when storing as 20 bit structs. This is too much for sending over the
network regularly as part of a save.
I've had some ideas on how to compress the data, but they might be
crap. I don't know.
I can't easily regenerate the set so thought someone who wanted to
compress consumables might know the name of a good solution for sparse
bitset compression. The bits are clumped around different areas of the
grid, so I feel that something that leverages spatial coherence might do
well.
Any leads would be highly appreciated.
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever
that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever
that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever that
it is not utterly absurd." - Bertrand Russell
Jon Watte
2014-01-15 21:21:22 UTC
Permalink
I would question this assumption:

This is too much for sending over the network regularly as part of a save.


How often do you send the savegame? 7k entities, times 20 bits, equals less
than 18 kB of data.
By comparison, the amount of cookie and image data involved in a single
mouseover-and-click event in a web browser these days is probably about
that same size.
Are you saying that your save data must be more lightweight than a mouse
click? Where does this limitation come from? Are you targeting a modem
where a 5 second background upload is not acceptable?

If I had to compress the data you talk about, I might want to look into
some implicit representation, like a quad tree with filled/not nodes.
Something like:
"Does the current sub-area have entities? If not, store 0, and terminate.
Else store 1. If the size of the sub-area is greater than one, subdivide,
and recurse for each sub-quadrant."
Depending on how clustered the entities are, this may compress well or
poorly (but with a max 7% fill rate, it ought to at least compress
somewhat.)

Apply gzip on top of any encoding you come up with for possible bonus gains.

Sincerely,

jw






Sincerely,

Jon Watte


--
"I pledge allegiance to the flag of the United States of America, and to
the republic for which it stands, one nation indivisible, with liberty and
justice for all."
~ Adopted by U.S. Congress, June 22, 1942
Post by Richard Fabian
The web is full of different solutions for compressing things, but some of
you have probably already done this and found a really good ratio
compression scheme that fits what I'm asking about.
I've got a load of entites, (4-7k) that live on a grid (1024x1024), and I
need to persist them to a savegame which gets sent across the network to
keep the save safe. I need to compress it. I'm compressing the entities in
the world already, ones that have been affected cost a few bytes, but the
but for the ones that don't have any changes, all I'm left with is the
coords. There are only a few that need the fatter compression, but at the
numbers I'm looking at, those "unchanged entity" coords add up to a lot of
data when storing as 20 bit structs. This is too much for sending over the
network regularly as part of a save.
I've had some ideas on how to compress the data, but they might be crap. I
don't know.
I can't easily regenerate the set so thought someone who wanted to
compress consumables might know the name of a good solution for sparse
bitset compression. The bits are clumped around different areas of the
grid, so I feel that something that leverages spatial coherence might do
well.
Any leads would be highly appreciated.
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever
that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
Richard Fabian
2014-01-17 10:14:40 UTC
Permalink
This is too much for sending over the network regularly as part of a save.
Well, we've got our play-through logs showing how much data was being sent
per day per user, and this part of the save was the bit being updated the
most. Over a day of play it was adding up to about 5mb I think (can't
remember precisely now) and this made it into the top three we needed to
compress better to reduce the client bandwidth cost (mobile carrier data
limits and all that).
If I had to compress the data you talk about, I might want to look into
some implicit representation, like a quad tree with filled/not nodes.
"Does the current sub-area have entities? If not, store 0, and terminate.
Else store 1. If the size of the sub-area is greater than one, subdivide,
and recurse for each sub-quadrant."
Depending on how clustered the entities are, this may compress well or
poorly (but with a max 7% fill rate, it ought to at least compress
somewhat.)
I had thought about a quad tree, but I think I chose to not go that way
because my previous experience with them was that they're not quite as
efficient in practice as the should be. I can't remember why I think this,
but I remember working through some code and thinking something along the
lines of "Oh, yeah. Of course" but that might have been because the data
involved wasn't quite as clumpy.
Apply gzip on top of any encoding you come up with for possible bonus gains.
that's part of the network layer, so no point in doing it explicitly.
Sincerely,
jw
Thanks, I might try the quad tree idea again.
Alex Walters
2014-01-17 18:04:46 UTC
Permalink
Its on the fringe of being useful, but one thing you could look at to
reduce your data size is Exponential Golomb coding (
http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in H.264
entropy encoding among other things.

Instead of writing a full n-bit values for every number you store, it
produces a bit stream of codes, using far less bits for small numbers -
take a look at the wiki page, its pretty simple to see whats going on when
you look at the example. It can reduce the amount of data you have to move,
and from what I remember from when I worked with video, it produces much
more predictable (and compressible) stream of bits for the next compression
scheme along (variable length encoding, or arithmetic encoding in the case
of H.264) - I've not looked into the details but could probably improve the
compression of the stream you get through gzip.

Regards,

Alex
Post by Richard Fabian
This is too much for sending over the network regularly as part of a save.
Well, we've got our play-through logs showing how much data was being sent
per day per user, and this part of the save was the bit being updated the
most. Over a day of play it was adding up to about 5mb I think (can't
remember precisely now) and this made it into the top three we needed to
compress better to reduce the client bandwidth cost (mobile carrier data
limits and all that).
If I had to compress the data you talk about, I might want to look into
some implicit representation, like a quad tree with filled/not nodes.
"Does the current sub-area have entities? If not, store 0, and terminate.
Else store 1. If the size of the sub-area is greater than one, subdivide,
and recurse for each sub-quadrant."
Depending on how clustered the entities are, this may compress well or
poorly (but with a max 7% fill rate, it ought to at least compress
somewhat.)
I had thought about a quad tree, but I think I chose to not go that way
because my previous experience with them was that they're not quite as
efficient in practice as the should be. I can't remember why I think this,
but I remember working through some code and thinking something along the
lines of "Oh, yeah. Of course" but that might have been because the data
involved wasn't quite as clumpy.
Apply gzip on top of any encoding you come up with for possible bonus gains.
that's part of the network layer, so no point in doing it explicitly.
Sincerely,
jw
Thanks, I might try the quad tree idea again.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
Samuel Moll
2014-01-17 19:11:36 UTC
Permalink
Another Idea that might be useful is to think of your positions
dataset as a black-and white 1024*1024 bitmap (This gets harder if two
entities are allowed to occupy the same position) and compress it with
some sort of run-length-encoding, i.e. you go through each pixel line
from left to right and instead of saving 0s (for empty space) and 1s
(for occupied pixels), you save how many 0s there are until the next
1.
Post by Alex Walters
Its on the fringe of being useful, but one thing you could look at to
reduce your data size is Exponential Golomb coding (
http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in H.264
entropy encoding among other things.
Instead of writing a full n-bit values for every number you store, it
produces a bit stream of codes, using far less bits for small numbers -
take a look at the wiki page, its pretty simple to see whats going on when
you look at the example. It can reduce the amount of data you have to move,
and from what I remember from when I worked with video, it produces much
more predictable (and compressible) stream of bits for the next compression
scheme along (variable length encoding, or arithmetic encoding in the case
of H.264) - I've not looked into the details but could probably improve the
compression of the stream you get through gzip.
Regards,
Alex
Post by Richard Fabian
This is too much for sending over the network regularly as part of a save.
Well, we've got our play-through logs showing how much data was being sent
per day per user, and this part of the save was the bit being updated the
most. Over a day of play it was adding up to about 5mb I think (can't
remember precisely now) and this made it into the top three we needed to
compress better to reduce the client bandwidth cost (mobile carrier data
limits and all that).
If I had to compress the data you talk about, I might want to look into
some implicit representation, like a quad tree with filled/not nodes.
"Does the current sub-area have entities? If not, store 0, and terminate.
Else store 1. If the size of the sub-area is greater than one, subdivide,
and recurse for each sub-quadrant."
Depending on how clustered the entities are, this may compress well or
poorly (but with a max 7% fill rate, it ought to at least compress
somewhat.)
I had thought about a quad tree, but I think I chose to not go that way
because my previous experience with them was that they're not quite as
efficient in practice as the should be. I can't remember why I think this,
but I remember working through some code and thinking something along the
lines of "Oh, yeah. Of course" but that might have been because the data
involved wasn't quite as clumpy.
Apply gzip on top of any encoding you come up with for possible bonus gains.
that's part of the network layer, so no point in doing it explicitly.
Sincerely,
jw
Thanks, I might try the quad tree idea again.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
Richard Fabian
2014-01-18 00:10:02 UTC
Permalink
hah, that was funny. I looked at the link, then realised that it was the
same as what I'm doing except I pack with ones instead of zeros.
0 -> 0
100 -> 1
101 -> 2
11000 -> 3
etc.
I'm pretty sure i read about this in a jpeg compression book more than 10
years ago.
Post by Alex Walters
Its on the fringe of being useful, but one thing you could look at to
reduce your data size is Exponential Golomb coding (
http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in
H.264 entropy encoding among other things.
Instead of writing a full n-bit values for every number you store, it
produces a bit stream of codes, using far less bits for small numbers -
take a look at the wiki page, its pretty simple to see whats going on when
you look at the example. It can reduce the amount of data you have to move,
and from what I remember from when I worked with video, it produces much
more predictable (and compressible) stream of bits for the next compression
scheme along (variable length encoding, or arithmetic encoding in the case
of H.264) - I've not looked into the details but could probably improve the
compression of the stream you get through gzip.
Regards,
Alex
Post by Richard Fabian
This is too much for sending over the network regularly as part of a
save.
Well, we've got our play-through logs showing how much data was being
sent per day per user, and this part of the save was the bit being updated
the most. Over a day of play it was adding up to about 5mb I think (can't
remember precisely now) and this made it into the top three we needed to
compress better to reduce the client bandwidth cost (mobile carrier data
limits and all that).
If I had to compress the data you talk about, I might want to look into
some implicit representation, like a quad tree with filled/not nodes.
"Does the current sub-area have entities? If not, store 0, and
terminate. Else store 1. If the size of the sub-area is greater than one,
subdivide, and recurse for each sub-quadrant."
Depending on how clustered the entities are, this may compress well or
poorly (but with a max 7% fill rate, it ought to at least compress
somewhat.)
I had thought about a quad tree, but I think I chose to not go that way
because my previous experience with them was that they're not quite as
efficient in practice as the should be. I can't remember why I think this,
but I remember working through some code and thinking something along the
lines of "Oh, yeah. Of course" but that might have been because the data
involved wasn't quite as clumpy.
Apply gzip on top of any encoding you come up with for possible bonus gains.
that's part of the network layer, so no point in doing it explicitly.
Sincerely,
jw
Thanks, I might try the quad tree idea again.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever that
it is not utterly absurd." - Bertrand Russell
Robin Green
2014-01-21 08:08:10 UTC
Permalink
Here's a modern Run Length version of Golomb-Rice encoding designed for
compressing general data with a Generalized Gaussian distribution. Works
best if you can prove there are, in most cases, small differences between
adjacent data (but handily you get to define what "adjacent" means to your
stream). Encode a single starting value at full bandwidth then send a
stream of differences:

https://research.microsoft.com/pubs/102069/malvar_dcc06.pdf

As a bonus, if you're encoding depth images, you can remap the values to
leave zero as an out-of-band value:

http://www.charlesneedham.com/pubs/153971/depthcode-final.pdf

Yes, I know that wasn't the question you asked but at least it's an
algorithm. :-)

- Robin Green
Post by Alex Walters
Its on the fringe of being useful, but one thing you could look at to
reduce your data size is Exponential Golomb coding (
http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in
H.264 entropy encoding among other things.
Instead of writing a full n-bit values for every number you store, it
produces a bit stream of codes, using far less bits for small numbers -
take a look at the wiki page, its pretty simple to see whats going on when
you look at the example. It can reduce the amount of data you have to move,
and from what I remember from when I worked with video, it produces much
more predictable (and compressible) stream of bits for the next compression
scheme along (variable length encoding, or arithmetic encoding in the case
of H.264) - I've not looked into the details but could probably improve the
compression of the stream you get through gzip.
Regards,
Alex
Jon Watte
2014-01-31 21:32:02 UTC
Permalink
Encode a single starting value at full bandwidth then send a stream of
Which is equivalent to a wavelet "lift" transform :-)

Back to the original question: If you have a previous save game, is there
lots of similarity in the next save? If so, can you save a delta instead?
If not, what solution did you end up choosing?

Sincerely,

jw






Sincerely,

Jon Watte


--
"I find that the harder I work, the more luck I seem to have." -- Thomas
Jefferson
Here's a modern Run Length version of Golomb-Rice encoding designed for
compressing general data with a Generalized Gaussian distribution. Works
best if you can prove there are, in most cases, small differences between
adjacent data (but handily you get to define what "adjacent" means to your
stream). Encode a single starting value at full bandwidth then send a
https://research.microsoft.com/pubs/102069/malvar_dcc06.pdf
As a bonus, if you're encoding depth images, you can remap the values to
http://www.charlesneedham.com/pubs/153971/depthcode-final.pdf
Yes, I know that wasn't the question you asked but at least it's an
algorithm. :-)
- Robin Green
Post by Alex Walters
Its on the fringe of being useful, but one thing you could look at to
reduce your data size is Exponential Golomb coding (
http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in
H.264 entropy encoding among other things.
Instead of writing a full n-bit values for every number you store, it
produces a bit stream of codes, using far less bits for small numbers -
take a look at the wiki page, its pretty simple to see whats going on when
you look at the example. It can reduce the amount of data you have to move,
and from what I remember from when I worked with video, it produces much
more predictable (and compressible) stream of bits for the next compression
scheme along (variable length encoding, or arithmetic encoding in the case
of H.264) - I've not looked into the details but could probably improve the
compression of the stream you get through gzip.
Regards,
Alex
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
Richard Fabian
2014-01-31 23:27:26 UTC
Permalink
there was a lot of similarity, but with no processng at the destination, we
decided that the Golomb-rice algorithm was good enough. Dropped our data by
70%. That was enough of a saving for this one area.
Encode a single starting value at full bandwidth then send a stream of
Which is equivalent to a wavelet "lift" transform :-)
Back to the original question: If you have a previous save game, is there
lots of similarity in the next save? If so, can you save a delta instead?
If not, what solution did you end up choosing?
Sincerely,
jw
Sincerely,
Jon Watte
--
"I find that the harder I work, the more luck I seem to have." -- Thomas
Jefferson
Post by Robin Green
Here's a modern Run Length version of Golomb-Rice encoding designed for
compressing general data with a Generalized Gaussian distribution. Works
best if you can prove there are, in most cases, small differences between
adjacent data (but handily you get to define what "adjacent" means to your
stream). Encode a single starting value at full bandwidth then send a
https://research.microsoft.com/pubs/102069/malvar_dcc06.pdf
As a bonus, if you're encoding depth images, you can remap the values to
http://www.charlesneedham.com/pubs/153971/depthcode-final.pdf
Yes, I know that wasn't the question you asked but at least it's an
algorithm. :-)
- Robin Green
Post by Alex Walters
Its on the fringe of being useful, but one thing you could look at to
reduce your data size is Exponential Golomb coding (
http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in
H.264 entropy encoding among other things.
Instead of writing a full n-bit values for every number you store, it
produces a bit stream of codes, using far less bits for small numbers -
take a look at the wiki page, its pretty simple to see whats going on when
you look at the example. It can reduce the amount of data you have to move,
and from what I remember from when I worked with video, it produces much
more predictable (and compressible) stream of bits for the next compression
scheme along (variable length encoding, or arithmetic encoding in the case
of H.264) - I've not looked into the details but could probably improve the
compression of the stream you get through gzip.
Regards,
Alex
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever that
it is not utterly absurd." - Bertrand Russell
Colt McAnlis
2014-03-08 15:13:15 UTC
Permalink
Why not just use industry standard Arithmetic Compression?

If you calculate the estimated entropy for that data set (
http://planetcalc.com/2476/) you end up with about H=0.06 for 7k set bits
in a 1024*1024 stream, and H=0.01 for 1k set bits.
That being entropy (or minimum bits per symbol) should yield 70kb and 10k
after compression, respectively. Or, with AC, and those sparse values, you
can get around 94% compression and 99% compression. (if my early morning
math is right)

AC is a known algorithm, easy examples to find on the web. And as far a
serialization is concerned, it's pretty straight forward to just encode
your bits, then decode; no extra special data structures needed.

~Main
Post by Richard Fabian
there was a lot of similarity, but with no processng at the destination,
we decided that the Golomb-rice algorithm was good enough. Dropped our data
by 70%. That was enough of a saving for this one area.
Encode a single starting value at full bandwidth then send a stream of
Which is equivalent to a wavelet "lift" transform :-)
Back to the original question: If you have a previous save game, is there
lots of similarity in the next save? If so, can you save a delta instead?
If not, what solution did you end up choosing?
Sincerely,
jw
Sincerely,
Jon Watte
--
"I find that the harder I work, the more luck I seem to have." -- Thomas
Jefferson
Post by Robin Green
Here's a modern Run Length version of Golomb-Rice encoding designed for
compressing general data with a Generalized Gaussian distribution. Works
best if you can prove there are, in most cases, small differences between
adjacent data (but handily you get to define what "adjacent" means to your
stream). Encode a single starting value at full bandwidth then send a
https://research.microsoft.com/pubs/102069/malvar_dcc06.pdf
As a bonus, if you're encoding depth images, you can remap the values to
http://www.charlesneedham.com/pubs/153971/depthcode-final.pdf
Yes, I know that wasn't the question you asked but at least it's an
algorithm. :-)
- Robin Green
Post by Alex Walters
Its on the fringe of being useful, but one thing you could look at to
reduce your data size is Exponential Golomb coding (
http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in
H.264 entropy encoding among other things.
Instead of writing a full n-bit values for every number you store, it
produces a bit stream of codes, using far less bits for small numbers -
take a look at the wiki page, its pretty simple to see whats going on when
you look at the example. It can reduce the amount of data you have to move,
and from what I remember from when I worked with video, it produces much
more predictable (and compressible) stream of bits for the next compression
scheme along (variable length encoding, or arithmetic encoding in the case
of H.264) - I've not looked into the details but could probably improve the
compression of the stream you get through gzip.
Regards,
Alex
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever
that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
==
Colt "MainRoach" McAnlis
Graphics Programmer
Richard Fabian
2014-03-09 16:40:20 UTC
Permalink
AC (and now Finite State Entropy
https://github.com/Cyan4973/FiniteStateEntropy) are really good, but carry
the weight of spending more time than we had to do the task. Goulomb coding
the distance between high bits was a great, simple to implement solution
for us. Another issue was the compression time. It needed to be quick, and
AC is not known to be fast.

In future, I hope to use FSE for almost all my lossless coders.
Post by Colt McAnlis
Why not just use industry standard Arithmetic Compression?
If you calculate the estimated entropy for that data set (
http://planetcalc.com/2476/) you end up with about H=0.06 for 7k set bits
in a 1024*1024 stream, and H=0.01 for 1k set bits.
That being entropy (or minimum bits per symbol) should yield 70kb and 10k
after compression, respectively. Or, with AC, and those sparse values, you
can get around 94% compression and 99% compression. (if my early morning
math is right)
AC is a known algorithm, easy examples to find on the web. And as far a
serialization is concerned, it's pretty straight forward to just encode
your bits, then decode; no extra special data structures needed.
~Main
Post by Richard Fabian
there was a lot of similarity, but with no processng at the destination,
we decided that the Golomb-rice algorithm was good enough. Dropped our data
by 70%. That was enough of a saving for this one area.
Encode a single starting value at full bandwidth then send a stream of
Which is equivalent to a wavelet "lift" transform :-)
Back to the original question: If you have a previous save game, is
there lots of similarity in the next save? If so, can you save a delta
instead?
If not, what solution did you end up choosing?
Sincerely,
jw
Sincerely,
Jon Watte
--
"I find that the harder I work, the more luck I seem to have." --
Thomas Jefferson
Post by Robin Green
Here's a modern Run Length version of Golomb-Rice encoding designed for
compressing general data with a Generalized Gaussian distribution. Works
best if you can prove there are, in most cases, small differences between
adjacent data (but handily you get to define what "adjacent" means to your
stream). Encode a single starting value at full bandwidth then send a
https://research.microsoft.com/pubs/102069/malvar_dcc06.pdf
As a bonus, if you're encoding depth images, you can remap the values
http://www.charlesneedham.com/pubs/153971/depthcode-final.pdf
Yes, I know that wasn't the question you asked but at least it's an
algorithm. :-)
- Robin Green
Post by Alex Walters
Its on the fringe of being useful, but one thing you could look at to
reduce your data size is Exponential Golomb coding (
http://en.wikipedia.org/wiki/Exponential-Golomb_coding), its used in
H.264 entropy encoding among other things.
Instead of writing a full n-bit values for every number you store, it
produces a bit stream of codes, using far less bits for small numbers -
take a look at the wiki page, its pretty simple to see whats going on when
you look at the example. It can reduce the amount of data you have to move,
and from what I remember from when I worked with video, it produces much
more predictable (and compressible) stream of bits for the next compression
scheme along (variable length encoding, or arithmetic encoding in the case
of H.264) - I've not looked into the details but could probably improve the
compression of the stream you get through gzip.
Regards,
Alex
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever
that it is not utterly absurd." - Bertrand Russell
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
==
Colt "MainRoach" McAnlis
Graphics Programmer
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to
Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and
the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
GDAlgorithms-list mailing list
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list
--
fabs();
"The fact that an opinion has been widely held is no evidence whatever that
it is not utterly absurd." - Bertrand Russell
Loading...