Lars Wirzenius

Lars Wirzenius at

http://obnam.org/format-green-albatross/ is my current best idea for the next generation Obnam repository format. Review is welcome.

Wow, ditching the b-trees is a bold move.

Will this make obnam less sensitive to roundtrip latency? Sounds like you can stream out data into a bag and then do essentially one update at the end of the metadata..

The bag.1 format seems possibly suboptimal, if only part of a bag needs to be accessed to restore, you'd need to download the whole thing. An offset allowing random access might be better?

I suspect this will also make pruning old generations more expensive, since some bags will need to be rewritten with parts removed.

joeyh at 2015-04-20T15:58:24Z

The files storing B-tree nodes are not cacheable, since they get updated in place. Or rather, they are cacheable, but cache management is difficult and potentially quite expensive. The new design aims to not ever update files in place and so would be much, much more cacheable, avoiding a lot of rounttrips, I hope.

I expect bags to be somewhat small (perhaps as small as 64 k, but probably more like 1 meg), so that the overhead of wasted space and downloading too much is kept reasonable. I don't plan on rewriting bags when data is removed, normally, but there might be a "packing" function or option to force that.

I hadn't thought about random access to a bag file. I shall ponder on this.

Lars Wirzenius at 2015-04-20T16:08:44Z

Incidentally, the obnam-dev mailing list is the best place for this discussion, to keep it in one place, but it's not strictly a requirement.

Lars Wirzenius at 2015-04-20T16:09:29Z