One of the first things that a user of umoci may notice is that certain operations can be quite expensive. Notably unpack and repack operations require either scanning through each layer archive of an image, or scanning through the filesystem. Both operations require quite a bit of disk IO, and can take a while. Fedora images are known to be quite large, and can take several seconds to operate on.
% time umoci unpack --image fedora:26 bundle
umoci unpack --image fedora:26 bundle 8.43s user 1.68s system 105% cpu 9.562 total
% time umoci repack --image fedora:26-old bundle
umoci repack --image fedora:26 bundle 3.62s user 0.43s system 115% cpu 3.520 total
% find bundle/rootfs -type f -exec touch {} \;
% time umoci repack --image fedora:26-new bundle
umoci repack --image fedora:26-new bundle 32.03s user 4.50s system 112% cpu 32.559 total
While it is not currently possible to optimise or parallelise the above operations individually (due to the structure of the layer archives), it is possible to optimise your workflows in certain situations. These workflow tips effectively revolve around reducing the amount of extractions that are performed.
--refresh-bundle
A very common workflow when building a series of layers in an image is that, since you want to place different files in different layers of the image, you have to do something like the following:
% umoci unpack --image image_build_XYZ:wip bundle_a
% ./some_build_process_1 ./bundle_a
% umoci repack --image image_build_XYZ:wip bundle_a
% umoci unpack --image image_build_XYZ:wip bundle_b
% ./some_build_process_2 ./bundle_b
% umoci repack --image image_build_XYZ:wip bundle_b
% umoci unpack --image image_build_XYZ:wip bundle_c
% ./some_build_process_3 ./bundle_c
% umoci repack --image image_build_XYZ:wip bundle_c
% umoci tag --image image_build_XYZ:wip final
The above usage, while correct, is not very efficient. Each layer that is
created requires us to to do an unpack of the entire image_build_XYZ:wip
image before we can do anything. By noting that the root filesystem contained
in bundle_a
after we’ve made our changes is effectively the same as the root
filesystem that we extract into bundle_b
(and since we already have
bundle_a
we don’t have to extract it), we can conclude that using bundle_a
is probably going to be more efficient. However, you cannot just do this the
“intuitive way”:
% umoci unpack --image image_build_XYZ:wip bundle_a
% ./some_build_process_1 ./bundle_a
% umoci repack --image image_build_XYZ:wip bundle_a
% ./some_build_process_2 ./bundle_a
% umoci repack --image image_build_XYZ:wip bundle_a
% ./some_build_process_3 ./bundle_a
% umoci repack --image image_build_XYZ:wip bundle_a
% umoci tag --image image_build_XYZ:wip final
Because the metadata stored in bundle_a
includes information about what image
the bundle was based on (this is used when creating the modified image
metadata). Thus, the above usage will not result in multiple layers being
created, and the usage is roughly identical to the following:
% umoci unpack --image image_build_XYZ:wip bundle_a
% ./some_build_process_1 ./bundle_a
% ./some_build_process_2 ./bundle_a
% ./some_build_process_3 ./bundle_a
% umoci repack --image image_build_XYZ:wip bundle_a
% umoci tag --image image_build_XYZ:wip final
Do not despair however, there is a flag just for you! With --refresh-bundle
it is possible to perform the above operations without needing to do any extra
unpack operations.
% umoci unpack --image image_build_XYZ:wip bundle_a
% ./some_build_process_1 ./bundle_a
% umoci repack --refresh-bundle --image image_build_XYZ:wip bundle_a
% ./some_build_process_2 ./bundle_a
% umoci repack --refresh-bundle --image image_build_XYZ:wip bundle_a
% ./some_build_process_3 ./bundle_a
% umoci repack --refresh-bundle --image image_build_XYZ:wip bundle_a
% umoci tag --image image_build_XYZ:wip final
Internally, --refresh-bundle
is modifying the few metadata files inside
bundle_a
so that future repack invocations modify the new image created by
the previous repack operation rather than basing it on the original unpacked
image. Therefore the cost of --refresh-bundle
is constant, and is actually
much smaller than the cost of doing additional unpack operations.
umoci insert
Sometimes all you want to do is to add some files to an image (or remove some
files) and nothing else, and in those cases doing an umoci unpack
-umoci repack
cycle is also quite expensive. This is especially true when you
consider that OCIv1 images are backed by tar
archives – and the delta layer
being generated is just going to be a tar
archive of the files you are
adding. The most basic usage of umoci insert
is to just specify what files
you want added, and what you want them to be called in the image (we don’t have
any magical rsync
semantics – we just copy the root to whatever path you
tell us).
Note that unlike most other umoci
commands, umoci insert
will overwrite
the image you give it. As a counter-example, the --image
flag of umoci repack
refers to the target image not the source image (the source image
is already known, because umoci unpack
saves that information).
This behaviour may change in the future, but it’s not clear what would be an
obvious interface for this change (older versions of umoci
had separate
--src
and --dst
flags, but they were unwieldy and so were removed in
favour of the --image
style).
Also note that each umoci insert
creates a separate layer.
% umoci insert --image myimg:foo mybinary /usr/bin/release-binary
% umoci insert --image myimg:foo myconfigdir /etc/binary.d
If the target file already exists in previous layers, the new layer will overwrite any older versions of the files inserted (when extracted).
You can also remove a file (or directory) from an image by using the
--whiteout
option, which creates a new layer with a “whiteout” entry for the
path you give it. If the file doesn’t already exist, the behaviour depends on
the extraction tool used – umoci insert
will ignore whiteouts for
non-existent files when extracting.
Do not use this to remove secrets from an image. Since umoci insert
operates by creating a new layer, older layers will still contain a copy of the
secret you are trying to remove. If you want to avoid things from being
included in an image in the first place, take a look at umoci repack --mask-path
(which causes changes to the given paths to not be included in the
new layer) or umoci config --config.volumes
(which is automatically treated
as a masked path by umoci repack
).
% umoci insert --whiteout /usr/bin/old-binary
% umoci insert --whiteout /etc/old-config.d
Finally, there is one more important thing to know about umoci insert
– how
directory insertion is handled. By default, umoci insert
just creates a new
layer with the contents of the directory. When unpacked, this results in any
existing contents in that directory (from older layers) to be merged with the
new layer’s contents. You can imagine this as though you extracted your new
directory on top of the previous layers' cumulative directory state.
But what if you want to entire replace the contents of a directory? That’s the
reason why we have --opaque
– it allows you to effectively blank out any
pre-existing contents of the directory and replace it entirely with the new
directory. If the target was not a directory in previous layers, or the source
is not a directory, then the behaviour will depend on the tool used for
extraction – umoci unpack
will just ignore the meaningless opaque whiteout
entry.
% umoci insert --opaque myetcdir /etc
The same caveat about umoci insert --whiteout
applies here, as older layers
will contain the files that were removed by the opaque whiteout.
It should be noted that this is the only way that umoci will currently create
an “opaque whiteout”. This means that if you need to replace an entire
directory wholesale, the layer created by umoci insert --opaque
is far more
efficient in the resulting layer than the umoci unpack
-umoci repack
cycle
(even if you ignore the CPU-time benefits).
Though currently umoci insert
only allows one operation per layer, which is
mostly a UX restriction. This may change in the future, and so umoci insert
will be far more generally usable and efficient in terms of number of layers
generated.