It turns out that checking for the last mount time of an ext4 file
system isn't a very reliable way to check whether the file system was
properly unmounted.
When creating that test in the first place (88530e02b6),
I was reluctant to inspect the file system when the VM is down and was
searching for a way to check for a clean unmount *after* the file system
was mounted again to make sure we don't need to create a 512 MB raw
image on the host.
Fortunately however, when converting from qcow2, qemu-img actually
writes a sparse file, so for most file systems (that is, file systems
supporting sparse files) this shouldn't waste a lot of disk space.
So when investigating the flakiness, I found that whenever the test is
failing, the unmount of /test-x-initrd-mount was done *before* the final
step during which systemd remounts+unmounts all the remaining file
systems.
I haven't investigated why this is the case, but the test is a
regression test for https://github.com/NixOS/nixpkgs/issues/35268, which
actually didn't unmount the file system *at* *all*, so really all we
need to take care here is whether the unmount has happened and not
*how*.
To make sure that checking the filesystem state is enough for this, I
temporarily replaced the $machine->shutdown call with $machine->crash
and verified that the file system state is "not clean".
Signed-off-by: aszlig <aszlig@nix.build>
Fixes: https://github.com/NixOS/nixpkgs/issues/67555
* nixos/acme: Fix ordering of cert requests
When subsequent certificates would be added, they would
not wake up nginx correctly due to target units only being triggered
once. We now added more fine-grained systemd dependencies to make sure
nginx always is aware of new certificates and doesn't restart too early
resulting in a crash.
Furthermore, the acme module has been refactored. Mostly to get
rid of the deprecated PermissionStartOnly systemd options which were
deprecated. Below is a summary of changes made.
* Use SERVICE_RESULT to determine status
This was added in systemd v232. we don't have to keep track
of the EXITCODE ourselves anymore.
* Add regression test for requesting mutliple domains
* Deprecate 'directory' option
We now use systemd's StateDirectory option to manage
create and permissions of the acme state directory.
* The webroot is created using a systemd.tmpfiles.rules rule
instead of the preStart script.
* Depend on certs directly
By getting rid of the target units, we make sure ordering
is correct in the case that you add new certs after already
having deployed some.
Reason it broke before: acme-certificates.target would
be in active state, and if you then add a new cert, it
would still be active and hence nginx would restart
without even requesting a new cert. Not good! We
make the dependencies more fine-grained now. this should fix that
* Remove activationDelay option
It complicated the code a lot, and is rather arbitrary. What if
your activation script takes more than activationDelay seconds?
Instead, one should use systemd dependencies to make sure some
action happens before setting the certificate live.
e.g. If you want to wait until your cert is published in DNS DANE /
TLSA, you could create a unit that blocks until it appears in DNS:
```
RequiredBy=acme-${cert}.service
After=acme-${cert}.service
ExecStart=publish-wait-for-dns-script
```
* nginx: expose generated config and allow nginx reloads
Fixes: https://github.com/NixOS/nixpkgs/issues/15906
Another try was done, but not yet merged in https://github.com/NixOS/nixpkgs/pull/24476
This add 2 new features: ability to review generated Nginx config
(and NixOS has sophisticated generation!) and reloading
of nginx on config changes. This preserves nginx restart on package
updates.
I've modified nginx test to use this new feature and check reload/restart
behavior.
* rename to enableReload
* add sleep(1) in ETag test (race condition) and rewrite rebuild-switch using `nesting.clone`