This is analogous to #70447 and #76487.
These are all needed to attach a container to the default bridge
network, without which the final line of the following script fails with
the error for each respective kernel module listed below.
```sh
lxc storage create foo dir
lxc launch -s foo ubuntu:trusty bar
lxc network attach lxdbr0 bar
```
veth
----
> Error: Failed to start device 'lxdbr0': Failed to create the veth interfaces vethefbc3cd6 and vetha4abbcbc: Failed to run: ip link add dev vethefbc3cd6 type veth peer name vetha4abbcbc: RTNETLINK answers: Operation not supported
iptable_mangle
--------------
> lvl=eror msg="Failed to bring up network" err="Failed to list ipv4 rules for LXD network lxdbr0 (table mangle)" name=lxdbr0
xt_comment
----------
> lvl=error msg="Failed to bring up network" err="Failed to run: iptables -w -t filter -I INPUT -i lxdbr0 -p udp --dport 67 -j ACCEPT -m comment --comment generated for LXD network lxdbr0: iptables v1.8.4 (legacy): Couldn't load match `comment':No such file or directory\n\nTry `iptables -h' or 'iptables --help' for more information." name=lxdbr0
xt_MASQUERADE
-------------
> vl=eror msg="Failed to bring up network" err="Failed to run: iptables -w -t nat -I POSTROUTING -s 10.0.107.0/24 ! -d 10.0.107.0/24 -j MASQUERADE -m comment --comment generated for LXD network lxdbr0: iptables v1.8.4 (legacy): Couldn't load target `MASQUERADE':No such file or directory\n\nTry `iptables -h' or 'iptables --help' for more information." name=lxdbr0
systemd-nspawn can react to SIGTERM and send a shutdown signal to the container
init process. use that instead of going through dbus and machined to request
nspawn sending the signal, since during host shutdown machined or dbus may have
gone away by the point a container unit is stopped.
to solve the issue that a container that is still starting cannot be stopped
cleanly we must also handle this signal in containerInit/stage-2.
At the moment, it's not possible to override the libvirtd package used
without supplying a nixpkgs overlay. Adding a package option makes
libvirtd more consistent and allows enabling e.g. ceph and iSCSI support
more easily.
Xen is now enabled unconditionally on kernels that support it, so the
xen_dom0 feature doesn't do anything. The isXen attribute will now
produce a deprecation warning and unconditionally return true.
Passing in a custom value for isXen is no longer supported.
This dependency has been added in 65eae4d, when NixOS switched to
systemd, as a substitute for the previous udevtrigger and hasn't been
touched since. It's probably unneeded as the upstream unit[1] doesn't
do it and I haven't found any mention of any problem in NixOS or the
upstream issue trackers.
[1]: https://gitlab.com/libvirt/libvirt/-/blob/master/src/remote/libvirtd.service.in
Related to #85746 which addresses documentation issue,
digging deeper for a reason why this was disabled
was simply because it wasn't working which is not the case anymore.
- Actually use the zfsSupport option
- Add documentation URI to lxd.service
- Add lxd.socket to enable socket activatation
- Add proper dependencies and remove systemd-udev-settle from lxd.service
- Set up /var/lib/lxc/rootfs using systemd.tmpfiles
- Configure safe start and shutdown of lxd.service
- Configure restart on failures of lxd.service
Launching a container with a private network requires creating a
dedicated networking interface for it; name of that interface is derived
from the container name itself - e.g. a container named `foo` gets
attached to an interface named `ve-foo`.
An interface name can span up to IFNAMSIZ characters, which means that a
container name must contain at most IFNAMSIZ - 3 - 1 = 11 characters;
it's a limit that we validate using a build-time assertion.
This limit has been upgraded with Linux 5.8, as it allows for an
interface to contain a so-called altname, which can be much longer,
while remaining treated as a first-class citizen.
Since altnames have been supported natively by systemd for a while now,
due diligence on our side ends with dropping the name-assertion on newer
kernels.
This commit closes#38509.
systemd/systemd#14467systemd/systemd#17220https://lwn.net/Articles/794289/
Ensure that the HyperV keyboard driver is available in the early
stages of the boot process. This allows the user to enter a disk
encryption passphrase or repair a boot problem in an interactive
shell.
We now set the hooks dir correctly if the OCI hook is enabled. CRI-O
supports this specific hook from v1.20.0.
Signed-off-by: Sascha Grunert <mail@saschagrunert.de>
Reintroduce the `fetch-ssh-keys` service so that GCE images that work
with NixOps can once again be built. Also, reformat the code a bit.
The service was removed in 88570538b3,
likely due to a comment saying it should be removed. It was still
needed for images to work with NixOps, however, and probably needed to
be replaced or rewritten rather than removed.
The socketActivation option was removed, but later on socket activation
was added back without the option to disable it. The description now reflects
that socket activation is used unconditionally in the current setup.
Since the introduction of option `containers.<name>.pkgs`, the
`nixpkgs.*` options (including `nixpkgs.pkgs`, `nixpkgs.config`, ...) were always
ignored in container configs, which broke existing containers.
This was due to `containers.<name>.pkgs` having two separate effects:
(1) It sets the source for the modules that are used to evaluate the container.
(2) It sets the `pkgs` arg (`_module.args.pkgs`) that is used inside the container
modules.
This happens even when the default value of `containers.<name>.pkgs` is unchanged, in which
case the container `pkgs` arg is set to the pkgs of the host system.
Previously, the `pkgs` arg was determined by the `containers.<name>.config.nixpkgs.*` options.
This commit reverts the breaking change (2) while adding a backwards-compatible way to achieve (1).
It removes option `pkgs` and adds option `nixpkgs` which implements (1).
Existing users of `pkgs` are informed by an error message to use option
`nixpkgs` or to achieve only (2) by setting option `containers.<name>.config.nixpkgs.pkgs`.
It's been 8.5 years since NixOS used mingetty, but the option was
never renamed (despite the file definining the module being renamed in
9f5051b76c ("Rename mingetty module to agetty")).
I've chosen to rename it to services.getty here, rather than
services.agetty, because getty is implemantation-neutral and also the
name of the unit that is generated.
... build-vm-with-bootloader" for EFI systems
This reverts commit 20257280d9, reversing
changes made to 926a1b2094.
It broke nixosTests.installer.simpleUefiSystemdBoot
and right now channel is lagging behing for two weeks.
`nixos-rebuild build-vm-with-bootloader` currently fails with the
default NixOS EFI configuration:
$ cat >configuration.nix <<EOF
{
fileSystems."/".device = "/dev/sda1";
boot.loader.systemd-boot.enable = true;
boot.loader.efi.canTouchEfiVariables = true;
}
EOF
$ nixos-rebuild build-vm-with-bootloader -I nixos-config=$PWD/configuration.nix -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/nixos-20.09.tar.gz
[...]
insmod: ERROR: could not insert module /nix/store/1ibmgfr13r8b6xyn4f0wj115819f359c-linux-5.4.83/lib/modules/5.4.83/kernel/fs/efivarfs/efivarfs.ko.xz: No such device
mount: /sys/firmware/efi/efivars: mount point does not exist.
[ 1.908328] reboot: Power down
builder for '/nix/store/dx2ycclyknvibrskwmii42sgyalagjxa-nixos-boot-disk.drv' failed with exit code 32
[...]
Fix it by setting virtualisation.useEFIBoot = true in qemu-vm.nix, when
efi is needed.
And remove the now unneeded configuration in
./nixos/tests/systemd-boot.nix, since it's handled globally.
Before:
* release-20.03: successful build, unsuccessful run
* release-20.09 (and master): unsuccessful build
After:
* Successful build and run.
Fixes https://github.com/NixOS/nixpkgs/issues/107255
Fixes that `containers.<name>.extraVeths.<name>` configuration was not
always applied.
When configuring `containers.<name>.extraVeths.<name>` and not
configuring one of `containers.<name>.localAddress`, `.localAddress6`,
`.hostAddress`, `.hostAddress6` or `.hostBridge` the veth was created,
but otherwise no configuration (i.e. no ip) was applied.
nixos-container always configures the primary veth (when `.localAddress`
or `.hostAddress` is set) to be the containers default gateway, so
this fix is required to create a veth in containers that use a different
default gateway.
To test this patch configure the following container and check if the
addresses are applied:
```
containers.testveth = {
extraVeths.testveth = {
hostAddress = "192.168.13.2";
localAddress = "192.168.13.1";
};
config = {...}:{};
};
```
The metadata fetcher scripts run each time an instance starts, and it
is not safe to assume that responses from the instance metadata
service (IMDS) will be as they were on first boot.
Example: an EC2 instance can have its user data changed while
the instance is stopped. When the instance is restarted, we want to
see the new user data applied.
According to Freenode's ##AWS, the metadata server can sometimes
take a few moments to get its shoes on, and the very first boot
of a machine can see failed requests for a few moments.
AWS's metadata service has two versions. Version 1 allowed plain HTTP
requests to get metadata. However, this was frequently abused when a
user could trick an AWS-hosted server in to proxying requests to the
metadata service. Since the metadata service is frequently used to
generate AWS access keys, this is pretty gnarly. Version two is
identical except it requires the caller to request a token and provide
it on each request.
Today, starting a NixOS AMI in EC2 where the metadata service is
configured to only allow v2 requests fails: the user's SSH key is not
placed, and configuration provided by the user-data is not applied.
The server is useless. This patch addresses that.
Note the dependency on curl is not a joyful one, and it expand the
initrd by 30M. However, see the added comment for more information
about why this is needed. Note the idea of using `echo` and `nc` are
laughable. Don't do that.
See https://www.redhat.com/sysadmin/fedora-31-control-group-v2 for
details on why this is desirable, and how it impacts containers.
Users that need to keep using the old cgroup hierarchy can re-enable it
by setting `systemd.unifiedCgroupHierarchy` to `false`.
Well-known candidates not supporting that hierarchy, like docker and
hidepid=… will disable it automatically.
Fixes#73800
This reverts commit fb6d63f3fd.
I really hope this finally fixes#99236: evaluation on Hydra.
This time I really did check basically the same commit on Hydra:
https://hydra.nixos.org/eval/1618011
Right now I don't have energy to find what exactly is wrong in the
commit, and it doesn't seem important in comparison to nixos-unstable
channel being stuck on a commit over one week old.
This adds the pinns path to the configuration let CRI-O start properly.
We also change the configuration to the new drop-in syntax.
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
The toplevel derivations of systems that have `networking.hostName`
set to `""` (because they want their hostname to be set by DHCP) used
to be all named
`nixos-system-unnamed-${config.system.nixos.label}`.
This makes them hard to distinguish.
A similar problem existed in NixOS tests where `vmName` is used in the
`testScript` to refer to the VM. It defaulted to the
`networking.hostName` which when set to `""` won't allow you to refer
to the machine from the `testScript`.
This commit makes the `system.name` configurable. It still defaults to:
```
if config.networking.hostName == ""
then "unnamed"
else config.networking.hostName;
```
but in case `networking.hostName` needs to be to `""` the
`system.name` can be set to a distinguishable name.
This is analogous to #70447.
With security.lockKernelModules=true, docker commands result in the following
error without at least loading veth:
$ docker run hello-world
/nix/store/mr50kaan2vs4gc40ymwncb2vci25aq7z-docker-19.03.2/libexec/docker/docker: Error response from daemon: failed to create endpoint epic_kare on network bridge: failed to add the host (veth8b381f3) <=> sandbox (veth348e197) pair interfaces: operation not supported.
ERRO[0003] error waiting for container: context canceled
This is required by (among others) Podman to run containers in rootless mode.
Other distributions such as Fedora and Ubuntu already set up these mappings.
The scheme with a start UID/GID offset starting at 100000 and increasing in 65536 increments is copied from Fedora.
Per upstream:
> libvirtd-tcp.socket - the unit file corresponding to the TCP 16509
> port for non-TLS remote access. This socket should not be configured
> to start on boot until the administrator has configured a suitable
> authentication mechanism.
Without this, systemd-boot does not add an EFI boot entry for itself.
The reason it worked before this fix is because it would fall back to
the default installed \EFI\BOOT\BOOTX64.EFI
boot.loader.grub.device` was hardcoded to `bootDevice`, which is
wrong, because that's the device for `/`, and with `useBootLoader`
the boot loader is not on that device.
This bug probably came into existence because of bad naming;
`virtualisation.bootDevice` has description
"The disk to be used for the root filesystem", which is very confusing;
it should be `.rootDevice` then!
Unfortunately, the description is right and the attribute name is wrong,
so it is not easy to change this without deprecation.
This commit ensures that even if you use `useBootLoader` and
`diskInterface == "scsi"`, the created VM can boot through, and can run
`nixos-rebuild afterwards.
It also adds extra commentary to explain what's going on in this module
in general in relation to `useBootLoader`.
VMSGVA is recommended by virtualbox for Linux clients.
Compared to VBoxVGA and VBoxSVGA it also supports 3D acceleration.
Adding the driver makes nixos work with all three supported graphics card
types.
xchg is advertised as a bidirectional exchange dir, but file content
transfer from host to VM fails due to caching:
If a file is read in the VM and then modified on the host, subsequent
re-reads in the VM can yield old, cached data.
This is caused by the use of 9p's cache=loose mode that is explicitly
meant for read-only mounts.
9p doesn't provide any suitable cache modes, so fix this by disabling
caching.
Also, remove a now unnecessary sync in the test driver.
- Update the default pause image
- Set the cgroup manager to systemd
- Enable `manage_ns_lifecycle` instead of the deprecated
`manage_network_ns_lifecycle` option
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
This follows upstreams change in documentation. While the `[DHCP]`
section might still work it is undocumented and we should probably not
be using it anymore. Users can just upgrade to the new option without
much hassle.
I had to create a bit of custom module deprecation code since the usual
approach doesn't support wildcards in the path.