Rant : SystemD is too complex for Linux distributors
I could say too complex for Linux distributions, but it looks more like a lack of understanding of how SystemD works, and the fact that unit files get installed without asking leading to blocking issues.
The story
During the beginning of September, this server started to be
inaccessible. I guess it’s during this period. Turns out that I
checked my blog during August and it was still up !
Then holidays ended, I went back to work and… I checked my blog
back at the start of October.
Note that, I receive emails from the hosting company I use, if the
servers were to be “inaccessible” from their point of view.
But they only check their local network.
So, I tried to reach https://miouyouyou.fr … “Connection refused”….
Uh-oh…
Ok, let’s start a SSH shell ! “Connection refused”.
!!? What’s happening !? Was my server hijacked !? What the fuck !
I then tried to use my provider (Scaleway) “Web console” : Nothing.
(Spoiler alert : Turns out that Scaleway web console sucks)
Ugh… Okay… I got locked out from my server ? Is that some kind of “HA-HA ! You forgot to secure this part of your server ! pwned you, m0r0n !” ?
Let’s try a nmap -sS -v
on my server !
Discovered open port 22/tcp
Wait, what !? What’s running on port 22 !? The SSH server is supposed to run on another port !
… I changed my client ~/.ssh/config to use port 22 instead of the
“configured” port. Then I retried to get a SSH Shell and…
Got a shell on my server !
… ?
Alright… ps auxww
… nothing unusual…
Checked dmesg
, checked journalctl
… Nothing unusual !
Threw a tcpdump not port 22
for kicks… nothing unusual…
Maybe the machine wasn’t hacked ? I’m flairing some retarded system update now…
… Let’s see if something happens if we put the the system back together for the moment… I mean, I’m just hosting a static blog which content is available on a public git repository, which I can redeploy at any moment so, if it blows, I’ll order a new server unit.
docker container ls
… Ok, the containers are down…
iptables -L
… The firewall was reset ?
Fine, ran my script to restablish the firewall, deleted all the
containers, updated the docker images, deployed the new instances with
docker-compose
AAAND, then, I tried to reach https://miouyouyou.fr
Success !
I got my blog back !
Updated the SSL certificates, the OCSP staples : Alright !
Then I tried to create a script to automate the blog updates
as much as possible, discovered that there’s some big differences
between the Hugo 0.46 on my machine and the Hugo 0.56 on the server,
which fucked up my templates real bad.
After a few hours, checking each Hugo release note, to understand
which update fucked up the templates, I pinpointed the issue,
updated the templates and Voila : My blog is restored !
Now, I can focus on the real issue !
WHY THE FUCK IS MY SSH SERVER LISTENING ON PORT 22 !
Checked /etc/ssh/sshd_service
… It’s clearly written to listen
on another port.
I don’t get it… Either sshd is executed with specific instructions to not listen on this port or there’s a rogue ssh server !
root# ps auxww | grep ssh
root 2119 0.0 0.3 14272 7572 ? Ss 18:29 0:00 sshd: root@pts/0
… ? sshd: root@pts/0
… ? Whaaat ?
Maybe there’s an environment variable fucking with the sshd server ?
root# ps auxwwe | grep sshd
root 2119 0.0 0.3 14272 7572 ? Ss 18:29 0:01 sshd: root@pts/0 =
root# systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:sshd(8)
man:sshd_config(5)
DEAD !!!? WHO’S LAUNCHING THIS SERVER THEN ?
… Why don’t I see any real sshd
server with ps auxwww
? Here,
it’s listing my connection as a server… ??? Whut ?
I can’t see anything related to sshd “as-is”.
Maybe it’s executed from the initrd file ? Hmm… This server is
certainly booted from the network so I’m not going to access the initrd
easily… (Well, turns out that I could actually, but didn’t know that
back then).
But even then, I should see it in the list…
Wait, if I run the ssh service with systemctl start ssh
… and then
try to connect on the good port… Success !
WHHAAAAT !?
Then WHY !? Why systemd refused to execute this service on startup !?
… Maybe it’s not executed on startup ?
How do I check the services ran on startup again ?
root# systemctl -t service --state=active
UNIT LOAD ACTIVE SUB DESCRIPTION
blk-availability.service loaded active exited Availability of block devices
containerd.service loaded active running containerd container runtime
cron.service loaded active running Regular background program processing daemon
dbus.service loaded active running D-Bus System Message Bus
docker.service loaded active running Docker Application Container Engine
exim4.service loaded active running LSB: exim Mail Transport Agent
getty@tty1.service loaded active running Getty on tty1
getty@ttyAMA0.service loaded active running Getty on ttyAMA0
haveged.service loaded active running Entropy daemon using the HAVEGE algorithm
kmod-static-nodes.service loaded active exited Create list of required static device nodes for the current kernel
lvm2-monitor.service loaded active exited Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
ntp.service loaded active running Network Time Service
polkit.service loaded active running Authorization Manager
rsyslog.service loaded active running System Logging Service
serial-getty@ttyAMA0.service loaded active running Serial Getty on ttyAMA0
ssh@0-10.X.Y.Z:22-A.B.C.D:39924.service loaded active running OpenBSD Secure Shell server per-connection daemon
sysstat.service loaded active exited Resets System Activity Data Collector
systemd-journal-flush.service loaded active exited Flush Journal to Persistent Storage
systemd-journald.service loaded active running Journal Service
systemd-logind.service loaded active running Login Service
systemd-modules-load.service loaded active exited Load Kernel Modules
systemd-networkd-wait-online.service loaded active exited Wait for Network to be Configured
systemd-networkd.service loaded active running Network Service
systemd-random-seed.service loaded active exited Load/Save Random Seed
systemd-remount-fs.service loaded active exited Remount Root and Kernel File Systems
systemd-resolved.service loaded active running Network Name Resolution
systemd-sysctl.service loaded active exited Apply Kernel Variables
systemd-sysusers.service loaded active exited Create System Users
systemd-tmpfiles-setup-dev.service loaded active exited Create Static Device Nodes in /dev
systemd-tmpfiles-setup.service loaded active exited Create Volatile Files and Directories
systemd-udev-trigger.service loaded active exited udev Coldplug all Devices
systemd-udevd.service loaded active running udev Kernel Device Manager
systemd-update-utmp.service loaded active exited Update UTMP about System Boot/Shutdown
systemd-user-sessions.service loaded active exited Permit User Sessions
ufw.service loaded active exited Uncomplicated firewall
unattended-upgrades.service loaded active running Unattended Upgrades Shutdown
user-runtime-dir@0.service loaded active exited User Runtime Directory /run/user/0
user@0.service loaded active running User Manager for UID 0
The IP were replaced by
10.X.Y.Z
andA.B.C.D
in this copy.
… What the fuck ? There’s a service executed for my IP, but there’s no ssh.service running the server ?
This shit is insane ! What the fuck is wrong with SystemD !?
No wait… maybe I’m blaming SystemD while it’s actually System-V generating issues.
root# grep sshd /etc/* -r
/etc/default/ssh:# Options to pass to sshd
/etc/init.d/ssh:# Provides: sshd
/etc/init.d/ssh:test -x /usr/sbin/sshd || exit 0
/etc/init.d/ssh:( /usr/sbin/sshd -\? 2>&1 | grep -q OpenSSH ) 2>/dev/null || exit 0
/etc/init.d/ssh: # forget it if we're trying to start, and /etc/ssh/sshd_not_to_be_run exists
/etc/init.d/ssh: if [ -e /etc/ssh/sshd_not_to_be_run ]; then
/etc/init.d/ssh: log_action_msg "OpenBSD Secure Shell server not in use (/etc/ssh/sshd_not_to_be_run)" || true
/etc/init.d/ssh: if [ ! -d /run/sshd ]; then
/etc/init.d/ssh: mkdir /run/sshd
/etc/init.d/ssh: chmod 0755 /run/sshd
/etc/init.d/ssh: if [ ! -e /etc/ssh/sshd_not_to_be_run ]; then
/etc/init.d/ssh: /usr/sbin/sshd $SSHD_OPTS -t || exit 1
/etc/init.d/ssh: log_daemon_msg "Starting OpenBSD Secure Shell server" "sshd" || true
/etc/init.d/ssh: if start-stop-daemon --start --quiet --oknodo --chuid 0:0 --pidfile /run/sshd.pid --exec /usr/sbin/sshd -- $SSHD_OPTS; then
/etc/init.d/ssh: log_daemon_msg "Stopping OpenBSD Secure Shell server" "sshd" || true
/etc/init.d/ssh: if start-stop-daemon --stop --quiet --oknodo --pidfile /run/sshd.pid --exec /usr/sbin/sshd; then
/etc/init.d/ssh: log_daemon_msg "Reloading OpenBSD Secure Shell server's configuration" "sshd" || true
/etc/init.d/ssh: if start-stop-daemon --stop --signal 1 --quiet --oknodo --pidfile /run/sshd.pid --exec /usr/sbin/sshd; then
/etc/init.d/ssh: log_daemon_msg "Restarting OpenBSD Secure Shell server" "sshd" || true
/etc/init.d/ssh: start-stop-daemon --stop --quiet --oknodo --retry 30 --pidfile /run/sshd.pid --exec /usr/sbin/sshd
/etc/init.d/ssh: if start-stop-daemon --start --quiet --oknodo --chuid 0:0 --pidfile /run/sshd.pid --exec /usr/sbin/sshd -- $SSHD_OPTS; then
/etc/init.d/ssh: log_daemon_msg "Restarting OpenBSD Secure Shell server" "sshd" || true
/etc/init.d/ssh: start-stop-daemon --stop --quiet --retry 30 --pidfile /run/sshd.pid --exec /usr/sbin/sshd || RET="$?"
/etc/init.d/ssh: if start-stop-daemon --start --quiet --oknodo --chuid 0:0 --pidfile /run/sshd.pid --exec /usr/sbin/sshd -- $SSHD_OPTS; then
/etc/init.d/ssh: status_of_proc -p /run/sshd.pid /usr/sbin/sshd sshd && exit 0 || exit $?
grep: /etc/motd: No such file or directory
/etc/pam.d/sshd:# access limits that are hard to express in sshd_config.
/etc/passwd:sshd:x:109:65534::/run/sshd:/usr/sbin/nologin
/etc/passwd-:sshd:x:109:65534::/run/sshd:/usr/sbin/nologin
/etc/shadow:sshd:*:17346:0:99999:7:::
/etc/shadow-:sshd:*:17346:0:99999:7:::
/etc/ssh/sshd_config:# See the sshd_config(5) manpage for details
/etc/ssh/sshd_config:# Use these options to restrict which interfaces/protocols sshd will bind to
/etc/ssh/sshd_config.ucf-dist:# $OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $
/etc/ssh/sshd_config.ucf-dist:# This is the sshd server system-wide configuration file. See
/etc/ssh/sshd_config.ucf-dist:# sshd_config(5) for more information.
/etc/ssh/sshd_config.ucf-dist:# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin
/etc/ssh/sshd_config.ucf-dist:# The strategy used for options in the default sshd_config shipped with
/etc/ssh/sshd_config.ucf-dist:#PidFile /var/run/sshd.pid
/etc/systemd/system/scw-generate-ssh-keys.service:Before=sshd.service
/etc/systemd/system/scw-fetch-ssh-keys.service:Before=sshd.service
Ha ! /etc/init.d/ssh
! Maybe it’s that stupid service that’s
generating issues ! … Why is Debian mixing System-V with SystemD ?
Either go System-V or go SystemD… Don’t do both, it’s irritating…
Ok, let’s check if this script is actually run ! Let’s edit it and
add a echo MEOW > /tmp/stoopid
after set -e
.
root# /etc/init.d/ssh restart
[ ok ] Restarting ssh (via systemctl): ssh.service.
root# cat /tmp/stoopid
MEOW
root# systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2019-10-25 19:48:28 UTC; 13min ago
Docs: man:sshd(8)
man:sshd_config(5)
Process: 3914 ExecStartPre=/usr/sbin/sshd -t (code=exited, status=0/SUCCESS)
Main PID: 3915 (sshd)
Tasks: 1 (limit: 2377)
Memory: 1016.0K
CGroup: /system.slice/ssh.service
└─3915 /usr/sbin/sshd -D
Oct 25 19:48:27 myy-blargh systemd[1]: Starting OpenBSD Secure Shell server...
Oct 25 19:48:28 myy-blargh sshd[3915]: Server listening on 0.0.0.0 port N.
Oct 25 19:48:28 myy-blargh sshd[3915]: Server listening on :: port N.
Oct 25 19:48:28 myy-blargh systemd[1]: Started OpenBSD Secure Shell server.
The actual port was replaced by
N
in this copy of the logs.
Hmm… If reload it, SystemD is considering the service loaded as well… Both systems are cooperating correctly then ?
Alright, let’s reboot
and see if the file is present on reboot !
After reboot :
root# cat /tmp/stoopid
cat: /tmp/stoopid: No such file or directory
… This script is not executed ?? … Maybe it’s the initrd after all ?
Turns out that Scaleway mount the initrd file in
/run/initramfs
.
root# grep ssh /run/initramfs/*
Binary file /run/initramfs/bin/busybox matches
/run/initramfs/functions:start_sshd() {
/run/initramfs/functions: run mkdir -p /etc/dropbear /root/.ssh
/run/initramfs/functions: run chmod 700 /root/.ssh
/run/initramfs/functions: run sh -ec "scw-metadata --cached | grep 'SSH_PUBLIC_KEYS_.*_KEY' | cut -d'=' -f 2- | tr -d \' > /root/.ssh/authorized_keys"
/run/initramfs/functions: run sh -ec "scw-metadata --cached | grep 'TAGS_.*=AUTHORIZED_KEY' | cut -d'=' -f 3- | sed 's/_/\ /g' >> /root/.ssh/authorized_keys"
/run/initramfs/init:log_begin_msg "Checking metadata for debug sshd (dropbear)"
/run/initramfs/init: log_success_msg "Starting a debug sshd"
/run/initramfs/init: start_sshd
/run/initramfs/init: ewarn "You can connect to your server with 'scw' or 'ssh'"
/run/initramfs/init: ewarn " -- ssh root@${PUBLIC_IP_ADDRESS}"
/run/initramfs/init: ewarn "You can connect to your server with 'ssh'"
/run/initramfs/init: ewarn " -- ssh root@${PUBLIC_IP_ADDRESS}"
/run/initramfs/init:# Ensure sshd is killed if running
Binary file /run/initramfs/lib/aarch64-linux-gnu/libtinfo.so.5.9 matches
Binary file /run/initramfs/usr/sbin/dropbear matches
Binary file /run/initramfs/usr/bin/dropbearkey matches
Ooooh, here you go ! Let’s edit /run/initramfs/init and check how
they # Ensure sshd is killed if running
:
# Ensure sshd is killed if running
if [ "$(pgrep dropbear)" != "" ]; then
run killall dropbear
fi
Hmm :
root# ps auxww | grep drop
root 4016 0.0 0.0 5796 648 pts/0 S+ 20:07 0:00 grep drop
Ok… Nothing… I don’t get it…
Let’s check for rootkits, just in case.
root# apt install chkrootkit
root# chkrootkit
No problems detected…
I’m tired… I’m tired of this shit, there’s a fucking SSH server running on my machine, I have NO idea who’s spawning it !
Oh wait ! I forgot about lsof
!
lsof -t
No, wrong one… Couldn’t they use netstat
syntax ?
lsof -i
systemd 1 root 52u IPv6 17371 0t0 TCP TCP *:22 (LISTEN)
O_O … …
…
O_O !
YOU’RE FUCKING KIDDING ME !? SystemD ITSELF IS LISTENING ON PORT 22 !? But the SSH service is dead ! HOW ?
A little search on the internet, got me this gem : https://bbs.archlinux.org/viewtopic.php?id=166582
“So, ive found out that sshd.socket was enabled and this was the cause”
…
root# find /etc -name "ssh*.socket"
/etc/systemd/system/sockets.target.wants/ssh.socket
root# cat /etc/systemd/system/sockets.target.wants/ssh.socket
[Unit]
Description=OpenBSD Secure Shell server socket
Before=ssh.service
Conflicts=ssh.service
ConditionPathExists=!/etc/ssh/sshd_not_to_be_run
[Socket]
ListenStream=22
Accept=yes
[Install]
WantedBy=sockets.target
FUCK YOU ! FUCK THIS SHIT !
I AM DONE WITH SYSTEMD !
The real issue with SystemD
It’s OVERLY COMPLEX ! If you don’t remember all the commands and, most importantly, ALL THE WAYS SYSTEMD CAN START A SERVICE, YOU WILL NOT be able to understand what’s going on.
systemctl status ssh
was indicating the status of the
ssh.service file, not the status or presence of ssh.socket.
Understand that I threw a journactl
and searched for SSH and saw
sshd being started and receiving connections as sshd[PID]
, but not
why, nor how it was started ! And the logs were about sshd[PID]
not systemd[PID]
!
So until you understand that SystemD can start services using socket connections (Overkill feature for simple servers), you will NEVER KNOW :
- Why is the SSH service not started
- Why is there a SSH server listening on another port than the one provided in the configuration
Understand that I had a running server until start of September, the server got rebooted for whatever reasons and THEN, POOF, the SSH server started to listen on port 22.
So, the real issue with SystemD is that it’s TOO COMPLEX FOR DISTRIBUTORS !.
It can do a ton of things and maybe, once you understand it
perfectly, you’ll be happy to use all the bells and whistles…
If you need them, of course…
However :
systemctl status ssh
just shown thessh
service as DEAD !journalctl
had some “SSH connections” entries, but didn’t show why it started an SSH service !ps auxww | grep sshd
made it look like SSH servers are run “on the fly”.- MY ACTUAL SSHD CONFIGURATION WAS COMPLETELY IGNORED.
It’s those things combined which make me hate SystemD for server
management.
It’s UNRELIABLE !
If something breaks, I want AS MUCH INFORMATION AS POSSIBLE !
If SystemD start a sshd
server, systemctl status ssh
or
systemctl status sshd
should show information about it !
I don’t give two shits about the file extension of the unit file
triggering the execution of “sshd” !
That said, the issue here is not only due to SystemD overcomplexity. It’s also the fact that some distributors thought :
“Hey ! Let’s put a ssh.socket in
/etc/systemd/system/sockets.target.wants/ , so that the user will
never know why his SSH server start ignoring /etc/ssh/sshd_config !
This will be so much fun !”
The fact is : even if I remove this .socket
file, it might just
come back after an update ! And fuck up my system again !
Do you understand the issue here ? I’M LOSING CONTROL OF THE SYSTEM ! Because distributors started to ship with an OVERCOMPLEX INIT, and started using features of this init system without understanding the consequences !
If I put CONFIGURATION DIRECTIVES in a CONFIGURATION FILE, I don’t want them to be OVERRIDEN BY SOME RANDOM .whatever FILE used by my init system. If I want a different configuration, I’ll either edit the configuration file, or force the daemon to use another configuration file by editing the appropriate init file.
I’d appreciate if distributions using SystemD went with the “least amount of SystemD unit files” on server configurations.
Look at “Clear Linux”, there’s a way to use SystemD while making
things “Lean & Clean”.
Just check the /etc/
folder after installing Clear Linux :
It’s clean !
They don’t add tons of .service, .socket, .mount or
.whatever extension systemd reacts on !
No, they put a clean and lean /etc
directory, with only the
strictly necessary files.
And it works !
That said, an Init system so complex do not interest me. I’ll
brush up my systemd-foo on Arch Linux, because I really need
it. But, still, I don’t give it a shit about SystemD on my
servers ATM.
If you run tons of microservices, maybe you do.
But me ? Nope. All my main public services run in docker
containers and the only daemons and configurations I care
about on my system are :
- The SSH server
- The firewall
- The Docker containers
I could manage this with a busybox ‘init’ file…
So I’ll start looking for anoter distro that I can deploy on
my server, and which doesn’t use SystemD.
And I can’t find one, I’ll look for one using SystemD with
the least amount of units files.
I’m done with Debian and SystemD.
And I’d like to be done playing detective to understand :
- Why my server is not responding anymore ?
- Why services are executing while ignoring their configurations ?
- How I can I avoid traps added by systemd updates ?