The Linux kernel exposes lots of interesting information via the
/proc filesystem. For example,
/proc/net/udp expose information about all tcp and udp sessions on the server.
In the usual Linux kernel style, these files are freely readable by all users on the Linux system. This becomes a bit of a problem on a multi-tenant system like Heroku. Heroku deploy many customers apps on each of their Amazon EC2 servers. So if you create an app skeleton, deploy to Heroku and shell out and you can see all the other apps tcp and udp sessions:
$ heroku run console
Running `console` attached to terminal... up, run.1971
irb(main):001:0> system("netstat | grep ESTAB | head -n 10")
tcp 0 0 22d87d80-deb7-415:33113 ip-10-40-86-97.ec:27018 ESTABLISHED
tcp 0 0 22d87d80-deb7-415:57256 ip-10-60-122:postgresql ESTABLISHED
tcp 0 0 22d87d80-deb7-415:59268 domU-12-31-3:postgresql ESTABLISHED
tcp 0 0 22d87d80-deb7-415:38536 collector6.newrelic:www ESTABLISHED
tcp 0 0 22d87d80-deb7-415:54459 ip-10-189-243-92.:27017 ESTABLISHED
tcp 0 0 22d87d80-deb7-415:50495 ip-10-151-25-11.ec:6242 ESTABLISHED
tcp 0 0 22d87d80-deb7-415:53192 ip-10-114-25:postgresql ESTABLISHED
tcp 0 0 22d87d80-deb7-415:47405 220.127.116.11:https ESTABLISHED
tcp 0 0 22d87d80-deb7-415:39011 collector6.newrelic:www ESTABLISHED
tcp 0 0 22d87d80-deb7-415:52237 ip-10-218-31-147.:42002 ESTABLISHED
If the netstat command (or system method) is not available, you can just read and parse the file in Ruby.
It’s an information leak, so I think it is quite low risk but still a hint that Heroku’s process separation still isn’t as strict as one might hope (I still remember David Chen’s pretty shocking discovery back in 2011)
There are a few ways to fix this kind of problem. Mandatory access control systems, like AppArmor can prevent processes reading these files. The Grsecurity security patches have lots of protections against proc based information leaking too, these files included.
I reported my findings to Heroku’s security team back in September 2012. I had some problems getting timely responses at first but after whining on Twitter (in March 2013) I got lots of attention from them (and an apology for lack of response). Complaining on Twitter is a clearly a powerful tool and to be used only wisely; and maybe sometimes when drunk.
Anyway, it seems they had been working hard on this and just failing to update me; it is quite a major change for them and a low priority bug imo. They’ve fixed it by properly virtualising networking too. It’s mentioned on their change log but it doesn’t go into too much detail. Important point is that it’s fixed.
The Linux Logical Volume Manager (LVM) supports creating snapshots of logical volumes (LV) using the device mapper. Device mapper implements snapshots using a copy on write system, so whenever you write to either the source LV or the new snapshot LV, a copy is made first.
So a write to a normal LV is just a write, but a write to a snapshotted LV (or an LV snapshot) involves reading the original data, writing it elsewhere and then writing some metadata about it all.
This quite obviously impacts performance, and due to device mapper having a very basic implementation, it is particularly bad. My tests show synchronous sequential writes to a snapshotted LV are around 90% slower than writes to a normal LV.
Continue reading LVM snapshot performance
On a busy Linux Netfilter-based firewall, you usually need to up the maximum number of allowed tracked connections (or new connections will be denied and you’ll see log messages from the kernel link this:
nf_conntrack: table full, dropping packet.
More connections will use more RAM, but how much? We don’t want to overcommit, as the connection tracker uses unswappable memory and things will blow up. If we set aside 512MB for connection tracking, how many concurrent connections can we track?
There is some Netfilter documentation on wallfire.org, but it’s quite old. How can we be sure it’s still correct without completely understanding the Netfilter code? Does it account for real life constraints such as page size, or is it just derived from looking at the code? A running Linux kernel gives us all the info we need through it’s
slabinfo proc file.
Continue reading Netfilter Conntrack Memory Usage
I’m doing a talk tonight about virtualizing your storage with LVM on Linux at the West Yorkshire Linux User Group. Sorry about the short notice here (it was announced earlier in the week elsewhere though).
My mate Paul Brook is talking about RAID on Linux too.
Come along for the talk, or the beer, or the socialising – or all three.
Ricardo Correia has been porting Sun’s recently GPLed ZFS to Linux using FUSE. I’ve been playing with it and I’m quite impressed. The FUSE port is alpha quality, so isn’t to be trusted with important data yet – but it’s fun to play with.
ZFS merges the concept of a volume manager and a filesystem. It’s a bit like LVM, with zpools being volume groups and zfs being formatted logical volumes. Zfs “partitions” can change size at any time in any way. It’s also hierarchical, so zfs partitions can have child partitions inheriting their attributes. It also does away with fstab – all mount points are specified as zfs attributes and are automatically mounted when a zpool is brought online.
Continue reading Sun’s ZFS on Linux via FUSE
I upgraded my home gateway firewall to Edgy today in the hope of fixing some SATA problems I’ve been experiencing. The new Edgy kernel might help – we’ll see.
Anyway, it went pretty well. Two runs (?) of
apt-get dist-upgrade -u, a reboot and there I was.
Unfortunately I had two problems with my Openswan IPSEC VPNs. I’m not so sure if these count as bugs. I’ll be investigating further and reporting if so. Anyway, techie details follow…
Continue reading IPSEC VPN problems upgrading to Ubuntu Edgy
I met Stuart Langridge and Jono Bacon from LUG Radio last night as they came to Leeds where Jono talked at my local LUG. Wookey also gave a talk about the latest stuff going on in the Emdebian world.
And of course I also met up with some of the WYLUG regulars (Robert Speed, Jim Jackson, James Holden, et al) and some perhaps not so regular (which would include me btw).
We enjoyed the talks, went to a local Italian place for dinner then to a local pub. Was a great night.
Here I am attempting to connect to a server using lftp wondering why the firewall is blocking the incoming data connections, even though ip_conntrack_ftp has been working for years.
lftp supports tls, and so does the server I’m connecting to. This means the control connection is encrypted, so the netfilter ftp connection tracker can’t peek inside the packets to find out which ports to open up to allow the data connections. DUH. Ftp sucks.
Anyway, the only way I found to disable tls support in lftp is to add the following line to
set ftp:ssl-allow false
It turns out that xorg will use $HOME/xorg.conf if it finds it, rather than the default in /etc/X11/xorg.conf.
I didn’t know this, and didn’t notice that it was telling me this in the logs. I’ve now wasted a bunch of time troubleshooting a font problem on Ubuntu where xorg couldn’t find my fonts and I was working from the WRONG CONFIG FILE. AGH!
Anyway, in other Ubuntu xorg related news, something changed in the latest Breezy upgrades that causes /dev/input/mice not to be created. Xorg then bombs on boot as that’s my core pointer. If you restart udev the device nodes get created fine. Not sure what this is yet.
I’m playing with the grsecurity patches for Linux. Unfortunately 2.6.8 changed in a way that causes major headache for the grsec team, so no planned release date for a new patch. Having some problems with strange enforcements of rlimits, potentially linked to the rlimit auditing code. I’ll hopefully get time to tinker with SELinux too.
My new Fedora installation was playing up with certain web sites resulting in *very* slow download (I could see the words drawing on my screen one by one). A ethereal dump showed a nice big window size, but max 120 byte packets and an ack for each one!
Well it turns out since about kernel 2.6.7, the default tcp_window_scale setting has been 7. The problem is, as was with ECN, there are lots of broken routers out there which break window scaling (they strip the TCP options, which is totally against RFC, and common sense). So the other end doesn’t know you’re scaling, so it’ll think you set (or you think it set) a tiny ikle window size.
Anyway I fixed it for now with a ‘net.ipv4.tcp_default_win_scale = 0’ in my /etc/sysctl.conf, but there is a new kernel patch floating around which seems to be a bit cleverer and will be due in the next kernel.
I’ve been working on my qmail rpms for RedHat ES/AS/Fedora. I’ve even started some documentation. It’s all on my RedHat page.
I’ve also been working on Firestorm, improving the arp decoder and developing my macwatch arpwatch clone. Hopefully this will appear in the latest Firestorm tree soon.
I recently ditched my aging Linux wireless bridge/router/firewall in favour of a little Linksys device that cost no more than 60 pounds, uses considerably less electricity and makes almost no noise. The price is impressive and even the device seems to work ok. One thing it can’t deal with properly at all is the TCP ECN flag. The web admin port just sends a RST. Can you believe a Cisco company would make this mistake? Yes. I can.
Also, I’ve created an rpm2html index of all the RPMs in my downloads tree. Some are old crap I’ve not bothered deleting yet, but there is some stuff in there that will be useful to someone (not just google).
Gianni will be home from Luxembourg soon.