Solaris file system says it’s full, but there’s plenty of free space? Comments Off on Solaris file system says it’s full, but there’s plenty of free space?
A fairly common problem with Solaris UFS filesystems is where df output is showing lots of free space, but you can’t actually write to the filesystem. Having been recently playing with multi-terabyte filesystems, and forcing these sort of issues for debugging, I thought I’d share some information about the tools you can use and what they can report.
As an example, let’s look at a 2TB filesystem:
[root@gollum:/] # df -kh Filesystem size used avail capacity Mounted on /dev/dsk/c9t60060E80141189000001118900001400d0s0 1.9T 532G 1.4T 28% /fatty
The first thing we can do is not only check the amount of free disk space, but also check inode usage:
df -F ufs -o i
[root@gollum:/] # df -F ufs -o i Filesystem iused ifree %iused Mounted on /dev/dsk/c9t60060E80141189000001118900001400d0s0 2096192 0 100% /fatty
If we have multi-terabyte filesystems, our number of bytes per inode (nbpi) could be set too high if we’re using lots of small files – in which case it’s very easy to run out of inodes. We can see on this filesystem that we’ve used up all our inodes. Trying to write to this filesystem will result in “No space left on device” error messages – which is always good for some head scratching fun, as we can see that we’ve got 1.4Tb of space free.
To get an idea of how inodes, block size and things have been specified we need to find out how the filesystem was built:
/usr/sbin/mkfs -m <disk_device>
I’ve wrapped the line here to make it a bit more readable, but here’s the output querying our full multi-terabyte filesystem.
[root@gollum:/] # /usr/sbin/mkfs -m /dev/dsk/c9t60060E80141189000001118900001400d0s0 mkfs -F ufs -o nsect=128,ntrack=48,bsize=8192,fragsize=8192,cgsize=143,free=1,rps=1,nbpi=1161051, \ opt=t,apc=0,gap=0,nrpos=1,maxcontig=128 /dev/dsk/c9t60060E80141189000001118900001400d0s0 4110401456
This will show the commands passed to mkfs when it created the filesystem, and we can get an idea of what parameters were specified when the filesystem was built.
Things we care about here are:
- fragsize – the smallest amount of disk space that can be allocated to a file. If we have loads of files smaller than 8kb, then this should be smaller than 8kb.
- nbpi – number of bytes per inode
- opt – how is filesystem performance being optimised? t means we’re optimising to spend the least time allocating blocks, and s means we’ll be minimising the space fragmentation on the disk
On a multiterabyte filesystem, nbpi cannot be set to less than 1mb, and fragsize will also be set to bsize. So we’d want to optimise for time as opposed to fragments, as we’ll only every allocate in 8kb blocks.
fstyp is the command we can use to do some really low-level querying of a UFS filesystem.
We can invoke it with:
fstyp -v <disk_device>
Make sure you pipe it through more, or redirect the output to a file, because there’s a lot of it. fstyp will report on the statistics of all the cylinder groups for a filesystem, but it’s really just the first section reported from the superblocks that we’re interested in.
[root@gollum:/] # fstyp -v /dev/dsk/c9t60060E80141189000001118900001400d0s0 | more ufs magic decade format dynamic time Fri Dec 5 17:26:27 2008 sblkno 2 cblkno 3 iblkno 4 dblkno 11 sbsize 8192 cgsize 8192 cgoffset 8 cgmask 0xffffffc0 ncg 4679 size 256900091 blocks 256857968 bsize 8192 shift 13 mask 0xffffe000 fsize 8192 shift 13 mask 0xffffe000 frag 1 shift 0 fsbtodb 4 minfree 1% maxbpg 2048 optim time maxcontig 128 rotdelay 0ms rps 1 csaddr 11 cssize 81920 shift 9 mask 0xfffffe00 ntrak 48 nsect 128 spc 6144 ncyl 669011 cpg 143 bpg 54912 fpg 54912 ipg 448 nindir 2048 inopb 64 nspf 16 nbfree 187148663 ndir 2 nifree 0 nffree 0 cgrotor 462 fmod 0 ronly 0 logbno 23 version 1 fs_reclaim is not set
bsize and fsize show us the block and fragment size, respectively.
nbfree and nffree show us the number of free block and fragments, respectively. If nbfree is 0, you’re in trouble – no free blocks means no more writing to the filesystem, regardless of how much space is actually still available.
What usually happens when writing lots of small (ie. > 8kb) files to a filesystem is that the number of free blocks (nbfree) has fallen to 0, but you’ve got plenty of fragments left. If block size = fragment size, that’s not an issue – but if fragments are, say, 2kb, then you’re not going to be able to write to the filesystem any more (“file system full” error messages) even though df is showing lots of free disk space.
A big part of tuning your filesystem is knowing what’s going on it. For multi-terabyte filesystems, you should be placing larger files on there – so setting block size to equal fragment size won’t be wasting space.
If you’ve got lots of smaller files, you’ll need to think about what the average filesize is – if it’s less than 8kb, you’ll want to make sure that fragment size is also less than 8kb. Otherwise you’ll be wasting space by writing 8kb blocks all the time when you could get away with 2kb fragments.
Anyway, back to the problem at hand – our 2Tb filesystem that’s run out of inodes. In this particular case, we’ll need to rebuild the filesystem and allocate more inodes. The question is – how do we work out what the value should be?
This simple shell script will analyse the files from the directory you execute it in, and will come back with the average file size:
#!/bin/sh find . -type f -exec ls -l {} \; | \ awk 'BEGIN {tsize=0; fcnt=1;} \ { printf("%03d File: %-060s size: %d bytes\n", fcnt++, $9, $5); \ tsize += $5; } \ END { printf("Total size = %d Average file size = %.02f\n", \ tsize, tsize/fcnt); } '
Running it we can see:
(lots of output) .... Total size = 2147483647 Average file size = 258286.18
Now, if our average file size is 252k, then our inode density of 1161051 (1 inode per 1mb) is going to be hopelessly inadequate. This is born out by looking again at our df output – we can see that we’ve run out of inodes when the filesystem is only approximately a quarter full, which matches up to our average file size being a quarter of the inode density.
However, at this point, we’re stuffed – we can’t set nbpi to be less than 1mb on a Solaris UFS filesystem that’s larger than 1Tb. Our only options are:
- chop the filesystem up into smaller ones
- migrate to ZFS
- create bigger files ;-)