F2FS experiences

zackw · 11-15-2020, 11:52 AM

Has anyone tried out F2FS for filesystems on the internal eMMC, and if so, how did it go? I'm curious about both performance and reliability.

I'm using the stable 20.10 release of Manjaro-ARM right now, but I might wind up switching again now the hardware support has been mainstreamed, so reports from any environment are welcome.

DrYak · 11-15-2020, 07:02 PM

F2FS is not a bad filesystem performance-/stability- wise. (It has decent result on Phoronix benchmarks).
It's also very gentle on flash (it's Log structured), hence its name.
I have been using it a lot for root partitions on several Raspberry Pi projects (as the default out-of-the-box raspbian doesn't use initrd by default, and my most favourite filesystem - BTRFS - isn't compiled in and requires a modules, whereas F2FS is available out of the box), and never had a problem.

If you're using it on the default u-boot that came with your Pinebook Pro's Manjaron ARM - it doesn't support reading of a F2FS partition, only the EXT4 and FAT partition drivers are compiled in.
It means that your /boot partition needs to be separate and in a format that this u-boot version understands (e.g.: out of the box, your Pinebook Pro uses a FAT32 partition for /boot )

Manjaro ARM's Linux kernel has F2FS compiled in, so it should work out of the box. If you recompile your own kernel, pay attention to either: a) compile F2FS driver in or b) change the initrd options to include the F2FS module (in /etc/mkinitcpio.conf, add "MODULES=(f2fs)")

Henry · 11-16-2020, 08:57 AM

(11-15-2020, 07:02 PM)DrYak Wrote: Manjaro ARM's Linux kernel has F2FS compiled in, so it should work out of the box.

Currently, Manjaro's images use ext4 and the installer does not seem to offer the option to select a filesystem (or so I remeber since last time I used it). What would be the easiest way to get Manjaro with F2FS? (<https://forum.pine64.org/showthread.php?tid=12121&pid=83449#pid83449> suggested a tool to convert an existing filesystem, but I would imagine there should be a more canonical way to do that.)

DrYak · 11-17-2020, 07:51 AM

(11-16-2020, 08:57 AM)Henry Wrote: Currently, Manjaro's images use ext4 and the installer does not seem to offer the option to select a filesystem (or so I remeber since last time I used it).

Yes, the installer basically only flash an already pre-made partition image and then extend it. Manjaro Installer uses a ready made EXT4 image and thus can only deploy to an EXT4 partition.

(11-16-2020, 08:57 AM)Henry Wrote: (<https://forum.pine64.org/showthread.php?tid=12121&pid=83449#pid83449> suggested a tool to convert an existing filesystem, but I would imagine there should be a more canonical way to do that.)

Almost always avoid in-place conversion tools at all cost.
(The only exceptions: converting to a newer version of EXTn where it basically just amounts to turning on a new options that was introduced in later versions. e.g.: EXT2 -> EXT3 conversion is just switching the journal on.)
(Also BTRFS due to its weird CoW structure and flexible layout can actually keep the original EXT2/3 partition in place as-is untouched and wiggle itself in the free space remaining around, thus giving you a perfect way to recover back. So supposedly it should be okay. But even there, I prefer to avoid.)

(11-16-2020, 08:57 AM)Henry Wrote: What would be the easiest way to get Manjaro with F2FS?

The same way I converted mine to BTRFS:
Basically "just copy over the files onto new partition(s)".

boot on something else (SD Card or boot stick)
(not mandatory, but it helps *a lot* if you're not running the system you're moving around)
if you already have used you Manjaro on your eMMC and plan to install F2FS on the same eMMC: make a backup from it. You can go:
- completely dump and just DD is to an externel USB3 harddisk.
- use something a bit more refined like partclone and copy only the actually used parts to the USB support
partition and format your target (eMMC, SD card or SSD NVMe)
- blkdiscard the old partition (optionnally: the whole device) so the wear levelling running on the controller is aware tha the space is free.
- boot partition 256MiB to 512MiB, *must* be either EXT4 or FAT32 (as supported by the official uBoot), *should be* on eMMC orSD, unless you have flashed the special NVMe-aware uboot on SPI.
- root partition: most of the remaining space
- swap partition: a couple of GiB should be enough
- leave some extra free space
try aligning it to at least flash boundary. Usually my boot goes from 4MiB (or whatever the original flash did align to) up to before 256MiB. And my root starts from 256MiB.
mount the destination:
- create /mnt/target
- mount your new root to /mnt/target
- create /mnt/target/boot
- mount your new boot to /mnt/target/boot
mount the source manjaro:
- create /mnt/source
- loop mount the image file if you're installing from fresh
- or mount the SD card or eMMC on a USB adapter
- or mount the external USB HDD
copy:

Code:
rsync -avPSHAX /mnt/source/ /mnt/target
Mind the "/" at the end of "/source/" other wise you'll get a single sub-directory with everything inside your target.
"a" - archive will copy almost everything, SHAX will take care of the rest (Sparse files, hard-links, ACLs, and XATTR)
flash the boot loader
(mandatory if you did blkdiscard the whole flash, or if you switched to a new medium.
Usually that's something like:

Code:
dd if=/boot/idbloader.img of=/dev/mmcblkX seek=64 conv=notrunc dd if=/boot/u-boot.itb of=/dev/mmcblkX seek=16384 conv=notrunc
unless you're booting from pureley NVMe, in which case you need to flash a special uboot on the SPI)
check and update the boot/extlinux.conf in the new installation.
(e.g.: change the label, the UUID or whatever you're passing to the linux kernel as a root mount)
(check also the paths as now uboot must load the file from the root of that partition)
unmount everything and try booting the new install.

If the new installation fails to boot, most likely you've borked the new deployed extlinux, just reboot using the "something else" from step one, and edit.
(I needed to re-edit twice mine: got the wrong paths (was still refering stupidly to "/boot/vmlinuz" (uboot fails and the laptop's led keeps orange), and then again I mistook the new partition LABEL= i was using to identify the root partition (Linux and its initrd boot, but then it complains about being unable to find the root device, and stays stuck waiting for it to appear) )

User 18618 · 11-17-2020, 07:57 AM

Thanks @DrYak for your illuminating posts Smile

zackw · 11-18-2020, 12:59 PM

Thanks everyone for your suggestions and experiences.

FYI Manjaro's installation images currently produce severely misaligned partitions -- someone got decimal and binary megabytes mixed up; not only are the partitions not aligned to a multiple of any plausible erase block size, as they ought to be, they aren't even aligned to a round number of sectors. So, if I did this conversion, I would be wiping out all the partitions and recreating them anyway.

BIPM should never have defined the power-of-1024 suffixes, they just made the confusion even worse.

xyzzy · 11-19-2020, 04:24 PM

(11-18-2020, 12:59 PM)zackw Wrote: FYI Manjaro's installation images currently produce severely misaligned partitions

Non 4kB aligned partitions are usually really bad. I'd fix this even if I wasn't going to change FS type. What this means really depends on what the secret FTL inside the eMMC does. Maybe the firmware in it is smart enough to detect "screwed up partition table" and it offsets all the LBAs by 1? But most likely it means each write to a 4kB ext4 (or other fs) block involves a read-modify-write cycle to the two flash blocks (of unknown size, 2k? 4k?) that the ext4 block overlaps.

It would also be nice if they'd take those magic locations for the idbloader and so on and make them partitions. It's entirely possible to add them at the exact spot where the boot loader(s) will look for them. Then no more magic dd offsets need to be used, just go straight to the partition. And there is no my mystery blank space at the beginning of the eMMC where critical boot files are kept. And if they get to big to fit there's an error trying to flash them instead of silently overwriting and corrupting the boot file that comes afterward.

And use a gpt table, so partition labels can be used. Then idbloader.img goes to "/dev/disk/by-partlabel/idbloader". No more keeping track of partition numbers.

DrYak · 11-25-2020, 01:53 PM

(11-19-2020, 04:24 PM)xyzzy Wrote:
(11-18-2020, 12:59 PM)zackw Wrote: FYI Manjaro's installation images currently produce severely misaligned partitions
I'd fix this even if I wasn't going to change FS type.

It's not limited to Manjaro, I've seen images of Raspbian assuming some small alignement too...

(11-19-2020, 04:24 PM)xyzzy Wrote: Non 4kB aligned partitions are usually really bad. {...} But most likely it means each write to a 4kB ext4 (or other fs) block involves a read-modify-write cycle to the two flash blocks (of unknown size, 2k? 4k?) that the ext4 block overlaps.

This 4kiB doesn't matter any more nowadays.

That used to be important once HDD with 4kiB sectors (either natural 4KiB, or actually 4KiB but simulating a 512b to not disturb the OS), because exactly what you discribbed would happen.

That's not the case anymore with Flash (and also with shingled HDD), because most flash is organised in erase blocks: even if they can write sectors of 512b or 4KiB, they can only write those over free space, but they can't erase anything smaller than 2MiB ~ 8MiB (depending on FTL and on number of bits per SLC/MLC/TLC/etc. )

Trying to overwrite and change in-place any content small than that size will always trigger a read-modify-write cycle. (with the small details that the erase block that get "read" and "erased", and that the erase block that get actually "written with the modified version", might actually be two different block in order to rotate which blocks get erased and spread the wear).

That why log-structured filesystems (like F2FS) or copy-on-write filesystems (BTRFS, ZFS, BCacheFS) are better for flash:
They *never* modify/overwrite inplace (which will invariably trigger a read-modify-write). They *always exclusively* append writes by definition (which can be write-only by allocating fresh free erased blocks from the wear levelling pool).

(11-19-2020, 04:24 PM)xyzzy Wrote: What this means really depends on what the secret FTL inside the eMMC does. Maybe the firmware in it is smart enough to detect "screwed up partition table" and it offsets all the LBAs by 1?

I wouldn't put too much hope on such a small / cheap chip :-)

(11-19-2020, 04:24 PM)xyzzy Wrote: It would also be nice if they'd take those magic locations for the idbloader and so on and make them partitions. It's entirely possible to add them at the exact spot where the boot loader(s) will look for them. Then no more magic dd offsets need to be used, just go straight to the partition.

GPT partition tables allow exactly that with BIOS boot partitions (21686148-6449-6E6F-744E-656564454649).
I have formatted my NVMe this way.

(11-19-2020, 04:24 PM)xyzzy Wrote: And use a gpt table, so partition labels can be used. Then idbloader.img goes to "/dev/disk/by-partlabel/idbloader". No more keeping track of partition numbers.

Mine is called uboot.

xyzzy · 11-25-2020, 02:27 PM

(11-25-2020, 01:53 PM)DrYak Wrote:
(11-19-2020, 04:24 PM)xyzzy Wrote: Non 4kB aligned partitions are usually really bad. {...} But most likely it means each write to a 4kB ext4 (or other fs) block involves a read-modify-write cycle to the two flash blocks (of unknown size, 2k? 4k?) that the ext4 block overlaps.

This 4kiB doesn't matter any more nowadays.

That's not what I see when I test. I did read-testing with flashbench:

align 16777216 pre 282µs       on 395µs        post 280µs      diff 114µs
align 8388608   pre 282µs       on 403µs        post 285µs      diff 120µs
align 4194304   pre 231µs       on 350µs        post 281µs      diff 94µs
align 2097152   pre 230µs       on 263µs        post 231µs      diff 32.5µs
align 1048576   pre 231µs       on 255µs        post 229µs      diff 24.5µs
align 524288    pre 231µs       on 255µs        post 233µs      diff 22.8µs
align 262144    pre 228µs       on 245µs        post 228µs      diff 16.9µs
align 131072    pre 230µs       on 254µs        post 233µs      diff 22.9µs
align 65536     pre 231µs       on 261µs        post 228µs      diff 31.1µs
align 32768     pre 230µs       on 247µs        post 232µs      diff 16.4µs
align 16384     pre 233µs       on 249µs        post 233µs      diff 15.6µs
align 8192      pre 232µs       on 250µs        post 233µs      diff 16.7µs
align 4096      pre 232µs       on 253µs        post 234µs      diff 20.2µs
align 2048      pre 233µs       on 234µs        post 233µs      diff 416ns

The important number is the last column, which is the difference between reading before or after a boundary vs on the boundary. The increase in cost of a read that crosses a 4kB boundary is about 50x higher than a read that crosses a 2kB boundary. There's also a significant change at 4 MB, which is probably the erase block size.

I also did a test using fio of random 4kB IO on the ext4 partition in it's original unaligned location and after I moved the partition to align it. Unaligned it got 1701 IOPS read and 568 IOPS write. After alignment it was 3549 IOPS read and 1168 IOPS write.

DrYak · (This post was last modified: 11-25-2020, 04:50 PM by DrYak.)

Sorry, my bad, I should have written it more explicitly:
4KiB doesn't matter that much nowadays with regards to write amplification (read-modify-write cycles).

(i.e.: R-M-Ws will always happen upon in-place overwriting in a sub-part of an erase block).

Of course, it would matter to at least align the 4KiB block of an ext partitions with the 4KiB sectors, for read performance (faster to read a single sector that to reads 2 sectors and merge them).

(and given that the erase-block are a super-set of this sector layer, aligning to erase blocks to aleviate R-M-Ws will also get you aligned to sector boundary anyway).

Login




Remember me Lost Password?

About Us