Voozh

From:	Andres Freund <andres-AT-anarazel.de>
To:	Andreas Dilger <adilger-AT-dilger.ca>
Subject:	Re: fsync() errors is unsafe and risks data loss
Date:	Wed, 11 Apr 2018 19:17:52 -0700
Message-ID:	<20180412021752.2wykkutkmzh4ikbf@alap3.anarazel.de>
Cc:	20180410184356.GD3563-AT-thunk.org, "Theodore Y. Ts'o" <tytso-AT-mit.edu>, Ext4 Developers List <linux-ext4-AT-vger.kernel.org>, Linux FS Devel <linux-fsdevel-AT-vger.kernel.org>, Jeff Layton <jlayton-AT-redhat.com>, "Joshua D. Drake" <jd-AT-commandprompt.com>

Hi,

On 2018-04-11 15:52:44 -0600, Andreas Dilger wrote:

It's not just postgres. dpkg (underlying apt, on debian derived distros)
to take an example I just randomly guessed, does too:
 /* We want to guarantee the extracted files are on the disk, so that the
 * subsequent renames to the info database do not end up with old or zero
 * length files in case of a system crash. As neither dpkg-deb nor tar do
 * explicit fsync()s, we have to do them here.
 * XXX: This could be avoided by switching to an internal tar extractor. */
 dir_sync_contents(cidir);

(a bunch of other places too)

Especially on ext3 but also on newer filesystems it's performancewise
entirely infeasible to fsync() every single file individually - the
performance becomes entirely attrocious if you do that.

I think there's some legitimate arguments that a database should use
direct IO (more on that as a reply to David), but claiming that all
sorts of random utilities need to use DIO with buffering etc is just
insane.



Except that they won't notice that they got a failure, at least in the
dpkg case. And happily continue installing corrupted data



Yea, I agree that'd not be sane. As far as I understand the dpkg code
(all of 10min reading it), that'd also be unnecessary. It can abort the
installation, but only if it detects the error. Which isn't happening.



And that's *horrible*. If I cp a file, and writeback fails in the
background, and I then cat that file before restarting, I should be able
to see that that failed. Instead of returning something bogus.

Or even more extreme, you untar/zip/git clone a directory. Then do a
sync. And you don't know whether anything actually succeeded.



The data in the file also is corrupt. Having to unmount or delete the
file to reset the fact that it can't safely be assumed to be on disk
isn't insane.



Except that postgres uses multiple processes. And works on a lot of
architectures. If we started to fsync all opened files on process exit
our users would *lynch* us. We'd need a complicated scheme that sends
processes across sockets between processes, then deduplicate them on the
receiving side, somehow figuring out which is the oldest filedescriptors
(handling clockdrift safely).

Note that it'd be perfectly fine that we've "thrown away" the buffer
contents if we'd get notified that the fsync failed. We could just do
WAL replay, and restore the contents (just was we do after crashes
and/or for replication).



There's already a per-process cache of open files.



Well, I'm making that argument because several people argued that
throwing away buffer contents in this case is the only way to not cause
OOMs, and that that's incompatible with reporting errors. It's clearly
not...



Sure.



I don't think this is that PG specific, as explained above.


Greetings,

Andres Freund

URL: https://lwn.net/Articles/752108/

⇱ Re: fsync() errors is unsafe and risks data loss [LWN.net]

Re: fsync() errors is unsafe and risks data loss