Orkut Gmail Agenda Docs Web mais »
Grupos visitados recentemente | Ajuda | Acessar
Página inicial dos Grupos do Google
True fsync() in Linux (on IDE)
Há um número excessivo de tópicos que aparecem em primeiro plano neste grupo. Para fazer com que este tópico apareça primeiro, elimine essa opção de um outro tópico.
Erro ao processar a solicitação. Tente novamente.
Modo de exibição padrão   Exibir como árvore
sinalizar
  Mensagens 1 - 25 de 40 - Recolher todas  -  Traduzir tudo para Traduzido (ver todos os originais)   Recentes >
O grupo no qual você está postando é um grupo da Usenet. As mensagens postadas neste grupo farão com que o seu e-mail fique visível para qualquer pessoa na internet.
Sua resposta não foi enviada.
Postagem publicada
 
De:
Para:
Cc:
Encaminhar para
Adicionar Cc | Adicionar Encaminhar para | Editar Assunto
Assunto:
Validação:
Com o objetivo de verificação, digite os caracteres que você vê na figura abaixo ou os números que ouvir ao clicar no ícone de acessibilidade. Ouça e digite os números que ouvir
 
Peter Zaitsev  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 17 mar 2004, 22:20
Grupos de notícias: linux.kernel
De: Peter Zaitsev <pe...@mysql.com>
Data: Thu, 18 Mar 2004 02:20:11 +0100
Local: Qua 17 mar 2004 22:20
Assunto: True fsync() in Linux (on IDE)
Hello,

I'm wondering is there any way in Linux to do proper fsync(), which
makes sure data is written to the disk.

Currently on IDE devices one can see, fsync() only flushes data to the
drive cache which is not enough for ACID guaranties database server must
give.

There is solution just to disable drive write cache, but it seems to
slowdown performance way to much.

I would be also happy enough with some global kernel option which would
enable drive cache flush on fsync :)

Mac OS X also has this "optimization", but at least it provides an
alternative flush method for Database Servers:

fcntl(fd, F_FULLFSYNC, NULL)

can be used instead of fsync() to get true fsync() behavior.

--
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
  http://www.mysql.com/uc2004/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Jens Axboe  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 04:00
Grupos de notícias: linux.kernel
De: Jens Axboe <ax...@suse.de>
Data: Thu, 18 Mar 2004 08:00:21 +0100
Local: Qui 18 mar 2004 04:00
Assunto: Re: True fsync() in Linux (on IDE)

On Wed, Mar 17 2004, Peter Zaitsev wrote:
> Hello,

> I'm wondering is there any way in Linux to do proper fsync(), which
> makes sure data is written to the disk.

> Currently on IDE devices one can see, fsync() only flushes data to the
> drive cache which is not enough for ACID guaranties database server must
> give.

> There is solution just to disable drive write cache, but it seems to
> slowdown performance way to much.

Chris and I have working real fsync() with the barrier patches. I'll
clean it up and post a patch for vanilla 2.6.5-rc today.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Matthias Andree  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 08:40
Grupos de notícias: linux.kernel
De: Matthias Andree <matthias.and...@gmx.de>
Data: Thu, 18 Mar 2004 12:40:09 +0100
Local: Qui 18 mar 2004 08:40
Assunto: Re: True fsync() in Linux (on IDE)

On Thu, 18 Mar 2004, Jens Axboe wrote:
> Chris and I have working real fsync() with the barrier patches. I'll
> clean it up and post a patch for vanilla 2.6.5-rc today.

This is good news.

The barrier stuff is long overdue^UI'm looking forward to this.

I'm using the term "TCQ" liberally although it may be inexact for older
(parallel) ATA generations:

All these ATA fsync() vs. write cache issues have been open for much too
long - no reproaches, but it's a pity we haven't been able to have data
consistency for data bases and fast bulk writes (that need the write
cache without TCQ) in the same drive for so long. I have seen Linux
introduce TCQ for PATA early in 2.5, then drop it again. Similarly,
FreeBSD ventured into TCQ for ATA but appears to have dropped it again
as well.

May I ask that the information whether a particular driver (file system,
hardware) supports write barriers be exposed in a standard way, for
instance in the Kconfig help lines?

If I recall correctly from earlier patches, the barrier stuff is 1.
command model (ATA vs.  SCSI) specific and 2. driver and hardware
specific and 3. requires that the file system knows how to use this
properly.

Given that file systems have certain write ordering requirements if they
are to be recoverable after a crash, I suspect Linux has _not_ been able
to guarantee on-disk consistency for any time for years, which means
that a crash in the wrong moment can kill the file system itself if the
drive has reordered writes - only ext3 without write cache seems to
behave better in this respect (data=ordered).

I would like to have a document that shows which file system, which
chipset driver for PATA, which chipset driver for ATA, which low-level
SCSI host adaptor driver, which file system support write barrier. We
will probably also need to check if intermediate layers such as md and
dm-mod propagate such information.

Given the necessary information, I can hack together a HTML document to
provide this information; this offer has however not seen any response
in the past. I am however not acquainted with the drivers and need
information from the kernel hackers. Without such support, such a
documentation effort is doomed.

BTW, I should very much like to be able to trace the low-level write
information that goes out to the device, possibly including the payload
- something like tcpdump for the ATA or SCSI commands that are sent to
the driver. Is such a facility available?

--
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Jens Axboe  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 09:00
Grupos de notícias: linux.kernel
De: Jens Axboe <ax...@suse.de>
Data: Thu, 18 Mar 2004 13:00:19 +0100
Local: Qui 18 mar 2004 09:00
Assunto: Re: True fsync() in Linux (on IDE)

(btw - maybe you don't like to be cc'ed on kernel posts, but I do. it's
lkml etiquette to do so, and it makes sure that I see your mail.
otherwise I might not, especially true for bigger threads. so please, cc
people. thanks)

That's because PATA TCQ sucks :-)

> May I ask that the information whether a particular driver (file system,
> hardware) supports write barriers be exposed in a standard way, for
> instance in the Kconfig help lines?

Since reiser is the first implementation of it, it gets to chose how
this works. Currently that's done by giving -o barrier=flush (=ordered
used to exist as well, it will probably return - right now we just
played with IDE).

> If I recall correctly from earlier patches, the barrier stuff is 1.
> command model (ATA vs.  SCSI) specific and 2. driver and hardware
> specific and 3. requires that the file system knows how to use this
> properly.

Yes.

> Given that file systems have certain write ordering requirements if they
> are to be recoverable after a crash, I suspect Linux has _not_ been able
> to guarantee on-disk consistency for any time for years, which means
> that a crash in the wrong moment can kill the file system itself if the
> drive has reordered writes - only ext3 without write cache seems to
> behave better in this respect (data=ordered).

> I would like to have a document that shows which file system, which
> chipset driver for PATA, which chipset driver for ATA, which low-level
> SCSI host adaptor driver, which file system support write barrier. We
> will probably also need to check if intermediate layers such as md and
> dm-mod propagate such information.

Only PATA core needs to support it, not the chipset drivers. md and dm
aren't a difficult to implement now that unplug/congestion already
iterates the device list and I added a blkdev_issue_flush() command.

> Given the necessary information, I can hack together a HTML document to
> provide this information; this offer has however not seen any response
> in the past. I am however not acquainted with the drivers and need
> information from the kernel hackers. Without such support, such a
> documentation effort is doomed.

Usual approach - just start writing, it's a lot easier to get
corrections (people seem to be several times more willing to point out
your errors than give you recomendations for something you haven't
started yet).

> BTW, I should very much like to be able to trace the low-level write
> information that goes out to the device, possibly including the payload
> - something like tcpdump for the ATA or SCSI commands that are sent to
> the driver. Is such a facility available?

No.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Assunto da discussão alterado para (no subject)" de Daniel Czarnecki
Daniel Czarnecki  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 09:00
Grupos de notícias: linux.kernel
De: Daniel Czarnecki <dan...@zoltak.com>
Data: Thu, 18 Mar 2004 13:00:20 +0100
Local: Qui 18 mar 2004 09:00
Assunto: (no subject)
unsubscribe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Assunto da discussão alterado para True fsync() in Linux (on IDE)" de Matthias Andree
Matthias Andree  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 09:30
Grupos de notícias: linux.kernel
De: Matthias Andree <matthias.and...@gmx.de>
Data: Thu, 18 Mar 2004 13:30:11 +0100
Local: Qui 18 mar 2004 09:30
Assunto: Re: True fsync() in Linux (on IDE)
Jens Axboe schrieb am 2004-03-18:

> > All these ATA fsync() vs. write cache issues have been open for much too
> > long - no reproaches, but it's a pity we haven't been able to have data
> > consistency for data bases and fast bulk writes (that need the write
> > cache without TCQ) in the same drive for so long. I have seen Linux
> > introduce TCQ for PATA early in 2.5, then drop it again. Similarly,
> > FreeBSD ventured into TCQ for ATA but appears to have dropped it again
> > as well.

> That's because PATA TCQ sucks :-)

True. Few drives support it, and many of these you would not want to run
in production...

> > May I ask that the information whether a particular driver (file system,
> > hardware) supports write barriers be exposed in a standard way, for
> > instance in the Kconfig help lines?

> Since reiser is the first implementation of it, it gets to chose how
> this works. Currently that's done by giving -o barrier=flush (=ordered
> used to exist as well, it will probably return - right now we just
> played with IDE).

This looks as though this was not the default and required the user to
know what he's doing. Would it be possible to choose a sane default
(like flush for ATA or ordered for SCSI when the underlying driver
supports ordered tags) and leave the user just the chance to override
this?

> Only PATA core needs to support it, not the chipset drivers. md and dm

Hum, I know the older Promise chips were blacklisted for PATA TCQ in
FreeBSD. Might "ordered" cause situations where similar things happen to
Linux?  How about SCSI/libata? Is the situation the same there?

> aren't a difficult to implement now that unplug/congestion already
> iterates the device list and I added a blkdev_issue_flush() command.

So this would - for SCSI - be an sd issue rather than a driver issue as
well?

--
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Jens Axboe  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 09:40
Grupos de notícias: linux.kernel
De: Jens Axboe <ax...@suse.de>
Data: Thu, 18 Mar 2004 13:40:12 +0100
Local: Qui 18 mar 2004 09:40
Assunto: Re: True fsync() in Linux (on IDE)

On Thu, Mar 18 2004, Matthias Andree wrote:
> > > All these ATA fsync() vs. write cache issues have been open for much too
> > > long - no reproaches, but it's a pity we haven't been able to have data
> > > consistency for data bases and fast bulk writes (that need the write
> > > cache without TCQ) in the same drive for so long. I have seen Linux
> > > introduce TCQ for PATA early in 2.5, then drop it again. Similarly,
> > > FreeBSD ventured into TCQ for ATA but appears to have dropped it again
> > > as well.

> > That's because PATA TCQ sucks :-)

> True. Few drives support it, and many of these you would not want to run
> in production...

Plus, the spec is broken.

> > > May I ask that the information whether a particular driver (file system,
> > > hardware) supports write barriers be exposed in a standard way, for
> > > instance in the Kconfig help lines?

> > Since reiser is the first implementation of it, it gets to chose how
> > this works. Currently that's done by giving -o barrier=flush (=ordered
> > used to exist as well, it will probably return - right now we just
> > played with IDE).

> This looks as though this was not the default and required the user to
> know what he's doing. Would it be possible to choose a sane default
> (like flush for ATA or ordered for SCSI when the underlying driver
> supports ordered tags) and leave the user just the chance to override
> this?

When things have matured, might not be a bad idea to default to using
barriers.

> > Only PATA core needs to support it, not the chipset drivers. md and dm

> Hum, I know the older Promise chips were blacklisted for PATA TCQ in
> FreeBSD. Might "ordered" cause situations where similar things happen to
> Linux?  How about SCSI/libata? Is the situation the same there?

Don't confuse TCQ and barriers, it has nothing to do with each other for
IDE. I can't imagine any chipsets having problems with a syncronize
cache command.

> > aren't a difficult to implement now that unplug/congestion already
> > iterates the device list and I added a blkdev_issue_flush() command.

> So this would - for SCSI - be an sd issue rather than a driver issue as
> well?

No, for scsi it's a low level driver issue. IDE chipset 'drivers' really
aren't anything but setup stuff, and maybe a few hooks to deal with dma.
All the action is in the ide core.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Peter Zaitsev  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 17:00
Grupos de notícias: linux.kernel
De: Peter Zaitsev <pe...@mysql.com>
Data: Thu, 18 Mar 2004 21:00:13 +0100
Local: Qui 18 mar 2004 17:00
Assunto: Re: True fsync() in Linux (on IDE)

On Wed, 2004-03-17 at 22:47, Jens Axboe wrote:
> > There is solution just to disable drive write cache, but it seems to
> > slowdown performance way to much.

> Chris and I have working real fsync() with the barrier patches. I'll
> clean it up and post a patch for vanilla 2.6.5-rc today.

Good to hear. How is it going to work from user point of view ?
Just fsync working back again or there would be some special handling.

Also. What is about  fsync() in 2.6 nowadays ?

I've done some tests on 3WARE RAID array and it looks like  it is
different compared to 2.4 I've been testing previously.

I have the simple test which has single page writes to the file followed
by fsync().   First run give you the case when file grows with each
write, second when you're writing to existing file space.

The results I have on 2.4 is something like  40 sec per 1000 fsyncs for
new file, and 0.6 sec for existing file.

With 2.6.3 I have  both existing file and new file to complete in less
than 1 second.

--
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
  http://www.mysql.com/uc2004/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Jens Axboe  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 17:00
Grupos de notícias: linux.kernel
De: Jens Axboe <ax...@suse.de>
Data: Thu, 18 Mar 2004 21:00:17 +0100
Local: Qui 18 mar 2004 17:00
Assunto: Re: True fsync() in Linux (on IDE)

On Thu, Mar 18 2004, Peter Zaitsev wrote:
> On Wed, 2004-03-17 at 22:47, Jens Axboe wrote:

> > > There is solution just to disable drive write cache, but it seems to
> > > slowdown performance way to much.

> > Chris and I have working real fsync() with the barrier patches. I'll
> > clean it up and post a patch for vanilla 2.6.5-rc today.

> Good to hear. How is it going to work from user point of view ?
> Just fsync working back again or there would be some special handling.

It's just going to work :)

> Also. What is about  fsync() in 2.6 nowadays ?

> I've done some tests on 3WARE RAID array and it looks like  it is
> different compared to 2.4 I've been testing previously.

> I have the simple test which has single page writes to the file followed
> by fsync().   First run give you the case when file grows with each
> write, second when you're writing to existing file space.

> The results I have on 2.4 is something like  40 sec per 1000 fsyncs for
> new file, and 0.6 sec for existing file.

> With 2.6.3 I have  both existing file and new file to complete in less
> than 1 second.

I believe some missed set_page_writeback() calls caused fsync() to never
really wait on anything, pretty broken... IIRC, it's fixed in latest
-mm, or maybe it's just pending for next release.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Chris Mason  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 17:20
Grupos de notícias: linux.kernel
De: Chris Mason <ma...@suse.com>
Data: Thu, 18 Mar 2004 21:20:07 +0100
Local: Qui 18 mar 2004 17:20
Assunto: Re: True fsync() in Linux (on IDE)

On Thu, 2004-03-18 at 14:47, Jens Axboe wrote:
> > With 2.6.3 I have  both existing file and new file to complete in less
> > than 1 second.

> I believe some missed set_page_writeback() calls caused fsync() to never
> really wait on anything, pretty broken... IIRC, it's fixed in latest
> -mm, or maybe it's just pending for next release.

This should have only been broken in -mm.  Which kernels exactly are you
comparing?  Maybe the 3ware array defaults to different writecache
settings under 2.6?

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Peter Zaitsev  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 17:30
Grupos de notícias: linux.kernel
De: Peter Zaitsev <pe...@mysql.com>
Data: Thu, 18 Mar 2004 21:30:48 +0100
Local: Qui 18 mar 2004 17:30
Assunto: Re: True fsync() in Linux (on IDE)

On Thu, 2004-03-18 at 12:11, Chris Mason wrote:
> > I believe some missed set_page_writeback() calls caused fsync() to never
> > really wait on anything, pretty broken... IIRC, it's fixed in latest
> > -mm, or maybe it's just pending for next release.

> This should have only been broken in -mm.  Which kernels exactly are you
> comparing?  Maybe the 3ware array defaults to different writecache
> settings under 2.6?

I'm trying RH AS 3.0  kernel, however I have the same behavior on my
SuSE 8.2 workstation.

I use 2.6.3 kernel for tests now (It is not the latest I know)
EXT3 file system.

3WARE has writeback cache setting in both cases.

Here is the test program I was using:

#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>

char buffer[4096] __attribute__((__aligned__(4096)));

main()
{
  int rc2,rc;
  int i;
  buffer[0]=(char)getpid();
  rc=open("write",O_RDWR | O_CREAT,0666);
  if (rc==-1) printf("Error at open: %d\n",errno);
  for(i=0;i<1000;i++)
   {
    rc2=write(rc,&buffer,4096);
    printf(".");
    fflush(stdout);
    if (rc2<0)
      {
        printf("Error code: %d\n",errno);
        return;
      }
  fsync(rc);
   }

}

--
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
  http://www.mysql.com/uc2004/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Chris Mason  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 17:40
Grupos de notícias: linux.kernel
De: Chris Mason <ma...@suse.com>
Data: Thu, 18 Mar 2004 21:40:10 +0100
Local: Qui 18 mar 2004 17:40
Assunto: Re: True fsync() in Linux (on IDE)
On Thu, 2004-03-18 at 15:17, Peter Zaitsev wrote:
> On Thu, 2004-03-18 at 12:11, Chris Mason wrote:

> > > I believe some missed set_page_writeback() calls caused fsync() to never
> > > really wait on anything, pretty broken... IIRC, it's fixed in latest
> > > -mm, or maybe it's just pending for next release.

> > This should have only been broken in -mm.  Which kernels exactly are you
> > comparing?  Maybe the 3ware array defaults to different writecache
> > settings under 2.6?

> I'm trying RH AS 3.0  kernel, however I have the same behavior on my
> SuSE 8.2 workstation.

Some suse 8.2 kernels had write barriers for IDE, some did not.  If
you're running any kind of recent suse kernel, you're doing cache
flushes on fsync with ext3.

Not sure if RH has ever carried the patches or not.  Easy enough to test
for on suse, just look for blk_queue_ordered in the System.map.

> I use 2.6.3 kernel for tests now (It is not the latest I know)
> EXT3 file system.

> 3WARE has writeback cache setting in both cases.

Then it sounds like your 2.4 is doing flushes.  I'd expect this test to
run very quickly without them.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Peter Zaitsev  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 17:50
Grupos de notícias: linux.kernel
De: Peter Zaitsev <pe...@mysql.com>
Data: Thu, 18 Mar 2004 21:50:11 +0100
Local: Qui 18 mar 2004 17:50
Assunto: Re: True fsync() in Linux (on IDE)

On Thu, 2004-03-18 at 12:33, Chris Mason wrote:
> Some suse 8.2 kernels had write barriers for IDE, some did not.  If
> you're running any kind of recent suse kernel, you're doing cache
> flushes on fsync with ext3.

I have this kernel:

Linux abyss 2.4.20-4GB #1 Sat Feb 7 02:07:16 UTC 2004 i686 unknown
unknown GNU/Linux

I believe it is reasonably  recent one from Hubert's kernels.

The thing is the performance is different if file grows or it does not.
If it does - we have some 25 fsync/sec. IF we're writing to existing
one, we have some 1600 fsync/sec

In the former case cache is surely not flushed.

> > I use 2.6.3 kernel for tests now (It is not the latest I know)
> > EXT3 file system.

> > 3WARE has writeback cache setting in both cases.

> Then it sounds like your 2.4 is doing flushes.  I'd expect this test to
> run very quickly without them.

2.4 does flush in one case but not in other. 2.6 does not do it in ether
case.

I was also surprised to see this simple test case has so different
performance with default and "deadline" IO scheduler   -  1.6 vs 0.5 sec
per 1000 fsync's.

--
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
  http://www.mysql.com/uc2004/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Chris Mason  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 18:10
Grupos de notícias: linux.kernel
De: Chris Mason <ma...@suse.com>
Data: Thu, 18 Mar 2004 22:10:11 +0100
Local: Qui 18 mar 2004 18:10
Assunto: Re: True fsync() in Linux (on IDE)

Hmmm, is it reiser?  For both 2.4 reiserfs and ext3, the flush happens
when you commit.  ext3 always commits on fsync and reiser only commits
when you've changed metadata.

Thanks to Jens, the 2.6 barrier patch has a nice clean way to allow
barriers on fsync, O_SYNC, O_DIRECT, etc, so we can make IDE drives much
safer than the 2.4 code did.  

I had a patch to make fsync always generate the barriers in 2.4, but it
was tricky since it had to figure out the last buffer it was going to
write before it wrote it.  The 2.6 code is much better.

> 2.4 does flush in one case but not in other. 2.6 does not do it in ether
> case.

> I was also surprised to see this simple test case has so different
> performance with default and "deadline" IO scheduler   -  1.6 vs 0.5 sec
> per 1000 fsync's.

Not sure on that one, both cases are generating tons of unplugs, the
drive is just responding insanely fast.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Peter Zaitsev  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 18:20
Grupos de notícias: linux.kernel
De: Peter Zaitsev <pe...@mysql.com>
Data: Thu, 18 Mar 2004 22:20:17 +0100
Local: Qui 18 mar 2004 18:20
Assunto: Re: True fsync() in Linux (on IDE)

On Thu, 2004-03-18 at 13:02, Chris Mason wrote:
> > In the former case cache is surely not flushed.

> Hmmm, is it reiser?  For both 2.4 reiserfs and ext3, the flush happens
> when you commit.  ext3 always commits on fsync and reiser only commits
> when you've changed metadata.

Oh. Yes. This is Reiser, I did not think it is FS issue.
I'll know to stay away from ReiserFS now.

> Thanks to Jens, the 2.6 barrier patch has a nice clean way to allow
> barriers on fsync, O_SYNC, O_DIRECT, etc, so we can make IDE drives much
> safer than the 2.4 code did.  

Great.

> > I was also surprised to see this simple test case has so different
> > performance with default and "deadline" IO scheduler   -  1.6 vs 0.5 sec
> > per 1000 fsync's.

> Not sure on that one, both cases are generating tons of unplugs, the
> drive is just responding insanely fast.

Well why it would be slow if it has write cache off.

--
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
  http://www.mysql.com/uc2004/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Chris Mason  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 18 mar 2004, 18:30
Grupos de notícias: linux.kernel
De: Chris Mason <ma...@suse.com>
Data: Thu, 18 Mar 2004 22:30:16 +0100
Local: Qui 18 mar 2004 18:30
Assunto: Re: True fsync() in Linux (on IDE)

On Thu, 2004-03-18 at 16:09, Peter Zaitsev wrote:
> On Thu, 2004-03-18 at 13:02, Chris Mason wrote:

> > > In the former case cache is surely not flushed.

> > Hmmm, is it reiser?  For both 2.4 reiserfs and ext3, the flush happens
> > when you commit.  ext3 always commits on fsync and reiser only commits
> > when you've changed metadata.

> Oh. Yes. This is Reiser, I did not think it is FS issue.
> I'll know to stay away from ReiserFS now.

For reiserfs data=ordered should be enough to trigger the needed
commits.  If not, data=journal.  Note that neither fs does barriers for
O_SYNC, so we're just not perfect in 2.4.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Hans Reiser  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 05:10
Grupos de notícias: linux.kernel
De: Hans Reiser <rei...@namesys.com>
Data: Fri, 19 Mar 2004 09:10:09 +0100
Local: Sex 19 mar 2004 05:10
Assunto: Re: True fsync() in Linux (on IDE)

You are not listening to Peter.  As I understand it from what Peter says
and your words, your implementation is wrong, and makes fsync
meaningless.  If so, then you need to fix it.  fsync should not be
meaningless even for metadata only journaling.  This is a serious bug
that needs immediate correction, if Peter and I understand it correctly
from your words.

--
Hans

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Chris Mason  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 11:00
Grupos de notícias: linux.kernel
De: Chris Mason <ma...@suse.com>
Data: Fri, 19 Mar 2004 15:00:17 +0100
Local: Sex 19 mar 2004 11:00
Assunto: Re: True fsync() in Linux (on IDE)

I am listening to Peter, Jens and I have spent a significant amount of
time on this code.  We can go back and spend many more hours testing and
debugging the 2.4 changes, or we can go forward with a very nice
solution in 2.6.

I'm planning on going forward with 2.6

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Peter Zaitsev  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 16:40
Grupos de notícias: linux.kernel
De: Peter Zaitsev <pe...@mysql.com>
Data: Fri, 19 Mar 2004 20:40:08 +0100
Local: Sex 19 mar 2004 16:40
Assunto: Re: True fsync() in Linux (on IDE)

On Fri, 2004-03-19 at 05:52, Chris Mason wrote:
> I am listening to Peter, Jens and I have spent a significant amount of
> time on this code.  We can go back and spend many more hours testing and
> debugging the 2.4 changes, or we can go forward with a very nice
> solution in 2.6.

> I'm planning on going forward with 2.6

Chris, Hans

It is great to hear this is going to be fixed in 2.6, however it is
quite a pity we have a real mess with this in  2.4 series.

Resuming what I've heard so far it looks like it depends on:

- If it is fsync/O_SYNC or O_DIRECT   (which user would expect to have
the same effect in this respect.
- It depends on kernel version. Some vendors have some fixes, while
others do not have them.
- It depends on hardware - if it has write cache on or off
- It depends on type of write (if it changes mata data or not)
- Finally it depends on file system and even journal mount options

Just curious does at least Asynchronous IO have the same behavior as
standard IO ?

All of these makes it extremely hard to explain what do users need in
order to get durability for their changes, while preserving performance.

Furthermore as it was broken for years I expect we'll have people which
developed things with fast fsync() in mind, who would start screaming
once we have real fsync()

(see my mail about Apple actually disabling cache flush on fsync() due
to this reason)

--
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
  http://www.mysql.com/uc2004/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Hans Reiser  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 16:40
Grupos de notícias: linux.kernel
De: Hans Reiser <rei...@namesys.com>
Data: Fri, 19 Mar 2004 20:40:11 +0100
Local: Sex 19 mar 2004 16:40
Assunto: Re: True fsync() in Linux (on IDE)

but you need to get it right.

>We can go back and spend many more hours testing and
>debugging the 2.4 changes, or we can go forward with a very nice
>solution in 2.6.

>I'm planning on going forward with 2.6

This is a very important patch that you have created, but you haven't
articulated what happens in the following scenario (Peter I am making up
something without knowing your internals, please feel encouraged to help
me on this).

mysql fsync()'s a file, which it thinks guarantees that all of a mysql
transaction has reached disk.  The disk write caches it.  You let fsync
return.  It is not on disk.  mysql performs its mysql commit, and writes
a mysql commit record which reaches disk, but not all of the transaction
is on disk.  The system crashes.  mysql plays the log.  mysql has
internal corruption.  User  calls Peter.  Peter asks, what do you expect
when you use a piece of shit like reiserfs?  User doesn't care about our
internal squabbling and goes back to using windows which does proper
commits.

Or, random application fsyncs, expects that it means that data has
reached disk, and tells user to perform real world actions dependent on
the data being on disk, but it is not.

I hope I am totally off-base and not understanding you....  Please help
me here.

>-chris

>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

--
Hans

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Chris Mason  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 17:00
Grupos de notícias: linux.kernel
De: Chris Mason <ma...@suse.com>
Data: Fri, 19 Mar 2004 21:00:20 +0100
Local: Sex 19 mar 2004 17:00
Assunto: Re: True fsync() in Linux (on IDE)

On Fri, 2004-03-19 at 14:36, Hans Reiser wrote:
> I hope I am totally off-base and not understanding you....  Please help
> me here.

Lets look at actual scope of the problem:

filesystem metadata
filesystem data (fsync, O_SYNC, O_DIRECT)
block device data (fsync, O_SYNC, O_DIRECT)

Multiply the cases above times each filesystem and also times md and
device mapper, since the barriers need to aggregate down to all the
drives.

In other words, just fixing fsync in 2.4 is not enough, and there is
still considerable development needed in 2.6.  Maybe after all the 2.6
changes are done and accepted we can consider backporting parts of it to
2.4.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Hans Reiser  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 17:10
Grupos de notícias: linux.kernel
De: Hans Reiser <rei...@namesys.com>
Data: Fri, 19 Mar 2004 21:10:13 +0100
Local: Sex 19 mar 2004 17:10
Assunto: Re: True fsync() in Linux (on IDE)

In 2.6 does fsync always insert a write barrier when the metadata
journaling option is set for reiserfs?

--
Hans

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Peter Zaitsev  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 17:10
Grupos de notícias: linux.kernel
De: Peter Zaitsev <pe...@mysql.com>
Data: Fri, 19 Mar 2004 21:10:16 +0100
Local: Sex 19 mar 2004 17:10
Assunto: Re: True fsync() in Linux (on IDE)

On Fri, 2004-03-19 at 11:36, Hans Reiser wrote:
> mysql fsync()'s a file, which it thinks guarantees that all of a mysql
> transaction has reached disk.  The disk write caches it.  You let fsync
> return.  It is not on disk.  mysql performs its mysql commit, and writes
> a mysql commit record which reaches disk, but not all of the transaction
> is on disk.  The system crashes.  mysql plays the log.  mysql has
> internal corruption.  User  calls Peter.  Peter asks, what do you expect
> when you use a piece of shit like reiserfs?  User doesn't care about our
> internal squabbling and goes back to using windows which does proper
> commits.

This is right,

We had some unexplained data corruptions in Innodb which can be
explained by broken fsync(), but in the most cases the scenario is less
gloomy.  Users just do not see some of last committed transactions if
they test durability by shutting off the power, which is however already
not good enough for critical applications.

However this is due to external pre-caution Innodb does. It uses
"double write buffer", which basically means each page is first written
to some small page based log file, and only afterwards written to the
proper place on the disk.   We have to do it even with proper fsync()
implementation as there is still possibility to crash in the middle of
fsync (or synchronous write) which will result in partial page write.
Think for example about the case when page crosses stripe boundary on
RAID.

If file system would guaranty atomicity of write() calls (synchronous
would be enough) we could disable it and get good extra performance.

--
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
  http://www.mysql.com/uc2004/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Chris Mason  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 17:20
Grupos de notícias: linux.kernel
De: Chris Mason <ma...@suse.com>
Data: Fri, 19 Mar 2004 21:20:13 +0100
Local: Sex 19 mar 2004 17:20
Assunto: Re: True fsync() in Linux (on IDE)

Yes, fsync is done in the 2.6 patches.  O_SYNC, O_DIRECT and others are
not yet.  The important part right now is to get the IDE core bits
reviewed and all the FS guys to agree on how we want to use them.

It's much cleaner in 2.6, the filesystem can just request a flush after
the last data buffer goes down the pipe.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Chris Mason  
Ver perfil   Traduzir para Traduzido (ver original)
 Mais opções 19 mar 2004, 17:30
Grupos de notícias: linux.kernel
De: Chris Mason <ma...@suse.com>
Data: Fri, 19 Mar 2004 21:30:21 +0100
Local: Sex 19 mar 2004 17:30
Assunto: Re: True fsync() in Linux (on IDE)

It is indeed.

> Resuming what I've heard so far it looks like it depends on:

> - If it is fsync/O_SYNC or O_DIRECT   (which user would expect to have
> the same effect in this respect.
> - It depends on kernel version. Some vendors have some fixes, while
> others do not have them.
> - It depends on hardware - if it has write cache on or off
> - It depends on type of write (if it changes mata data or not)
> - Finally it depends on file system and even journal mount options

All of the above is correct.

> Just curious does at least Asynchronous IO have the same behavior as
> standard IO ?

For the suse patch, yes.  If it triggers a commit, you get a cache
flush.

> All of these makes it extremely hard to explain what do users need in
> order to get durability for their changes, while preserving performance.

> Furthermore as it was broken for years I expect we'll have people which
> developed things with fast fsync() in mind, who would start screaming
> once we have real fsync()

> (see my mail about Apple actually disabling cache flush on fsync() due
> to this reason)

These are all difficult issues.  I wish I had easier answers for you,
hopefully we can get it all nailed down in 2.6 for starters.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Encaminhar  
É necessário Acessar antes de postar mensagens.
Para postar uma mensagem você precisa primeiro participar deste grupo.
Atualize seu apelido na página de configurações da inscrição antes de postar.
Você não tem a permissão necessária para postar.
Mensagens 1 - 25 de 40   Recentes >
« Voltar às Discussões « Tópico recente     Tópico antigo »

Criar um grupo - Grupos do Google - Página inicial do Google - Termos de Uso - Política de Privacidade
©2010 Google