Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two channels sharing a file descriptor fail to seek properly due to buffering #5106

Closed
vicuna opened this issue Jul 18, 2010 · 5 comments
Closed

Comments

@vicuna
Copy link

vicuna commented Jul 18, 2010

Original bug ID: 5106
Reporter: Hawk777
Status: acknowledged (set by @damiendoligez on 2011-05-20T16:15:02Z)
Resolution: open
Priority: normal
Severity: minor
Version: 3.10.2
Target version: later
Category: otherlibs
Monitored by: @ygrek

Bug description

It's possible to get two separate channels (an in_channel and an out_channel) to share the same operating-system-level file descriptor by using Unix.openfile in O_RDWR mode and then calling both Unix.in_channel_of_descr and Unix.out_channel_of_descr. I did this because it seemed like the intuitive way of getting hold of a file in a proper way for reading and writing simultaneously.

I expected this to end with a single file pointer shared between the two channels, but at least for them to work properly as long as you were careful to always seek a channel before using it.

Unfortunately, this is not the case because the standard library adds some buffering to in_channel, and omits making the OS-level lseek() call in cases when a seek can be satisfied within the buffer. This means that if you (1) do a bit of reading with the in_channel, (2) seek a long way away with the out_channel and do some writing, and then (3) seek the in_channel back close to where you were reading earlier, then what happens is that the call to seek_in omits the lseek() syscall due to being satisfied in the buffer, but as you start reading, you run off the end of the buffer... but then the attempt to refill the buffer reads from the last place you wrote, because no lseek() was done!

@vicuna
Copy link
Author

vicuna commented Sep 19, 2012

Comment author: @damiendoligez

A possible fix would be to flush the buffer at each seek operation, but how much slower will other programs get?

@vicuna
Copy link
Author

vicuna commented Jul 31, 2014

Comment author: @damiendoligez

A better solution is to call lseek even if the seek is within the buffer, but you can still get the wrong data when the buffers overlap.

@vicuna
Copy link
Author

vicuna commented May 6, 2015

Comment author: @mshinwell

Damien and I talked about this. It seems that a proper solution to this problem may be fairly complicated, probably involving tracking of all buffers pointing to a given file descriptor, and maybe involving some management to prevent them overlapping.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 15, 2020
@xavierleroy
Copy link
Contributor

I dont' think this use (two buffers on the same file descriptor) will ever be fully supported by the current implementation of buffering. We might switch to using C's <stdio.h> buffered I/O at some point in the future, but even then multiple buffers on the same file descriptor are not guaranteed to be synchronized. Let me close as "won't fix".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants