New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marshal.to_channel does not clean the channel internal buffers when it fails #7238
Comments
Comment author: @gasche Thanks for the very detailed analysis. Do you have an idea of whether it is possible to fix this by just changing the caml_{really_,}putblock implementations, or maybe even a patch proposal? |
Comment author: pfonseca Ideally, it would be desirable to ensure atomicity of the marshaling operation, even in the presence of errors. Either marshaling should succeed -- the entire object gets written to the channel -- or it fails -- nothing gets written to the channel buffers and nothing gets sent. I'm not sure that it would be possible to solve the problem at the caml_{really_,}putblock level. The reason is that, because multiple calls are made for each object marshaled, ensuring marshaling atomicity would require somehow reverting the effects of the calls of previous caml_{really_,}putblock invocations. This would be difficult to achieve in the general case because, at least in some contexts, caml_putblock could have already written the data to the channel (e.g., file) which would be hard to revert. This problem seems nontrivial to solve and I don't have a patch for it. |
Comment author: @gasche So I looked a bit more into the issue, but I know very little about these low-level details so I may be confused.
|
Comment author: @diml Wouldn't it be more reliable to marshal to a temporary buffer and then handle sending data manually? You get one extra memory copy, but on the other hand this copy is probably a lot faster than the marshaling itself, so I don't think it's worth trying to avoid it in the end. If you'd prefer to marshal to bigarrays instead of OCaml strings, there is a module [Bigstring_marshal] in core_kernel: https://github.com/janestreet/core_kernel/blob/master/src/bigstring_marshal.mli You could easily extract what you want from this module if you didn't want to depend on core_kernel |
Comment author: @diml And actually you can avoid the extra copy if you use file descriptors directly instead of going through output channels |
Comment author: @gasche dim: This seems an excellent workaround for this case, but don't you think that it is a bug that the code does not currently work? If we provide the ability to use UDP sockets, I think we should make sure that writing to them actually works, even when seen as out_channel. |
Comment author: @diml Well, I would say it is the correct solution rather than a workaround. Marshaling to an out_channel that is backed by an UDP socket seems dodgy anyway; there is no guarantee about how data will be split. For instance what should be the expected behavior if the marshaled data end up spanning over multiple packets? If one packet is lost you'll get a segfault on the other side. At best we can indeed try to ensure that marshaled data always fit in a single packet and always fail otherwise. But if that's hard to do in a portable way, I tend to think we should just document that it is not supported. |
Comment author: @gasche I suppose that the marshalling format has a length field that would let you know that some packets were dropped. But, in any case, wouldn't the problem also occur with plain old output_string/print_{string,endline}? Marshalling is arguably an edge case, but not being able to use print_* at all because the buffer is too large for its own good seems a bit drastic. |
Comment author: @diml Indeed. I reread you previous message, basically you shouldn't need to do a discovery to find the max packet size, it should be 65507 (the maximum theoretical size of a UDP packet). I added a patch to do this. pfonseca, could you try and see if it's enough for your use case? |
Comment author: @gasche dim: it is unclear to me whether this limit is portable or specific to Linux systems. See for example https://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/472149 One compromise would be to move this retry-logic in the loop (while (n > 1)), and set n = min(n/2, MAX_UDP_SIZE) This should work as your proposed patch in the common case, yet handle situations where the system requires smaller messages (with a small degradation in performance). P.S.: there is a small typo in your patch, (errno = EMSGSIZE) instead of (errno == EMSGSIZE). |
Comment author: @diml True, it doesn't even seem to work on OSX. The more I think about this, the more I think we shouldn't try to add magic and give the programmer the fake impression that it works. If you are marshalling through UDP, you must really make sure that the message fits in one UDP packet, as packets can be lost or arrive in a different order (using the length in the marshal header is not enough). If you want to send bigger values you need some logic to ensure you properly reconstruct the marshaled buffer first. For other output functions, at the end of the day we don't know whether the programmer expect the data to be sent as one packet or not, and if you are using UDP, there is a good chance you expect exactly that. So the current error seems right to me; it indicates you need to think about what you are doing. However, it's indeed annoying that we are leaving half a marshaled message in the output buffer, if possible we should try to discard it before raising the error. |
Comment author: @xavierleroy Re-reading this discussion, I'm on dim's side: UDP is for transmitting packets, out_channel/in_channel are abstractions for streams, and the two models (stream vs packets) don't mix at all. It is much safer indeed to marshal to a string, then "packetize" the string explicitly, following whatever conventions your UDP protocol has for splitting large messages into packets. This said, we may have the problem described here (the marshaler not cleaning its internal buffers) in other error cases, e.g. when a write to disk fails because there is no space left. This still needs to be investigated. |
Comment author: @gasche
Should we not fail when trying to convert an UDP socket to a out_channel then? Because currently we pretend to be able to do it and it works most of the time, and eats user data only sometimes... If we think the feature is disfunctional by design, should we not disable it? |
Comment author: @xavierleroy
Quite possibly. It would take a bit of work, though. Under Unix we could use getsockopt(SO_TYPE) to detect sockets that are not SOCK_STREAM. Under Win32 we might be able to do the same, or, alternatively, we could use flags from 'struct filedescr'. |
Comment author: @gasche I was reminded of this bug again this morning as I read pfonseca's paper who mentions it (as one of the bugs found against formally verified distributed algorithm implementations): https://homes.cs.washington.edu/~pfonseca/papers/eurosys2017-dsbugs.pdf |
Comment author: @gasche The paper writes:
I think this is correct and that we should fail on UDP->out_channel conversion as discussed, so that people realize the issue when testing. |
Comment author: @xavierleroy
See proposal at #1825 |
Comment author: @xavierleroy The check on file descriptor -> buffered I/O channel is now in trunk, will be in release 4.08.0. Re-reading the discussion for this issue, I don't think other fixes are needed. It is true that output_value can fail in the middle of writing its data to an out_channel, e.g. if the disk is full, but I don't see any sensible way to make output_value more atomic w.r.t. such I/O errors. |
Original bug ID: 7238
Reporter: pfonseca
Status: resolved (set by @xavierleroy on 2018-06-14T09:04:24Z)
Resolution: fixed
Priority: normal
Severity: minor
Fixed in version: 4.08.0+dev/beta1/beta2
Category: standard library
Monitored by: @gasche @diml @oandrieu
Bug description
The semantic of Marshal.to_channel can lead to subtle bugs, particularly, when used with channels that are UDP sockets. The problem is due to the fact that to_channel fails when the message size is bigger than the UDP limit but, when it fails, it leaves part of the message (the prefix) in the internal data structures of the channel.
Leaving the prefix of the message in the buffer is a problem because subsequent calls to to_channel will append messages to the buffer that will be sent. In practice, this causes corruption of messages and prevents correct unmarshalling on the receiver side.
Internally, “Marshal.to_channel” works in the following way:
a) While marshaling the components of message, to_channel makes several calls to the function caml_putblock [1] (through the sequence of invocations: to_channel [3] -> caml_output_value [2] -> caml_output_val [2] -> caml_really_putblock [1] -> caml_putblock [1])
b) On each invocation caml_putblock appends the input data to an internal buffer (channel->buff).
c) This buffer is flushed when (1) flush is explicitly called on the channel or (2) inside caml_putblock when new contents cannot be appended to the buffer because it would reach its maximum capacity. Flushing the buffer consists in calling the write system call on the socket file descriptor. The size of the internal buffer is larger than the UDP packet limit
d) When caml_putblock tries to flush the internal buffer (in a case where it is filled beyond the UDP limit) the call to the write system call fails (as expected) and it fails with the error EMSGSIZE. Upon such error, an exception (UnixError) is thrown, which will be handled by an exception handler but crucially two things happen (1) the internal buffer is not cleared (i.e., it retains the prefix of the message) and (2) nobody tries to resend the suffix of the message that was discarded.
Given the above, it is hard to correctly use the to_channel function on UDP sockets without explicitly checking before invoking to_channel that the message size fits within the content limit of UDP packets. To complicate matters further, the message size is not obvious to the callee given that the message sent will include marshaling metadata, not just user data.
Chapar [4] is an example of an application that is affected by this problem.
References:
[1] https://github.com/ocaml/ocaml/blob/trunk/byterun/io.c
[2] https://github.com/ocaml/ocaml/blob/trunk/byterun/extern.c
[3] https://github.com/ocaml/ocaml/blob/trunk/stdlib/marshal.ml
[4] coq-community/chapar#2
File attachments
The text was updated successfully, but these errors were encountered: