Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot close properly a file with memory mapped blocks (Bigarray.map_file) #6567

Closed
vicuna opened this issue Sep 18, 2014 · 18 comments
Closed
Assignees
Milestone

Comments

@vicuna
Copy link

vicuna commented Sep 18, 2014

Original bug ID: 6567
Reporter: Christoph Bauer
Assigned to: @xavierleroy
Status: assigned (set by @xavierleroy on 2017-10-19T13:27:11Z)
Resolution: open
Priority: normal
Severity: minor
Platform: Windows MSVC
OS: Windows
OS Version: 7
Version: 4.01.0
Target version: 4.07.0+dev/beta2/rc1/rc2
Category: otherlibs
Monitored by: @hcarty gerd

Bug description

My program maps several blocks of a file into memory with Bigarray.Array1.map_file for writing.
Finally the file handle is closed with Unix.close.

On local files system everything works as expected. But if the file is on a network drive,
the resulting file is after the close still locked and it contains just zeros.

My workaround is, to call Gc.full_major() before the close. Then all remaining bigarrays are
finalized (the finalizer calls caml_ba_unmap_file(), see bigarray_stubs.c).

Calling Gc.full_major () is somewhat dissatisfying. But I’m not sure how it could be solved more cleanly.
I would expect that Unix.close should unmap all remaining views of the file. But Unix.close knows nothing
about the bigarray mapping.

Maybe a special Bigarray.unmap is the solution.

File attachments

@vicuna
Copy link
Author

vicuna commented Sep 18, 2014

Comment author: Christoph Bauer

My current solution is: I defined a flush function:

external flush : ('a, 'b, 'c) Bigarray.Array1.t -> unit = "ml_flush_bigarray"

It is implemented like this:

#ifdef WIN32

static void ml_flush_mapped_bigarray(void * addr, uintnat len)
{
SYSTEM_INFO sysinfo;
uintnat delta;

GetSystemInfo(&sysinfo);
delta = (uintnat) addr % sysinfo.dwAllocationGranularity;

FlushViewOfFile((void )((uintnat)addr - delta),
0 /
zero means: the file is flushed from the base address to the end of the mapping. */
);
}

#else

static void ml_flush_mapped_bigarray(void * addr, uintnat len)
{

#if defined(_POSIX_SYNCHRONIZED_IO)
msync(addr, len, MS_SYNC);
#endif
}

#endif

CAMLprim value ml_flush_bigarray( value v )
{
CAMLparam1(v);
struct caml_ba_array * b = Caml_ba_array_val(v);

if( b->flags & CAML_BA_MANAGED_MASK ) {
if( b->proxy == NULL )
ml_flush_mapped_bigarray(b->data, caml_ba_byte_size(b));
else {
ml_flush_mapped_bigarray(b->proxy->data, b->proxy->size);
}
}

CAMLreturn( Val_unit );
}

@vicuna
Copy link
Author

vicuna commented Sep 18, 2014

Comment author: Christoph Bauer

If needed I could prepare a patch against the OCaml sources.

@vicuna
Copy link
Author

vicuna commented Jan 20, 2015

Comment author: @damiendoligez

Christoph, what does your flush function do? As far as I can tell, it writes the data out to the file, but does not close it, so the file will still be locked, right?

@vicuna
Copy link
Author

vicuna commented Apr 28, 2015

Comment author: Christoph Bauer

I my case I must do both:flush the file and close the file handle with Unix.close. Then everything works. You can rename the file it contains the content you expected. Just closing the file is not enough.

@vicuna
Copy link
Author

vicuna commented Jul 1, 2015

Comment author: gerd

This issue isn't restricted to Windows. On Unix, you need the msync() call (as in the patch), which in particular writes NFS-mounted files out. Actually, POSIX doesn't specify that modifications of mapped regions are immediately visible in file operations at all (not even for local files, although current OS guarantee this).

However, I suggest to use the MSYNC_SYNC flag, because the programmer normally wants a guarantee that the synchronization is complete at some point.

Of course, there is no locking problem on Unix. It's just a synch problem, i.e. file reads could see old data.

@vicuna
Copy link
Author

vicuna commented Jul 1, 2015

Comment author: Christoph Bauer

I agree with the MSYNC_SYNC flag. I edited the patch above.

@vicuna
Copy link
Author

vicuna commented Dec 1, 2015

Comment author: @alainfrisch

The Windows version is not synchronous, isn't it? MSDN says one should call FlushFileBuffers to get something closer to MS_SYNC (for local files at least), but we need to know the file handle.

Can you also comment on the locking part for network files? Is FlushViewOfFile followed by close enough / required so that the server releases the lock on the file? Do you have some references to document this behavior? It's somehow surprising that closing is not enough...

@vicuna
Copy link
Author

vicuna commented Dec 4, 2015

Comment author: Christoph Bauer

The patch solved our problems (file locks or files full of zeros) reliable.

The problem, which was mentioned in the MSDN documentation is, that the file is not written to disk on the remote system, but the data could be still is the cache. Also for Linux you cannot sync reliable over the network AFAIK. I would say: forget it.

Unfortunately I have longer access to the windows setup, where I could reproduce the problem.

@vicuna
Copy link
Author

vicuna commented Dec 9, 2015

Comment author: @alainfrisch

So the function would be synchronous on Unix, waiting for the data to on disk; while on Windows, it would be asynchronous, with no guarantee of completion (even for local files?). If we go this direction, this should be properly documented.

In order to move forward on this one, can you prepare a pull request, probably adding the flush function everywhere where there is a map_file function in Bigarray?

@vicuna
Copy link
Author

vicuna commented Dec 9, 2015

Comment author: @xavierleroy

Two remarks and one question on the patch:

  • "if( b->flags & CAML_BA_MANAGED_MASK )" should be "if( b->flags & CAML_BA_MAPPED_FILE ) "
    (no need to do anything if the bigarray is not mapped from a file).

  • The Unix implementation should also round addr and len to whole pages.

  • Shouldn't system errors be reported as a Sys_error exception?

@vicuna
Copy link
Author

vicuna commented Jan 3, 2016

Comment author: Christoph Bauer

I made a patch against the sources of 4.02.3 (bigarray_flush2.patch). The new flush function gets now a second [sync] parameter. It raises an exception on MS Windows when sync is true. The async (sync == false) call should work on all systems.

Also the three remarks from 0015089 are implemented (correct flag, rounding on unix, Sys_error). Thanks for the hints!

Patch can be applied with:
.../ocaml-4.02.3$ patch -p1 < bigarray_flush2.patch

Unfortunately I cannot test this patch against in the original setup, because I have no longer access to it.

@vicuna
Copy link
Author

vicuna commented Apr 17, 2016

Comment author: @gasche

Xavier or Alain, do you think we could take a decision on this patch?

Given that how late we are in the 4.03 release cycle, I think it's too late to integrate this in the release branch. I would, however, be happy to merge it in trunk if you have no qualms against it -- it's a more satisfying solution than just postponing.

@vicuna
Copy link
Author

vicuna commented Apr 21, 2016

Comment author: @damiendoligez

I'm in favour of merging into trunk.

@vicuna
Copy link
Author

vicuna commented Mar 15, 2017

Comment author: @damiendoligez

What has changed for this PR now that #1077 ( #1077 ) is merged?

@vicuna
Copy link
Author

vicuna commented Oct 15, 2017

Comment author: @xavierleroy

What has changed for this PR now that #1077 is merged?

  • The need for an explicit "sync" operation is still unaddressed.
  • The API should probably change: just like "map_file" is in module Unix now, and no longer in module Bigarray, "flush_file" or "flush_mapped_file" should be in module Unix.
  • The rest of the patch should be easy to reuse.

@vicuna
Copy link
Author

vicuna commented Oct 16, 2017

Comment author: @xavierleroy

See pull request at #1432

@vicuna vicuna added this to the 4.07.0 milestone Mar 14, 2019
@vicuna vicuna added the bug label Mar 20, 2019
@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 11, 2020
@xavierleroy
Copy link
Contributor

The explicit "flush" operation was implemented in PR #1432, then got such a barrage of questions from reviewers (hi @alainfrisch) that I rage-closed it. Anyone who cares about this issue is welcome to start again from where I left.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants