Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigarray lacks a free-like call. #6962

Closed
vicuna opened this issue Aug 18, 2015 · 10 comments
Closed

bigarray lacks a free-like call. #6962

vicuna opened this issue Aug 18, 2015 · 10 comments

Comments

@vicuna
Copy link

vicuna commented Aug 18, 2015

Original bug ID: 6962
Reporter: toolslive
Status: acknowledged (set by @xavierleroy on 2015-08-18T17:16:08Z)
Resolution: open
Priority: normal
Severity: feature
Target version: undecided
Category: standard library
Related to: #4108 #7676
Monitored by: domsj @braibant @diml @ygrek @jmeber @hcarty

Bug description

When you use bigarrays for IO, you find that after you're done using them, they stay around for quite a while before being garbage collected. The part on the OCaml heap is small, but the part on the C-heap can be quite large causing processes grow without limitations.

The simplest solution is to add a call similar to the one Jane street provides in their Bigstring library:

"""
val unsafe_destroy : t -> unit

unsafe_destroy bstr destroys the bigstring by deallocating its associated data or, if memory-mapped, unmapping the corresponding file, and setting all dimensions to zero. This effectively frees the associated memory or address-space resources instantaneously. This feature helps working around a bug in the current OCaml runtime, which does not correctly estimate how aggressively to reclaim such resources.

This operation is safe unless you have passed the bigstring to another thread that is performing operations on it at the same time. Access to the bigstring after this operation will yield array bounds exceptions.
"""

@vicuna
Copy link
Author

vicuna commented Aug 18, 2015

Comment author: @xavierleroy

The solution you propose was implemented in 2012 (see #4108), then rolled back because it prevents valuable optimizations (Common Subexpression Elimination) on bounds checks for bigarray accesses.

As a matter of fact, there is a risk that the unsafe_destroy function from Bigstring is potentially unsafe with respect to CSE optimization as performed in the OCaml trunk.

@vicuna
Copy link
Author

vicuna commented Aug 19, 2015

Comment author: toolslive

It might be an idea to be able to give a hint to the GC as
"never promote this value to the major heap".

Another idea is the ability to turn this CSE off for bigarrays.

It's not that we really want/need that call,
we just want a workable solution for the resource problem.

@vicuna
Copy link
Author

vicuna commented Aug 20, 2015

Comment author: domsj

Another idea would be not needing bigarrays in the first place ...
If you could for example pin a value so that it is temporarily not moved by the garbage collector, then one could also use Bytes.t to perform operations on them in detached threads (e.g. compression/encryption/IO).

@vicuna
Copy link
Author

vicuna commented Aug 25, 2015

Comment author: @alainfrisch

then rolled back because it prevents valuable optimizations (Common Subexpression Elimination) on bounds checks for bigarray accesses.

I guess this is not the case today, but it should also be possible to optimize the indirection on the "data" field of big arrays.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Mar 24, 2021
@gasche gasche removed the Stale label Mar 24, 2021
@gasche
Copy link
Member

gasche commented Mar 24, 2021

Deallocatable bigarrays is a request that comes up enough, still now, so I think the issue is not stale. Is there any progress on having separate user-defined libraries for this need that bring advantages and get usage in practice?

@xavierleroy
Copy link
Contributor

xavierleroy commented Mar 24, 2021

The code was there at some point: 5ca6abd

The issue is not to have it in a separate user-defined library. The issue is that since 2012 we've been unable to determine whether it is safe w.r.t. current optimizations (CSE, Flambda) and future optimizations we may want to have one day.

This demonstrates rather clearly that nobody cares. Closing this report.

@gasche
Copy link
Member

gasche commented Mar 24, 2021

For the record, I don't agree:

  • bigarrays are many different things to many people, some people want to use them for numeric computations but they are also often used for "easy-to-access external memory". The former use-cases wants great CSE, the latter use-case cares more about predictable resource disposal. People with this use-case (Mirage for example ? cc @hannesm) could have switched by now to a different module (than bigarray) that implements disposal, and they could tell us about that. (Maybe it gives idea for how to improve the stdlib, or it could solve the issue by directing people at those other libraries instead.) Maybe they ended up doing nothing because the benefits of having an imperfect stdlib-integrated solution are stronger than their energy to change.

  • I remember hearing people complain about bigarray lacking a "free" a couple years ago, way after this issue was created. The fact that people with this need failed to discuss it here does not demonstrate that they don't care; it may also be that this issue was obscure enough, and we didn't try hard enough to attract discussion here.

So: I think that some people care, but I don't know how to get them involved in the conversation. They are free to post here if find the issue, whether it's open or closed.

@dbuenzli
Copy link
Contributor

I did some tricks like this a long time ago when mapping GPU memory to CPU via glMapBuffer.

The code is here, but it's in the attic of this library, the idea was to map gpu memory to a linear bigarray and invalidate it once you have unmapped the gpu buffer (and the pointer is no longer valid). I have no idea if what is done there can pose problem.

In any case it seems that in the current incarnation of the library I rely on programmer discipline (what a good idea :-), but the gl backend implementation still mentions the zeroing trick should be considered.

As this example shows there may be many way to get a pointer to a chunk of memory and relinquish it, so I'm not sure the suggestion was a good idea (unless done via a user defined callback).

However I would welcome a clarification whether such tricks as done above are allowed or not and/or what is allowed exactly. Personally I'm fine doing these dirty things in C.

@dbuenzli
Copy link
Contributor

However I would welcome a clarification whether such tricks as done above are allowed or not and/or what is allowed exactly.

FWIW I'm still interested in knowing that and I think it would be good to mention it here.

It may seem like a niche area, but C libraries that temporarily lease you a pointer to work on is not unheard of (e.g. sound or graphics libraries).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants