Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

something is quadratic in pattern matching size #7242

Closed
vicuna opened this issue Apr 25, 2016 · 22 comments
Closed

something is quadratic in pattern matching size #7242

vicuna opened this issue Apr 25, 2016 · 22 comments

Comments

@vicuna
Copy link

vicuna commented Apr 25, 2016

Original bug ID: 7242
Reporter: @sliquister
Status: acknowledged (set by @gasche on 2016-04-26T00:53:49Z)
Resolution: open
Priority: low
Severity: minor
Version: 4.02.3
Target version: later
Category: middle end (typedtree to clambda)
Monitored by: @hcarty

Bug description

If I make a synthetic sum type and pattern match 10x bigger, the compiler takes ~100x longer to compile it. That makes for slow compilation times.
In my actual problem, the sum type is "only" 3500 constructors, but the patterns are more complicated and halving the number of constructors reduces the compilation time from 52s to 13s.

$ /tmp/blowup 6000

real 0m1.019s
user 0m0.989s
sys 0m0.026s

$ /tmp/blowup 60000

real 1m52.447s
user 1m52.021s
sys 0m0.227s

Here is some snapshot from perf (it's pretty much the same profile the whole duration):
28.74% camlParmatch__le_pat_1938
23.62% compare_val
16.50% camlParmatch__const_compare_1133
10.27% caml_page_table_lookup
6.61% camlList__exists_1136
4.17% caml_compare
3.66% camlParmatch__le_pats_1939
3.30% caml_c_call
1.48% caml_apply2
1.19% camlParmatch__fun_4365

Steps to reproduce

$ cat /tmp/blowup
#!/bin/bash

(
echo "type t ="
for ((i = 0; i < $1; i++)) do
echo "| A$i"
done
echo "let of_string = function"
for ((i = 0; i < $1; i++)) do
echo "| $i -> A$i"
done
echo "| _ -> assert false"
) > a.ml

time ocamlc.opt -i > /dev/null -c a.ml

@vicuna
Copy link
Author

vicuna commented Apr 25, 2016

Comment author: @sliquister

Well here is useful information: if I add -w -a on the command line, the compilation times are .16s at size 6000 and 1.5s at size 60_000.

@vicuna
Copy link
Author

vicuna commented Apr 25, 2016

Comment author: @gasche

I would guess that the guilty function is (get_mins le ps),

https://github.com/ocaml/ocaml/blob/502e4f9/typing/parmatch.ml#L1500-L1507

Its current implementation is clearly quadratic, it is used for warnings only, and I guess you could implement it better, but it's not completely obvious to me how to do so. What it does is compute the list of minimal elements (those for which there is no lesser element) among a list of elements, for a partial order. Someone better brushed-up on partial order algorithmics could probably suggest a more clever implementation -- currently we check, for all patterns, whether any of the other patterns is smaller.

@vicuna
Copy link
Author

vicuna commented Apr 25, 2016

Comment author: @gasche

Thinking about it a bit more, I'm not so sure there is an efficient generic implementation of get_mins. However, we could use the fact that the comparison function used is always le_pats, inclusion ordering between patterns, which gives us much more structure that an abstract partial order.

In particular, one could think of using term indexing structures; but we have or-patterns which could make this sensibly more delicate in the general case.

@vicuna
Copy link
Author

vicuna commented Apr 25, 2016

Comment author: @sliquister

Warning 8 is the problem here in the small test case. Although turning that off only removes ~14s of the compilation time in my initial example. The compilation is still super slow, and when I look at a strack trace, the compiler is doing the following:

#0 0x0000000000567f1a in camlParmatch__const_compare_1133 ()
#1 0x000000000056d634 in camlParmatch__le_pat_1938 ()
#2 0x000000000056d584 in camlParmatch__le_pats_1939 ()
#3 0x0000000000603381 in camlList__exists_1136 ()
#4 0x0000000000566e6e in camlParmatch__select_rec_1976 ()
#5 0x000000000056d9d4 in camlParmatch__get_mins_1973 ()
#6 0x00000000005c2d52 in camlMatching__split_ex_1784 ()
#7 0x00000000005cada2 in camlMatching__split_precompile_1887 ()
#8 0x00000000005d2085 in camlMatching__compile_match_2975 ()
#9 0x00000000005d1741 in camlMatching__do_compile_matching_2977 ()
#10 0x00000000005d2101 in camlMatching__compile_match_2975 ()
#11 0x00000000005c6954 in camlMatching__c_rec_2860 ()
#12 0x00000000005d0c93 in camlMatching__compile_test_2889 ()
#13 0x00000000005d1511 in camlMatching__comp_match_handlers_2939 ()
#14 0x00000000005d2101 in camlMatching__compile_match_2975 ()
#15 0x00000000005d293a in camlMatching__compile_matching_3073 ()
#16 0x00000000005da552 in camlTranslcore__transl_function_1445 ()
#17 0x00000000005daecc in camlTranslcore__transl_exp0_1436 ()
#18 0x00000000005d5c6b in camlTranslobj__oo_wrap_1140 ()
#19 0x00000000005dae3d in camlTranslcore__transl_exp0_1436 ()
#20 0x00000000005d7032 in camlTranslcore__transl_1691 ()
#21 0x00000000005e7c47 in camlTranslmod__transl_store_1532 ()
#22 0x00000000005e7bab in camlTranslmod__transl_store_1532 ()
#23 0x00000000005eb103 in camlTranslmod__transl_store_structure_1527 ()
#24 0x00000000005d5865 in camlTranslobj__transl_store_label_init_1126 ()
#25 0x00000000005eb424 in camlTranslmod__transl_store_implementation_1648 ()
#26 0x00000000004a48a4 in camlOptcompile__comp_1048 ()
#27 0x00000000004a4e2c in camlOptcompile__implementation_1040 ()
#28 0x000000000044a5fb in camlOptmain__process_implementation_file_1011 ()
#29 0x0000000000624c2d in camlArg__treat_action_1096 ()
#30 0x0000000000625c1f in camlArg__parse_argv_dynamic_1074 ()
#31 0x0000000000625e71 in camlArg__parse_1128 ()
#32 0x000000000044b87e in camlOptmain__main_1484 ()
#33 0x000000000044cdf4 in camlOptmain__entry ()
#34 0x0000000000447249 in caml_program ()
#35 0x00000000006497d6 in caml_start_program ()
#36 0x000000000064b0f5 in __libc_csu_init ()
#37 0x00000035b721ed5d in __libc_start_main () from /lib64/libc.so.6
#38 0x0000000000446929 in _start ()

Just like for emitting warnings, the compiler calls Parmatch.get_mins, which could be quadratic, doing List.exists at each step on the rest of the list.

@vicuna
Copy link
Author

vicuna commented Apr 25, 2016

Comment author: @sliquister

gasche, the comparison is not always le_pat, if you look in matching.ml, there's also le_pats and le_ctx.

@vicuna
Copy link
Author

vicuna commented Apr 25, 2016

Comment author: @gasche

Yes, but le_ctx is "just" doing le_pat on its two elements. Developing an indexing structure over patterns would be a hassle (and I'm not sure how or-patterns would be handled), but I think it would suffice for both.

Note that it might be easier to have a partial solution that covers the example you care about. For example, if discriminating on the head constructor would suffice in your case (and you use no or-patterns), it should be noticeably easier.

@vicuna
Copy link
Author

vicuna commented Apr 25, 2016

Comment author: @Octachron

For generic posets, there is at least a straightforward algorithm that achieves
O( number_of_elements * maximal length of antichains) rather than O (number_of_elements^2): see http://arxiv.org/abs/0707.1532 and https://github.com/Octachron/ocaml/blob/posets_for_parmatch/typing/parmatch.ml#L1501. I am not sure how much this would help.

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @alainfrisch

I am not sure how much this would help.

It would not help for this particular example, since the entire set is an antichain. That said, it would be a good idea in general. It would also be good idea to optimize the equality test for constructor tags.

if discriminating on the head constructor would suffice in your case

This is the case.

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @alainfrisch

Does anyone understand why there are two nested calls to select_rec in get_mins?

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @Octachron

Because select_rec only eliminates elements that are greater than any subsequent element, and reverse the list. For instance (adding a < parameter)

# select_rec (<) [] [ 1; 2; 5];;
- : int list = [5; 2; 1]

Two nested calls are thus needed to obtain a list whose elements are not greater than any subsequent and precedent element.

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @maranget

May I, with due precaution, draw the attention to one of my articles "Warnings for pattern-matching" that describes the principles behind OCaml PM warnings.
In particular, see the "Measures" section.
http://moscova.inria.fr/~maranget/papers/warn/warn009.html#toc13

As far as I understand, the "I" experiment is similar to the blowup script. Two series of data are plotted :

  • Opt: compilation time with safeguards, that include get_mins
  • Std: compilation time without safeguards.

Both plots demonstrate quadratic behaviour... So reducing the complexity of get_mins may not be sufficient.

I have no solution for getting rid of the quadratic behaviour of PM warnings. Even more, someone may, someday, come here with an exponential behaviour.

Finally, here is a non-solution : as the code obviously is generated, you can
put the burden on the generator not to produce useless clauses or non-exhaustive matches and disable ocaml PM warnings.

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @sliquister

Maranget: it's not just the warning code that is quadratic, it's also the pattern matching compilation, and obviously that can't be turned off. get_mins is the reason (or one reason perhaps) why compilation without warning is quadratic behavior.

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @alainfrisch

It seems that get_mins accounts for less than half the total time even for N = 5000.

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @maranget

Ok sliquister, thanks.

I am almost sure that get_mins is not the only responsible for PM compilation being quadratic. Another suspect is the data structure for the divide phase of the algorithm. This structure is an association list, cf. the
function add (about L1190 in bytecomp/matching.ml)

https://github.com/ocaml/ocaml/blob/trunk/bytecomp/matching.ml#L1190

That said, can you confirm that you consider compilation times of ".16s at size 6000 and 1.5s at size 60_000" to be prohibitive? Or did I miss something again?

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @sliquister

Let me try to clarify:

My actual source file with 3500 constructors takes 1min to build or so (a bit less without the warnings), so it's not the end of the world, and I didn't know it was sort of expected to have non linear complexity in the pattern matching part of the compiler. It's also the second time I see this problem, so it figured it'd be nice if the compiler was better behaved. I had "fixed" that other case by removing the [@@deriving of_sexp] creating the big pattern match, but I don't know if I can do that this time.

Now, in the synthetic case, .16s at size 6_000, 1.5s at size 60_000 is only when building with ocamlc.opt -i -w -a, ie no warnings and no compilation.
Without the -i, it's 1.6s at size 6_000 and 2m52s at size 60_000.

@vicuna
Copy link
Author

vicuna commented Apr 26, 2016

Comment author: @maranget

Thank you for the clarification. My first try would be with a better data structure for divide and see if this is worth the trouble.

As to warnings I do not think we can make it better. The reason is given at the beginning of reference "Warning for Pattern-Mathcing above" :
Assume that checking the usefulness of a clause is linear in size, then performing it for each clause in PM has a quadratic behaviour...

@vicuna
Copy link
Author

vicuna commented May 3, 2016

Comment author: @alainfrisch

As to warnings I do not think we can make it better.

The current algorithm might be quadratric, but I don't see how this justify that one cannot do better. If one could represent in a compact symbolic form the set of all values matched by previous clause, one might check a specific pattern in almost constant time. It doesn't seem difficult to derive an algorithm to compile and check all warnings, all in almost linear time, for the case reported here (a long "enum" type with one pattern per constructor).

@vicuna
Copy link
Author

vicuna commented Feb 18, 2017

Comment author: @xavierleroy

Ins't pattern-matching exhaustiveness an NP-complete problem? If so, @Frisch is being very optimistic indeed...

At any rate, I find the reported compilation times quite reasonable for a use of pattern-matching that is not (reasonable). (If you have 3500 constructors in a single datatype, you need to re-think how you model your problem in terms of types.) I move to resolve/suspend this issue.

@vicuna
Copy link
Author

vicuna commented Mar 10, 2017

Comment author: @mshinwell

I would like to disagree. Large numbers of variant constructors arise easily when, for example, binding large external C APIs into OCaml code. I think it's increasingly important as the use of the language becomes more widespread that people don't run into corner cases where the complexity of algorithms in the compiler becomes a limiting factor. One minute to compile any source file on a modern machine is a very long time. Not even the example in #7456 takes that long (about 30 seconds), and that is enormous.

@vicuna
Copy link
Author

vicuna commented Mar 10, 2017

Comment author: @xavierleroy

First, we need to grow a sense of priorities and carefully pick our fights against NP-complete problems.

Second, I stand by my remark that 3500 constructors in a single data type is a modeling error, and you shouldn't end up with that even when binding to "large external C APIs".

@github-actions
Copy link

github-actions bot commented May 9, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 9, 2020
@maranget
Copy link
Contributor

maranget commented May 9, 2020

I ran the experiment three years later, the behaviour is still quadratic, with one minute of CPU time for N=27000. I am in favor of closing.
exp01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants