This is a report of what I did to have "make doc-stage-1" complete successfully. A previous report is at: https://ao2.it/tmp/lilypond-guile2/NOTES.txt Checkout the lilypond master branch: git clone git://git.sv.gnu.org/lilypond.git lilypond.git cd lilypond.git Create a new test branch git checkout -b guile-v2-local-tests Do we still need the changes in the dev/guilev2 and dev/guilev21 branches? Cherry pick them: git cherry-pick 1001e63b2d54cc970374f5be98f7f31e26554c69 git cherry-pick 122525fbd9f95521b80487fe7567f3d38871ebc5 Or equivalently apply patches 0001 and 0002 from https://ao2.it/tmp/lilypond-guile2 Apply patch 0003: https://ao2.it/tmp/lilypond-guile2/0003-Update-changes-from-commit-122525f-Keep-GUILEv2-from.patch Apply patch 0004: https://ao2.it/tmp/lilypond-guile2/0004-Fix-the-GUILE-autoconf-variable-substitution-with-gu.patch Run ./autogen.sh enabling guile2: ./autogen.sh --enable-guile2 --enable-debugging --disable-checking --disable-optimising --prefix=$PWD/build make Just to be sure, reset the locale when running lilypond (I had LANG=it_IT.utf8), otherwise the eps files might end up containing numbers formatted with commas as a decimal separator and that will make the conversion from eps to pdf fail with a message like this: ------------------------------------------------------------------------------- $ gs -q -dNOSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -r1200 -sDEVICE=pdfwrite -sOutputFile=/home/ao2/Proj/debian/Src/lilypond/lilypond.git/out/lybook-db/94/lily-32d78681.pdf -c.setpdfwrite -f/home/ao2/Proj/debian/Src/lilypond/lilypond.git/out/lybook-db/94/lily-32d78681.eps Error: /undefined in 7,3977 Operand stack: Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_pop --nostringval-- 1966 1 3 %oparray_pop 1852 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- Dictionary stack: --dict:1207/1684(ro)(G)-- --dict:0/20(G)-- --dict:113/200(L)-- Current allocation mode is local Last OS error: No such file or directory Current file position is 8634 GPL Ghostscript 9.19: Unrecoverable error, exit code 1 ------------------------------------------------------------------------------- I noticed the above when running lilypond directly from the command line, but just to be safe and avoid that particular issue, reset the locale also when calling "make": LANG=C make doc-stage-1 The build starts but at some point it halts with a Segmentation Fault: ------------------------------------------------------------------------------- Processing `/home/ao2/Proj/debian/Src/lilypond/lilypond.git/out/lybook-db/57/lily-2c7e8231.ly' Parsing... Renaming input to: `markup-cyclic-reference.ly' ... Writing /home/ao2/Proj/debian/Src/lilypond/lilypond.git/out/lybook-db/57/lily-2c7e8231-systems.count...Errore di segmentazione ------------------------------------------------------------------------------- This can be reproduced with: $ LANG=C ./out/bin/lilypond input/regression/markup-cyclic-reference.ly Even by looking at the backtrace from gdb: https://ao2.it/tmp/lilypond-guile2/gdb_backtrace_SIGSEGV_markup-cyclic-reference.ly.log I could not find the root cause, however from an higher level I observed that the issue happens when a cyclic function is called _twice_. In fact, a patch like the following works around the segfault: ------------------------------------------------------------------------------- diff --git a/input/regression/markup-cyclic-reference.ly b/input/regression/markup-cyclic-reference.ly index 82bfe06..b94023d 100644 --- a/input/regression/markup-cyclic-reference.ly +++ b/input/regression/markup-cyclic-reference.ly @@ -23,4 +23,4 @@ not crash LilyPond with an endless loop" \markup { \cycle "a" } -\markup { \cycleI "a" } +%\markup { \cycleI "a" } ------------------------------------------------------------------------------- Starting from the observation that the segfault was triggered by the _second_ cyclic markup command I thought it might be about some unsafe state left when the max cycle depth was reached the first time, I looked where the max depth was used, and ended up looking at lily/text-interface.cc, specifically at Text_interface::interpret_markup(). After reading the manual about the Dynamic Wind functionality https://www.gnu.org/software/guile/manual/html_node/Dynamic-Wind.html I noticed that in one of the exit paths the dynamic extent was not ended properly. Fixing that seems to fix the segfault, see: https://ao2.it/tmp/lilypond-guile2/0005-Fix-ending-the-dynamic-extent-in-Text_interface-inte.patch A small patch is also need to markup-cyclic-reference.ly: https://ao2.it/tmp/lilypond-guile2/0006-Fix-the-expected-warning-with-guile-2-in-markup-cycl.patch After this, it looks like "make doc-stage-1" goes on, no more guile errors on the console but at some point one lilypond invocation gets stuck: after more than one hour it would not finish yet. If I attach gdb to it and print the backtrace I get this: ------------------------------------------------------------------------------- (gdb) bt #0 0x00002abb762e2e13 in st_resize_port (pt=0x55b39540b4e0, new_size=0) at strports.c:111 #1 0x00002abb762e2f8d in st_flush (port=0x55b392370210) at strports.c:145 #2 0x00002abb762e30bb in st_write (port=0x55b392370210, data=0x2abb76342f7a, size=1) at strports.c:172 #3 0x00002abb762a6333 in scm_lfwrite (ptr=0x2abb76342f7a "(", size=1, port=0x55b392370210) at ports.c:1581 #4 0x00002abb7627059f in scm_puts (s=0x2abb76342f7a "(", port=0x55b392370210) at ../libguile/inline.h:132 #5 0x00002abb762ace1f in scm_iprlist (hdr=0x2abb76342f7a "(", exp=0x55b392a7d8e0, tlr=41, port=0x55b392370210, pstate=0x55b3923da780) at print.c:1348 #6 0x00002abb762ab01b in iprin1 (exp=0x55b392a7d8e0, port=0x55b392370210, pstate=0x55b3923da780) at print.c:617 #7 0x00002abb762aabae in scm_iprin1 (exp=0x55b392a7d8e0, port=0x55b392370210, pstate=0x55b3923da780) at print.c:546 #8 0x00002abb762abc5d in scm_prin1 (exp=0x55b392a7d8e0, port=0x55b392370210, writingp=1) at print.c:848 #9 0x00002abb762ad30d in scm_write (obj=0x55b392a7d8e0, port=0x55b392370210) at print.c:1461 #10 0x00002abb762f64f3 in vm_regular_engine (vm=0x55b391b80de0, program=0x55b391b71080, argv=0x7fff5a4f3a30, nargs=2) at vm-i-system.c:858 #11 0x00002abb76316f08 in scm_c_vm_run (vm=0x55b391b80de0, program=0x55b391b71080, argv=0x7fff5a4f3a20, nargs=2) at vm.c:761 #12 0x00002abb762464bb in scm_call_2 (proc=0x55b391b71080, arg1=0x55b392a7d8e0, arg2=0x55b392370210) at eval.c:493 #13 0x000055b38f6c6a14 in ly_scm_write_string[abi:cxx11](scm_unused_struct*) (s=0x55b392a7d8e0) at lily-guile.cc:59 #14 0x000055b38f6c8002 in print_scm_val[abi:cxx11](scm_unused_struct*) (val=0x55b392a7d8e0) at lily-guile.cc:392 #15 0x000055b38f6c8769 in type_check_assignment (sym=0x55b391dcf420, val=0x55b392a7d8e0, type_symbol=0x55b392540640) at lily-guile.cc:444 #16 0x000055b38f62a9f8 in Grob::internal_get_property_data (this=0x55b39474ea80, sym=0x55b391dcf420) at grob-property.cc:151 #17 0x000055b38f62add7 in Grob::internal_get_pure_property (this=0x55b39474ea80, sym=0x55b391dcf420, start=0, end=2147483647) at grob-property.cc:194 #18 0x000055b38f62aec4 in Grob::internal_get_maybe_pure_property (this=0x55b39474ea80, sym=0x55b391dcf420, pure=true, start=0, end=2147483647) at grob-property.cc:212 ... #88 ... ------------------------------------------------------------------------------- The full backtrace is also in https://ao2.it/tmp/lilypond-guile2/gdb_backtrace_lilypond_stuck_in_a_loop.log The problem can pre reproduced with this script: https://ao2.it/tmp/lilypond-guile2/reproduce_lockup.sh By following the gdb backtrace and adding some printfs (https://ao2.it/tmp/lilypond-guile2/__ly_scm_write_string_LOCKUP_PRINTFs.diff), I verified that the problem was that scm_call_2() would never return, this happened when it was called from print_scm_val() in type_check_assignment(). I don't know the root cause but I was able to work around the issue and get on with the compilation, see: https://ao2.it/tmp/lilypond-guile2/0007-XXX-Avoid-the-lockup-in-ly_scm_write_string.patch This makes the warnings look like the following: warning: type check for `staff-staff-spacing' failed; value `PLACEHOLDER' must be of type `list' However, even after this issue is solved lilypond still gives errors when non-ASCII characters are used in macro names, the issue can be reproduced with: https://ao2.it/tmp/lilypond-guile2/catalan_example.ly Maybe a regression test about this issue could be added in input/regression. A workaround is to only use ASCII characters in macro names: https://ao2.it/tmp/lilypond-guile2/0008-XXX-avoid-errors-with-non-ASCII-characters-in-macros.patch After that last patch "make doc-stage-1" and "make doc" complete successfully. MISC MINOR ISSUES. Some things I noticed during my journey: 1. lily/main.cc contains a conditional section guarded by GUILE2, but this code is never executed because the build system defines GUILEV2 (with the 'V'). Can that section be removed? Compilation seems to have worked even without it. 2. I don't remember the exact setup to reproduce the issue but one time I spotted a deprecation message: "`scm_take_str' is deprecated. Use scm_take_locale_stringn instead." And I see that there is one instance of "scm_take_str" in the code. 3. There are indications about using "string->utf16" instead of "ly:encode-string-for-pdf" with guile-2.0 in scm/framework-ps.scm and lily/pdf-scheme.cc. This made me notice that in ly_encode_string_for_pdf() it happens that memory obtained by g_convert() is passed to scm_take_str() which will eventually call free() on it, but g_convert may have used glib allocator functions _different_ from malloc() to obtain that memory, and that calling free() may not be appropriate on that memory.