Skip to content

Instantly share code, notes, and snippets.

@moewew
Created April 5, 2019 19:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save moewew/df9eb6e4f730350084b9a3fb371621a9 to your computer and use it in GitHub Desktop.
Save moewew/df9eb6e4f730350084b9a3fb371621a9 to your computer and use it in GitHub Desktop.
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[style=authoryear-comp]{biblatex}
\addbibresource{biblatex-examples.bib}
\renewbibmacro*{cite}{%
<\the\value{overallcitecount}/\the\value{overallcitetotal}>
[\the\value{citecount}/\the\value{citetotal}]}
\begin{document}
% leaking counts with \cite (probably the smuggling?)
A \cite{sigfridsson,worman,nussbaum}
B \cite{sigfridsson}
C \cites{sigfridsson,sigfridsson}
D \cites{sigfridsson}{worman}
E \cites{sigfridsson,sigfridsson}
RESET \setcounter{overallcitecount}{0}\setcounter{overallcitetotal}{0}
% comparison with \cites
A \cites{sigfridsson}{worman}{geer}
B \cite{sigfridsson}
C \cite{sigfridsson,worman}
D \cite{sigfridsson}
RESET \setcounter{overallcitecount}{0}\setcounter{overallcitetotal}{0}
% \forcsvlist over the list of keys does not drop duplicates, hence
% may give inflated values
A \cites{sigfridsson,worman,sigfridsson}
\end{document}
@PhelypeOleinik
Copy link

Hello moewe,

I'm sorry for the epic delay to answer this. I've been busy with my dissertation project.

Continuing our discussion, I had some debugging time (more than I'd like to admit, but it was fun :) looking at the internals of BibLaTeX and managed to get your test file working properly :D

The leaking counter problem was, as you said, the smuggling. Well, not the smuggling, but the smuggler :-)
It worked fine for \cites because the multicite commands have two extra grouping levels which the normal citation commands don't. Thus, the smuggling for a normal citation command was making a (sort of) global assignment to the overallcitecount. More precisely, the last \endgroup you see in the definition of \blx@defcitecmd@v, around which the smuggling of overallcitecount happens, is a bottom level \endgroup for \cite, but not for \cites. I made the smuggling conditional, so it should work now.

The sorting problem is not that easy, unfortunately. As you said the multicite commands work in two passes, and the sorting is done only in the second pass, after the multicitetotal is already known, which is too late to count the overallcitetotal. Of course the overallcitetotal could be “estimated” (as I was doing in the previous version) then “corrected” at each citation group, after the sorting was done, but I think this would make it pretty pointless.

So I made what here in Brazil we would call a “gambiarra” :) (actually I made two, for choosing).
The top-level macro responsible for setting up, then sorting and compressing the citation list is \blx@citeloop. Since most of the assignments in this macro are local I used it to sort the citation list while counting the multicitetotal so we can count the overallcitetotal as well in one go. The counting seems to be correct and there are apparently no side effects. At least I compiled my dissertation project with this modified version of biblatex.sty and it the produced PDF is identical to the one with the biblatex.sty I didn't change :)

All in all, the code seems to work. The downside is the redundancy of sorting each citation list twice, but the sorting code doesn't seem to be that slow.

So here's the modified BibLaTeX code: https://pastebin.com/rDduuqf4
and the test document: https://pastebin.com/tf4QM8qg


I have another idea which might spare us from the sorting redundancy, but requires more intrusive changes to BibLaTeX (though nothing that drastic, I think). When you use a multicite command (you probably know this already, but) BibLaTeX gets more or less this on the input stream:

\begingroup
  \a@bunch@of@initialisation@macros
  % Here \blx@multiparse reads all the citation groups and inserts them in a `\blx@tempe'
  % macro, which then expands to something like this:
    \blxmcites {2}{}{}%
    \blxmciteicmd {1}{cite}{}{}{sigfridsson,sigfridsson}{}%
    \setunit {\multicitedelim }%
    \blxmciteicmd {2}{cite}{}{}{worman,sigfridsson}{}%
    % and more, one for each citation group...
    \blxendmcites
\endgroup

The \blx@tempe macro is incremented with each citation group in the \blx@multicite@add macro (in which I added the extra sorting to handle the overallcitetotal). With a little more modification to \blx@multicite@add we could make the \blx@tempe macro (and consequently all further citation processing) use the already sorted citation list. This would basically move the sorting for multicite commands to an earlier stage than for other citation commands. For me this doesn't seem to have any further effect, but I'm not 100% sure.

If you would like to move forward with this overallcitetotal thing I can implement the idea above, just say the word.


For now this is it.

Hope you're well.

Best, Phelype

@moewew
Copy link
Author

moewew commented May 18, 2019

Thank you very much for looking into this again. And sorry for not replying sooner.

The smuggling business still feels a bit weird to me especially given the different levels of grouping that we have to care about (as you found out yourself). So global variables still feel a bit safer to me, but that would mean initialising those variables properly, which is probably also going to be a hassle. ...

Of the two options for the sorting thingy I think I prefer option 2. Option 1 is very clever and I'd probably prefer it for an answer on TeX.SX, but if this goes into the package itself it feels wrong to manipulate the macros in such a way. It might be worth thinking about splitting the bits of the original loop macro that we need off into a new macro and use that as a basis for the normal cite and the multicite loop. That way we could avoid the code duplication by externalising the shared code to a macro that is used in both situations. (Come to think of it that might in the end even look similar to option 1.)

The point about sorting multicites before things even start is very relevant and in fact I also thought that it might be worth trying to tackle overallcitecount and sorting-multicites (plk/biblatex#214) together. Unfortunately, the last time I tried to look into sorting multicites there were too many details that made my head spin, so I gave up. The main issue is that each 'multicite group' may contain multiple citations (\cites[34]{sigfridsson,worman}[cf.][45]{nussbaum,geer}) and it is not clear to me how this should be sorted: Should we sort only the groups or the entire list of keys over all groups? How should a group with several items sort (by first item in the unsorted list, by first item when sorted; how would we break ties ...)? Plus, we need to keep track of the pre- and postnotes.

@PhelypeOleinik
Copy link

PhelypeOleinik commented May 18, 2019

Don't worry about the time, I'm doing this for procrastination fun, so there's no need to hurry :)

About the smuggling, I think it's mostly safe, the issue was that I completely overlooked normal citation commands in my first attempt. But I think switching to a global counter is relatively easy. The initialisation of the counter is done in non-global mode anyhow, so I would just need to make the assignments global. The only additional thing that would be necessary (I think) is a cleanup at the end of the citation command to reset the counter to zero (or some special value, say -1, to indicate an “uninitialised” value). On the other hand one might want the value of overallcitetotal after the citation command for whatever reason, so perhaps the cleanup should be in the beginning of the citation command. I'm open to suggestions here.

About the sorting, if we were to change more of BibLaTeX's code it would be ideal to have a dedicated sorting macro which would do exclusively that, so it could be safely used anywhere without clever misuse of macros :) I'd need to understand exactly what each part of \blx@citeloop does to split the code without breaking anything. Now that we're at it: does BibLaTeX have a set of tests which I can use to check how much code I have broken?

I'm not sure I understood correctly what you meant with the sorting of multicite commands. If I did, I think that the sorting of should remain as it is: each group sorted individually (ie, \cites{b,a}{b,b} would be \cites{a,b}{b}) and the groups should be left as input (ie, \cites{b}{a} would remain that way). I think that sorting over all groups is problematic, first because the ambiguity of \cites{a,b}{b}: would it be \cites{a}{b} or \cites{a,b}? This would get worse with pre- and postnotes because the citations could get misplaced relative to the notes. Furthermore, from what I understand the whole purpose of the multicite commands is for the user to separate them in logical groups, so I think this should remain. What could be done, however, is to issue a warning in cases like \cites{a,b}{b} saying that b is duplicated. If things are done this way the pre- and postnotes shouldn't be a problem at all when sorting the multicite groups (I might have overlooked something, though. I do that often :).

I'll leave you my mail if you want to reach me there: moc.liamg ⟨at⟩ 1o.h.ehp (reversed :)

P.S.: I subscribed to this Gist to get notifications :)

@moewew
Copy link
Author

moewew commented May 19, 2019

There is precedence for using global variables and (re-)initialising them at the beginning of the relevant commands (at least on a .cbx level, see for example https://github.com/plk/biblatex/blob/master/tex/latex/biblatex/cbx/authoryear-icomp.cbx), so that would not be unnatural.

Unfortunately, we only have a limited test suite written in Perl. https://github.com/plk/biblatex/tree/dev/obuild. I usually run it as follows from the root of the git folder

./obuild/build.sh install 3.13 ~/texmf
./obuild/build.sh test
./obuild/build.sh testoutput

You'll need a matching version of Biber in your PATH. If you start from current dev that would be the dev version from sourceforge (https://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/development/binaries/). If you start from master Biber 2.12 should be fine.

The test suite only runs on the files in https://github.com/plk/biblatex/tree/dev/doc/latex/biblatex/examples and thus only tests a very limited subset of biblatex's functionality. Joseph worked on l3build tests a while ago (https://github.com/plk/biblatex/tree/l3build-tests), but since most things we'd want to test are typesetting-related and not of the pure function evaluation type that proved difficult.

Sorting multicites is a feature request that comes up from time to time. People see that \cite{sigfrisson,worman,nussbaum,geer} gets sorted if the turn on sortcites=true and then also want \cites{sigfrisson}{worman}{nussbaum}{geer} to give the same result (plk/biblatex#214 is a bit low on details, but https://tex.stackexchange.com/q/65809/35864 has an MWE that shows what I mean). I thought that if we are digging in the implementation of the multicite commands anyway and have to look at sorting we might as well look if something like this is feasible. But there are quite a few conceptual issues here.

@PhelypeOleinik
Copy link

Okay, I'll make the counter global with initialisation at the beginning of the citation command.

The test suite is something to start with, I'll use it. Random thought: testing typesetting is tricky, but I think that the output of \loggingoutput might be something to start with. Given the same settings the boxes TeX make should remain the same. Plus, with pdfTeX and LuaTeX it is possible to have reproducible PDFs (https://tex.stackexchange.com/q/440270/134574), so it might be used as well. I'll learn how to use l3build to do something about that.

I think that sorting the multicites will be awkward, but it can be done. However I believe that it should not be activated by sortcites. I think that a new option (a good name would be IKnowThatThisIsn'tThePurposeOfMulticitesButI'llDoThisAnyway ;-) should be added to make that work. Certainly possible, given an enough amount of code :)

I'm working on it. Right now I'm trying to understand how the sorting algorithm works to start messing around with it. As soon as I have something new I send you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment