Skip to content

Instantly share code, notes, and snippets.

@tarleb
Created May 22, 2020 06:28
Show Gist options
  • Save tarleb/afee1b1d97e52aca888f410e77b3624a to your computer and use it in GitHub Desktop.
Save tarleb/afee1b1d97e52aca888f410e77b3624a to your computer and use it in GitHub Desktop.
Filter to highlight some authors in the bibliography
local List = require 'pandoc.List'
local utils = require 'pandoc.utils'
local stringify = utils.stringify
function highlighter(given_name_pattern, family_name_pattern)
local highlight_author = function (author)
local given = author.given and stringify(author.given)
local family = author.family and stringify(author.family)
if given and given:match(given_name_pattern) and
family and family:match(family_name_pattern) then
author.given = {pandoc.Strong(setmetatable(author.given, nil))}
author.family = {pandoc.Strong(setmetatable(author.family, nil))}
end
return author
end
return function(reference)
if reference.author and reference.author.map then
reference.author = reference.author:map(highlight_author)
end
return reference
end
end
function Pandoc (doc)
local meta = doc.meta
local fh = io.popen(
"pandoc-citeproc --bib2yaml "
.. stringify(meta.bibliography)
)
if io.type(fh) ~= 'file' then return end
local bibyaml = fh:read('*a')
fh:close()
local references = pandoc.read(bibyaml).meta.references
meta.bibliography = nil
meta.references = references:map(
highlighter(stringify(meta['given-name-pattern']),
stringify(meta['family-name-pattern']))
)
return utils.run_json_filter(
pandoc.Pandoc(doc.blocks, meta),
'pandoc-citeproc'
)
end
@tarleb
Copy link
Author

tarleb commented May 22, 2020

Usage

Define bibliography and family/given name patterns in metadata:

---
bibliography: my-articles.bib
given-name-pattern: Jane
family-name-pattern: Doe
---

@gnpan
Copy link

gnpan commented Feb 6, 2021

An alternative I use, as suggested by EBkysko and BP in pandoc-discuss, will highlight e.g. Smith, J. in all references (pandoc.Underline or pandoc.Emph can be used instead of pandoc.Strong). Maybe you could modify it so that surname and initials are defined in the metadata, although it's easy to just replace them in the filter:

local highlight_author_filter = {
  Para = function(el)
    if el.t == "Para" then
    for k,_ in ipairs(el.content) do
      if el.content[k].t == "Str" and el.content[k].text == "Smith,"
      and el.content[k+1].t == "Space"
      and el.content[k+2].t == "Str" and el.content[k+2].text:find("^J.") then
          local _,e = el.content[k+2].text:find("^J.")
          local rest = el.content[k+2].text:sub(e+1) 
          el.content[k] = pandoc.Strong { pandoc.Str("Smith, J.") }
          el.content[k+1] = pandoc.Str(rest)
          table.remove(el.content, k+2) 
      end
    end
  end
  return el
  end
}

function Div (div)
  if 'refs' == div.identifier then
    return pandoc.walk_block(div, highlight_author_filter)
  end
  return nil
end

Notes:

  1. The above works if you use an author-date format csl. If you want to use a numeric format csl (e.g. ieee.csl or nature.csl) you will need to substitute Span for Para in the filter, i.e.:
  Span = function(el)
    if el.t == "Span" then
  1. If you also want to use the multiple-bibliographies lua filter, it should go before the author highlight filter. And 'refs' should be 'refs_biblio1' or 'refs_biblio2' etc, depending on how you have defined them:
function Div (div)
  if 'refs' or 'refs_biblio1’ or 'refs_biblio2== div.identifier then

For pdf output, you will also need to add -V csl-refs in the pandoc command if you use a numeric format csl.

  1. The filter highlights Smith, J. if formated in this order by the csl. Some csl will use this format for the first author and then switch to J. Smith for the rest, so you will have to adjust the filter accordingly adding an extra if el.content[k].t == "Str”…etc. Converting to .json first will help to check the correct formatting in the AST.

@tarleb
Copy link
Author

tarleb commented Feb 6, 2021

The advantage of @gnpan's approach is that it also works with newer pandoc versions which have the citeproc converter built-in.

@hanlonmt
Copy link

I've gotten @gnpan's approach to work well, but I'm having trouble trying to capitalize two initials (i.e. Smith, J.L.). I've tried a bunch of things, but nothing is working well. Any tips?

@gnpan
Copy link

gnpan commented Jun 17, 2021

Do you mean that you have both Smith, J. and Smith, J.L.? In this case, add a second block, e.g.:

local highlight_author_filter = {
  Para = function(el)
    if el.t == "Para" then
    for k,_ in ipairs(el.content) do
      if el.content[k].t == "Str" and el.content[k].text == "Smith,"
      and el.content[k+1].t == "Space"
      and el.content[k+2].t == "Str" and el.content[k+2].text:find("^J.L.") then
          local _,e = el.content[k+2].text:find("^J.L.")
          local rest = el.content[k+2].text:sub(e+1) 
          el.content[k] = pandoc.Strong { pandoc.Str("Smith, J.L.") }
          el.content[k+1] = pandoc.Str(rest)
          table.remove(el.content, k+2) 
      end
    
      if el.content[k].t == "Str" and el.content[k].text == "Smith,"
      and el.content[k+1].t == "Space"
      and el.content[k+2].t == "Str" and el.content[k+2].text:find("^J.") then
          local _,e = el.content[k+2].text:find("^J.")
          local rest = el.content[k+2].text:sub(e+1) 
          el.content[k] = pandoc.Strong { pandoc.Str("Smith, J.") }
          el.content[k+1] = pandoc.Str(rest)
          table.remove(el.content, k+2) 
   
      end
    end
  end
  return el
  end
}

@hanlonmt
Copy link

Thanks @gnpan. I tried making the same mods that you suggested previously and tried your code, but got these errors.

Error running filter highlight_author_filter.lua:

PandocLuaError "Error during function call: highlight_author_filter.lua:23: attempt to perform arithmetic on a nil value (local 'e')\nstack traceback:\n\thighlight_author_filter.lua:23: in function <highlight_author_filter.lua:6>\n\t[C]: in ?\n\t[C]: in field 'walk_block'\n\thighlight_author_filter.lua:41: in function 'Div'"
stack traceback:
	[C]: in field 'walk_block'
	highlight_author_filter.lua:41: in function 'Div'
Error: pandoc document conversion failed with error 83
Execution halted

I ended up switching to a citation style without periods between initials and got it to work fine, so I'm sticking with that.

@gnpan
Copy link

gnpan commented Jun 17, 2021

It may be worth using pandoc -t json first and check the resulting file to see how the initials and periods are shown in the AST.
Can't help with the error, maybe Albert has some ideas.
George

@tarleb
Copy link
Author

tarleb commented Jun 17, 2021

Sounds like the find patterns are inconsistent: the one in the if condition probably returns a truthy value, but the latter one doesn't match.

@kjayhan
Copy link

kjayhan commented Sep 13, 2022

Any updates on this? Including [highlight-author.lua] in the _output.yml throws the following error:

Error running filter lua/highlight-author.lua:
PandocLuaError "Cannot get Attr from TypeNil"
stack traceback:
lua/highlight-author.lua:32: in function 'Pandoc'
Error: pandoc document conversion failed with error 83
Execution halted

While "local highlight_author_filter = {
Para = function(el) ..." approach does not do the highlighting trick for me.

Thanks in advance.

@gnpan
Copy link

gnpan commented Sep 13, 2022

The highlight-author.lua filter only works with old versions of pandoc that use pandoc-citeproc. The other filter should work, but you have to be careful about how your name and initials are output by the csl you are using. Try converting to a .json file first to see how your csl treats initials, commas, periods, etc and make sure that you copy the strings correctly into the filter. I did a quick search in Scopus for Ayhan and created a short bibliography to test (as .json file, can also be .bib) - not sure if this is you, but you can adjust accordingly. I include below a text with two references (save as test.txt), the lua filter modified with your name (save as test.lua), the bibliography (save as biblio.json) and a csl file that I use to create a cv with my name underlined (so it puts all the details in the citation, not in a bibliography at the end - save as CiteOnly.csl). The command:

pandoc --citeproc --bibliography=biblio.json --csl=CiteOnly.csl -L test.lua -o test.pdf test.txt

should give you the correct result with name underlined - or use -o test.html if you don't want pdf. Then try with your csl file, if it doesn't work, convert to .json (pandoc -t json etc.), check how your initials are formated and adjust the lua filter.

Files:
test.txt

@Ayhan2022872

@Varpahovskis202252

test.lua

function Inline (el)
  if el.t == "Cite" then
    for k,_ in ipairs(el.content) do

      if el.content[k].t == "Str" and el.content[k].text == "Ayhan,"
      and el.content[k+1].t == "Space"
      and el.content[k+2].t == "Str" and el.content[k+2].text:find("^K.J.") then

          local _,e = el.content[k+2].text:find("^K.J.")
          local rest = el.content[k+2].text:sub(e+1) 
          el.content[k] = pandoc.Underline { pandoc.Str("Ayhan, K.J.") }
          el.content[k+1] = pandoc.Str(rest)
          table.remove(el.content, k+2) 
      end
    end
  end
  return el
end

biblio.json

[
	{
		"id": "Ayhan2022872",
		"type": "article-journal",
		"container-title": "Journal of Asian and African Studies",
		"DOI": "10.1177/00219096211035800",
		"issue": "4",
		"note": "tex.document_type: Article\ntex.source: Scopus",
		"page": "872-893",
		"title": "Exploring global korea scholarship as a public diplomacy tool",
		"URL": "https://www.scopus.com/inward/record.uri?eid=2-s2.0-85112764468&doi=10.1177%2f00219096211035800&partnerID=40&md5=b0445b994fdf178b3635cd521b10abe9",
		"volume": "57",
		"author": [
			{
				"family": "Ayhan",
				"given": "K.J."
			},
			{
				"family": "Gouda",
				"given": "M."
			},
			{
				"family": "Lee",
				"given": "H."
			}
		],
		"issued": {
			"date-parts": [
				[
					"2022"
				]
			]
		}
	},
	{
		"id": "Varpahovskis202252",
		"type": "article-journal",
		"container-title": "Place Branding and Public Diplomacy",
		"DOI": "10.1057/s41254-020-00177-0",
		"issue": "2",
		"note": "tex.document_type: Article\ntex.source: Scopus",
		"page": "52-64",
		"title": "Impact of country image on relationship maintenance: a case study of Korean Government Scholarship Program alumni",
		"URL": "https://www.scopus.com/inward/record.uri?eid=2-s2.0-85089368484&doi=10.1057%2fs41254-020-00177-0&partnerID=40&md5=5f1f27dc8b88674ee4d38560f1d2eab9",
		"volume": "18",
		"author": [
			{
				"family": "Varpahovskis",
				"given": "E."
			},
			{
				"family": "Ayhan",
				"given": "K.J."
			}
		],
		"issued": {
			"date-parts": [
				[
					"2022"
				]
			]
		}
	}
]

CiteOnly.csl

<?xml version="1.0" encoding="utf-8"?>
<style xmlns="http://purl.org/net/xbiblio/csl" version="1.0" class="in-text" default-locale="en-US" demote-non-dropping-particle="sort-only" page-range-format="expanded">
  <info>
    <title>CiteOnly</title>
  </info>
  <macro name="author-short">
    <names variable="author">
      <name form="short" and="text"/>
    </names>
  </macro>
  <macro name="author-count">
    <names variable="author">
      <name form="count"/>
    </names>
  </macro>
  <macro name="author">
    <names variable="author">
      <name name-as-sort-order="all" initialize-with="." and="text" delimiter-precedes-last="always"/>
    </names>
  </macro>
  <macro name="issued">
    <date variable="issued">
      <date-part name="year"/>
    </date>
  </macro>
  <macro name="publisher">
    <group prefix="(" delimiter=": " suffix=")">
      <text variable="publisher-place"/>
      <text variable="publisher"/>
    </group>
  </macro>
  <macro name="editor">
    <names variable="editor">
      <name initialize-with="." and="text" delimiter-precedes-last="always"/>
      <label form="short" prefix=", "/>
    </names>
  </macro>
  <citation et-al-min="21" et-al-use-first="20">
    <sort>
      <key macro="author-short" names-min="1" names-use-first="1"/>
      <key macro="author-count" names-min="3" names-use-first="3"/>
      <key macro="author" names-min="3" names-use-first="1"/>
      <key macro="issued"/>
      <key variable="title"/>
    </sort>
    <layout suffix=".">
      <group delimiter=" ">
        <text macro="author"/>
        <text macro="issued" prefix="(" suffix=")."/>
        <choose>
          <if type="article article-magazine article-newspaper article-journal review" match="any">
            <text variable="title" suffix="."/>
            <text variable="container-title" form="short" text-case="title" font-weight="bold"/>
            <group delimiter=", ">
              <text variable="volume" font-style="italic"/>
              <text variable="page"/>
            </group>
          </if>
          <else-if type="chapter paper-conference" match="any">
            <text variable="title" suffix="."/>
            <text variable="container-title" prefix="In " suffix="," text-case="title"/>
            <text macro="editor"/>
            <text macro="publisher" suffix=","/>
            <label variable="page" form="short"/>
            <text variable="page"/>
          </else-if>
          <else-if type="thesis">
            <text variable="title" suffix="."/>
            <text variable="genre" suffix="."/>
            <text variable="publisher"/>
          </else-if>
          <else>
            <text variable="title"/>
            <text macro="publisher"/>
          </else>
        </choose>
      </group>
    </layout>
  </citation>
</style>

@kjayhan
Copy link

kjayhan commented Sep 22, 2022

Thank you very much for very detailed instructions. I should have probably mentioned that I am a markdown novice. And this is first time touching a lua file. I tried adding test.lua to my folder and to _output.yml here, but it didn't produce any result either (probably I should change something more).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment