Skip to content

Instantly share code, notes, and snippets.

@EliahKagan
Last active January 23, 2023 05:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save EliahKagan/16a6b13ee717149ab41f983b3635659c to your computer and use it in GitHub Desktop.
Save EliahKagan/16a6b13ee717149ab41f983b3635659c to your computer and use it in GitHub Desktop.
CodeFenceCleanup - scripts for finding posts with broken code fences
# Cached Python modules
__pycache__/
# Output files
out[0-9]
# This branch has been rebased to not track .csv files. To keep it that way...
*.csv

This gist is mirrored as a full (non-gist) repo, which may be easier to work with.


CodeFenceCleanup - scripts for finding posts with broken code fences

Written in 2020 by Eliah Kagan <degeneracypressure@gmail.com>.

To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.

You should have received a copy of the CC0 Public Domain Dedication along with this software. If not, see http://creativecommons.org/publicdomain/zero/1.0/.

These are scripts for finding broken code fences.

(This is intended to supersede that gist, which is less complete.)

This is based on another repository (from which that gist is also copied). This is rebased to omit .csv files that were obtained from SEDE. Those files could probably be redistributed, but under the appropriate CC licenses for user-contributed content on Stack Exchange, not under this public domain dedication. To my recollection, that repository was never published.

I believe I did the rebase correctly, but in case some commit has a .csv file, I am not claiming authorship of its contents, nor attempting to offer it under CC0.


The only script here that is currently being used is filter-fences.

To use it, give it CSV data on standard input. For example:

./filter-fences <QueryResults.csv >outfile

Previous versions of filter-fences hard-coded the input filename (and did not read from standard input), but the current version works this way instead.

The other script that was once useful is expand.py, but that approach to generating T-SQL is no longer being used. Instead, a more general (and much nicer) query is used and filtered (way!) down by filter-fences.

The other script, search.py, is just a bit of scratchwork.

CC0 1.0 Universal
Statement of Purpose
The laws of most jurisdictions throughout the world automatically confer
exclusive Copyright and Related Rights (defined below) upon the creator and
subsequent owner(s) (each and all, an "owner") of an original work of
authorship and/or a database (each, a "Work").
Certain owners wish to permanently relinquish those rights to a Work for the
purpose of contributing to a commons of creative, cultural and scientific
works ("Commons") that the public can reliably and without fear of later
claims of infringement build upon, modify, incorporate in other works, reuse
and redistribute as freely as possible in any form whatsoever and for any
purposes, including without limitation commercial purposes. These owners may
contribute to the Commons to promote the ideal of a free culture and the
further production of creative, cultural and scientific works, or to gain
reputation or greater distribution for their Work in part through the use and
efforts of others.
For these and/or other purposes and motivations, and without any expectation
of additional consideration or compensation, the person associating CC0 with a
Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
and publicly distribute the Work under its terms, with knowledge of his or her
Copyright and Related Rights in the Work and the meaning and intended legal
effect of CC0 on those rights.
1. Copyright and Related Rights. A Work made available under CC0 may be
protected by copyright and related or neighboring rights ("Copyright and
Related Rights"). Copyright and Related Rights include, but are not limited
to, the following:
i. the right to reproduce, adapt, distribute, perform, display, communicate,
and translate a Work;
ii. moral rights retained by the original author(s) and/or performer(s);
iii. publicity and privacy rights pertaining to a person's image or likeness
depicted in a Work;
iv. rights protecting against unfair competition in regards to a Work,
subject to the limitations in paragraph 4(a), below;
v. rights protecting the extraction, dissemination, use and reuse of data in
a Work;
vi. database rights (such as those arising under Directive 96/9/EC of the
European Parliament and of the Council of 11 March 1996 on the legal
protection of databases, and under any national implementation thereof,
including any amended or successor version of such directive); and
vii. other similar, equivalent or corresponding rights throughout the world
based on applicable law or treaty, and any national implementations thereof.
2. Waiver. To the greatest extent permitted by, but not in contravention of,
applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
and Related Rights and associated claims and causes of action, whether now
known or unknown (including existing as well as future claims and causes of
action), in the Work (i) in all territories worldwide, (ii) for the maximum
duration provided by applicable law or treaty (including future time
extensions), (iii) in any current or future medium and for any number of
copies, and (iv) for any purpose whatsoever, including without limitation
commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
the Waiver for the benefit of each member of the public at large and to the
detriment of Affirmer's heirs and successors, fully intending that such Waiver
shall not be subject to revocation, rescission, cancellation, termination, or
any other legal or equitable action to disrupt the quiet enjoyment of the Work
by the public as contemplated by Affirmer's express Statement of Purpose.
3. Public License Fallback. Should any part of the Waiver for any reason be
judged legally invalid or ineffective under applicable law, then the Waiver
shall be preserved to the maximum extent permitted taking into account
Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
is so judged Affirmer hereby grants to each affected person a royalty-free,
non transferable, non sublicensable, non exclusive, irrevocable and
unconditional license to exercise Affirmer's Copyright and Related Rights in
the Work (i) in all territories worldwide, (ii) for the maximum duration
provided by applicable law or treaty (including future time extensions), (iii)
in any current or future medium and for any number of copies, and (iv) for any
purpose whatsoever, including without limitation commercial, advertising or
promotional purposes (the "License"). The License shall be deemed effective as
of the date CC0 was applied by Affirmer to the Work. Should any part of the
License for any reason be judged legally invalid or ineffective under
applicable law, such partial invalidity or ineffectiveness shall not
invalidate the remainder of the License, and in such case Affirmer hereby
affirms that he or she will not (i) exercise any of his or her remaining
Copyright and Related Rights in the Work or (ii) assert any associated claims
and causes of action with respect to the Work, in either case contrary to
Affirmer's express Statement of Purpose.
4. Limitations and Disclaimers.
a. No trademark or patent rights held by Affirmer are waived, abandoned,
surrendered, licensed or otherwise affected by this document.
b. Affirmer offers the Work as-is and makes no representations or warranties
of any kind concerning the Work, express, implied, statutory or otherwise,
including without limitation warranties of title, merchantability, fitness
for a particular purpose, non infringement, or the absence of latent or
other defects, accuracy, or the present or absence of errors, whether or not
discoverable, all to the greatest extent permissible under applicable law.
c. Affirmer disclaims responsibility for clearing rights of other persons
that may apply to the Work or any use thereof, including without limitation
any person's Copyright and Related Rights in the Work. Further, Affirmer
disclaims responsibility for obtaining any necessary consents, permissions
or other rights required for any use of the Work.
d. Affirmer understands and acknowledges that Creative Commons is not a
party to this document and has no duty or obligation with respect to this
CC0 or use of the Work.
For more information, please see
<http://creativecommons.org/publicdomain/zero/1.0/>
#!/usr/bin/env python3
"""Expand (what's intended to be) a T-SQL expression for searching."""
def expand(text, pattern, replacement, min_count, max_count):
"""Expands \n and a single replacement with repetitions."""
for count in range(min_count, max_count + 1):
print(text.replace(pattern, replacement * count)
.replace(r'\n', "' + CHAR(10) + '")
.replace(r'\N', "' + CHAR(13) + CHAR(10) + '"))
if __name__ == '__main__':
expand(text=r"OR ph.Text LIKE '%\n```{}[^a-z\N]%'",
pattern=r'{}',
replacement=r'[^\N]',
min_count=0,
max_count=8)
#!/bin/sh
# filter-fences - Uses csvgrep to find posts that may have broken code fences.
# shellcheck disable=SC2016 # The operand to csvgrep -r is meant literally.
csvgrep -c Text -r '(?mx) # Match ^ on each line. ("." does not match "\n".)
^[ \t]*(([`~])\2{2,})(?!`)(?!.*\1) # Opening line of a code fence.
[a-z]* # Lowercase letters, maybe.
(?:[^a-z\s] # Either (a) non-whitespace non-lowercase, OR
|[ \t]+\S) # (b) blanks followed by non-whitespace.
' |
csvcut -c 'Post Link' |
grep -oP '(?:""id"": |^)\K\d+' | # Support both "Post Link" variants.
sed 's@.*@https://askubuntu.com/posts/&/revisions@'
#!/usr/bin/env python3
"""Searches posts in CSV format for signs of malformed code fences."""
import csv
with open('code-fences-with-lost-text-on-the-first-line.csv', newline='') as f:
reader = csv.reader(f)
for _ in range(2):
print(next(reader)[-1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment