Skip to content

Instantly share code, notes, and snippets.

@twneale
Created June 5, 2014 17:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save twneale/4c4bc80c35ed7c1fef40 to your computer and use it in GitHub Desktop.
Save twneale/4c4bc80c35ed7c1fef40 to your computer and use it in GitHub Desktop.
US Code thread
usc25.xml:
- The xml for /us/usc/t25/s450l (Contract or grant specifications) contains a model agreement with its own internal
Thom Neale <twneale@gmail.com>
Apr 13
to Katherine
Hi Katherine, please forgive the earlier email fragment; I accidentally sent if before I was done typing.
Thom Neale <twneale@gmail.com>
Apr 13
to Katherine
Hi Katherine,
Are the USLM identifiers (described on page 40 of the schema user manual http://uscode.house.gov/download/resources/USLM-User-Guide.pdf) designed to be unique? I found a number of them that aren't (in the attached file). I researched two of these to understand why they're happening:
This first was /us/usc/t25/s450l (Contract or grant specifications). This element contains an embedded model agreement with its own internal divisions. Those internal divisions have inaccurate identifiers; all of them begin with /us/usc/t25/s1, which is clearly a bug, since the model agreement is a part of /us/usc/t25/s450l.
The second was Title 21, section 812. This section lays out several schedules of controlled substances in subsection (c). In this case, the schedules also have their own internal paragraph numbering schemes, which are unrelated to the structure of section 812. Because each schedule's paragraph numbering scheme starts anew at (a), the identifiers for all the paragraph (a)'s in each schedule have the same identifiers. Moreover, the identifier for schedule 1, paragraph (a) is /us/usc/t21/s812/a, which suggests paragraph (a) is a child of section 812, when in reality it's a child of 812(c).
Last question--is there a bug tracker were it would be more convenient for me report issues like this? Thank you for your time,
Thom Neale
Attachments area
Text
identifiers.txt
Lane, Katherine <Katherine.Lane@mail.house.gov>
Apr 16
to me
Mr. Neale:
Thank you for your question on the USLM identifiers. We are looking into the issue and will get back to you soon with a detailed response. Our apologies for any inconvenience this may have caused you.
We appreciate your emails. They keep us on our toes and help us identify and resolve problems with our website.
Thanks, again.
Katherine Lane
Assistant Counsel
Office of the Law Revision Counsel
U.S. House of Representatives
(202) 226-9053
From: Thom Neale [mailto:twneale@gmail.com]
Sent: Sunday, April 13, 2014 9:46 PM
To: Lane, Katherine
Subject: Re: US Code release point 113-88 issues
Thom Neale <twneale@gmail.com>
Apr 16
to Katherine
It's my pleasure to help test out this otherwise uniquely high-quality dataset; finding one or two unresolved edge cases is a small inconvenience, if any. Thank you for your reply,
Thom
Lane, Katherine <Katherine.Lane@mail.house.gov>
May 2
to me
Mr. Neale:
The USLM identifier issue is taking longer than expected. We are working with our contractor and will provide a detailed answer to your question as soon as we can. It may be several more weeks. If your question is urgent, please let us know and we will see if we can move it up on the priority project list.
For now, emails either to me or to uscode@mail.house.gov are the best way to contact us with questions or comments about the website or the U.S. Code.
Thank you for your patience.
Katherine Lane
From: Thom Neale [mailto:twneale@gmail.com]
Sent: Wednesday, April 16, 2014 1:46 PM
Lane, Katherine
Jun 3 (2 days ago)
to me
Thom:
The identifier attribute is designed to be unique, but uniqueness cannot be guaranteed at this time. The identifier attribute is meant to reflect the numbering that exists in the text. Most of the time the identifier attribute is unique, but there are some duplicates. This generally occurs where the section text structure is non-traditional. For instance, if two subsections (a) have been enacted in a section 1234, they will both get the identifier /us/usc/tXX/s1234/a. When a user or program asks for /us/usc/tXX/s1234/a, the user or program will get two results.
In your examples, the identifiers should not have been duplicates. The reason for these duplicates is that the conversion is not yet handling non-traditional structures within section text. Your examples contain an "insertion" of "external content" into the text of the section, making the structure non-traditional. As you point out, the model agreement embedded in 25 USC 450l (c) and the schedules of controlled substances inserted in 21 USC 812(c) have their own structures independent of the section text.
The id attribute, on the other hand, is unique and the schema enforces the uniqueness.
The US Code in XML is being created by converting free form text with GPO photocomposition codes (locators) into XML and not by an XML editing environment. Because of this there are some limitations in regards to its content. We are working on creating a native US Code in XML editing environment.
We do not currently have a publicly available bug-tracker. Please continue to email us with any questions or comments. We value your insight.
Thank you for your patience. I hope this answers your questions.
Katherine Lane
Assistant Counsel
Office of the Law Revision Counsel
U.S. House of Representatives
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment