samcv/grant_.md Secret

## grant_.md

      
    Raw
  

              grant_.md
            
          
    Improving the Robustness of Unicode Support in Rakudo on MoarVM

Name

Samantha McVey (samcv)
Where can we contact you?

...
Synopsis

Implement Unicode Collation Algorithm, improve speed and spec conformance of
the text normalizer. Improve test coverage for Unicode specs and document our
compliance or lack of compliance with the Unicode spec.
Benefits to Perl 6 Development

As Perl 6 starts to take off, it is increasingly important to provide robust
Unicode support. Perl 6 already provides some of the best Unicode support on many
levels compared to other programming languages. The goal of this project is to
make Perl 6's Unicode support production ready.
Deliverables/Project Details

General


Document any deficits of our Unicode coverage in the course of work on this
project. This is very important
due to the vastness of the Unicode standard. Deficits should have tests
written, unless such a thing would not be possible to test or input is needed
from the rest of the Perl 6 team. In any of these cases, they will be
documented in my reports for future and current developers of Perl 6 to reference.


Tests will be written to cover all of the relevant Unicode 9.0 test files, as
well as making current ones more robust when checking the breaking of graphemes.


Unicode Names


Hangul Syllables and other Unicode names need to be programmatically determined
when generating the Unicode database.


Add support for Unicode 1 names. (This is lowest on the priority list, as
I had already implemented Name Aliases)


Unicode Collation Algorithm


Fully implement the Unicode Collation Algorithm at least for language
nonspecific sorting.


Assess needs in supporting language and country specific collation.


Text Normalization


Improve the performance of the text normalizer and also allow the normalizer
to save state across multiple characters to properly support Grapheme Breaking
for all of Unicode 9.0 and beyond.

Unicode Database Generation


The script used to generate the Unicode database shall be made deterministic,
and produce the same output file on every run.
At the current time ~1/2 of the file changes even if no changes are made to the
script. This is an issue that will be solved.


Rewrite the Perl 5 script used to generate the Unicode database in Perl 6.
This is also part of the previous item, since a rewrite is needed, it should be
done in Perl 6 to help make it more maintainable.

Work toward the rewrite has been occuring here.


Implement all relevant remaining Unicode properties from Unicode 9.0. This
includes the properties needed to support the deliverables listed above.


Try to reduce the memory footprint of the Unicode database. Currently
the unicode.o binary file created is 4.1MB. I hope to cut that in half.


Project Schedule

1 1/2 months
When can you begin work?

Can begin work as soon as possible.
Report Schedule

Reports will be made on my blog at https://cry.nu, which will be syndicated at pl6anet.org
How frequently will these updates be made?

Reports will be submitted every week.
Public Repository

Code will be stored in the MoarVM, NQP and Rakudo repositories, although work in progress may
happen on my own public forks before being merged into these repositories. Change logs will
be viewable on github.com.
Grant Deliverables ownership/copyright and License Information

Same as MoarVM/Rakudo/NQP (Artistic 2.0)
Who and/or which organization will have ownership (copyright) of the grant deliverables?

The Perl Foundation will have copyright of the deliverables.
Bio

Although I am a fairly recent addition to the Perl 6 core developers, in a short few months I have been very busy.
I have two Perl 6 modules, IRC::TextColor and URL::Find and I am the lead developer of the Perl 6
syntax highlighter for Atom/Github as well as for docs.perl6.org. I converted the site from using
the old Pygments highlighter to the new highlighter.
My contributions to Perl 6 have
been focused on Unicode support in Perl 6, making changes throughout Rakudo, NQP and
MoarVM to achieve this.
All of the work I have already done on improving Unicode support in Perl 6
shows I am capable of completing this project and am the best person for this grant.
In addition, I have already started work on rewriting the Unicode Database
generation and shrinking the size of the data needed to be loaded on startup.
Unicode already done within the last few months

Tests:


Fixed several errata in roast related to our Unicode support, which had often
been present for a long time.
Added a test based on GraphemeBreakTest.txt from Unicode and many others
to Unicode 9.0
Updated other tests for Unicode 9.0 and reworked others for compliance.

MoarVM:


Implemented part of the Unicode Collation Algorithm.
Added support for named codepoint sequences, which includes the Named Sequences,
Emoji Sequences and Emoji ZWJ Sequences.

"\c[woman gesturing OK]")


Implemented Unicode Name Aliases in getting codepoints by name
Implemented the 'Extend' Grapheme_Cluster_Break property which was new in Unicode 9.0.
We previously had no support for this property.
Implemented many other Grapheme_Cluster_Break fixes and added support for
most Emoji sequences.
Improved the speed of radix 50% non-ASCII decimal digits
Improved the speed of text normalization, making slurping a Unicode heavy text
file 14% faster
Added a multitude of properties to our Unicode database.

Rakudo:


Added support for a large number Unicode properties, handling Bool/Str/Int
return types for uniprop
Implemented uniprops method in Rakudo

Country of Residence

USA
Nationality

USA
Amount Requested

$50 USD / Hour * 100 hours a month = $5,000.
Total is $7,500 for 1.5 months.
Okay to publish proposal? :

Yes
Suggestions for Grant Manager

moritz