public
Last active

Java Byte Code to Parrot Byte code Proposal for GSoC 2011

  • Download Gist
JVMtoParrot.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
Ben Batha
Email: bbatha@u.rochester.edu
Google ID: applied under: elektronjunge
prefered contact: bhbatha
Other contact info: bbatha on irc.parrot.org #parrot
 
Your Project's Title: Java Byte Code to Parrot
 
Abstract
The focus of this project is to build a translator from Java Byte Code to Parrot friendly targets (PIR).
By building a parser of JVM code and using a translation table where possible and AST based code generation
where the table falls short. The resulting compiler should pair with an existing Java Standard library for
new code development and run previously written JVM applications.
 
 
Benefits to the Parrot VM and Open Source Community
 
 
This project would be a boon to both the Parrot and larger open source community. The JVM is becoming a
target for many languages due to the popularity of Java, unfortunately the JVM is largely target for Java
development and is limited for other languages. For instance, the JVM does not support dynamic typing
which makes porting many scripting languages to the JVM difficult. Parrot can provide the same role as the
JVM in many cases, but with out the weight of Java and Oracle to drag it down. Doing a direct translation
of Java byte code would guarantee compatibility of existing Java applications and would include support for
existing JVM languages like Scala and Groovy "for free." This foundation though not optimal for Java ports
allows for simple porting of old code and guarantees a robust solution that can keep up with the development
of new JVM languages.
 
Deliverables
 
A library with a table and functions for conversion from Java bytecode to PIR.
A tool for .class file conversion. This tool has several pieces which are described in detail in the Project
Details section.
A tool for .jar file conversion. This may or may not be completed due to time constraints.
A test suite with example libraries and programs that have been translated using JUnit and Rossella where
appropriate.
Complete documentation with a particular focus on how to extend the project so future improvements can be
made using Javadoc and included html pages.
Build infrastructure for major platforms. Development focus will be Mac OSX and linux.
 
 
Project Details
 
The project will consist of three pieces.
 
1. A table based translator -- this will be broken into several pieces
a) expression converter -- converts assignments, register allocations, operations, etc.
b) method converter -- converts calls and definitions
c) class converter -- converts class definitions and object creation
d) common pool converter -- converts the Java common pool to Parrot compatible types and fixes UTF-8
strings.
2. Class and Jar file conversion tools
3. Test and build suite
 
The table based translator will be implemented as a library designed so access to PIR statements is simple
so if future projects like a CLI converter or a Java bytecode to PBC library does not have to rewrite code.
 
The class file converter will use the library to find appropriate translations for read in files and output
the translation. The jar file converter will use the class file converter on the contained classes and
resolve classpath needed to compile the jar and the manifest of the jar into Parrot equivalents.
 
The test and build suite will have tests for a variety of conversions to Java. These will be based on the
how the project works and will be chosen at later date. There will be tests that test each completed feature
of the converter. The build system will be completed to build on Unix based systems with Parrot and Java
and will integrate with the tests.
 
<Problem Resolution>
1. Plan ahead to foresee potential problems
2. Interface with Parrot and Java compiler communities for assistance
3. Try different implementation of problematic code sections
4. Revert to backup plan
 
<Backup plan>
If it seems impossible to do a direct translation via a table, I will need to develop a compiler to parse
Java bytecode and generate an AST and generate code from there. Tentatively, this can be done using PAST and
other Parrot tools. Criteria for making this decision will be based on number of incompatible features and
the scope these features. The scope of these problems will be determined by how other translations are
 
Project Schedule
March 27 - April 8:
Finish proposal. Get familiar with Java bytecode specification and code bases. Begin to get familiar with
Parrot. Discuss proposal with mentors.
 
April 8 - May 23:
Begin familiarizing myself with Parrot internals and Java bytecode specification. The best rout for the
development of a parser. Determine viability of each of the Java standards (Java 7 versus Java 6). Discuss
these with mentor and the Parrot community. Find contacts in the Java community to discuss with as well.
Begin to find places where direct translation is impossible.
 
May 23: Begin work.
May 30: Write file reading framework. Begin expression translation.
Milestone: Reads java code. Translates basic expressions. Document this.
 
June 6: Finish expression translation. Control flow translation.
Milestone: Finish up work from previous week. Expression and control flow translation. Tests to go with
this. Documentation matches completed work.
 
June 13: Potential absence for the work week, I should be around on the weekends to make up for any
major set back to the project. The worst case is I will be a week behind. If there is no absence then I can
begin work on more advanced Java features sooner, or have more time for debugging, etc.
 
June 20: Begin working on method translation.
Milestone: Translation of basic method translation.
 
June 27: Method translation of more complex cases (polymorphism, etc.). Begin work on class translation.
Milestone: Translation of complex methods. Documentation for method translator. Test suite includes
tests of method translation.
 
July 4: Continue work on translation of classes.
Milestone: Simple class translation (no inheritance, interface, static abstract, etc).
 
 
July 11: Interim review. Finish class translation for more complex cases.
Milestone: Basic code should convert properly. Test suite for classes and previous functions.
Documentation to this point.
 
July 18: Begin work on conversion of the common pool. Finish remaining work on class translation and
documentation, and tests.
Milestone: Classes complete.
 
July 25: Continue work on converting the common pool. Begin finding algorithm to de-mangle UTF-8
strings.
Milestone: non-UTF-8 strings convert properly. Tests to test strings of different types.
 
August 1: Begin work on Jar file conversion.
Milestone: Proper conversion of UTF-8 strings, update tests and documentation to reflect changes.
 
August 8: Make a build system around jar file conversion. Find or create test code to test beyond
JUnit tests. Finish any incomplete sections of the project.
Milestone: Finish jar file conversion.
 
August 15: FEATURE FREEZE. Finish debugging and continue to find and create test cases.
Milestone: Complete build and test system. Complete documentation
 
August 22: Final submission
Milestone: Complete GSOC project.
 
References and Likely Mentors
???
 
IRC Discussions:
Discussed with Whiteknight
 
License
All Parrot projects should use the Artistic 2.0 license.
 
Bio
I am a sophomore computer science student at the University of Rochester. I have taken the following
computer science classes:
Programming Language Design and Implementation, this covered the design of compiler front ends and basics of
optimization. In this class I wrote a compiler for C.
Computation and Formal Systems,
Data Structures and Algorithms
Artificial Intelligence
Discrete Math
Linear Algebra & Differential Equations
Calculus I & II
 
I have experience with Java, C/C++, Python, Ruby, FORTRAN, and a host of small scientific languages, and I
am familiar with Linux/Unix. While I do not have familiarity with open source development I do use a variety
of pieces of open source soft ware and have participated in private sector software development. In the
past, I have been responsible in the past for projects of this size in my previous employment at Los Alamos
National Laboratory, where over three summers I completed three projects:
1) Rewrote an old hydrodynamic code from FORTRAN77 to FORTRAN95.
2) Developed a graphing tool for use with a rad-hydro code
3) Developed a graphing tool and scripting interface for use with a different rad-hydro code.
 
 
Eligibility
I am an eligible student who is a U.S. citizen and has the documentation to prove it.

Do we need anything as heavy as a real "parser" to read bytecode? It seems like once we have a tool for reading bytecode and finding the code segments, we would be able to walk the bytecode in a linear way to do table-based translations.

PBC is too hard a target to use for this project, we don't have a good PBC generation tool besides Parrot itself (Creating a good library to do that would be the same size as a GSoC project, and might even make a good project idea in it's own right). What is your target language going to be? PIR? Something higher like winxed/NQP? Explain your decision.

On June 13 you mention that potential absence. Explain how that will affect your schedule.

Break up some of the deliverables into smaller milestones: How long is it going to take you to read the constant pool and translate that? The class definitions?

You mention you might fall back from a table-based translator to a PAST-based translator. When will you make that decision? What will be your criteria for deciding? The earlier, the better because this is a major change in goals. You should probably provide two timelines since the two projects will be very different.

Also, is this translator going to work on .class files? .jar files?

Java Class files do weird mangling of UTF-8 strings in the constant pool. Is un-mangling these string literals something that is part of your goals for the summer, or do we leave that for a later time (I ask because this could turn out to be non-trivial, and we want to know what your priorities are in case time gets tight).

See my most recent comments on https://gist.github.com/889902. Those comments are relevant to this proposal as well.

Hi is the code ready? Can we somewhere see the code? Or if it is not ready when can we see it?

This project wasn't accepted to GSOC. I had another job so I did no work on it.

hmm...too sad. anyways the project did looked promising to me. Well best of luck to you and thanks for your reply

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.