Skip to content

Instantly share code, notes, and snippets.

@mosser
Created September 11, 2018 20:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mosser/b05ea78ba70a135d245d1d41a873c53a to your computer and use it in GitHub Desktop.
Save mosser/b05ea78ba70a135d245d1d41a873c53a to your computer and use it in GitHub Desktop.
PFE @Polytech, 2018

Projets de Fin d'Études (PFE) 2018

  • Contact: Sébastien Mosser
  • Research group: ACE is a subgroup of SPARKS, I3S (CNRS / Université Côte d'Azur).

Applying Software Composition to Agile Backlogs

Summary

Agile projects rely on Product Backlog Items (PBI), (e.g., stories), and iterations (e.g., sprints) associated to backlogs to drive project implementation. Thanks to recent results obtained at Utrecht University with respect to natural language processing applied to user stories, we propose in this project to define a compositional model that will support the development team (e.g., product owner, developers) when interacting with backlogs and PBIs.

Project Description

This project is en exploratory attempt to apply software composition mechanisms to the notion of agile backlog. The key-point here is to model classical action made on a backlog as composition operators: (i) addition of a new Product Backlog Item (PBI, e.g., a User story), (ii) extraction of a sprint backlog from a product backlog, (iii) withdrawal of a PBI from a backlog. Based on these operators, the idea is to identify properties associated to these operators (impact of PBI withdrawal, effort associated to the addition of a story) and provide feedback to product owners and development teams.

We consider for this project a Product Backlog as a set of User Stories, modelled as a (i) a persona, (ii) a mean, and (iii) an end. It is possible to use Natural Language Processing methods and tools (e.g., VisualNarrator) to extract a domain model from a set of stories. A domain model identifies the main concepts associated to the product, and the features associated to these concepts. For example, in the story "As a visitor, I can create an account so that the website remembers me", the associated domain model identifies two entities "visitor" and "account", and an action named "create" going from Visitor to Account.

When creating a backlog by adding stories one by one to a backlog, we are basically relying on an informal composition operator (select: Story x Backlog -> Backlog). Selecting a story into a sprint backlog also remove it from the product backlog (remove: Story x Backlog -> Backlog). When decomposing an epic into a set of stories, we are also using a (de)composition operator (slice: Epic -> Story*).

The objective of this project is to formalise and implement such operators, focusing on the impact of the compositions to the domain model. The resulting model can be audited (e.g., in terms of structural coverage, functional coverage, business value, development effort).

Expected Skills

  • Affinities with agile vocabulary
  • Capability to formalise at an abstract level a concrete problem
  • Good knowledge of object-oriented programming and graphs

Requirements

We will integrate the VisualNarrator tool to automate the NLP part of this project. The toolchain will store stories and conceptual models in a Graph database (Neo4J). The composition operators will operate at the graph level (e.g., using the Cypher language) to implement properly the formalised operators. We will provide a set of several product backlog (> 15) publicly available. We will also provide an access to the Mjølnnir server to analyse the Agile and Innovation projects defined by previous students.

Expected results

  • Integrated toolchain
  • composition operators available through a CLI to interact with a backlog
  • storage and reasoning based on graphs
  • object-oriented implementation.

References

Make Git-merge great again

Summary

Git is the most used version control systems. Unfortunately, by default, its merge algorithm relies on text (c)hunks and does not take into account the Abstract Syntax Tree (AST) of the underlying language. The main advantage is that anything stored as text can be versioned. The obvious drawback is that a lot of conflicts that happens when merging source code might be avoided if the merge algorithm work at the AST level.

Project Description

This project extends Florian Lehmann internship, who provided a framework to identify merge conflicts on top of Java AST, and proposed rewriting rules to solve such conflicts. He also gathered thousands of merge conflicts available on GitHub to perform experiment. In this project, we will (i) extend the conflict detection rule sets and (ii) code rewriting rules to automatically solve such conflicts. To identify the conflicts that need to be addresses, we will analyse the experimental corpus gather by Florian to identify the more frequent ones.

For example, consider a code base with a class C containing an attribute a. The first developer (named left) use a in her own code. The second developer (named right) refactored a into myPrettyAttribute. From a git point of view, there is no conflict here, as the two modification where made at different places. However, the final code does not compile. A merge algorithm working on the AST might compute a symbol table and identify the issue, automatically rewriting references to a into references to myPrettyAttribute.

From a language/compilation point of view, the project will be co-supervised with Erick Gallesio. We will also interact with Xavier Blanc's team in Bordeaux. It might be possible to extend this work into an internship.

Expected Skills

  • good knowledge of the Java programming language
  • affinity with code generation, reflexivity and metaprogramming
  • Good understanding of the Git model

Requirements

The merge algorithm will be transparently integrated into Git, which will trigger by default its textual merge and when possible the enhanced one. A classification of the regular merge conflicts will be defined, based on the experimental corpus gathered by Florian. The most frequent rules will be implemented in the framework, and validated thanks to the corpus example.

Expected results

  • enhanced ruleset to support merge conflict leveraging the AST
  • experiment to assess the benefit of using such technique

Composing Code refactoring

Summary

Software code is often subject to code refactoring. This is the essence tof test-driven development: (i) test, (ii) code, (iii) make test green, (iv) refactor. Refactoring is often performed manually, or with a support from the IDE. But the information that a given refactoring happened is lost, leading to the infamous "I told you so" versioning anti-pattern where 2 alternative design are constantly added and removed in the given system. We propose in this project to formalise the classical code refactoring and to remember their application, supporting developers at both level: (i) making easy to apply a refactoring to a given piece of code and (ii) reasoning on the sequence of refactoring made in a given project.

Project Description

We consider here the refactoring defined by Fowler in his seminal book. In the project, students will provide a framework where an expert can implement a refactoring rule. Then, according to the available catalogue, a developer will be able to select the rule to apply, configure it and automatically apply it to her source code. Then, The system will version this application in a way that the whole story will be available to support reasoning.

For example, a developer d asks the system to refactor a class C into C' (applying a renaming rule). The system will automatically apply the modification to the source code. The system will also warns the user that this class wal already named C' two month ago. The idea is to reason on the sequence of refactoring to provide a better support for end-user. In addition, it can also help conflict resolution when performing code merge (by computing the canonical sequence of refactoring to apply when refactoring were performed in ≠ branches).

It might be possible to extend this work into an internship.

Expected Skills

  • good knowledge of the Java programming language
  • affinity with code generation, reflexivity and metaprograming
  • logical reasoning

Requirements

The refactoring rules will be implemented in Java (e.g., using Spoon), and the project will cover well defined rules extracted from Fowler's book. The user will interact with the rules using a CLI as a minimal and viable product, and a possible integration into a classical IDE such as IntelliJ or Eclipse. The project will use Git as versioning control system to support refactoring versioning.

Expected results

  • a framework to express refactoring rules
  • experiments showing how the rules can be applied to Java code
  • Reasoning on sequences of refactoring rules to warn users or support better merge.

References

Composition of machine learning algorithms in Weka

Summary

Machine Learning (ML) and artificial intelligence are everywhere. Several libraries exist to support software developers who use ML techniques. However, from a newcomer point of view, it is extremely complicated to select the right approach. How to preprocess the data? Which algorithm to select? This project proposes to develop reverse-engineering methods to help newcomers to use an ML library.

Project Description

We will consider here the Weka library, which contains preprocessing and ML algorithms. The idea is to automatically extract from the Weka JAR file a model representation associated to each algorithm defined in the library: which kind of algorithms, its properties, ... These information will be automatically (as much as possible) inferred by a static analysis of the source code of the library.

Then, from an user point of view, it will be possible to define Java code that assemble these algorithms into a workflow, for example using a small language created in the project. Based on the information extracted from the library, the project will define a compositional model that support the user to (i) validate their workflow and (ii) identify properties of the global workflow based on properties associated to each separated part of the workflow.

It might be possible to extend this work into an internship.

Expected Skills

  • No ML skills are required, but being familiar to ML might help;
  • A strong taste for software engineering and compilation;
  • Not being afraid by static code analysis

Requirements

As input, the project will absorb the Weka library as a JAR file. It will automatically feed a compositional model used to support the user who will assemble Weka artefact for a given puprose.

Expected results

  • A static analyser for the Weka library
  • An experiment applying this analyser to several versions of Weka
  • a reasoning engine to automatically infer properties on workflows defined using Weka algorithms

References

Composition of Terrain Generation algorithms

Summary

Procedural map generation is a technique massively used by the video game industry to provide a better user experience for gamers. Think for example of the Diablo game, each time you enter a dungeon its map is totally re-generated to a brand new one but equivalent in terms of difficulty. In this project, we will leverage classical generation techniques to provide a compositional model supporting the process.

Project Description

RedBlog Game is an awesome resources with respect to procedural map generation. In this project, we will focus on the Island generation mechanisms fully described in this blog. An alternative implementation of this technique was used as the foundation of the Island game used at Polytech.

The goal of this project is to identify all the variation points that exists in the presented technique: polygon distribution, elevation techniques, biome repartition, river generation, place naming, ... the idea is to identify each possibility and implement it into a compassable artefact. Thus, one will be able to assemble an island generator for a given context by composing elementary bricks.

The key point here is to guarantee properties on the assembled generator. For example, a generator will be dedicated to archipelagos, and guarantee by construction that the generated map will contain at least 3 islands. Another one will be dedicated to crater lake islands, and will guarantee a low island altitude with an inner lake. Another one will focus on atoll definition.

Expected Skills

  • Good object-oriented knowledge (programming and modelling);
  • A taste for geometry (even if the project will not investigate the maths used as foundations for terrain generation)

Requirements

The project will focus on Island generation only. As an end user, one will be able to assemble elementary map generation bricks using a dedicated language to create her own generator. The produced generator will be executable, and one will obtain maps as output. The maps will be stored as JSON files using the ISLAND file format, and a 3D visualisation will be provided to visualise the result in a web browser.

Expected results

  • A configurable and composable map generator;
  • A visualiser available in a web browser;
  • An analysis of the extensions points associated to map generation tehcniques

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment