StevenWong12/(Draft Proposal) Implement Incremental Re-Parsing in SwiftParser.md

## (Draft Proposal) Implement Incremental Re-Parsing in SwiftParser.md

      
    Raw
  

              (Draft Proposal) Implement Incremental Re-Parsing in SwiftParser.md
            
          
    Overview

Currently, SwiftParser is not able to parse incrementally. Considering the performance, instead of reparsing the entire file every time, we could speed up the parsing procedure with the information provided by old AST, which is an ability provided by the old parser.
In the old implementation, this feature was only applied to CodeBlockList and MemberDeclList. In the new implementation, It would be nice to apply node reusing to other syntax collections e.g. exprList, functionParameterList, etc.
As mentioned in Alex's Proposal, it would be good to allow more fine-grained reusable nodes via expanding the definition of nodeAtCursorCanBeReused  to allow reusing nodes such as FunctionDeclSyntax where only identifier is changed.
To sum up, I would like to propose the following ideas as this year's GSoC project:

Port the incremental re-parsing feature to SwiftParser
Expand the old design from only reusing CodeBlockList and MemberDeclList to all syntax collection type
Allow more fine-grained resuable nodes

Implementation

Enable incremental re-parsing:

Add an optional IncrementalParseTransition as a new member to the Parser class.
Add a member function loadCurrentSyntaxNodeFromCache to Parser as a wrapper for the  lookUP function and to advance cursor.


When users call Parser.parse to parse a code snippet, they could construct an instance of IncrementalParseTransition with the AST of last parse and the modified position and pass it to enable incremental parsing.

Provide early exit mechanism: Call loadCurrentSyntaxNodeFromCache at parsing functions as an early exit mechanism to speed up parsing.
Expand the definition of nodeAtCursorCanBeReused: Add new rules to determine whether nodes can be reused.
Testing

Correctness

In terms of correctness of incremental re-parsing, it is convenient to make use of SourceEdit class to create multi bogus modifications. Since SyntaxProtocol conforms to  CustomDebugStringConvertible, we could just compare description string of the old AST and the new AST to determine whether the file is correctly parsed.
Then, we could check the ByteSourceRange and Syntax  pair in IncrementalParseReusedNodeCollector to determine whether the reusable nodes conform to expectations.
Performance

To test the parsing performance at different scales, I plan to make use of SwiftSyntaxBuilder to generate a branch of testcases of different sizes and view the parsing performance on these cases.
Demo


Add incremental parsing ability to Parser

// Parser.swift
class Parser {
    ...
    public let parseTransition: IncrementalParseTransition?
    ...

    public init(_ input: String, maximumNestingLevel: Int? = nil, parseTransition: IncrementalParseTransition? = nil) {
        ...
        self.parseTransition = parseTransition
    }
}

// for Incremental Parse
extension Parser {
    func loadCurrentSyntaxNodeFromCache(for kind: SyntaxKind) -> Syntax? {
    guard let parseTransition = self.parseTransition else { return nil }

    var lookUpHelper = IncrementalParseLookup(transition: parseTransition)
    let currentOffset = self.lexemes.getOffsetToStart(self.currentToken)
    if let node = lookUpHelper.lookUp(offset, kind: kind),
       let nextToken = lexemes.advance(by: node.byteSize) {
            self.currentToken = nextToken
            return node
        }
        return nil
    }			  
}
// LexemeSequence.swift
extension Lexer {
    public struct LexemeSequence: IteratorProtocol, Sequence, CustomDebugStringConvertible {
    ...
        public func getOffsetToStart(_ token: Lexer.Lexeme) -> Int {
            return self.sourceBufferStart.distance(to: token.cursor)
        }

        public mutating func advance(by offset: Int) -> Lexer.Lexeme? {
            assert(offset > 0)
            var consumedOffset = 0
            var result: Lexer.Lexeme?

            while consumedOffset < offset {

                consumedOffset += self.nextToken.byteLength
                result = self.advance()
                if self.cursor.isAtEndOfFile {
                    break
                }

            }
            return result
        }
    ...
}

// Parser+Entry.swift
extension Parser {
    public static func parse(source: String, parseTransition: IncrementalParseTransition? = nil) -> SourceFileSyntax {
        var parser = Parser(source, parseTransition: parseTransition)
        return SourceFileSyntax.parse(from: &parser)
    }


    public static func parse(source: UnsafeBufferPointer<UInt8>, maximumNestingLevel: Int? = nil, parseTransition: IncrementalParseTransition? = nil) -> SourceFileSyntax {
        var parser = Parser(source, maximumNestingLevel: maximumNestingLevel, parseTransition: parseTransition)
        return SourceFileSyntax.parse(from: &parser)
    }
    ...
}

Provide early return for parsing (Same for other syntax collection)

// Declaration.swift
@_spi(RawSyntax)
public mutating func parseFuncDeclaration(_ attrs: DeclAttributes,
_ handle: RecoveryConsumptionHandle) -> RawFunctionDeclSyntax {
    if let syntaxNode = self.loadCurrentSyntaxNodeFromCache(for: .functionDecl),
    // TODO: an unnecessary Syntax->RawSyntax cast, maybe a more elegant way?
      let returnNode = RawFunctionDeclSyntax(syntaxNode.raw) {
        return returnNode
    }
    ...
}

Add new rules to judge whether cursor.node can be reused

// IncrementalParseTransition.swift
fileprivate func nodeAtCursorCanBeReused(prevPosition: AbsolutePosition,kind: SyntaxKind) -> Bool {
    ...
    for edit in edits.edits {
        ...
        if kind.hasIdentifier, edit.intersectsIdentifierRange() {
            continue
        }
        ...
    }
    ...
}

Correctness test

...
let oldTree = try! SyntaxParser.parse(source: source)
let sourceEdit = SourceEdit(range: ByteSourceRange(offset: offset, length: length), replacementLength: replacementLength)
let reusedNodeCollector = IncrementalParseReusedNodeCollector()
let transition = IncrementalParseTransition(previousTree: oldTree, edits: ConcurrentEdits(sourceEdit), reusedNodeDelegate: reusedNodeCollector)
let newTree = try! SyntaxParser.parse(source: newSource, parseTransition: transition)
...

// correctness for incremental parse
XCTAssertEqual("\(newTree)", newSource)

// correctness for reused nodes
for (range, node) in rangeAndNodes {
    checkRange(range)
    checkNode(node)
}

Performance test

...
let fibonacci = try FunctionDeclSyntax("func fibonacci(_ n: Int) -> Int") {
    StmtSyntax("if n <= 1 { return n }")
    StmtSyntax("return fibonacci(n - 1) + self.fibonacci(n - 2)")
}

let oldTree = SourceFileSyntax {
    for i in 0...1000 {
        fibonacci
    }
}

let source = "\(oldTree)"
let newTree = try! SyntaxParser.parse(source: source, parseTransition: transition)
XCTAssertEqual("\(oldTree)", "\(newTree)")

...