Currently, SwiftParser is not able to parse incrementally. Considering the performance, instead of reparsing the entire file every time, we could speed up the parsing procedure with the information provided by old AST, which is an ability provided by the old parser.
In the old implementation, this feature was only applied to CodeBlockList
and MemberDeclList
. In the new implementation, It would be nice to apply node reusing to other syntax collections e.g. exprList, functionParameterList, etc.
As mentioned in Alex's Proposal, it would be good to allow more fine-grained reusable nodes via expanding the definition of nodeAtCursorCanBeReused
to allow reusing nodes such as FunctionDeclSyntax
where only identifier is changed.
To sum up, I would like to propose the following ideas as this year's GSoC project:
- Port the incremental re-parsing feature to SwiftParser
- Expand the old design from only reusing
CodeBlockList
andMemberDeclList
to all syntax collection type - Allow more fine-grained resuable nodes
Enable incremental re-parsing:
- Add an optional
IncrementalParseTransition
as a new member to the Parser class. - Add a member function
loadCurrentSyntaxNodeFromCache
to Parser as a wrapper for thelookUP
function and to advance cursor.
When users call
Parser.parse
to parse a code snippet, they could construct an instance ofIncrementalParseTransition
with the AST of last parse and the modified position and pass it to enable incremental parsing.
Provide early exit mechanism: Call loadCurrentSyntaxNodeFromCache
at parsing functions as an early exit mechanism to speed up parsing.
Expand the definition of nodeAtCursorCanBeReused
: Add new rules to determine whether nodes can be reused.
In terms of correctness of incremental re-parsing, it is convenient to make use of SourceEdit
class to create multi bogus modifications. Since SyntaxProtocol
conforms to CustomDebugStringConvertible
, we could just compare description string of the old AST and the new AST to determine whether the file is correctly parsed.
Then, we could check the ByteSourceRange
and Syntax
pair in IncrementalParseReusedNodeCollector
to determine whether the reusable nodes conform to expectations.
To test the parsing performance at different scales, I plan to make use of SwiftSyntaxBuilder
to generate a branch of testcases of different sizes and view the parsing performance on these cases.
- Add incremental parsing ability to Parser
// Parser.swift
class Parser {
...
public let parseTransition: IncrementalParseTransition?
...
public init(_ input: String, maximumNestingLevel: Int? = nil, parseTransition: IncrementalParseTransition? = nil) {
...
self.parseTransition = parseTransition
}
}
// for Incremental Parse
extension Parser {
func loadCurrentSyntaxNodeFromCache(for kind: SyntaxKind) -> Syntax? {
guard let parseTransition = self.parseTransition else { return nil }
var lookUpHelper = IncrementalParseLookup(transition: parseTransition)
let currentOffset = self.lexemes.getOffsetToStart(self.currentToken)
if let node = lookUpHelper.lookUp(offset, kind: kind),
let nextToken = lexemes.advance(by: node.byteSize) {
self.currentToken = nextToken
return node
}
return nil
}
}
// LexemeSequence.swift
extension Lexer {
public struct LexemeSequence: IteratorProtocol, Sequence, CustomDebugStringConvertible {
...
public func getOffsetToStart(_ token: Lexer.Lexeme) -> Int {
return self.sourceBufferStart.distance(to: token.cursor)
}
public mutating func advance(by offset: Int) -> Lexer.Lexeme? {
assert(offset > 0)
var consumedOffset = 0
var result: Lexer.Lexeme?
while consumedOffset < offset {
consumedOffset += self.nextToken.byteLength
result = self.advance()
if self.cursor.isAtEndOfFile {
break
}
}
return result
}
...
}
// Parser+Entry.swift
extension Parser {
public static func parse(source: String, parseTransition: IncrementalParseTransition? = nil) -> SourceFileSyntax {
var parser = Parser(source, parseTransition: parseTransition)
return SourceFileSyntax.parse(from: &parser)
}
public static func parse(source: UnsafeBufferPointer<UInt8>, maximumNestingLevel: Int? = nil, parseTransition: IncrementalParseTransition? = nil) -> SourceFileSyntax {
var parser = Parser(source, maximumNestingLevel: maximumNestingLevel, parseTransition: parseTransition)
return SourceFileSyntax.parse(from: &parser)
}
...
}
- Provide early return for parsing (Same for other syntax collection)
// Declaration.swift
@_spi(RawSyntax)
public mutating func parseFuncDeclaration(_ attrs: DeclAttributes,
_ handle: RecoveryConsumptionHandle) -> RawFunctionDeclSyntax {
if let syntaxNode = self.loadCurrentSyntaxNodeFromCache(for: .functionDecl),
// TODO: an unnecessary Syntax->RawSyntax cast, maybe a more elegant way?
let returnNode = RawFunctionDeclSyntax(syntaxNode.raw) {
return returnNode
}
...
}
- Add new rules to judge whether
cursor.node
can be reused
// IncrementalParseTransition.swift
fileprivate func nodeAtCursorCanBeReused(prevPosition: AbsolutePosition,kind: SyntaxKind) -> Bool {
...
for edit in edits.edits {
...
if kind.hasIdentifier, edit.intersectsIdentifierRange() {
continue
}
...
}
...
}
- Correctness test
...
let oldTree = try! SyntaxParser.parse(source: source)
let sourceEdit = SourceEdit(range: ByteSourceRange(offset: offset, length: length), replacementLength: replacementLength)
let reusedNodeCollector = IncrementalParseReusedNodeCollector()
let transition = IncrementalParseTransition(previousTree: oldTree, edits: ConcurrentEdits(sourceEdit), reusedNodeDelegate: reusedNodeCollector)
let newTree = try! SyntaxParser.parse(source: newSource, parseTransition: transition)
...
// correctness for incremental parse
XCTAssertEqual("\(newTree)", newSource)
// correctness for reused nodes
for (range, node) in rangeAndNodes {
checkRange(range)
checkNode(node)
}
- Performance test
...
let fibonacci = try FunctionDeclSyntax("func fibonacci(_ n: Int) -> Int") {
StmtSyntax("if n <= 1 { return n }")
StmtSyntax("return fibonacci(n - 1) + self.fibonacci(n - 2)")
}
let oldTree = SourceFileSyntax {
for i in 0...1000 {
fibonacci
}
}
let source = "\(oldTree)"
let newTree = try! SyntaxParser.parse(source: source, parseTransition: transition)
XCTAssertEqual("\(oldTree)", "\(newTree)")
...