tanishiking/tasty-semanticdb.md

## tasty-semanticdb.md

      
    Raw
  

              tasty-semanticdb.md
            
          
    The future of semanticDB, TASTy, and the overlap that they have #73


ckipp01

on Aug 22
Maintainer
So I don't have a specific point to bring up for this topic, but lately on Discord there have been multiple different places where conversations about the usage of SemanticDB and TASTY are discussed. We currently have an ecosystem that heavily relies on semanticDB for tooling, but also TASTy is starting to be used in various places. There are also other tools/formats like SCIP that are using semanticDB to produce SCIP.
Especially to outsiders this web gets pretty confusing. Why do we need both? Can't X work for both use-cases, etc. I think it may be a good idea to have a session where we discuss the pros and cons of both and potentially look at solutions in tooling that only require using one these to avoid the situation where we have tools relying on both semanticDB and TASTy.

tanishiking

on Sep 7
Collaborator
We previously discussed a bit about the overlap between SemanticDB and TASTy. scalameta/scalameta#2403 (comment)
My take is: while SemanticDB still has some advantages over TASTy, and it will take some work to migrate toolings to TASTy, I would like to see toolings move towards TASTy from SemanticDB.
This is because:

The Scala 3 compiler currently generates both TASTy and SemanticDB, which is somewhat duplicative. If toolings use TASTy instead of SemanticDB, we can skip the generation of SemanticDB, which would make Scala compilation slightly faster.
As @ckipp01 mentioned, the situation where the Scala 3 compiler recommends TASTy, while many toolings rely on SemanticDB, can be confusing for new tooling developers.
Basically, TASTy has more information than SemanticDB. SemanticDB is more like a summary of the semantic analysis which reduced some information during translation from Scala compiler internal data structure into SemanticDB.

As one of the semanticdb developers, I will try to compare tasty and semanticdb (although my knowledge of tasty is limited).
Comparison between SemanticDB vs TASTy

Type Information

Both TASTy and SemanticDB export type information for symbols. However, SemanticDB encodes Scala types into its own schema, and some information is lost during this conversion. For example, SemanticDB types do not have parent types.
As a result, I have recommended that scalafix rules developers use WartRemover), which uses TASTy (for Scala 3) instead of SemanticDB. I believe wartremover is able to access more information about Scala types, which allows it to write more precise and accurate rules.

example
https://discord.com/channels/632642981228314653/632683673417678860/1042109346781335552
Will Sargent — 11/15/2022 5:10 PM
is there a way in scalafix to see if a particular type is in scope, from whereever?  So if a parent class, lexical scope, or local variable has type Foo then Scalafix can check for that?
the doc.tree will cover everything in the same class, but won't capture subclassing relationships, I think
https://scalacenter.github.io/scalafix/docs/developers/symbol-information.html#lookup-class-parents
[SymbolInformation · Scalafix](https://scalacenter.github.io/scalafix/docs/developers/symbol-information.html)
I think it's probably better to just not guess

tanishiking — 11/15/2022 5:28 PM
For that purpose, I would recommend https://github.com/wartremover/wartremover and access Scala compiler APIs,
Maybe we can list all parent symbols with the way as you mentioned (as long as the symbols are defined in your workspace), but I think there's no way to check what is imported by import xxx._ from scalafix (SemanticDB) (we can guess that, but it doesn't work in some cases).

bjaglin — 11/15/2022 5:33 PM
you can also access presentation compiler APIs from Scalafix rules, but there is a lot of boilerplate and your rule won't be portable as it needs to be run with the scala version your target files are build against (ExplicitResultTypes and https://github.com/scala/scala-rewrites are using that trick)
GitHub
GitHub - scala/scala-rewrites: Scalafix Rewrites for Scala
Scalafix Rewrites for Scala. Contribute to scala/scala-rewrites development by creating an account on GitHub.
GitHub - scala/scala-rewrites: Scalafix Rewrites for Scala

Will Sargent — 11/15/2022 8:03 PM
Yeah, I think inferring an inscope logger would be more confusing than useful

Will Sargent — 11/16/2022 6:28 PM
I published it 🙂 https://github.com/tersesystems/echopraxia-scalafix


Synthetic Trees

Both TASTy and SemanticDB contain information about synthetically constructed trees.
However, the information that SemanticDB provides about trees is not exhaustive. This is because the SemanticDB tree data structure is designed to be generic and flexible and has only limited kinds of trees. https://scalameta.org/docs/semanticdb/specification.html#tree Not all the trees after typer are available.
(The synthetic trees in SemanticDB are used to show implicit arguments and classes in Metals, IIUC).
These trees are limited in number, but they are convenient for Metals because they allow Metals to show implicit applications without having to traverse the entire typed tree. However, the non-exhaustiveness of these trees makes them unreliable, I think.
Design Principle

The original design principle of SemanticDB was to lower the learning curve for tooling engineers, so that they could develop devtools without having to have deep knowledge of the compiler internals.
Prior to SemanticDB, tooling developers who wanted to traverse the tree and types of a program had to have some knowledge of the compiler internals. SemanticDB makes this process easier by providing a more abstract representation of the program's structure. https://scalameta.org/talks/2018-06-20-HowWeBuiltToolsThatScaleToMillionsOfLoc.pdf
However, I think accessing type information and synthetic trees in SemanticDB still requires some knowledge of the compiler internals. This is because the SemanticDB representation of types is not always the same as the representation used by the compiler. Additionally, the way that synthetic trees are created in SemanticDB is not always obvious.
Overall, I think that SemanticDB has made it easier for tooling engineers to develop devtools in some aspects especially around symbol occurrences, but regarding the type information and synthetic trees, I believe it's still not that easy + confusing because of the gap between SemanticDB and compiler internal.
Symbol Occurrences

I totally agree with the word from @smarter

Generating semanticdb directly from the compiler is probably still useful for things which are expensive to recompute from tasty like symbol occurences

scalameta/scalameta#2403 (comment)
but, if TASTy could embeds the same information to them, or if we could develop a tool that generates the symbol occurrences from TASTy outside the compilation process, it would be enough.
What's holding toolings back from using TASTy?


Scala 2 doesn't emit TASTy

To support various versions of Scala, we can't use TASTy for tooling at this moment.


Moving from SemanticDB to TASTy is just much work for tooling developers.

bishabosha

on Sep 7
Maintainer
also we need currently to add more position information to TASTy to remain compatible (e.g. no positions for end markers currently)
SethTisue

on Sep 12
Collaborator
What's holding toolings back from using TASTy?
Scala 2 doesn't emit TASTy
also Java