https://www.alternetsoft.com/blog/code-parsing-explained

Code Parsing Explained

Explains various approaches to syntax and semantic analysis for C#, Visual Basic, JavaScript, TypeScript, Python and other programming languages.
12 Apr 2021 7 min read

Syntax parsing is the process of analyzing code to understand its structure and meaning. This allows AlterNET Studio’s code editor to provide advanced features, such as syntax highlighting, auto-completion, code formatting and code outlining, significantly enhancing the coding experience.

Syntax parsing tasks are carried out by specialized SyntaxParser components within AlterNET Studio. These components play a crucial role in enabling the advanced code editing features.

Here’s a breakdown of the different parsing methods used in AlterNET Studio:

Generic Parsers

These are the simplest parsers, mainly used for basic syntax highlighting of text. They rely on rules and regular expressions to identify different elements in the code (e.g., keywords, identifiers, comments).

For more information on generic parsers, including creating your own syntax scheme, refer to the Code Editor user guide on our documentation page: Creating Your Own Syntax Scheme.

TextMate grammar support

In AlterNET Studio version 9, we’ve upgraded our Generic parser engine to support TextMate language grammars, that power syntax highlighting in Visual Studio Code. This enhancement enables additional features beyond syntax highlighting, such as automatic brace matching and indentation-based code folding.

By adopting TextMate grammars, we’ve made it possible to use all syntax schemes developed for Visual Studio Code directly within our Code Editor. This significantly expands the range of available syntax highlighting styles and customization options.

Read our blog for more information regarding TextMate parser.

Generic Parsers

Advanced Parsers

While similar to generic parsers in their use of finite-state automation for lexical analysis, advanced parsers employ hard-coded routines for improved performance.

Parsing for various languages

We’ve developed advanced parsers for a wide range of programming languages, including C#, Visual Basic, Python, Java, JavaScript, SQL, XML, and HTML. These parsers go beyond basic syntax highlighting by analyzing the code’s structure to build an Abstract Syntax Tree (AST). This AST representation enables features like code outlining, syntax guidelines, smart formatting, and visual feedback on syntax errors.

Semantic Analysis

Features like Intellisense (Code Completion), finding declarations and references, and alike require additional semantic information about symbols in the text.

For instance, if the code contains a variable declaration like var myString = “text”, semantic analysis determines that myString is a variable of type string and links it to the string symbol containing all declared methods. This information is then used for tasks like code completion, as when the user types myString. in the editor.

Advanced Parsers

Advanced Code Completion

Some advanced parsers, like those for C# and Visual Basic, support a more sophisticated form of code completion. They can resolve semantic information within a specific scope, such as a statement block or expression, as the user types special characters like “.” or “(”. This provides more contextually relevant suggestions.

Python and IronPython proprietary implementation

For Python and IronPython, we’ve developed our own semantic analysis implementation. Unlike some parsers that focus on a partial scope, our approach builds a semantic model of the entire text displayed in the editor, including imported files. This comprehensive understanding enables more accurate and context-aware code completion.

This semantic analysis implementation was inspired by the Microsoft Code Analysis (“Roslyn”) API. We’ll delve deeper into this API in the following section.

XML code completon implementation

Code Completion for XML parser is is implemented by identifying the corresponding XmlSchema type at the current cursor position and suggesting possible input values for elements and attributes. Read our blog for more information regarding XML code completion.

Advanced Parsers

C# and Visual Basic Roslyn parsers

While we strive to provide comprehensive language support with our own parsers, there’s a clear benefit to utilizing the same methods used by leading development tools like Visual Studio and Visual Studio Code.

This is where Microsoft’s open-source project, the .NET Compiler Platform ("Roslyn"), comes in. Roslyn offers open-source C# and Visual Basic compilers with a powerful code analysis API. By integrating this API in our next-generation C# and Visual Basic parsers, we achieve the same level of parsing accuracy and functionality as those found in native tools.

Leveraging industry-standard APIs for C# and Visual Basic

The Roslyn API encompasses a wide range of capabilities, including syntax highlighting, error diagnostics, building AST, code completion service, finding declarations and references, and much more.

This comprehensive approach ensures the most accurate and feature-rich parsing experience for C# and Visual Basic code within AlterNET Studio.

Unlocking Advanced Functionality

While Roslyn provides a comprehensive set of features, certain functionalities, such as signature help, code fixes, and code refactoring, are implemented internally within the API. This restricts direct access for external applications.

To overcome these limitations, we’re utilizing Reflection. This technique allows us to dynamically examine and interact with the internal structure of an assembly at runtime. Reflection has proven effective in implementing signature help tooltips and shows promise for unlocking advanced features like code fixes and code refactoring in our C# and Visual Basic parsers.

Code parsing with industrial-grade APIs

TypeScript/JavaScript parser

Similar to Roslyn-based parsers, we leverage the Microsoft TypeScript API, which shares a close resemblance to the Roslyn API for TypeScript/JavaScript parsers. This API provides a rich set of features that are essential for advanced code editing functionalities. Many of the APIs we require for features like code completion, smart formatting, code fixes, and refactoring are readily accessible and already integrated into our Code Editor.

Code parsing with industrial-grade APIs

LangServer protocol-based parsers

The Language Server protocol (LSP) serves as a communication bridge between a development tool (the client) and a language-specific intelligence provider (the server). This protocol enables the integration of features like auto-complete, go-to-definition, and find all references within the development tool.

While most tools may implement a subset of the LSP specification, our parsers effectively utilize LSP to deliver a coding experience comparable to native tools.

Server installation options

To use LSP-based parsers, a language server must be installed on the target machine. We offer two options to accommodate different preferences:

  • Existing Server: If a language server is already installed, our parser can leverage it directly.

  • Embedded Server: For users without a pre-installed server, we provide a version of the parser that embeds all necessary language server-related files. This option ensures a seamless setup experience.

Code parsing with LangServer.org protocol

Expanding language support

We currently offer LSP-based parsers for C/C++, Python, Lua, and PowerShell, XML, Java an R. These parsers are ready to be used within AlterNET Studio. Additionally, we’re actively exploring support for other languages using the Language Server Protocol. This will further expand the range of languages that can benefit from advanced code editing features.