Lexer (Lexer) - NetBeans API Javadoc (Current Development Version)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

org.netbeans.modules.lexer/2 1.19.0 1

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.netbeans.spi.lexer
Interface Lexer<T extends TokenId>

public interface Lexer<T extends TokenId>

Lexer reads input characters from LexerInput and groups them into tokens.
The lexer delegates token creation to TokenFactory.createToken(TokenId). Token factory instance should be given to the lexer in its constructor.

The lexer must be able to express its internal lexing state at token boundaries and it must be able to restart lexing from such state.
It is expected that if the input characters following the restart point would not change then the lexer will return the same tokens regardless whether it was restarted at the restart point or run from the input begining as a batch lexer.

Testing of the lexers:
Testing of newly written lexers can be performed in several ways. The most simple way is to test batch lexing first (see e.g. org.netbeans.lib.lexer.test.simple.SimpleLexerBatchTest in lexer module tests).
Then an "incremental" behavior of the new lexer can be tested (see e.g. org.netbeans.lib.lexer.test.simple.SimpleLexerIncTest).
Finally the lexer can be tested by random tests that randomly insert and remove characters from the document (see e.g. org.netbeans.lib.lexer.test.simple.SimpleLexerRandomTest).
Once these tests pass the lexer can be considered stable.

Method Summary

Token<T> nextToken()
          Return a token based on characters of the input and possibly additional input properties.

void release()
          Infrastructure calls this method when it no longer needs this lexer for lexing so it becomes unused.

Object state()
          This method is called by lexer's infrastructure to return present lexer's state once the lexer has recognized and returned a token.

Method Summary
`Token<T>`	`nextToken()` Return a token based on characters of the input and possibly additional input properties.
`void`	`release()` Infrastructure calls this method when it no longer needs this lexer for lexing so it becomes unused.
`Object`	`state()` This method is called by lexer's infrastructure to return present lexer's state once the lexer has recognized and returned a token.

Method Detail

nextToken

Token<T> nextToken()

Return a token based on characters of the input and possibly additional input properties.
Characters can be read by using LexerInput.read() method. Once the lexer knows that it has read enough characters to recognize a token it calls TokenFactory.createToken(TokenId) to obtain an instance of a Token and then returns it.

Note: Lexer must *not* return any other Token instances than those obtained from the TokenFactory.

The lexer is required to tokenize all the characters (except EOF) provided by the LexerInput prior to returning null from this method. Not doing so is treated as malfunctioning of the lexer.

Returns:: token recognized by the lexer or null if there are no more characters (available in the input) to be tokenized.
Return TokenFactory.SKIP_TOKEN if the token should be skipped because of a token filter.
Throws:: IllegalStateException - if the token instance created by the lexer was not created by the methods of TokenFactory (there is a common superclass for those token implementations).; IllegalStateException - if this method returns null but not all the characters of the lexer input were tokenized.

state

Object state()

This method is called by lexer's infrastructure to return present lexer's state once the lexer has recognized and returned a token.
In mutable environment this method is called after each recognized token and its result is paired (together with token's lookahead) with the token for later use - when lexer needs to be restarted at the token boundary.

If the lexer is in no extra state (it is in a default state) it should return null. Most lexers are in the default state only at all the time.
If possible the non-default lexer states should be expressed as small non-negative integers. The LexerInput.integerState(int) can be used for convenience.
There is an optimization that shrinks the storage costs for small java.lang.Integers to single bytes.

The returned value should not be tied to this particular lexer instance in any way. Another lexer instance may be restarted from this state later.

Returns:: valid state object or null if the lexer is in a default state.

release

void release()

Infrastructure calls this method when it no longer needs this lexer for lexing so it becomes unused.
If lexer instances are cached and reused later then this method should first release all the references that might cause memory leaks and then add this unused lexer to the cache.