站内搜索: 请输入搜索关键词
当前页面: 在线文档首页 > NetBeans API Javadoc (Current Development Version)

Lexer - NetBeans Architecture Questions - NetBeans API Javadoc (Current Development Version)

NetBeans Architecture Answers for Lexer module


Interfaces table

Group of java interfaces
Interface NameIn/OutStabilitySpecified in What Document?
LexerAPIExportedOfficial
EditorUtilitiesAPIImportedFriend .../overview-summary.html

The module is needed for compilation. The module is used during runtime. Specification version 1.15 is required.

UtilitiesAPIImportedOfficial../org-openide-util/overview-summary.html

The module is needed for compilation. The module is used during runtime. Specification version 6.4 is required.

Group of logger interfaces
Interface NameIn/OutStabilitySpecified in What Document?
org.netbeans.lib.lexer.TokenHierarchyOperationExportedFriend

FINE level lists lexer changes made in tokens both at the root level and embedded levels of the token hierarchy after each document modification.
FINER level in addition will also check the whole token hierarchy for internal consistency after each modification.

Group of property interfaces
Interface NameIn/OutStabilitySpecified in What Document?
netbeans.debug.lexer.testExportedPrivate

System property to allow deep-compare of token lists - the framework generates and maintains lookahead and state information even for batch-lexed inputs. There are also some additional checks performed that should verify correctness of the framework and the SPI implementation classes being used (for example when flyweight tokens are created the text passed to the token factory is compared to the text in the lexer input).


General Information

    Question (arch-what): What is this project good for?

    Answer: Lexer module provides token lists for various text inputs. Token lists can either be flat or they can form tree token hierarchies if any language embedding is present. Tokens

    Question (arch-overall): Describe the overall architecture.

    Answer: The lexer module defines LexerAPI providing access to sequence of tokens for various input sources.
    An API entry point is TokenHierarchy class with its static methods that provide its instance for the given input source.

    Input Sources

    TokenHierarchy can be created for immutable input sources ( CharSequence or java.io.Reader ) or for mutable input sources (typically javax.swing.text.Document ).
    For mutable input source the lexer framework updates the tokens in the token hierarchy automatically with subsequent changes to the underlying text input. The tokens of the hierarchy always reflect the text of the input at the given time.

    TokenSequence and Token

    TokenHierarchy.tokenSequence() allows to iterate over a list of Token instances.
    The token carries a token identification TokenId (returned by Token.id() ) and a text (aka token body) represented as CharSequence (returned by Token.text() ).
    TokenUtilities contains many useful methods related to operations with the token's text such as TokenUtilities.equals(CharSequence text1, CharSequence text2), TokenUtilities.startsWith(CharSequence text, CharSequence prefix), etc.
    It is also possible to debug the text of the token (replace special chars by escapes) by TokenUtilities.equals(CharSequence text).
    A typical token also carries offset of its occurrence in the input text.

    Flyweight Tokens

    As there are many token occurrences where the token text is the same for all or many occurrences (e.g. java keywords, operators or a single-space whitespace) the memory consumption can be decreased considerably by allowing the creation of flyweight token instances i.e. just one token instance is used for all the token's occurrences in all the inputs.
    Flyweight tokens can be determined by Token.isFlyweight().
    The flyweight tokens do not carry a valid offset (their internal offset is -1).
    Therefore TokenSequence is used for iteration through the tokens (instead of a regular iterator) and it provides TokenSequence.offset() which returns the proper offset even when positioned over a flyweight token.
    When holding a reference to the token's instance its offset can also be determined by Token.offset(TokenHierarchy tokenHierarchy). The tokenHierarchy parameter is usually null which gets the offset for the "live" token hierarchy (a snapshot token hierarchy might be used as the parameter too).
    For flyweight tokens the Token.offset(TokenHierarchy tokenHierarchy) returns -1 and for regular tokens it gives the same value like TokenSequence.offset().

    There may be applications where the flyweight tokens use could be problematic. For example if a parser would like to use token instances in a parse tree nodes to determine the nodes' boundaries then the flyweight tokens would always return offset -1 so the positions of the parse tree nodes could not generally be determined from the tokens only.
    Therefore there is a possibility to de-flyweight a token by using TokenSequence.offsetToken() which checks the current token and if it's flyweight then it replaces it with a non-flyweight token instance with a valid offset and with the same properties as the original flyweight token.

    TokenId and Language

    Token is identified by its id represented by TokenId interface. Token ids for a language are typically implemented as java enums (extensions of Enum ) but it's not mandatory.
    All token ids for the given language are described by Language.
    Each token id may belong to one or more token categories that allow to better operate tokens of the same type (e.g. keywords or operators).
    Each token id may define its primary category TokenId.primaryCategory() and LanguageHierarchy.createTokenCategories() may provide additional categories for the token ids for the given language.
    Each language description has a mandatory mime-type specification Language.mimeType()
    Although it's a bit non-related information it brings many benefits because with the mime-type the language can be accompanied with an arbitrary sort of settings (e.g. syntax coloring information etc.).

    LanguageHierarchy, Lexer, LexerInput and TokenFactory

    SPI providers wishing to provide a Language first need to define its SPI counterpart LanguageHierarchy. It mainly needs to define token ids in LanguageHierarchy.createTokenIds() and lexer in LanguageHierarchy.createLexer(LexerInput lexerInput, TokenFactory tokenFactory, Object state, LanguagePath languagePath, InputAttributes inputAttributes).
    Lexer reads characters from LexerInput and breaks the text into tokens.
    Tokens are produced by using methods of TokenFactory.
    As a per-token memory consumption is critical the Token does not have any counterpart in SPI. However the framework prevents instantiation of any other token classes except those contained in the lexer module's implementation.

    Language Embedding

    With language embedding the flat list of tokens becomes in fact a tree-like hierarchy represented by the TokenHierarchy class. Each token can potentially be broken into a sequence of embedded tokens.
    The TokenSequence.embedded() method can be called to obtain the embedded tokens (when positioned on the branch token).
    There are two ways of specifying what language is embedded in a token. The language can either be specified explicitly (hardcoded) in the LanguageHierarchy.embedding() method or there can be a LanguageProvider registered in the default Lookup, which will create a Language for the embedded language.
    There is no limit on the depth of a language hierarchy and there can be as many embedded languages as needed.
    In SPI the language embedding is represented by LanguageEmbedding.

    Token Hierarchy Snapshots

    Token hierarchy allows to create a snapshot of itself at any given time by using TokenHierarchy.createSnapshot().
    Subsequent modifications to the "live" token hierarchy will not affect tokens of the snapshot.
    Snapshot creation is cheap because initially the snapshot shares all the tokens with the live hierarchy. With subsequent modifications the initial and ending areas (where the tokens are shared between snapshot and live hierarchy) are maintained by the snapshot. In the middle the snapshot retains the tokens originally present in the live token hierarchy.

    Question (arch-usecases): Describe the main use cases of the new API. Who will use it under what circumstances? What kind of code would typically need to be written to use the module?

    Answer:

    API Usecases

    Obtaining of token hierarchy for various inputs.

    The TokenHierarchy is an entry point into Lexer API and it represents the given input in terms of tokens.
        String text = "public void m() { }";
        TokenHierarchy hi = TokenHierarchy.create(text, JavaLanguage.description());
    

    Token hierarchy for swing documents must be operated under read/write document's lock.
        document.readLock();
        try {
            TokenHierarchy hi = TokenHierarchy.get(document);
            ... // explore tokens etc.
        } finally {
            document.readUnlock();
        }
    

    Obtaining and iterating token sequence over particular swing document from the given offset.

    The tokens cover the whole document and it's possible to iterate either forward or backward.
    Each token can contain language embedding that can also be explored by the token sequence. The language embedding covers the whole text of the token (there can be few characters skipped at the begining an end of the branch token).
        document.readLock();
        try {
            TokenHierarchy hi = TokenHierarchy.get(document);
            TokenSequence ts = hi.tokenSequence();
            // If necessary move ts to the requested offset
            ts.move(offset);
            while (ts.moveNext()) {
                Token t = ts.token();
                if (t.id() == ...) { ... }
                if (TokenUtilities.equals(t.text(), "mytext")) { ... }
                if (ts.offset() == ...) { ... }
    
                // Possibly retrieve embedded token sequence
                TokenSequence embedded = ts.embedded();
                if (embedded != null) { // Token has a valid language embedding
                    ...
                }
            }
        } finally {
            document.readUnlock();
        }
    

    Typical clients:
    • Editor's painting code doing syntax coloring org.netbeans.modules.lexer.editorbridge.LexerLayer in lexer/editorbridge module.
    • Brace matching code searching for matching brace in forward/backward direction.
    • Code completion's quick check whether caret is located inside comment token.
    • Parser constructing a parse tree iterating through the tokens in forward direction.

    Using language path of the token sequence

    For the given token sequence the client may check whether it's a top level token sequence in the token hierarchy or whether it's embedded at which level it's embedded and what are the parent languages.
    Each token can contain language embedding that can also be explored by the token sequence. The language embedding covers the whole text of the token (there can be few characters skipped at the begining an end of the branch token).
        TokenSequence ts = ...
        LanguagePath lp = ts.languagePath();
        if (lp.size() > 1) { ... } // This is embedded token sequence
        if (lp.topLanguage() == JavaLanguage.description()) { ... } // top-level language of the token hierarchy
        String mimePath = lp.mimePath();
        Object setting-value = some-settings.getSetting(mimePath, setting-name);
    

    Creating token hierarchy snapshot for token hierarchy over a mutable input

    Token hirerarchy snapshots allow to create a snapshot of the token hierarchy at the given time and guarantee that it will not be affected by subsequent input text modifications.
    Token snapshot is represented as a TokenHierarchy instance so the creation of token sequence etc. is the same like for regular token hierarchy.
        private TokenHierarchy snapshot;
        
        document.readLock();
        try {
            TokenHierarchy hi = TokenHierarchy.get(document);
            snapshot = hi.createSnapshot();
            // Possible later modifications will not affect tokens of the snapshot
        } finally {
            document.readUnlock();
        }
    
        ...
    
        document.readLock();
        try {
            TokenSequence ts = snapshot.tokenSequence();
            ...
        } finally {
            document.readUnlock();
        }
    

    Typical clients:
    • Parser constructing a parse tree. The parser may retain "last good snapshot" for the edited file - a snapshot when it was possible to parse the document without errors.

    Extra information about the input

    The InputAttributes class may carry extra information about the text input on which the token hierarchy is being created. For example there can be information about the version of the language that the input represents and the lexer may be written to recognize multiple versions of the language. It should suffice to do the versioning through a simple integer:
    public class MyLexer implements Lexer<MyTokenId> {
        
        private final int version;
        
        ...
        
        public MyLexer(LexerInput input, TokenFactory<MyTokenId> tokenFactory, Object state,
        LanguagePath languagePath, InputAttributes inputAttributes) {
            ...
            
            Integer ver = (inputAttributes != null)
                    ? (Integer)inputAttributes.getValue(languagePath, "version")
                    : null;
            this.version = (ver != null) ? ver.intValue() : 1; // Use version 1 if not specified explicitly
        }
        
        public Token<MyTokenId> nextToken() {
            ...
            if (recognized-assert-keyword) {
                return (version >= 4) { // "assert" recognized as keyword since version 4
                    ? keyword(MyTokenId.ASSERT)
                    : identifier();
            }
            ...
        }
        ...
    }
    
    The client will then use the following code:
        InputAttributes attrs = new InputAttributes();
        // The "true" means global value i.e. for any occurrence of the MyLanguage including embeddings
        attrs.setValue(MyLanguage.description(), "version", Integer.valueOf(3), true);
        TokenHierarchy hi = TokenHierarchy.create(text, false, SimpleLanguage.description(), null, attrs);
        ...
    

    Filtering out unnecessary tokens

    Filtering is only possible for immutable inputs (e.g. String or Reader).
        Set<MyTokenId> skipIds = EnumSet.of(MyTokenId.COMMENT, MyTokenId.WHITESPACE);
        TokenHierarchy tokenHierarchy = TokenHierarchy.create(inputText, false,
            MyLanguage.description(), skipIds, null);
        ...
    

    Typical clients:
    • Parser constructing a parse tree. It is not interested in the comment and whitespace tokens so these tokens do not need to be constructed at all.

    SPI Usecases

    Providing language description and lexer.

    Token ids should be defined as enums. For example org.netbeans.lib.lexer.test.simple.SimpleTokenId can be copied or the following example from org.netbeans.modules.lexer.editorbridge.calc.lang.CalcTokenId.
    The static language() method returns the language describing the token ids.
    public enum CalcTokenId implements TokenId {
    
        WHITESPACE(null, "whitespace"),
        SL_COMMENT(null, "comment"),
        ML_COMMENT(null, "comment"),
        E("e", "keyword"),
        PI("pi", "keyword"),
        IDENTIFIER(null, null),
        INT_LITERAL(null, "number"),
        FLOAT_LITERAL(null, "number"),
        PLUS("+", "operator"),
        MINUS("-", "operator"),
        STAR("*", "operator"),
        SLASH("/", "operator"),
        LPAREN("(", "separator"),
        RPAREN(")", "separator"),
        ERROR(null, "error"),
        ML_COMMENT_INCOMPLETE(null, "comment");
    
    
        private final String fixedText;
    
        private final String primaryCategory;
    
        private CalcTokenId(String fixedText, String primaryCategory) {
            this.fixedText = fixedText;
            this.primaryCategory = primaryCategory;
        }
        
        public String fixedText() {
            return fixedText;
        }
    
        public String primaryCategory() {
            return primaryCategory;
        }
    
        private static final Language<CalcTokenId> language = new LanguageHierarchy<CalcTokenId>() {
            @Override
            protected Collection<CalcTokenId> createTokenIds() {
                return EnumSet.allOf(CalcTokenId.class);
            }
            
            @Override
            protected Map<String,Collection<CalcTokenId>> createTokenCategories() {
                Map<String,Collection<CalcTokenId>> cats = new HashMap<String,Collection<CalcTokenId>>();
    
                // Incomplete literals 
                cats.put("incomplete", EnumSet.of(CalcTokenId.ML_COMMENT_INCOMPLETE));
                // Additional literals being a lexical error
                cats.put("error", EnumSet.of(CalcTokenId.ML_COMMENT_INCOMPLETE));
                
                return cats;
            }
    
            @Override
            protected Lexer<CalcTokenId> createLexer(LexerRestartInfo<CalcTokenId> info) {
                return new CalcLexer(info);
            }
    
            @Override
            protected String mimeType() {
                return "text/x-calc";
            }
            
        }.language();
    
        public static final Language<CalcTokenId> language() {
            return language;
        }
    
    }
    
    Note that it is not needed to publish the underlying LanguageHierarchy extension.
    Lexer example:
    public final class CalcLexer implements Lexer<CalcTokenId> {
    
        private static final int EOF = LexerInput.EOF;
    
        private static final Map<String,CalcTokenId> keywords = new HashMap<String,CalcTokenId>();
        static {
            keywords.put(CalcTokenId.E.fixedText(), CalcTokenId.E);
            keywords.put(CalcTokenId.PI.fixedText(), CalcTokenId.PI);
        }
        
        private LexerInput input;
        
        private TokenFactory<CalcTokenId> tokenFactory;
    
        CalcLexer(LexerRestartInfo<CalcTokenId> info) {
            this.input = info.input();
            this.tokenFactory = info.tokenFactory();
            assert (info.state() == null); // passed argument always null
        }
        
        public Token<CalcTokenId> nextToken() {
            while (true) {
                int ch = input.read();
                switch (ch) {
                    case '+':
                        return token(CalcTokenId.PLUS);
    
                    case '-':
                        return token(CalcTokenId.MINUS);
    
                    case '*':
                        return token(CalcTokenId.STAR);
    
                    case '/':
                        switch (input.read()) {
                            case '/': // in single-line comment
                                while (true)
                                    switch (input.read()) {
                                        case '\r': input.consumeNewline();
                                        case '\n':
                                        case EOF:
                                            return token(CalcTokenId.SL_COMMENT);
                                    }
                            case '*': // in multi-line comment
                                while (true) {
                                    ch = input.read();
                                    while (ch == '*') {
                                        ch = input.read();
                                        if (ch == '/')
                                            return token(CalcTokenId.ML_COMMENT);
                                        else if (ch == EOF)
                                            return token(CalcTokenId.ML_COMMENT_INCOMPLETE);
                                    }
                                    if (ch == EOF)
                                        return token(CalcTokenId.ML_COMMENT_INCOMPLETE);
                                }
                        }
                        input.backup(1);
                        return token(CalcTokenId.SLASH);
    
                    case '(':
                        return token(CalcTokenId.LPAREN);
    
                    case ')':
                        return token(CalcTokenId.RPAREN);
    
                    case '0': case '1': case '2': case '3': case '4':
                    case '5': case '6': case '7': case '8': case '9':
                    case '.':
                        return finishIntOrFloatLiteral(ch);
    
                    case EOF:
                        return null;
    
                    default:
                        if (Character.isWhitespace((char)ch)) {
                            ch = input.read();
                            while (ch != EOF && Character.isWhitespace((char)ch)) {
                                ch = input.read();
                            }
                            input.backup(1);
                            return token(CalcTokenId.WHITESPACE);
                        }
    
                        if (Character.isLetter((char)ch)) { // identifier or keyword
                            while (true) {
                                if (ch == EOF || !Character.isLetter((char)ch)) {
                                    input.backup(1); // backup the extra char (or EOF)
                                    // Check for keywords
                                    CalcTokenId id = keywords.get(input.readText());
                                    if (id == null) {
                                        id = CalcTokenId.IDENTIFIER;
                                    }
                                    return token(id);
                                }
                                ch = input.read(); // read next char
                            }
                        }
    
                        return token(CalcTokenId.ERROR);
                }
            }
        }
    
        public Object state() {
            return null;
        }
    
        private Token<CalcTokenId> finishIntOrFloatLiteral(int ch) {
            boolean floatLiteral = false;
            boolean inExponent = false;
            while (true) {
                switch (ch) {
                    case '.':
                        if (floatLiteral) {
                            return token(CalcTokenId.FLOAT_LITERAL);
                        } else {
                            floatLiteral = true;
                        }
                        break;
                    case '0': case '1': case '2': case '3': case '4':
                    case '5': case '6': case '7': case '8': case '9':
                        break;
                    case 'e': case 'E': // exponent part
                        if (inExponent) {
                            return token(CalcTokenId.FLOAT_LITERAL);
                        } else {
                            floatLiteral = true;
                            inExponent = true;
                        }
                        break;
                    default:
                        input.backup(1);
                        return token(floatLiteral ? CalcTokenId.FLOAT_LITERAL
                                : CalcTokenId.INT_LITERAL);
                }
                ch = input.read();
            }
        }
        
        private Token<CalcTokenId> token(CalcTokenId id) {
            return (id.fixedText() != null)
                ? tokenFactory.getFlyweightToken(id, id.fixedText())
                : tokenFactory.createToken(id);
        }
    
    }
    

    The classes containing token ids and the language description should be part of an API. The lexer should only be part of the implementation.

    Providing language embedding.

    The embedding may be provided statically in the LanguageHierarchy.embedding() see e.g. org.netbeans.lib.lexer.test.simple.SimpleLanguage.

    Or it may be provided dynamically through the xml layer by using file named with the token-id's name with ".instance" suffix located in "Editors/language-mime-type/embed" folder. The file should instantiate the language description for the embedded language.

    Question (arch-time): What are the time estimates of the work?

    Answer: The present implementation is stable but there are few missing implementations and other things to be considered:
    • Dynamic language embedding binding through xml layer.
    • CharPreprocessor servicing and tests.
    • TokenHierarchy snapshots correctness (will provide random unit tests; one unit test is currently failing).
    • Token hierarchy for Reader.
    • TokenFactory.createBranchToken() impl.
    • Providing JavaCC and Antlr support.
    • Support for token positions (may add API).

    Question (arch-quality): How will the quality of your code be tested and how are future regressions going to be prevented?

    Answer: The lexer module is completely unit-testable.
    Besides of tests for its own correctness it also contains support for testing of correctness of lexers from SPI providers by using org.netbeans.lib.lexer.test.TestRandomModify class.
    The main testing method for the lexer correctnes is token-by-token comparing of the updated token sequence with a batch-lexed token sequence for the same input.

    Question (arch-where): Where one can find sources for your module?

    Answer:

    The sources for the module are in NetBeans CVS in lexer directory.


Project and platform dependencies

    Question (dep-nb): What other NetBeans projects and modules does this one depend on?

    Answer:

    These modules are required in project.xml:

    • EditorUtilitiesAPI - The module is needed for compilation. The module is used during runtime. Specification version 1.15 is required.
    • UtilitiesAPI - The module is needed for compilation. The module is used during runtime. Specification version 6.4 is required.

    Question (dep-non-nb): What other projects outside NetBeans does this one depend on?

    Answer: No other projects.

    Question (dep-platform): On which platforms does your module run? Does it run in the same way on each?

    Answer: All platforms.

    Question (dep-jre): Which version of JRE do you need (1.2, 1.3, 1.4, etc.)?

    Answer: JDK1.4 and higher can be used.

    Question (dep-jrejdk): Do you require the JDK or is the JRE enough?

    Answer: JRE is sufficient.

Deployment

    Question (deploy-jar): Do you deploy just module JAR file(s) or other files as well?

    Answer: No additional files.

    Question (deploy-nbm): Can you deploy an NBM via the Update Center?

    Answer: Yes.

    Question (deploy-shared): Do you need to be installed in the shared location only, or in the user directory only, or can your module be installed anywhere?

    Answer: Anywhere.

    Question (deploy-packages): Are packages of your module made inaccessible by not declaring them public?

    Answer: Yes, where appropriate.

    Question (deploy-dependencies): What do other modules need to do to declare a dependency on this one, in addition to or instead of the normal module dependency declaration (e.g. tokens to require)?

    Answer:
    OpenIDE-Module-Module-Dependencies: org.netbeans.modules.lexer/2 > @SPECIFICATION-VERSION@
    

Compatibility with environment

    Question (compat-i18n): Is your module correctly internationalized?

    Answer: Yes.

    Question (compat-standards): Does the module implement or define any standards? Is the implementation exact or does it deviate somehow?

    Answer: Compatible with standards.

    Question (compat-version): Can your module coexist with earlier and future versions of itself? Can you correctly read all old settings? Will future versions be able to read your current settings? Can you read or politely ignore settings stored by a future version?

    Answer: Yes.

    Question (compat-deprecation): How the introduction of your project influences functionality provided by previous version of the product?

    Answer:

    The current API completely replaces the original one therefore the major version of the module was increased from 1 to 2.
    There are no plans to deprecated any part of the present API and it should be evolved in a compatible way.


Access to resources

    Question (resources-file): Does your module use java.io.File directly?

    Answer: No.

    Question (resources-layer): Does your module provide own layer? Does it create any files or folders in it? What it is trying to communicate by that and with which components?

    Answer: No.

    Question (resources-read): Does your module read any resources from layers? For what purpose?

    Answer: No.

    Question (resources-mask): Does your module mask/hide/override any resources provided by other modules in their layers?

    Answer: No.

    Question (resources-preferences): Does your module uses preferences via Preferences API? Does your module use NbPreferences or or regular JDK Preferences ? Does it read, write or both ? Does it share preferences with other modules ? If so, then why ?

    Answer:

    No.


Lookup of components


Execution Environment

    Question (exec-property): Is execution of your code influenced by any environment or Java system (System.getProperty) property? On a similar note, is there something interesting that you pass to java.util.logging.Logger? Or do you observe what others log?

    Answer: org.netbeans.lib.lexer.TokenHierarchyOperation - FINE level lists lexer changes made in tokens both at the root level and embedded levels of the token hierarchy after each document modification.
    FINER level in addition will also check the whole token hierarchy for internal consistency after each modification. netbeans.debug.lexer.test - System property to allow deep-compare of token lists - the framework generates and maintains lookahead and state information even for batch-lexed inputs. There are also some additional checks performed that should verify correctness of the framework and the SPI implementation classes being used (for example when flyweight tokens are created the text passed to the token factory is compared to the text in the lexer input).

    Question (exec-component): Is execution of your code influenced by any (string) property of any of your components?

    Answer: No.

    Question (exec-ant-tasks): Do you define or register any ant tasks that other can use?

    Answer: No.

    Question (exec-classloader): Does your code create its own class loader(s)?

    Answer: No.

    Question (exec-reflection): Does your code use Java Reflection to execute other code?

    Answer: No.

    Question (exec-privateaccess): Are you aware of any other parts of the system calling some of your methods by reflection?

    Answer: No.

    Question (exec-process): Do you execute an external process from your module? How do you ensure that the result is the same on different platforms? Do you parse output? Do you depend on result code?

    Answer: No.

    Question (exec-introspection): Does your module use any kind of runtime type information (instanceof, work with java.lang.Class, etc.)?

    Answer: No.

    Question (exec-threading): What threading models, if any, does your module adhere to? How the project behaves with respect to threading?

    Answer: Use of token hierarchies for mutable input sources must adhere to the locking mechanisms for the input sources themselves.
    For example accessing token hierarchy for swing document requires read/write locking of document prior accessing token hierarchy.

    Question (security-policy): Does your functionality require modifications to the standard policy file?

    Answer: No.

    Question (security-grant): Does your code grant additional rights to some other code?

    Answer: No.

Format of files and protocols

    Question (format-types): Which protocols and file formats (if any) does your module read or write on disk, or transmit or receive over the network? Do you generate an ant build script? Can it be edited and modified?

    Answer: No files read or written to the disk.

    Question (format-dnd): Which protocols (if any) does your code understand during Drag & Drop?

    Answer: No D&D.

    Question (format-clipboard): Which data flavors (if any) does your code read from or insert to the clipboard (by access to clipboard on means calling methods on java.awt.datatransfer.Transferable?

    Answer: No clipboard support.

Performance and Scalability

    Question (perf-startup): Does your module run any code on startup?

    Answer: No.

    Question (perf-exit): Does your module run any code on exit?

    Answer: No.

    Question (perf-scale): Which external criteria influence the performance of your program (size of file in editor, number of files in menu, in source directory, etc.) and how well your code scales?

    Answer: On a typical machine the framework is able to produce about 370,000 tokens of a text input with 1 million characters in less than 0.5 second.

    Question (perf-limit): Are there any hard-coded or practical limits in the number or size of elements your code can handle?

    Answer: No practical limits.

    Question (perf-mem): How much memory does your component consume? Estimate with a relation to the number of windows, etc.

    Answer: Memory consumption is critical for created tokens because there can be thousands of tokens per typical document. Thus there are several basic token types:
    • DefaultToken: 24 bytes
    • StringToken: 32 bytes (but only used for flyweight tokens)
    • PrepToken: 32 bytes plus text storage size (but only used for tokens where character preprocessing was necessary)

    Question (perf-wakeup): Does any piece of your code wake up periodically and do something even when the system is otherwise idle (no user interaction)?

    Answer: No.

    Question (perf-progress): Does your module execute any long-running tasks?

    Answer: All the tasks should be granularized. Both batch and incremental lexing is done lazily as clients ask for tokens.
    The only potential long-running task is relexing of a very long portion of documents e.g. if someone would type '/*' at the begining of java document without any comments - the whole document turns into unclosed comment.
    This typically isn't a problem unless the very long token does not need to be lexed several times (the original support without permanent tokens had to lex the token upon each request).
    The lexer framework further helps to improve the situation by introducing token validation which attempts to validate the token by checking whether the typed character may really affect the token or whether it's just necessary to fix the original token's length.

    Question (perf-huge_dialogs): Does your module contain any dialogs or wizards with a large number of GUI controls such as combo boxes, lists, trees, or text areas?

    Answer: No.

    Question (perf-menus): Does your module use dynamically updated context menus, or context-sensitive actions with complicated and slow enablement logic?

    Answer: No.

    Question (perf-spi): How the performance of the plugged in code will be enforced?

    Answer: The token change listeners implementations should be written to execute quickly. For complex tasks they should reschedule its work into another thread.

Built on May 28 2007.  |  Portions Copyright 1997-2007 Sun Microsystems, Inc. All rights reserved.