1 files changed, 1251 insertions, 1071 deletions
diff --git a/lua/lexers/lexer.lua b/lua/lexers/lexer.lua
index 2973ea6..15ff432 100644
--- a/lua/lexers/lexer.lua
+++ b/lua/lexers/lexer.lua
@@ -1,602 +1,464 @@
--- Copyright 2006-2017 Mitchell mitchell.att.foicica.com. See LICENSE.
+-- Copyright 2006-2022 Mitchell. See LICENSE.
 
 local M = {}
 
 --[=[ This comment is for LuaDoc.
 ---
--- Lexes Scintilla documents with Lua and LPeg.
---
--- ## Overview
---
--- Lexers highlight the syntax of source code. Scintilla (the editing component
--- behind [Textadept][] and [SciTE][]) traditionally uses static, compiled C++
--- lexers which are notoriously difficult to create and/or extend. On the other
--- hand, Lua makes it easy to to rapidly create new lexers, extend existing
--- ones, and embed lexers within one another. Lua lexers tend to be more
--- readable than C++ lexers too.
---
--- Lexers are Parsing Expression Grammars, or PEGs, composed with the Lua
--- [LPeg library][]. The following table comes from the LPeg documentation and
--- summarizes all you need to know about constructing basic LPeg patterns. This
--- module provides convenience functions for creating and working with other
--- more advanced patterns and concepts.
---
--- Operator             | Description
--- ---------------------|------------
--- `lpeg.P(string)`     | Matches `string` literally.
--- `lpeg.P(`_`n`_`)`    | Matches exactly _`n`_ characters.
--- `lpeg.S(string)`     | Matches any character in set `string`.
--- `lpeg.R("`_`xy`_`")` | Matches any character between range `x` and `y`.
--- `patt^`_`n`_         | Matches at least _`n`_ repetitions of `patt`.
--- `patt^-`_`n`_        | Matches at most _`n`_ repetitions of `patt`.
--- `patt1 * patt2`      | Matches `patt1` followed by `patt2`.
--- `patt1 + patt2`      | Matches `patt1` or `patt2` (ordered choice).
--- `patt1 - patt2`      | Matches `patt1` if `patt2` does not match.
--- `-patt`              | Equivalent to `("" - patt)`.
--- `#patt`              | Matches `patt` but consumes no input.
---
--- The first part of this document deals with rapidly constructing a simple
--- lexer. The next part deals with more advanced techniques, such as custom
--- coloring and embedding lexers within one another. Following that is a
--- discussion about code folding, or being able to tell Scintilla which code
--- blocks are "foldable" (temporarily hideable from view). After that are
--- instructions on how to use LPeg lexers with the aforementioned Textadept and
--- SciTE editors. Finally there are comments on lexer performance and
--- limitations.
+-- Lexes Scintilla documents and source code with Lua and LPeg.
+--
+-- ### Writing Lua Lexers
+--
+-- Lexers highlight the syntax of source code. Scintilla (the editing component behind
+-- [Textadept][] and [SciTE][]) traditionally uses static, compiled C++ lexers which are
+-- notoriously difficult to create and/or extend. On the other hand, Lua makes it easy to to
+-- rapidly create new lexers, extend existing ones, and embed lexers within one another. Lua
+-- lexers tend to be more readable than C++ lexers too.
+--
+-- Lexers are Parsing Expression Grammars, or PEGs, composed with the Lua [LPeg library][]. The
+-- following table comes from the LPeg documentation and summarizes all you need to know about
+-- constructing basic LPeg patterns. This module provides convenience functions for creating
+-- and working with other more advanced patterns and concepts.
+--
+-- Operator | Description
+-- -|-
+-- `lpeg.P(string)` | Matches `string` literally.
+-- `lpeg.P(`_`n`_`)` | Matches exactly _`n`_ number of characters.
+-- `lpeg.S(string)` | Matches any character in set `string`.
+-- `lpeg.R("`_`xy`_`")`| Matches any character between range `x` and `y`.
+-- `patt^`_`n`_ | Matches at least _`n`_ repetitions of `patt`.
+-- `patt^-`_`n`_ | Matches at most _`n`_ repetitions of `patt`.
+-- `patt1 * patt2` | Matches `patt1` followed by `patt2`.
+-- `patt1 + patt2` | Matches `patt1` or `patt2` (ordered choice).
+-- `patt1 - patt2` | Matches `patt1` if `patt2` does not also match.
+-- `-patt` | Equivalent to `("" - patt)`.
+-- `#patt` | Matches `patt` but consumes no input.
+--
+-- The first part of this document deals with rapidly constructing a simple lexer. The next part
+-- deals with more advanced techniques, such as custom coloring and embedding lexers within one
+-- another. Following that is a discussion about code folding, or being able to tell Scintilla
+-- which code blocks are "foldable" (temporarily hideable from view). After that are instructions
+-- on how to use Lua lexers with the aforementioned Textadept and SciTE editors. Finally there
+-- are comments on lexer performance and limitations.
 --
 -- [LPeg library]: http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html
--- [Textadept]: http://foicica.com/textadept
--- [SciTE]: http://scintilla.org/SciTE.html
+-- [Textadept]: https://orbitalquark.github.io/textadept
+-- [SciTE]: https://scintilla.org/SciTE.html
 --
--- ## Lexer Basics
+-- ### Lexer Basics
 --
--- The *lexers/* directory contains all lexers, including your new one. Before
--- attempting to write one from scratch though, first determine if your
--- programming language is similar to any of the 80+ languages supported. If so,
--- you may be able to copy and modify that lexer, saving some time and effort.
--- The filename of your lexer should be the name of your programming language in
--- lower case followed by a *.lua* extension. For example, a new Lua lexer has
--- the name *lua.lua*.
+-- The *lexers/* directory contains all lexers, including your new one. Before attempting to
+-- write one from scratch though, first determine if your programming language is similar to
+-- any of the 100+ languages supported. If so, you may be able to copy and modify that lexer,
+-- saving some time and effort. The filename of your lexer should be the name of your programming
+-- language in lower case followed by a *.lua* extension. For example, a new Lua lexer has the
+-- name *lua.lua*.
 --
--- Note: Try to refrain from using one-character language names like "c", "d",
--- or "r". For example, Scintillua uses "ansi_c", "dmd", and "rstats",
--- respectively.
+-- Note: Try to refrain from using one-character language names like "c", "d", or "r". For
+-- example, Scintillua uses "ansi_c", "dmd", and "rstats", respectively.
 --
--- ### New Lexer Template
+-- #### New Lexer Template
 --
--- There is a *lexers/template.txt* file that contains a simple template for a
--- new lexer. Feel free to use it, replacing the '?'s with the name of your
--- lexer:
+-- There is a *lexers/template.txt* file that contains a simple template for a new lexer. Feel
+-- free to use it, replacing the '?'s with the name of your lexer. Consider this snippet from
+-- the template:
 --
 --     -- ? LPeg lexer.
 --
---     local l = require('lexer')
---     local token, word_match = l.token, l.word_match
---     local P, R, S = lpeg.P, lpeg.R, lpeg.S
+--     local lexer = require('lexer')
+--     local token, word_match = lexer.token, lexer.word_match
+--     local P, S = lpeg.P, lpeg.S
 --
---     local M = {_NAME = '?'}
+--     local lex = lexer.new('?')
 --
 --     -- Whitespace.
---     local ws = token(l.WHITESPACE, l.space^1)
+--     local ws = token(lexer.WHITESPACE, lexer.space^1)
+--     lex:add_rule('whitespace', ws)
+--
+--     [...]
+--
+--     return lex
+--
+-- The first 3 lines of code simply define often used convenience variables. The fourth and
+-- last lines [define](#lexer.new) and return the lexer object Scintilla uses; they are very
+-- important and must be part of every lexer. The fifth line defines something called a "token",
+-- an essential building block of lexers. You will learn about tokens shortly. The sixth line
+-- defines a lexer grammar rule, which you will learn about later, as well as token styles. (Be
+-- aware that it is common practice to combine these two lines for short rules.)  Note, however,
+-- the `local` prefix in front of variables, which is needed so-as not to affect Lua's global
+-- environment. All in all, this is a minimal, working lexer that you can build on.
+--
+-- #### Tokens
+--
+-- Take a moment to think about your programming language's structure. What kind of key
+-- elements does it have? In the template shown earlier, one predefined element all languages
+-- have is whitespace. Your language probably also has elements like comments, strings, and
+-- keywords. Lexers refer to these elements as "tokens". Tokens are the fundamental "building
+-- blocks" of lexers. Lexers break down source code into tokens for coloring, which results
+-- in the syntax highlighting familiar to you. It is up to you how specific your lexer is
+-- when it comes to tokens. Perhaps only distinguishing between keywords and identifiers is
+-- necessary, or maybe recognizing constants and built-in functions, methods, or libraries is
+-- desirable. The Lua lexer, for example, defines 11 tokens: whitespace, keywords, built-in
+-- functions, constants, built-in libraries, identifiers, strings, comments, numbers, labels,
+-- and operators. Even though constants, built-in functions, and built-in libraries are subsets
+-- of identifiers, Lua programmers find it helpful for the lexer to distinguish between them
+-- all. It is perfectly acceptable to just recognize keywords and identifiers.
+--
+-- In a lexer, tokens consist of a token name and an LPeg pattern that matches a sequence of
+-- characters recognized as an instance of that token. Create tokens using the [`lexer.token()`]()
+-- function. Let us examine the "whitespace" token defined in the template shown earlier:
+--
+--     local ws = token(lexer.WHITESPACE, lexer.space^1)
+--
+-- At first glance, the first argument does not appear to be a string name and the second
+-- argument does not appear to be an LPeg pattern. Perhaps you expected something like:
 --
---     M._rules = {
---       {'whitespace', ws},
---     }
+--     local ws = token('whitespace', S('\t\v\f\n\r ')^1)
 --
---     M._tokenstyles = {
+-- The `lexer` module actually provides a convenient list of common token names and common LPeg
+-- patterns for you to use. Token names include [`lexer.DEFAULT`](), [`lexer.WHITESPACE`](),
+-- [`lexer.COMMENT`](), [`lexer.STRING`](), [`lexer.NUMBER`](), [`lexer.KEYWORD`](),
+-- [`lexer.IDENTIFIER`](), [`lexer.OPERATOR`](), [`lexer.ERROR`](), [`lexer.PREPROCESSOR`](),
+-- [`lexer.CONSTANT`](), [`lexer.VARIABLE`](), [`lexer.FUNCTION`](), [`lexer.CLASS`](),
+-- [`lexer.TYPE`](), [`lexer.LABEL`](), [`lexer.REGEX`](), and [`lexer.EMBEDDED`](). Patterns
+-- include [`lexer.any`](), [`lexer.alpha`](), [`lexer.digit`](), [`lexer.alnum`](),
+-- [`lexer.lower`](), [`lexer.upper`](), [`lexer.xdigit`](), [`lexer.graph`](), [`lexer.print`](),
+-- [`lexer.punct`](), [`lexer.space`](), [`lexer.newline`](), [`lexer.nonnewline`](),
+-- [`lexer.dec_num`](), [`lexer.hex_num`](), [`lexer.oct_num`](), [`lexer.integer`](),
+-- [`lexer.float`](), [`lexer.number`](), and [`lexer.word`](). You may use your own token names
+-- if none of the above fit your language, but an advantage to using predefined token names is
+-- that your lexer's tokens will inherit the universal syntax highlighting color theme used by
+-- your text editor.
+--
+-- ##### Example Tokens
+--
+-- So, how might you define other tokens like keywords, comments, and strings?  Here are some
+-- examples.
 --
---     }
+-- **Keywords**
 --
---     return M
+-- Instead of matching _n_ keywords with _n_ `P('keyword_`_`n`_`')` ordered choices, use another
+-- convenience function: [`lexer.word_match()`](). It is much easier and more efficient to
+-- write word matches like:
 --
--- The first 3 lines of code simply define often used convenience variables. The
--- 5th and last lines define and return the lexer object Scintilla uses; they
--- are very important and must be part of every lexer. The sixth line defines
--- something called a "token", an essential building block of lexers. You will
--- learn about tokens shortly. The rest of the code defines a set of grammar
--- rules and token styles. You will learn about those later. Note, however, the
--- `M.` prefix in front of `_rules` and `_tokenstyles`: not only do these tables
--- belong to their respective lexers, but any non-local variables need the `M.`
--- prefix too so-as not to affect Lua's global environment. All in all, this is
--- a minimal, working lexer that you can build on.
---
--- ### Tokens
---
--- Take a moment to think about your programming language's structure. What kind
--- of key elements does it have? In the template shown earlier, one predefined
--- element all languages have is whitespace. Your language probably also has
--- elements like comments, strings, and keywords. Lexers refer to these elements
--- as "tokens". Tokens are the fundamental "building blocks" of lexers. Lexers
--- break down source code into tokens for coloring, which results in the syntax
--- highlighting familiar to you. It is up to you how specific your lexer is when
--- it comes to tokens. Perhaps only distinguishing between keywords and
--- identifiers is necessary, or maybe recognizing constants and built-in
--- functions, methods, or libraries is desirable. The Lua lexer, for example,
--- defines 11 tokens: whitespace, comments, strings, numbers, keywords, built-in
--- functions, constants, built-in libraries, identifiers, labels, and operators.
--- Even though constants, built-in functions, and built-in libraries are subsets
--- of identifiers, Lua programmers find it helpful for the lexer to distinguish
--- between them all. It is perfectly acceptable to just recognize keywords and
--- identifiers.
---
--- In a lexer, tokens consist of a token name and an LPeg pattern that matches a
--- sequence of characters recognized as an instance of that token. Create tokens
--- using the [`lexer.token()`]() function. Let us examine the "whitespace" token
--- defined in the template shown earlier:
+--     local keyword = token(lexer.KEYWORD, lexer.word_match{
+--       'keyword_1', 'keyword_2', ..., 'keyword_n'
+--     })
 --
---     local ws = token(l.WHITESPACE, l.space^1)
+--     local case_insensitive_keyword = token(lexer.KEYWORD, lexer.word_match({
+--       'KEYWORD_1', 'keyword_2', ..., 'KEYword_n'
+--     }, true))
 --
--- At first glance, the first argument does not appear to be a string name and
--- the second argument does not appear to be an LPeg pattern. Perhaps you
--- expected something like:
+--     local hyphened_keyword = token(lexer.KEYWORD, lexer.word_match{
+--       'keyword-1', 'keyword-2', ..., 'keyword-n'
+--     })
 --
---     local ws = token('whitespace', S('\t\v\f\n\r ')^1)
+-- For short keyword lists, you can use a single string of words. For example:
 --
--- The `lexer` (`l`) module actually provides a convenient list of common token
--- names and common LPeg patterns for you to use. Token names include
--- [`lexer.DEFAULT`](), [`lexer.WHITESPACE`](), [`lexer.COMMENT`](),
--- [`lexer.STRING`](), [`lexer.NUMBER`](), [`lexer.KEYWORD`](),
--- [`lexer.IDENTIFIER`](), [`lexer.OPERATOR`](), [`lexer.ERROR`](),
--- [`lexer.PREPROCESSOR`](), [`lexer.CONSTANT`](), [`lexer.VARIABLE`](),
--- [`lexer.FUNCTION`](), [`lexer.CLASS`](), [`lexer.TYPE`](), [`lexer.LABEL`](),
--- [`lexer.REGEX`](), and [`lexer.EMBEDDED`](). Patterns include
--- [`lexer.any`](), [`lexer.ascii`](), [`lexer.extend`](), [`lexer.alpha`](),
--- [`lexer.digit`](), [`lexer.alnum`](), [`lexer.lower`](), [`lexer.upper`](),
--- [`lexer.xdigit`](), [`lexer.cntrl`](), [`lexer.graph`](), [`lexer.print`](),
--- [`lexer.punct`](), [`lexer.space`](), [`lexer.newline`](),
--- [`lexer.nonnewline`](), [`lexer.nonnewline_esc`](), [`lexer.dec_num`](),
--- [`lexer.hex_num`](), [`lexer.oct_num`](), [`lexer.integer`](),
--- [`lexer.float`](), and [`lexer.word`](). You may use your own token names if
--- none of the above fit your language, but an advantage to using predefined
--- token names is that your lexer's tokens will inherit the universal syntax
--- highlighting color theme used by your text editor.
---
--- #### Example Tokens
---
--- So, how might you define other tokens like comments, strings, and keywords?
--- Here are some examples.
+--     local keyword = token(lexer.KEYWORD, lexer.word_match('key_1 key_2 ... key_n'))
 --
 -- **Comments**
 --
 -- Line-style comments with a prefix character(s) are easy to express with LPeg:
 --
---     local shell_comment = token(l.COMMENT, '#' * l.nonnewline^0)
---     local c_line_comment = token(l.COMMENT, '//' * l.nonnewline_esc^0)
+--     local shell_comment = token(lexer.COMMENT, lexer.to_eol('#'))
+--     local c_line_comment = token(lexer.COMMENT, lexer.to_eol('//', true))
 --
--- The comments above start with a '#' or "//" and go to the end of the line.
--- The second comment recognizes the next line also as a comment if the current
--- line ends with a '\' escape character.
+-- The comments above start with a '#' or "//" and go to the end of the line. The second comment
+-- recognizes the next line also as a comment if the current line ends with a '\' escape character.
 --
--- C-style "block" comments with a start and end delimiter are also easy to
--- express:
+-- C-style "block" comments with a start and end delimiter are also easy to express:
 --
---     local c_comment = token(l.COMMENT, '/*' * (l.any - '*/')^0 * P('*/')^-1)
+--     local c_comment = token(lexer.COMMENT, lexer.range('/*', '*/'))
 --
--- This comment starts with a "/\*" sequence and contains anything up to and
--- including an ending "\*/" sequence. The ending "\*/" is optional so the lexer
--- can recognize unfinished comments as comments and highlight them properly.
+-- This comment starts with a "/\*" sequence and contains anything up to and including an ending
+-- "\*/" sequence. The ending "\*/" is optional so the lexer can recognize unfinished comments
+-- as comments and highlight them properly.
 --
 -- **Strings**
 --
--- It is tempting to think that a string is not much different from the block
--- comment shown above in that both have start and end delimiters:
+-- Most programming languages allow escape sequences in strings such that a sequence like
+-- "\\&quot;" in a double-quoted string indicates that the '&quot;' is not the end of the
+-- string. [`lexer.range()`]() handles escapes inherently.
 --
---     local dq_str = '"' * (l.any - '"')^0 * P('"')^-1
---     local sq_str = "'" * (l.any - "'")^0 * P("'")^-1
---     local simple_string = token(l.STRING, dq_str + sq_str)
+--     local dq_str = lexer.range('"')
+--     local sq_str = lexer.range("'")
+--     local string = token(lexer.STRING, dq_str + sq_str)
 --
--- However, most programming languages allow escape sequences in strings such
--- that a sequence like "\\&quot;" in a double-quoted string indicates that the
--- '&quot;' is not the end of the string. The above token incorrectly matches
--- such a string. Instead, use the [`lexer.delimited_range()`]() convenience
--- function.
+-- In this case, the lexer treats '\' as an escape character in a string sequence.
 --
---     local dq_str = l.delimited_range('"')
---     local sq_str = l.delimited_range("'")
---     local string = token(l.STRING, dq_str + sq_str)
+-- **Numbers**
 --
--- In this case, the lexer treats '\' as an escape character in a string
--- sequence.
+-- Most programming languages have the same format for integer and float tokens, so it might
+-- be as simple as using a predefined LPeg pattern:
 --
--- **Keywords**
+--     local number = token(lexer.NUMBER, lexer.number)
 --
--- Instead of matching _n_ keywords with _n_ `P('keyword_`_`n`_`')` ordered
--- choices, use another convenience function: [`lexer.word_match()`](). It is
--- much easier and more efficient to write word matches like:
+-- However, some languages allow postfix characters on integers.
 --
---     local keyword = token(l.KEYWORD, l.word_match{
---       'keyword_1', 'keyword_2', ..., 'keyword_n'
---     })
+--     local integer = P('-')^-1 * (lexer.dec_num * S('lL')^-1)
+--     local number = token(lexer.NUMBER, lexer.float + lexer.hex_num + integer)
 --
---     local case_insensitive_keyword = token(l.KEYWORD, l.word_match({
---       'KEYWORD_1', 'keyword_2', ..., 'KEYword_n'
---     }, nil, true))
+-- Your language may need other tweaks, but it is up to you how fine-grained you want your
+-- highlighting to be. After all, you are not writing a compiler or interpreter!
 --
---     local hyphened_keyword = token(l.KEYWORD, l.word_match({
---       'keyword-1', 'keyword-2', ..., 'keyword-n'
---     }, '-'))
+-- #### Rules
 --
--- By default, characters considered to be in keywords are in the set of
--- alphanumeric characters and underscores. The last token demonstrates how to
--- allow '-' (hyphen) characters to be in keywords as well.
+-- Programming languages have grammars, which specify valid token structure. For example,
+-- comments usually cannot appear within a string. Grammars consist of rules, which are simply
+-- combinations of tokens. Recall from the lexer template the [`lexer.add_rule()`]() call,
+-- which adds a rule to the lexer's grammar:
 --
--- **Numbers**
+--     lex:add_rule('whitespace', ws)
 --
--- Most programming languages have the same format for integer and float tokens,
--- so it might be as simple as using a couple of predefined LPeg patterns:
+-- Each rule has an associated name, but rule names are completely arbitrary and serve only to
+-- identify and distinguish between different rules. Rule order is important: if text does not
+-- match the first rule added to the grammar, the lexer tries to match the second rule added, and
+-- so on. Right now this lexer simply matches whitespace tokens under a rule named "whitespace".
 --
---     local number = token(l.NUMBER, l.float + l.integer)
+-- To illustrate the importance of rule order, here is an example of a simplified Lua lexer:
 --
--- However, some languages allow postfix characters on integers.
+--     lex:add_rule('whitespace', token(lexer.WHITESPACE, ...))
+--     lex:add_rule('keyword', token(lexer.KEYWORD, ...))
+--     lex:add_rule('identifier', token(lexer.IDENTIFIER, ...))
+--     lex:add_rule('string', token(lexer.STRING, ...))
+--     lex:add_rule('comment', token(lexer.COMMENT, ...))
+--     lex:add_rule('number', token(lexer.NUMBER, ...))
+--     lex:add_rule('label', token(lexer.LABEL, ...))
+--     lex:add_rule('operator', token(lexer.OPERATOR, ...))
 --
---     local integer = P('-')^-1 * (l.dec_num * S('lL')^-1)
---     local number = token(l.NUMBER, l.float + l.hex_num + integer)
+-- Note how identifiers come after keywords. In Lua, as with most programming languages,
+-- the characters allowed in keywords and identifiers are in the same set (alphanumerics
+-- plus underscores). If the lexer added the "identifier" rule before the "keyword" rule,
+-- all keywords would match identifiers and thus incorrectly highlight as identifiers instead
+-- of keywords. The same idea applies to function, constant, etc. tokens that you may want to
+-- distinguish between: their rules should come before identifiers.
 --
--- Your language may need other tweaks, but it is up to you how fine-grained you
--- want your highlighting to be. After all, you are not writing a compiler or
--- interpreter!
+-- So what about text that does not match any rules? For example in Lua, the '!'  character is
+-- meaningless outside a string or comment. Normally the lexer skips over such text. If instead
+-- you want to highlight these "syntax errors", add an additional end rule:
 --
--- ### Rules
+--     lex:add_rule('whitespace', ws)
+--     ...
+--     lex:add_rule('error', token(lexer.ERROR, lexer.any))
 --
--- Programming languages have grammars, which specify valid token structure. For
--- example, comments usually cannot appear within a string. Grammars consist of
--- rules, which are simply combinations of tokens. Recall from the lexer
--- template the `_rules` table, which defines all the rules used by the lexer
--- grammar:
+-- This identifies and highlights any character not matched by an existing rule as a `lexer.ERROR`
+-- token.
 --
---     M._rules = {
---       {'whitespace', ws},
---     }
+-- Even though the rules defined in the examples above contain a single token, rules may
+-- consist of multiple tokens. For example, a rule for an HTML tag could consist of a tag token
+-- followed by an arbitrary number of attribute tokens, allowing the lexer to highlight all
+-- tokens separately. That rule might look something like this:
 --
--- Each entry in a lexer's `_rules` table consists of a rule name and its
--- associated pattern. Rule names are completely arbitrary and serve only to
--- identify and distinguish between different rules. Rule order is important: if
--- text does not match the first rule, the lexer tries the second rule, and so
--- on. This simple grammar says to match whitespace tokens under a rule named
--- "whitespace".
+--     lex:add_rule('tag', tag_start * (ws * attributes)^0 * tag_end^-1)
 --
--- To illustrate the importance of rule order, here is an example of a
--- simplified Lua grammar:
+-- Note however that lexers with complex rules like these are more prone to lose track of their
+-- state, especially if they span multiple lines.
 --
---     M._rules = {
---       {'whitespace', ws},
---       {'keyword', keyword},
---       {'identifier', identifier},
---       {'string', string},
---       {'comment', comment},
---       {'number', number},
---       {'label', label},
---       {'operator', operator},
---     }
+-- #### Summary
 --
--- Note how identifiers come after keywords. In Lua, as with most programming
--- languages, the characters allowed in keywords and identifiers are in the same
--- set (alphanumerics plus underscores). If the lexer specified the "identifier"
--- rule before the "keyword" rule, all keywords would match identifiers and thus
--- incorrectly highlight as identifiers instead of keywords. The same idea
--- applies to function, constant, etc. tokens that you may want to distinguish
--- between: their rules should come before identifiers.
+-- Lexers primarily consist of tokens and grammar rules. At your disposal are a number of
+-- convenience patterns and functions for rapidly creating a lexer. If you choose to use
+-- predefined token names for your tokens, you do not have to define how the lexer highlights
+-- them. The tokens will inherit the default syntax highlighting color theme your editor uses.
 --
--- So what about text that does not match any rules? For example in Lua, the '!'
--- character is meaningless outside a string or comment. Normally the lexer
--- skips over such text. If instead you want to highlight these "syntax errors",
--- add an additional end rule:
+-- ### Advanced Techniques
 --
---     M._rules = {
---       {'whitespace', ws},
---       {'error', token(l.ERROR, l.any)},
---     }
---
--- This identifies and highlights any character not matched by an existing
--- rule as an `lexer.ERROR` token.
---
--- Even though the rules defined in the examples above contain a single token,
--- rules may consist of multiple tokens. For example, a rule for an HTML tag
--- could consist of a tag token followed by an arbitrary number of attribute
--- tokens, allowing the lexer to highlight all tokens separately. The rule might
--- look something like this:
---
---     {'tag', tag_start * (ws * attributes)^0 * tag_end^-1}
---
--- Note however that lexers with complex rules like these are more prone to lose
--- track of their state.
---
--- ### Summary
---
--- Lexers primarily consist of tokens and grammar rules. At your disposal are a
--- number of convenience patterns and functions for rapidly creating a lexer. If
--- you choose to use predefined token names for your tokens, you do not have to
--- define how the lexer highlights them. The tokens will inherit the default
--- syntax highlighting color theme your editor uses.
---
--- ## Advanced Techniques
---
--- ### Styles and Styling
---
--- The most basic form of syntax highlighting is assigning different colors to
--- different tokens. Instead of highlighting with just colors, Scintilla allows
--- for more rich highlighting, or "styling", with different fonts, font sizes,
--- font attributes, and foreground and background colors, just to name a few.
--- The unit of this rich highlighting is called a "style". Styles are simply
--- strings of comma-separated property settings. By default, lexers associate
--- predefined token names like `lexer.WHITESPACE`, `lexer.COMMENT`,
--- `lexer.STRING`, etc. with particular styles as part of a universal color
--- theme. These predefined styles include [`lexer.STYLE_CLASS`](),
--- [`lexer.STYLE_COMMENT`](), [`lexer.STYLE_CONSTANT`](),
--- [`lexer.STYLE_ERROR`](), [`lexer.STYLE_EMBEDDED`](),
--- [`lexer.STYLE_FUNCTION`](), [`lexer.STYLE_IDENTIFIER`](),
--- [`lexer.STYLE_KEYWORD`](), [`lexer.STYLE_LABEL`](), [`lexer.STYLE_NUMBER`](),
--- [`lexer.STYLE_OPERATOR`](), [`lexer.STYLE_PREPROCESSOR`](),
--- [`lexer.STYLE_REGEX`](), [`lexer.STYLE_STRING`](), [`lexer.STYLE_TYPE`](),
--- [`lexer.STYLE_VARIABLE`](), and [`lexer.STYLE_WHITESPACE`](). Like with
--- predefined token names and LPeg patterns, you may define your own styles. At
--- their core, styles are just strings, so you may create new ones and/or modify
--- existing ones. Each style consists of the following comma-separated settings:
---
--- Setting        | Description
--- ---------------|------------
--- font:_name_    | The name of the font the style uses.
--- size:_int_     | The size of the font the style uses.
--- [not]bold      | Whether or not the font face is bold.
--- weight:_int_   | The weight or boldness of a font, between 1 and 999.
--- [not]italics   | Whether or not the font face is italic.
--- [not]underlined| Whether or not the font face is underlined.
--- fore:_color_   | The foreground color of the font face.
--- back:_color_   | The background color of the font face.
--- [not]eolfilled | Does the background color extend to the end of the line?
--- case:_char_    | The case of the font ('u': upper, 'l': lower, 'm': normal).
--- [not]visible   | Whether or not the text is visible.
--- [not]changeable| Whether the text is changeable or read-only.
---
--- Specify font colors in either "#RRGGBB" format, "0xBBGGRR" format, or the
--- decimal equivalent of the latter. As with token names, LPeg patterns, and
--- styles, there is a set of predefined color names, but they vary depending on
--- the current color theme in use. Therefore, it is generally not a good idea to
--- manually define colors within styles in your lexer since they might not fit
--- into a user's chosen color theme. Try to refrain from even using predefined
--- colors in a style because that color may be theme-specific. Instead, the best
--- practice is to either use predefined styles or derive new color-agnostic
--- styles from predefined ones. For example, Lua "longstring" tokens use the
--- existing `lexer.STYLE_STRING` style instead of defining a new one.
---
--- #### Example Styles
---
--- Defining styles is pretty straightforward. An empty style that inherits the
--- default theme settings is simply an empty string:
---
---     local style_nothing = ''
---
--- A similar style but with a bold font face looks like this:
+-- #### Styles and Styling
 --
---     local style_bold = 'bold'
+-- The most basic form of syntax highlighting is assigning different colors to different
+-- tokens. Instead of highlighting with just colors, Scintilla allows for more rich highlighting,
+-- or "styling", with different fonts, font sizes, font attributes, and foreground and background
+-- colors, just to name a few. The unit of this rich highlighting is called a "style". Styles
+-- are simply Lua tables of properties. By default, lexers associate predefined token names like
+-- `lexer.WHITESPACE`, `lexer.COMMENT`, `lexer.STRING`, etc. with particular styles as part
+-- of a universal color theme. These predefined styles are contained in [`lexer.styles`](),
+-- and you may define your own styles. See that table's documentation for more information. As
+-- with token names, LPeg patterns, and styles, there is a set of predefined color names,
+-- but they vary depending on the current color theme in use. Therefore, it is generally not
+-- a good idea to manually define colors within styles in your lexer since they might not fit
+-- into a user's chosen color theme. Try to refrain from even using predefined colors in a
+-- style because that color may be theme-specific. Instead, the best practice is to either use
+-- predefined styles or derive new color-agnostic styles from predefined ones. For example, Lua
+-- "longstring" tokens use the existing `lexer.styles.string` style instead of defining a new one.
 --
--- If you want the same style, but also with an italic font face, define the new
--- style in terms of the old one:
+-- ##### Example Styles
 --
---     local style_bold_italic = style_bold..',italics'
+-- Defining styles is pretty straightforward. An empty style that inherits the default theme
+-- settings is simply an empty table:
 --
--- This allows you to derive new styles from predefined ones without having to
--- rewrite them. This operation leaves the old style unchanged. Thus if you
--- had a "static variable" token whose style you wanted to base off of
--- `lexer.STYLE_VARIABLE`, it would probably look like:
+--     local style_nothing = {}
 --
---     local style_static_var = l.STYLE_VARIABLE..',italics'
+-- A similar style but with a bold font face looks like this:
 --
--- The color theme files in the *lexers/themes/* folder give more examples of
--- style definitions.
+--     local style_bold = {bold = true}
 --
--- ### Token Styles
+-- You can derive new styles from predefined ones without having to rewrite them. This operation
+-- leaves the old style unchanged. For example, if you had a "static variable" token whose
+-- style you wanted to base off of `lexer.styles.variable`, it would probably look like:
 --
--- Lexers use the `_tokenstyles` table to assign tokens to particular styles.
--- Recall the token definition and `_tokenstyles` table from the lexer template:
+--     local style_static_var = lexer.styles.variable .. {italics = true}
 --
---     local ws = token(l.WHITESPACE, l.space^1)
+-- The color theme files in the *lexers/themes/* folder give more examples of style definitions.
 --
---     ...
+-- #### Token Styles
 --
---     M._tokenstyles = {
+-- Lexers use the [`lexer.add_style()`]() function to assign styles to particular tokens. Recall
+-- the token definition and from the lexer template:
 --
---     }
+--     local ws = token(lexer.WHITESPACE, lexer.space^1)
+--     lex:add_rule('whitespace', ws)
 --
--- Why is a style not assigned to the `lexer.WHITESPACE` token? As mentioned
--- earlier, lexers automatically associate tokens that use predefined token
--- names with a particular style. Only tokens with custom token names need
--- manual style associations. As an example, consider a custom whitespace token:
+-- Why is a style not assigned to the `lexer.WHITESPACE` token? As mentioned earlier, lexers
+-- automatically associate tokens that use predefined token names with a particular style. Only
+-- tokens with custom token names need manual style associations. As an example, consider a
+-- custom whitespace token:
 --
---     local ws = token('custom_whitespace', l.space^1)
+--     local ws = token('custom_whitespace', lexer.space^1)
 --
 -- Assigning a style to this token looks like:
 --
---     M._tokenstyles = {
---       custom_whitespace = l.STYLE_WHITESPACE
---     }
+--     lex:add_style('custom_whitespace', lexer.styles.whitespace)
 --
--- Do not confuse token names with rule names. They are completely different
--- entities. In the example above, the lexer assigns the "custom_whitespace"
--- token the existing style for `WHITESPACE` tokens. If instead you want to
--- color the background of whitespace a shade of grey, it might look like:
+-- Do not confuse token names with rule names. They are completely different entities. In the
+-- example above, the lexer associates the "custom_whitespace" token with the existing style
+-- for `lexer.WHITESPACE` tokens. If instead you prefer to color the background of whitespace
+-- a shade of grey, it might look like:
 --
---     local custom_style = l.STYLE_WHITESPACE..',back:$(color.grey)'
---     M._tokenstyles = {
---       custom_whitespace = custom_style
---     }
+--     lex:add_style('custom_whitespace', lexer.styles.whitespace .. {back = lexer.colors.grey})
 --
--- Notice that the lexer peforms Scintilla/SciTE-style "$()" property expansion.
--- You may also use "%()". Remember to refrain from assigning specific colors in
--- styles, but in this case, all user color themes probably define the
--- "color.grey" property.
+-- Remember to refrain from assigning specific colors in styles, but in this case, all user
+-- color themes probably define `colors.grey`.
 --
--- ### Line Lexers
+-- #### Line Lexers
 --
--- By default, lexers match the arbitrary chunks of text passed to them by
--- Scintilla. These chunks may be a full document, only the visible part of a
--- document, or even just portions of lines. Some lexers need to match whole
--- lines. For example, a lexer for the output of a file "diff" needs to know if
--- the line started with a '+' or '-' and then style the entire line
--- accordingly. To indicate that your lexer matches by line, use the
--- `_LEXBYLINE` field:
+-- By default, lexers match the arbitrary chunks of text passed to them by Scintilla. These
+-- chunks may be a full document, only the visible part of a document, or even just portions
+-- of lines. Some lexers need to match whole lines. For example, a lexer for the output of a
+-- file "diff" needs to know if the line started with a '+' or '-' and then style the entire
+-- line accordingly. To indicate that your lexer matches by line, create the lexer with an
+-- extra parameter:
 --
---     M._LEXBYLINE = true
+--     local lex = lexer.new('?', {lex_by_line = true})
 --
--- Now the input text for the lexer is a single line at a time. Keep in mind
--- that line lexers do not have the ability to look ahead at subsequent lines.
+-- Now the input text for the lexer is a single line at a time. Keep in mind that line lexers
+-- do not have the ability to look ahead at subsequent lines.
 --
--- ### Embedded Lexers
+-- #### Embedded Lexers
 --
--- Lexers embed within one another very easily, requiring minimal effort. In the
--- following sections, the lexer being embedded is called the "child" lexer and
--- the lexer a child is being embedded in is called the "parent". For example,
--- consider an HTML lexer and a CSS lexer. Either lexer stands alone for styling
--- their respective HTML and CSS files. However, CSS can be embedded inside
--- HTML. In this specific case, the CSS lexer is the "child" lexer with the HTML
--- lexer being the "parent". Now consider an HTML lexer and a PHP lexer. This
--- sounds a lot like the case with CSS, but there is a subtle difference: PHP
--- _embeds itself_ into HTML while CSS is _embedded in_ HTML. This fundamental
--- difference results in two types of embedded lexers: a parent lexer that
--- embeds other child lexers in it (like HTML embedding CSS), and a child lexer
--- that embeds itself within a parent lexer (like PHP embedding itself in HTML).
+-- Lexers embed within one another very easily, requiring minimal effort. In the following
+-- sections, the lexer being embedded is called the "child" lexer and the lexer a child is
+-- being embedded in is called the "parent". For example, consider an HTML lexer and a CSS
+-- lexer. Either lexer stands alone for styling their respective HTML and CSS files. However, CSS
+-- can be embedded inside HTML. In this specific case, the CSS lexer is the "child" lexer with
+-- the HTML lexer being the "parent". Now consider an HTML lexer and a PHP lexer. This sounds
+-- a lot like the case with CSS, but there is a subtle difference: PHP _embeds itself into_
+-- HTML while CSS is _embedded in_ HTML. This fundamental difference results in two types of
+-- embedded lexers: a parent lexer that embeds other child lexers in it (like HTML embedding CSS),
+-- and a child lexer that embeds itself into a parent lexer (like PHP embedding itself in HTML).
 --
--- #### Parent Lexer
+-- ##### Parent Lexer
 --
--- Before embedding a child lexer into a parent lexer, the parent lexer needs to
--- load the child lexer. This is done with the [`lexer.load()`]() function. For
--- example, loading the CSS lexer within the HTML lexer looks like:
+-- Before embedding a child lexer into a parent lexer, the parent lexer needs to load the child
+-- lexer. This is done with the [`lexer.load()`]() function. For example, loading the CSS lexer
+-- within the HTML lexer looks like:
 --
---     local css = l.load('css')
+--     local css = lexer.load('css')
 --
--- The next part of the embedding process is telling the parent lexer when to
--- switch over to the child lexer and when to switch back. The lexer refers to
--- these indications as the "start rule" and "end rule", respectively, and are
--- just LPeg patterns. Continuing with the HTML/CSS example, the transition from
--- HTML to CSS is when the lexer encounters a "style" tag with a "type"
--- attribute whose value is "text/css":
+-- The next part of the embedding process is telling the parent lexer when to switch over
+-- to the child lexer and when to switch back. The lexer refers to these indications as the
+-- "start rule" and "end rule", respectively, and are just LPeg patterns. Continuing with the
+-- HTML/CSS example, the transition from HTML to CSS is when the lexer encounters a "style"
+-- tag with a "type" attribute whose value is "text/css":
 --
 --     local css_tag = P('<style') * P(function(input, index)
---       if input:find('^[^>]+type="text/css"', index) then
---         return index
---       end
+--       if input:find('^[^>]+type="text/css"', index) then return index end
 --     end)
 --
--- This pattern looks for the beginning of a "style" tag and searches its
--- attribute list for the text "`type="text/css"`". (In this simplified example,
--- the Lua pattern does not consider whitespace between the '=' nor does it
--- consider that using single quotes is valid.) If there is a match, the
--- functional pattern returns a value instead of `nil`. In this case, the value
--- returned does not matter because we ultimately want to style the "style" tag
--- as an HTML tag, so the actual start rule looks like this:
+-- This pattern looks for the beginning of a "style" tag and searches its attribute list for
+-- the text "`type="text/css"`". (In this simplified example, the Lua pattern does not consider
+-- whitespace between the '=' nor does it consider that using single quotes is valid.) If there
+-- is a match, the functional pattern returns a value instead of `nil`. In this case, the value
+-- returned does not matter because we ultimately want to style the "style" tag as an HTML tag,
+-- so the actual start rule looks like this:
 --
 --     local css_start_rule = #css_tag * tag
 --
--- Now that the parent knows when to switch to the child, it needs to know when
--- to switch back. In the case of HTML/CSS, the switch back occurs when the
--- lexer encounters an ending "style" tag, though the lexer should still style
--- the tag as an HTML tag:
+-- Now that the parent knows when to switch to the child, it needs to know when to switch
+-- back. In the case of HTML/CSS, the switch back occurs when the lexer encounters an ending
+-- "style" tag, though the lexer should still style the tag as an HTML tag:
 --
 --     local css_end_rule = #P('</style>') * tag
 --
--- Once the parent loads the child lexer and defines the child's start and end
--- rules, it embeds the child with the [`lexer.embed_lexer()`]() function:
+-- Once the parent loads the child lexer and defines the child's start and end rules, it embeds
+-- the child with the [`lexer.embed()`]() function:
 --
---     l.embed_lexer(M, css, css_start_rule, css_end_rule)
+--     lex:embed(css, css_start_rule, css_end_rule)
 --
--- The first parameter is the parent lexer object to embed the child in, which
--- in this case is `M`. The other three parameters are the child lexer object
--- loaded earlier followed by its start and end rules.
+-- ##### Child Lexer
 --
--- #### Child Lexer
+-- The process for instructing a child lexer to embed itself into a parent is very similar to
+-- embedding a child into a parent: first, load the parent lexer into the child lexer with the
+-- [`lexer.load()`]() function and then create start and end rules for the child lexer. However,
+-- in this case, call [`lexer.embed()`]() with switched arguments. For example, in the PHP lexer:
 --
--- The process for instructing a child lexer to embed itself into a parent is
--- very similar to embedding a child into a parent: first, load the parent lexer
--- into the child lexer with the [`lexer.load()`]() function and then create
--- start and end rules for the child lexer. However, in this case, swap the
--- lexer object arguments to [`lexer.embed_lexer()`](). For example, in the PHP
--- lexer:
---
---     local html = l.load('html')
+--     local html = lexer.load('html')
 --     local php_start_rule = token('php_tag', '<?php ')
 --     local php_end_rule = token('php_tag', '?>')
---     l.embed_lexer(html, M, php_start_rule, php_end_rule)
+--     lex:add_style('php_tag', lexer.styles.embedded)
+--     html:embed(lex, php_start_rule, php_end_rule)
 --
--- ### Lexers with Complex State
+-- #### Lexers with Complex State
 --
--- A vast majority of lexers are not stateful and can operate on any chunk of
--- text in a document. However, there may be rare cases where a lexer does need
--- to keep track of some sort of persistent state. Rather than using `lpeg.P`
--- function patterns that set state variables, it is recommended to make use of
--- Scintilla's built-in, per-line state integers via [`lexer.line_state`](). It
--- was designed to accommodate up to 32 bit flags for tracking state.
--- [`lexer.line_from_position()`]() will return the line for any position given
--- to an `lpeg.P` function pattern. (Any positions derived from that position
--- argument will also work.)
+-- A vast majority of lexers are not stateful and can operate on any chunk of text in a
+-- document. However, there may be rare cases where a lexer does need to keep track of some
+-- sort of persistent state. Rather than using `lpeg.P` function patterns that set state
+-- variables, it is recommended to make use of Scintilla's built-in, per-line state integers via
+-- [`lexer.line_state`](). It was designed to accommodate up to 32 bit flags for tracking state.
+-- [`lexer.line_from_position()`]() will return the line for any position given to an `lpeg.P`
+-- function pattern. (Any positions derived from that position argument will also work.)
 --
 -- Writing stateful lexers is beyond the scope of this document.
 --
--- ## Code Folding
+-- ### Code Folding
 --
--- When reading source code, it is occasionally helpful to temporarily hide
--- blocks of code like functions, classes, comments, etc. This is the concept of
--- "folding". In the Textadept and SciTE editors for example, little indicators
--- in the editor margins appear next to code that can be folded at places called
--- "fold points". When the user clicks an indicator, the editor hides the code
--- associated with the indicator until the user clicks the indicator again. The
+-- When reading source code, it is occasionally helpful to temporarily hide blocks of code like
+-- functions, classes, comments, etc. This is the concept of "folding". In the Textadept and
+-- SciTE editors for example, little indicators in the editor margins appear next to code that
+-- can be folded at places called "fold points". When the user clicks an indicator, the editor
+-- hides the code associated with the indicator until the user clicks the indicator again. The
 -- lexer specifies these fold points and what code exactly to fold.
 --
--- The fold points for most languages occur on keywords or character sequences.
--- Examples of fold keywords are "if" and "end" in Lua and examples of fold
--- character sequences are '{', '}', "/\*", and "\*/" in C for code block and
--- comment delimiters, respectively. However, these fold points cannot occur
--- just anywhere. For example, lexers should not recognize fold keywords that
--- appear within strings or comments. The lexer's `_foldsymbols` table allows
--- you to conveniently define fold points with such granularity. For example,
--- consider C:
+-- The fold points for most languages occur on keywords or character sequences. Examples of
+-- fold keywords are "if" and "end" in Lua and examples of fold character sequences are '{',
+-- '}', "/\*", and "\*/" in C for code block and comment delimiters, respectively. However,
+-- these fold points cannot occur just anywhere. For example, lexers should not recognize fold
+-- keywords that appear within strings or comments. The [`lexer.add_fold_point()`]() function
+-- allows you to conveniently define fold points with such granularity. For example, consider C:
 --
---     M._foldsymbols = {
---       [l.OPERATOR] = {['{'] = 1, ['}'] = -1},
---       [l.COMMENT] = {['/*'] = 1, ['*/'] = -1},
---       _patterns = {'[{}]', '/%*', '%*/'}
---     }
+--     lex:add_fold_point(lexer.OPERATOR, '{', '}')
+--     lex:add_fold_point(lexer.COMMENT, '/*', '*/')
 --
--- The first assignment states that any '{' or '}' that the lexer recognized as
--- an `lexer.OPERATOR` token is a fold point. The integer `1` indicates the
--- match is a beginning fold point and `-1` indicates the match is an ending
--- fold point. Likewise, the second assignment states that any "/\*" or "\*/"
--- that the lexer recognizes as part of a `lexer.COMMENT` token is a fold point.
--- The lexer does not consider any occurences of these characters outside their
--- defined tokens (such as in a string) as fold points. Finally, every
--- `_foldsymbols` table must have a `_patterns` field that contains a list of
--- [Lua patterns][] that match fold points. If the lexer encounters text that
--- matches one of those patterns, the lexer looks up the matched text in its
--- token's table in order to determine whether or not the text is a fold point.
--- In the example above, the first Lua pattern matches any '{' or '}'
--- characters. When the lexer comes across one of those characters, it checks if
--- the match is an `lexer.OPERATOR` token. If so, the lexer identifies the match
--- as a fold point. The same idea applies for the other patterns. (The '%' is in
--- the other patterns because '\*' is a special character in Lua patterns that
--- needs escaping.) How do you specify fold keywords? Here is an example for
--- Lua:
+-- The first assignment states that any '{' or '}' that the lexer recognized as an `lexer.OPERATOR`
+-- token is a fold point. Likewise, the second assignment states that any "/\*" or "\*/" that
+-- the lexer recognizes as part of a `lexer.COMMENT` token is a fold point. The lexer does
+-- not consider any occurrences of these characters outside their defined tokens (such as in
+-- a string) as fold points. How do you specify fold keywords? Here is an example for Lua:
 --
---     M._foldsymbols = {
---       [l.KEYWORD] = {
---         ['if'] = 1, ['do'] = 1, ['function'] = 1,
---         ['end'] = -1, ['repeat'] = 1, ['until'] = -1
---       },
---       _patterns = {'%l+'}
---     }
---
--- Any time the lexer encounters a lower case word, if that word is a
--- `lexer.KEYWORD` token and in the associated list of fold points, the lexer
--- identifies the word as a fold point.
+--     lex:add_fold_point(lexer.KEYWORD, 'if', 'end')
+--     lex:add_fold_point(lexer.KEYWORD, 'do', 'end')
+--     lex:add_fold_point(lexer.KEYWORD, 'function', 'end')
+--     lex:add_fold_point(lexer.KEYWORD, 'repeat', 'until')
 --
 -- If your lexer has case-insensitive keywords as fold points, simply add a
--- `_case_insensitive = true` option to the `_foldsymbols` table and specify
--- keywords in lower case.
+-- `case_insensitive_fold_points = true` option to [`lexer.new()`](), and specify keywords in
+-- lower case.
 --
--- If your lexer needs to do some additional processing to determine if a match
--- is a fold point, assign a function that returns an integer. Returning `1` or
--- `-1` indicates the match is a fold point. Returning `0` indicates it is not.
--- For example:
+-- If your lexer needs to do some additional processing in order to determine if a token is
+-- a fold point, pass a function that returns an integer to `lex:add_fold_point()`. Returning
+-- `1` indicates the token is a beginning fold point and returning `-1` indicates the token is
+-- an ending fold point. Returning `0` indicates the token is not a fold point. For example:
 --
---     local function fold_strange_token(text, pos, line, s, match)
+--     local function fold_strange_token(text, pos, line, s, symbol)
 --       if ... then
 --         return 1 -- beginning fold point
 --       elseif ... then
@@ -605,107 +467,205 @@ local M = {}
 --       return 0
 --     end
 --
+--     lex:add_fold_point('strange_token', '|', fold_strange_token)
+--
+-- Any time the lexer encounters a '|' that is a "strange_token", it calls the `fold_strange_token`
+-- function to determine if '|' is a fold point. The lexer calls these functions with the
+-- following arguments: the text to identify fold points in, the beginning position of the
+-- current line in the text to fold, the current line's text, the position in the current line
+-- the fold point text starts at, and the fold point text itself.
+--
+-- #### Fold by Indentation
+--
+-- Some languages have significant whitespace and/or no delimiters that indicate fold points. If
+-- your lexer falls into this category and you would like to mark fold points based on changes
+-- in indentation, create the lexer with a `fold_by_indentation = true` option:
+--
+--     local lex = lexer.new('?', {fold_by_indentation = true})
+--
+-- ### Using Lexers
+--
+-- **Textadept**
+--
+-- Put your lexer in your *~/.textadept/lexers/* directory so you do not overwrite it when
+-- upgrading Textadept. Also, lexers in this directory override default lexers. Thus, Textadept
+-- loads a user *lua* lexer instead of the default *lua* lexer. This is convenient for tweaking
+-- a default lexer to your liking. Then add a [file type](#textadept.file_types) for your lexer
+-- if necessary.
+--
+-- **SciTE**
+--
+-- Create a *.properties* file for your lexer and `import` it in either your *SciTEUser.properties*
+-- or *SciTEGlobal.properties*. The contents of the *.properties* file should contain:
+--
+--     file.patterns.[lexer_name]=[file_patterns]
+--     lexer.$(file.patterns.[lexer_name])=[lexer_name]
+--
+-- where `[lexer_name]` is the name of your lexer (minus the *.lua* extension) and
+-- `[file_patterns]` is a set of file extensions to use your lexer for.
+--
+-- Please note that Lua lexers ignore any styling information in *.properties* files. Your
+-- theme file in the *lexers/themes/* directory contains styling information.
+--
+-- ### Migrating Legacy Lexers
+--
+-- Legacy lexers are of the form:
+--
+--     local l = require('lexer')
+--     local token, word_match = l.token, l.word_match
+--     local P, R, S = lpeg.P, lpeg.R, lpeg.S
+--
+--     local M = {_NAME = '?'}
+--
+--     [... token and pattern definitions ...]
+--
+--     M._rules = {
+--       {'rule', pattern},
+--       [...]
+--     }
+--
+--     M._tokenstyles = {
+--       'token' = 'style',
+--       [...]
+--     }
+--
 --     M._foldsymbols = {
---       ['strange_token'] = {['|'] = fold_strange_token},
---       _patterns = {'|'}
+--       _patterns = {...},
+--       ['token'] = {['start'] = 1, ['end'] = -1},
+--       [...]
 --     }
 --
--- Any time the lexer encounters a '|' that is a "strange_token", it calls the
--- `fold_strange_token` function to determine if '|' is a fold point. The lexer
--- calls these functions with the following arguments: the text to identify fold
--- points in, the beginning position of the current line in the text to fold,
--- the current line's text, the position in the current line the matched text
--- starts at, and the matched text itself.
+--     return M
 --
--- [Lua patterns]: http://www.lua.org/manual/5.2/manual.html#6.4.1
+-- While Scintillua will handle such legacy lexers just fine without any changes, it is
+-- recommended that you migrate yours. The migration process is fairly straightforward:
+--
+-- 1. Replace all instances of `l` with `lexer`, as it's better practice and results in less
+--    confusion.
+-- 2. Replace `local M = {_NAME = '?'}` with `local lex = lexer.new('?')`, where `?` is the
+--    name of your legacy lexer. At the end of the lexer, change `return M` to `return lex`.
+-- 3. Instead of defining rules towards the end of your lexer, define your rules as you define
+--    your tokens and patterns using [`lex:add_rule()`](#lexer.add_rule).
+-- 4. Similarly, any custom token names should have their styles immediately defined using
+--    [`lex:add_style()`](#lexer.add_style).
+-- 5. Optionally convert any table arguments passed to [`lexer.word_match()`]() to a
+--    space-separated string of words.
+-- 6. Replace any calls to `lexer.embed(M, child, ...)` and `lexer.embed(parent, M, ...)` with
+--    [`lex:embed`](#lexer.embed)`(child, ...)` and `parent:embed(lex, ...)`, respectively.
+-- 7. Define fold points with simple calls to [`lex:add_fold_point()`](#lexer.add_fold_point). No
+--    need to mess with Lua patterns anymore.
+-- 8. Any legacy lexer options such as `M._FOLDBYINDENTATION`, `M._LEXBYLINE`, `M._lexer`,
+--    etc. should be added as table options to [`lexer.new()`]().
+-- 9. Any external lexer rule fetching and/or modifications via `lexer._RULES` should be changed
+--    to use [`lexer.get_rule()`]() and [`lexer.modify_rule()`]().
+--
+-- As an example, consider the following sample legacy lexer:
 --
--- ### Fold by Indentation
+--     local l = require('lexer')
+--     local token, word_match = l.token, l.word_match
+--     local P, R, S = lpeg.P, lpeg.R, lpeg.S
 --
--- Some languages have significant whitespace and/or no delimiters that indicate
--- fold points. If your lexer falls into this category and you would like to
--- mark fold points based on changes in indentation, use the
--- `_FOLDBYINDENTATION` field:
+--     local M = {_NAME = 'legacy'}
 --
---     M._FOLDBYINDENTATION = true
+--     local ws = token(l.WHITESPACE, l.space^1)
+--     local comment = token(l.COMMENT, '#' * l.nonnewline^0)
+--     local string = token(l.STRING, l.delimited_range('"'))
+--     local number = token(l.NUMBER, l.float + l.integer)
+--     local keyword = token(l.KEYWORD, word_match{'foo', 'bar', 'baz'})
+--     local custom = token('custom', P('quux'))
+--     local identifier = token(l.IDENTIFIER, l.word)
+--     local operator = token(l.OPERATOR, S('+-*/%^=<>,.()[]{}'))
 --
--- ## Using Lexers
+--     M._rules = {
+--       {'whitespace', ws},
+--       {'keyword', keyword},
+--       {'custom', custom},
+--       {'identifier', identifier},
+--       {'string', string},
+--       {'comment', comment},
+--       {'number', number},
+--       {'operator', operator}
+--     }
 --
--- ### Textadept
+--     M._tokenstyles = {
+--       'custom' = l.STYLE_KEYWORD .. ',bold'
+--     }
 --
--- Put your lexer in your *~/.textadept/lexers/* directory so you do not
--- overwrite it when upgrading Textadept. Also, lexers in this directory
--- override default lexers. Thus, Textadept loads a user *lua* lexer instead of
--- the default *lua* lexer. This is convenient for tweaking a default lexer to
--- your liking. Then add a [file type][] for your lexer if necessary.
+--     M._foldsymbols = {
+--       _patterns = {'[{}]'},
+--       [l.OPERATOR] = {['{'] = 1, ['}'] = -1}
+--     }
 --
--- [file type]: _M.textadept.file_types.html
+--     return M
 --
--- ### SciTE
+-- Following the migration steps would yield:
 --
--- Create a *.properties* file for your lexer and `import` it in either your
--- *SciTEUser.properties* or *SciTEGlobal.properties*. The contents of the
--- *.properties* file should contain:
+--     local lexer = require('lexer')
+--     local token, word_match = lexer.token, lexer.word_match
+--     local P, S = lpeg.P, lpeg.S
 --
---     file.patterns.[lexer_name]=[file_patterns]
---     lexer.$(file.patterns.[lexer_name])=[lexer_name]
+--     local lex = lexer.new('legacy')
 --
--- where `[lexer_name]` is the name of your lexer (minus the *.lua* extension)
--- and `[file_patterns]` is a set of file extensions to use your lexer for.
+--     lex:add_rule('whitespace', token(lexer.WHITESPACE, lexer.space^1))
+--     lex:add_rule('keyword', token(lexer.KEYWORD, word_match('foo bar baz')))
+--     lex:add_rule('custom', token('custom', 'quux'))
+--     lex:add_style('custom', lexer.styles.keyword .. {bold = true})
+--     lex:add_rule('identifier', token(lexer.IDENTIFIER, lexer.word))
+--     lex:add_rule('string', token(lexer.STRING, lexer.range('"')))
+--     lex:add_rule('comment', token(lexer.COMMENT, lexer.to_eol('#')))
+--     lex:add_rule('number', token(lexer.NUMBER, lexer.number))
+--     lex:add_rule('operator', token(lexer.OPERATOR, S('+-*/%^=<>,.()[]{}')))
 --
--- Please note that Lua lexers ignore any styling information in *.properties*
--- files. Your theme file in the *lexers/themes/* directory contains styling
--- information.
+--     lex:add_fold_point(lexer.OPERATOR, '{', '}')
 --
--- ## Considerations
+--     return lex
 --
--- ### Performance
+-- ### Considerations
 --
--- There might be some slight overhead when initializing a lexer, but loading a
--- file from disk into Scintilla is usually more expensive. On modern computer
--- systems, I see no difference in speed between LPeg lexers and Scintilla's C++
--- ones. Optimize lexers for speed by re-arranging rules in the `_rules` table
--- so that the most common rules match first. Do keep in mind that order matters
--- for similar rules.
+-- #### Performance
 --
--- ### Limitations
+-- There might be some slight overhead when initializing a lexer, but loading a file from disk
+-- into Scintilla is usually more expensive. On modern computer systems, I see no difference in
+-- speed between Lua lexers and Scintilla's C++ ones. Optimize lexers for speed by re-arranging
+-- `lexer.add_rule()` calls so that the most common rules match first. Do keep in mind that
+-- order matters for similar rules.
 --
--- Embedded preprocessor languages like PHP cannot completely embed in their
--- parent languages in that the parent's tokens do not support start and end
--- rules. This mostly goes unnoticed, but code like
+-- In some cases, folding may be far more expensive than lexing, particularly in lexers with a
+-- lot of potential fold points. If your lexer is exhibiting signs of slowness, try disabling
+-- folding in your text editor first. If that speeds things up, you can try reducing the number
+-- of fold points you added, overriding `lexer.fold()` with your own implementation, or simply
+-- eliminating folding support from your lexer.
 --
---     <div id="<?php echo $id; ?>">
+-- #### Limitations
 --
--- or
+-- Embedded preprocessor languages like PHP cannot completely embed in their parent languages
+-- in that the parent's tokens do not support start and end rules. This mostly goes unnoticed,
+-- but code like
 --
---     <div <?php if ($odd) { echo 'class="odd"'; } ?>>
+--     <div id="<?php echo $id; ?>">
 --
 -- will not style correctly.
 --
--- ### Troubleshooting
+-- #### Troubleshooting
 --
--- Errors in lexers can be tricky to debug. Lexers print Lua errors to
--- `io.stderr` and `_G.print()` statements to `io.stdout`. Running your editor
--- from a terminal is the easiest way to see errors as they occur.
+-- Errors in lexers can be tricky to debug. Lexers print Lua errors to `io.stderr` and `_G.print()`
+-- statements to `io.stdout`. Running your editor from a terminal is the easiest way to see
+-- errors as they occur.
 --
--- ### Risks
+-- #### Risks
 --
--- Poorly written lexers have the ability to crash Scintilla (and thus its
--- containing application), so unsaved data might be lost. However, I have only
--- observed these crashes in early lexer development, when syntax errors or
--- pattern errors are present. Once the lexer actually starts styling text
--- (either correctly or incorrectly, it does not matter), I have not observed
+-- Poorly written lexers have the ability to crash Scintilla (and thus its containing application),
+-- so unsaved data might be lost. However, I have only observed these crashes in early lexer
+-- development, when syntax errors or pattern errors are present. Once the lexer actually starts
+-- styling text (either correctly or incorrectly, it does not matter), I have not observed
 -- any crashes.
 --
--- ### Acknowledgements
+-- #### Acknowledgements
 --
--- Thanks to Peter Odding for his [lexer post][] on the Lua mailing list
--- that inspired me, and thanks to Roberto Ierusalimschy for LPeg.
+-- Thanks to Peter Odding for his [lexer post][] on the Lua mailing list that provided inspiration,
+-- and thanks to Roberto Ierusalimschy for LPeg.
 --
 -- [lexer post]: http://lua-users.org/lists/lua-l/2007-04/msg00116.html
--- @field LEXERPATH (string)
---   The path used to search for a lexer to load.
---   Identical in format to Lua's `package.path` string.
---   The default value is `package.path`.
 -- @field DEFAULT (string)
 --   The token name for default tokens.
 -- @field WHITESPACE (string)
@@ -740,58 +700,6 @@ local M = {}
 --   The token name for label tokens.
 -- @field REGEX (string)
 --   The token name for regex tokens.
--- @field STYLE_CLASS (string)
---   The style typically used for class definitions.
--- @field STYLE_COMMENT (string)
---   The style typically used for code comments.
--- @field STYLE_CONSTANT (string)
---   The style typically used for constants.
--- @field STYLE_ERROR (string)
---   The style typically used for erroneous syntax.
--- @field STYLE_FUNCTION (string)
---   The style typically used for function definitions.
--- @field STYLE_KEYWORD (string)
---   The style typically used for language keywords.
--- @field STYLE_LABEL (string)
---   The style typically used for labels.
--- @field STYLE_NUMBER (string)
---   The style typically used for numbers.
--- @field STYLE_OPERATOR (string)
---   The style typically used for operators.
--- @field STYLE_REGEX (string)
---   The style typically used for regular expression strings.
--- @field STYLE_STRING (string)
---   The style typically used for strings.
--- @field STYLE_PREPROCESSOR (string)
---   The style typically used for preprocessor statements.
--- @field STYLE_TYPE (string)
---   The style typically used for static types.
--- @field STYLE_VARIABLE (string)
---   The style typically used for variables.
--- @field STYLE_WHITESPACE (string)
---   The style typically used for whitespace.
--- @field STYLE_EMBEDDED (string)
---   The style typically used for embedded code.
--- @field STYLE_IDENTIFIER (string)
---   The style typically used for identifier words.
--- @field STYLE_DEFAULT (string)
---   The style all styles are based off of.
--- @field STYLE_LINENUMBER (string)
---   The style used for all margins except fold margins.
--- @field STYLE_BRACELIGHT (string)
---   The style used for highlighted brace characters.
--- @field STYLE_BRACEBAD (string)
---   The style used for unmatched brace characters.
--- @field STYLE_CONTROLCHAR (string)
---   The style used for control characters.
---   Color attributes are ignored.
--- @field STYLE_INDENTGUIDE (string)
---   The style used for indentation guides.
--- @field STYLE_CALLTIP (string)
---   The style used by call tips if [`buffer.call_tip_use_style`]() is set.
---   Only the font name, size, and color attributes are used.
--- @field STYLE_FOLDDISPLAYTEXT (string)
---   The style used for fold display text.
 -- @field any (pattern)
 --   A pattern that matches any single character.
 -- @field ascii (pattern)
@@ -803,8 +711,7 @@ local M = {}
 -- @field digit (pattern)
 --   A pattern that matches any digit ('0'-'9').
 -- @field alnum (pattern)
---   A pattern that matches any alphanumeric character ('A'-'Z', 'a'-'z',
---     '0'-'9').
+--   A pattern that matches any alphanumeric character ('A'-'Z', 'a'-'z', '0'-'9').
 -- @field lower (pattern)
 --   A pattern that matches any lower case character ('a'-'z').
 -- @field upper (pattern)
@@ -818,18 +725,14 @@ local M = {}
 -- @field print (pattern)
 --   A pattern that matches any printable character (' ' to '~').
 -- @field punct (pattern)
---   A pattern that matches any punctuation character ('!' to '/', ':' to '@',
---   '[' to ''', '{' to '~').
+--   A pattern that matches any punctuation character ('!' to '/', ':' to '@', '[' to ''',
+--   '{' to '~').
 -- @field space (pattern)
---   A pattern that matches any whitespace character ('\t', '\v', '\f', '\n',
---   '\r', space).
+--   A pattern that matches any whitespace character ('\t', '\v', '\f', '\n', '\r', space).
 -- @field newline (pattern)
---   A pattern that matches any set of end of line characters.
+--   A pattern that matches a sequence of end of line characters.
 -- @field nonnewline (pattern)
 --   A pattern that matches any single, non-newline character.
--- @field nonnewline_esc (pattern)
---   A pattern that matches any single, non-newline character or any set of end
---   of line characters escaped with '\'.
 -- @field dec_num (pattern)
 --   A pattern that matches a decimal number.
 -- @field hex_num (pattern)
@@ -840,9 +743,12 @@ local M = {}
 --   A pattern that matches either a decimal, hexadecimal, or octal number.
 -- @field float (pattern)
 --   A pattern that matches a floating point number.
+-- @field number (pattern)
+--   A pattern that matches a typical number, either a floating point, decimal, hexadecimal,
+--   or octal number.
 -- @field word (pattern)
---   A pattern that matches a typical word. Words begin with a letter or
---   underscore and consist of alphanumeric and underscore characters.
+--   A pattern that matches a typical word. Words begin with a letter or underscore and consist
+--   of alphanumeric and underscore characters.
 -- @field FOLD_BASE (number)
 --   The initial (root) fold level.
 -- @field FOLD_BLANK (number)
@@ -850,9 +756,8 @@ local M = {}
 -- @field FOLD_HEADER (number)
 --   Flag indicating the line is fold point.
 -- @field fold_level (table, Read-only)
---   Table of fold level bit-masks for line numbers starting from zero.
---   Fold level masks are composed of an integer level combined with any of the
---   following bits:
+--   Table of fold level bit-masks for line numbers starting from 1.
+--   Fold level masks are composed of an integer level combined with any of the following bits:
 --
 --   * `lexer.FOLD_BASE`
 --     The initial fold level.
@@ -861,87 +766,328 @@ local M = {}
 --   * `lexer.FOLD_HEADER`
 --     The line is a header, or fold point.
 -- @field indent_amount (table, Read-only)
---   Table of indentation amounts in character columns, for line numbers
---   starting from zero.
+--   Table of indentation amounts in character columns, for line numbers starting from 1.
 -- @field line_state (table)
---   Table of integer line states for line numbers starting from zero.
+--   Table of integer line states for line numbers starting from 1.
 --   Line states can be used by lexers for keeping track of persistent states.
 -- @field property (table)
 --   Map of key-value string pairs.
 -- @field property_expanded (table, Read-only)
---   Map of key-value string pairs with `$()` and `%()` variable replacement
---   performed in values.
+--   Map of key-value string pairs with `$()` and `%()` variable replacement performed in values.
 -- @field property_int (table, Read-only)
---   Map of key-value pairs with values interpreted as numbers, or `0` if not
---   found.
+--   Map of key-value pairs with values interpreted as numbers, or `0` if not found.
 -- @field style_at (table, Read-only)
 --   Table of style names at positions in the buffer starting from 1.
+-- @field folding (boolean)
+--   Whether or not folding is enabled for the lexers that support it.
+--   This option is disabled by default.
+--   This is an alias for `lexer.property['fold'] = '1|0'`.
+-- @field fold_on_zero_sum_lines (boolean)
+--   Whether or not to mark as a fold point lines that contain both an ending and starting fold
+--   point. For example, `} else {` would be marked as a fold point.
+--   This option is disabled by default. This is an alias for
+--   `lexer.property['fold.on.zero.sum.lines'] = '1|0'`.
+-- @field fold_compact (boolean)
+--   Whether or not blank lines after an ending fold point are included in that
+--   fold.
+--   This option is disabled by default.
+--   This is an alias for `lexer.property['fold.compact'] = '1|0'`.
+-- @field fold_by_indentation (boolean)
+--   Whether or not to fold based on indentation level if a lexer does not have
+--   a folder.
+--   Some lexers automatically enable this option. It is disabled by default.
+--   This is an alias for `lexer.property['fold.by.indentation'] = '1|0'`.
+-- @field fold_line_groups (boolean)
+--   Whether or not to fold multiple, consecutive line groups (such as line comments and import
+--   statements) and only show the top line.
+--   This option is disabled by default.
+--   This is an alias for `lexer.property['fold.line.groups'] = '1|0'`.
 module('lexer')]=]
 
+if not require then
+  -- Substitute for Lua's require() function, which does not require the package module to
+  -- be loaded.
+  -- Note: all modules must be in the global namespace, which is the case in LexerLPeg's default
+  -- Lua State.
+  function require(name) return name == 'lexer' and M or _G[name] end
+end
+
+local print = function(...)
+  local args = table.pack(...)
+  local msg = {}
+  for i = 1, args.n do
+    msg[#msg + 1] = tostring(args[i])
+  end
+  vis:info(table.concat(msg, ' '))
+end
+
 lpeg = require('lpeg')
 local lpeg_P, lpeg_R, lpeg_S, lpeg_V = lpeg.P, lpeg.R, lpeg.S, lpeg.V
 local lpeg_Ct, lpeg_Cc, lpeg_Cp = lpeg.Ct, lpeg.Cc, lpeg.Cp
-local lpeg_Cmt, lpeg_C, lpeg_Carg = lpeg.Cmt, lpeg.C, lpeg.Carg
+local lpeg_Cmt, lpeg_C = lpeg.Cmt, lpeg.C
 local lpeg_match = lpeg.match
 
-M.LEXERPATH = package.path
+-- Searches for the given *name* in the given *path*.
+-- This is a safe implementation of Lua 5.2's `package.searchpath()` function that does not
+-- require the package module to be loaded.
+local function searchpath(name, path)
+  local tried = {}
+  for part in path:gmatch('[^;]+') do
+    local filename = part:gsub('%?', name)
+    local ok, errmsg = loadfile(filename)
+    if ok or not errmsg:find('cannot open') then return filename end
+    tried[#tried + 1] = string.format("no file '%s'", filename)
+  end
+  return nil, table.concat(tried, '\n')
+end
 
--- Table of loaded lexers.
-M.lexers = {}
+---
+-- Map of color name strings to color values in `0xBBGGRR` or `"#RRGGBB"` format.
+-- Note: for applications running within a terminal emulator, only 16 color values are recognized,
+-- regardless of how many colors a user's terminal actually supports. (A terminal emulator's
+-- settings determines how to actually display these recognized color values, which may end up
+-- being mapped to a completely different color set.) In order to use the light variant of a
+-- color, some terminals require a style's `bold` attribute must be set along with that normal
+-- color. Recognized color values are black (0x000000), red (0x000080), green (0x008000), yellow
+-- (0x008080), blue (0x800000), magenta (0x800080), cyan (0x808000), white (0xC0C0C0), light black
+-- (0x404040), light red (0x0000FF), light green (0x00FF00), light yellow (0x00FFFF), light blue
+-- (0xFF0000), light magenta (0xFF00FF), light cyan (0xFFFF00), and light white (0xFFFFFF).
+-- @name colors
+-- @class table
+M.colors = setmetatable({}, {
+  __index = function(_, name)
+    local color = M.property['color.' .. name]
+    return tonumber(color) or color
+  end, __newindex = function(_, name, color) M.property['color.' .. name] = color end
+})
 
--- Keep track of the last parent lexer loaded. This lexer's rules are used for
--- proxy lexers (those that load parent and child lexers to embed) that do not
--- declare a parent lexer.
-local parent_lexer
+-- A style object that distills into a property string that can be read by the LPeg lexer.
+local style_obj = {}
+style_obj.__index = style_obj
 
-if not package.searchpath then
-  -- Searches for the given *name* in the given *path*.
-  -- This is an implementation of Lua 5.2's `package.searchpath()` function for
-  -- Lua 5.1.
-  function package.searchpath(name, path)
-    local tried = {}
-    for part in path:gmatch('[^;]+') do
-      local filename = part:gsub('%?', name)
-      local f = io.open(filename, 'r')
-      if f then f:close() return filename end
-      tried[#tried + 1] = ("no file '%s'"):format(filename)
+-- Create a style object from a style name, property table, or legacy style string.
+function style_obj.new(name_or_props)
+  local prop_string = tostring(name_or_props)
+  if type(name_or_props) == 'string' and name_or_props:find('^[%w_]+$') then
+    prop_string = string.format('$(style.%s)', name_or_props)
+  elseif type(name_or_props) == 'table' then
+    local settings = {}
+    for k, v in pairs(name_or_props) do
+      settings[#settings + 1] = type(v) ~= 'boolean' and string.format('%s:%s', k, v) or
+        string.format('%s%s', v and '' or 'not', k)
     end
-    return nil, table.concat(tried, '\n')
+    prop_string = table.concat(settings, ',')
   end
+  return setmetatable({prop_string = prop_string}, style_obj)
 end
 
--- Adds a rule to a lexer's current ordered list of rules.
+-- Returns a new style based on this one with the properties defined in the given table or
+-- legacy style string.
+function style_obj.__concat(self, props)
+  if type(props) == 'table' then props = tostring(style_obj.new(props)) end
+  return setmetatable({prop_string = string.format('%s,%s', self.prop_string, props)}, style_obj)
+end
+
+-- Returns this style object as property string for use with the LPeg lexer.
+function style_obj.__tostring(self) return self.prop_string end
+
+---
+-- Map of style names to style definition tables.
+--
+-- Style names consist of the following default names as well as the token names defined by lexers.
+--
+-- * `default`: The default style all others are based on.
+-- * `line_number`: The line number margin style.
+-- * `control_char`: The style of control character blocks.
+-- * `indent_guide`: The style of indentation guides.
+-- * `call_tip`: The style of call tip text. Only the `font`, `size`, `fore`, and `back` style
+--   definition fields are supported.
+-- * `fold_display_text`: The style of text displayed next to folded lines.
+-- * `class`, `comment`, `constant`, `embedded`, `error`, `function`, `identifier`, `keyword`,
+--   `label`, `number`, `operator`, `preprocessor`, `regex`, `string`, `type`, `variable`,
+--   `whitespace`: Some token names used by lexers. Some lexers may define more token names,
+--   so this list is not exhaustive.
+-- * *`lang`*`_whitespace`: A special style for whitespace tokens in lexer name *lang*. It
+--   inherits from `whitespace`, and is used in place of it for all lexers.
+--
+-- Style definition tables may contain the following fields:
+--
+-- * `font`: String font name.
+-- * `size`: Integer font size.
+-- * `bold`: Whether or not the font face is bold. The default value is `false`.
+-- * `weight`: Integer weight or boldness of a font, between 1 and 999.
+-- * `italics`: Whether or not the font face is italic. The default value is `false`.
+-- * `underlined`: Whether or not the font face is underlined. The default value is `false`.
+-- * `fore`: Font face foreground color in `0xBBGGRR` or `"#RRGGBB"` format.
+-- * `back`: Font face background color in `0xBBGGRR` or `"#RRGGBB"` format.
+-- * `eolfilled`: Whether or not the background color extends to the end of the line. The
+--   default value is `false`.
+-- * `case`: Font case: `'u'` for upper, `'l'` for lower, and `'m'` for normal, mixed case. The
+--   default value is `'m'`.
+-- * `visible`: Whether or not the text is visible. The default value is `true`.
+-- * `changeable`: Whether the text is changeable instead of read-only. The default value is
+--   `true`.
+-- @class table
+-- @name styles
+M.styles = setmetatable({}, {
+  __index = function(_, name) return style_obj.new(name) end, __newindex = function(_, name, style)
+    if getmetatable(style) ~= style_obj then style = style_obj.new(style) end
+    M.property['style.' .. name] = tostring(style)
+  end
+})
+
+-- Default styles.
+local default = {
+  'nothing', 'whitespace', 'comment', 'string', 'number', 'keyword', 'identifier', 'operator',
+  'error', 'preprocessor', 'constant', 'variable', 'function', 'class', 'type', 'label', 'regex',
+  'embedded'
+}
+for _, name in ipairs(default) do
+  M[name:upper()] = name
+  M['STYLE_' .. name:upper()] = style_obj.new(name) -- backward compatibility
+end
+-- Predefined styles.
+local predefined = {
+  'default', 'line_number', 'brace_light', 'brace_bad', 'control_char', 'indent_guide', 'call_tip',
+  'fold_display_text'
+}
+for _, name in ipairs(predefined) do
+  M[name:upper()] = name
+  M['STYLE_' .. name:upper()] = style_obj.new(name) -- backward compatibility
+end
+
+---
+-- Adds pattern *rule* identified by string *id* to the ordered list of rules for lexer *lexer*.
 -- @param lexer The lexer to add the given rule to.
--- @param name The name associated with this rule. It is used for other lexers
---   to access this particular rule from the lexer's `_RULES` table. It does not
---   have to be the same as the name passed to `token`.
+-- @param id The id associated with this rule. It does not have to be the same as the name
+--   passed to `token()`.
 -- @param rule The LPeg pattern of the rule.
-local function add_rule(lexer, id, rule)
+-- @see modify_rule
+-- @name add_rule
+function M.add_rule(lexer, id, rule)
+  if lexer._lexer then lexer = lexer._lexer end -- proxy; get true parent
   if not lexer._RULES then
     lexer._RULES = {}
-    -- Contains an ordered list (by numerical index) of rule names. This is used
-    -- in conjunction with lexer._RULES for building _TOKENRULE.
+    -- Contains an ordered list (by numerical index) of rule names. This is used in conjunction
+    -- with lexer._RULES for building _TOKENRULE.
     lexer._RULEORDER = {}
   end
   lexer._RULES[id] = rule
   lexer._RULEORDER[#lexer._RULEORDER + 1] = id
+  lexer:build_grammar()
 end
 
--- Adds a new Scintilla style to Scintilla.
--- @param lexer The lexer to add the given style to.
--- @param token_name The name of the token associated with this style.
--- @param style A Scintilla style created from `style()`.
--- @see style
-local function add_style(lexer, token_name, style)
+---
+-- Replaces in lexer *lexer* the existing rule identified by string *id* with pattern *rule*.
+-- @param lexer The lexer to modify.
+-- @param id The id associated with this rule.
+-- @param rule The LPeg pattern of the rule.
+-- @name modify_rule
+function M.modify_rule(lexer, id, rule)
+  if lexer._lexer then lexer = lexer._lexer end -- proxy; get true parent
+  lexer._RULES[id] = rule
+  lexer:build_grammar()
+end
+
+---
+-- Returns the rule identified by string *id*.
+-- @param lexer The lexer to fetch a rule from.
+-- @param id The id of the rule to fetch.
+-- @return pattern
+-- @name get_rule
+function M.get_rule(lexer, id)
+  if lexer._lexer then lexer = lexer._lexer end -- proxy; get true parent
+  return lexer._RULES[id]
+end
+
+---
+-- Associates string *token_name* in lexer *lexer* with style table *style*.
+-- *style* may have the following fields:
+--
+-- * `font`: String font name.
+-- * `size`: Integer font size.
+-- * `bold`: Whether or not the font face is bold. The default value is `false`.
+-- * `weight`: Integer weight or boldness of a font, between 1 and 999.
+-- * `italics`: Whether or not the font face is italic. The default value is `false`.
+-- * `underlined`: Whether or not the font face is underlined. The default value is `false`.
+-- * `fore`: Font face foreground color in `0xBBGGRR` or `"#RRGGBB"` format.
+-- * `back`: Font face background color in `0xBBGGRR` or `"#RRGGBB"` format.
+-- * `eolfilled`: Whether or not the background color extends to the end of the line. The
+--   default value is `false`.
+-- * `case`: Font case, `'u'` for upper, `'l'` for lower, and `'m'` for normal, mixed case. The
+--   default value is `'m'`.
+-- * `visible`: Whether or not the text is visible. The default value is `true`.
+-- * `changeable`: Whether the text is changeable instead of read-only. The default value is
+--   `true`.
+--
+-- Field values may also contain "$(property.name)" expansions for properties defined in Scintilla,
+-- theme files, etc.
+-- @param lexer The lexer to add a style to.
+-- @param token_name The name of the token to associated with the style.
+-- @param style A style string for Scintilla.
+-- @usage lex:add_style('longstring', lexer.styles.string)
+-- @usage lex:add_style('deprecated_func', lexer.styles['function'] .. {italics = true}
+-- @usage lex:add_style('visible_ws', lexer.styles.whitespace .. {back = lexer.colors.grey}
+-- @name add_style
+function M.add_style(lexer, token_name, style)
   local num_styles = lexer._numstyles
-  if num_styles == 32 then num_styles = num_styles + 8 end -- skip predefined
-  if num_styles >= 255 then print('Too many styles defined (255 MAX)') end
+  if num_styles == 33 then num_styles = num_styles + 8 end -- skip predefined
+  if num_styles >= 256 then print('Too many styles defined (256 MAX)') end
   lexer._TOKENSTYLES[token_name], lexer._numstyles = num_styles, num_styles + 1
-  lexer._EXTRASTYLES[token_name] = style
+  if type(style) == 'table' and not getmetatable(style) then style = style_obj.new(style) end
+  lexer._EXTRASTYLES[token_name] = tostring(style)
+  -- If the lexer is a proxy or a child that embedded itself, copy this style to the parent lexer.
+  if lexer._lexer then lexer._lexer:add_style(token_name, style) end
+end
+
+---
+-- Adds to lexer *lexer* a fold point whose beginning and end tokens are string *token_name*
+-- tokens with string content *start_symbol* and *end_symbol*, respectively.
+-- In the event that *start_symbol* may or may not be a fold point depending on context, and that
+-- additional processing is required, *end_symbol* may be a function that ultimately returns
+-- `1` (indicating a beginning fold point), `-1` (indicating an ending fold point), or `0`
+-- (indicating no fold point). That function is passed the following arguments:
+--
+--   * `text`: The text being processed for fold points.
+--   * `pos`: The position in *text* of the beginning of the line currently being processed.
+--   * `line`: The text of the line currently being processed.
+--   * `s`: The position of *start_symbol* in *line*.
+--   * `symbol`: *start_symbol* itself.
+-- @param lexer The lexer to add a fold point to.
+-- @param token_name The token name of text that indicates a fold point.
+-- @param start_symbol The text that indicates the beginning of a fold point.
+-- @param end_symbol Either the text that indicates the end of a fold point, or a function that
+--   returns whether or not *start_symbol* is a beginning fold point (1), an ending fold point
+--   (-1), or not a fold point at all (0).
+-- @usage lex:add_fold_point(lexer.OPERATOR, '{', '}')
+-- @usage lex:add_fold_point(lexer.KEYWORD, 'if', 'end')
+-- @usage lex:add_fold_point(lexer.COMMENT, lexer.fold_consecutive_lines('#'))
+-- @usage lex:add_fold_point('custom', function(text, pos, line, s, symbol) ... end)
+-- @name add_fold_point
+function M.add_fold_point(lexer, token_name, start_symbol, end_symbol)
+  if not lexer._FOLDPOINTS then lexer._FOLDPOINTS = {_SYMBOLS = {}} end
+  local symbols = lexer._FOLDPOINTS._SYMBOLS
+  if not lexer._FOLDPOINTS[token_name] then lexer._FOLDPOINTS[token_name] = {} end
+  if lexer._CASEINSENSITIVEFOLDPOINTS then
+    start_symbol = start_symbol:lower()
+    if type(end_symbol) == 'string' then end_symbol = end_symbol:lower() end
+  end
+  if type(end_symbol) == 'string' then
+    if not symbols[end_symbol] then symbols[#symbols + 1], symbols[end_symbol] = end_symbol, true end
+    lexer._FOLDPOINTS[token_name][start_symbol] = 1
+    lexer._FOLDPOINTS[token_name][end_symbol] = -1
+  else
+    lexer._FOLDPOINTS[token_name][start_symbol] = end_symbol -- function or int
+  end
+  if not symbols[start_symbol] then
+    symbols[#symbols + 1], symbols[start_symbol] = start_symbol, true
+  end
+  -- If the lexer is a proxy or a child that embedded itself, copy this fold point to the
+  -- parent lexer.
+  if lexer._lexer then lexer._lexer:add_fold_point(token_name, start_symbol, end_symbol) end
 end
 
 -- (Re)constructs `lexer._TOKENRULE`.
--- @param parent The parent lexer.
 local function join_tokens(lexer)
   local patterns, order = lexer._RULES, lexer._RULEORDER
   local token_rule = patterns[order[1]]
@@ -950,218 +1096,118 @@ local function join_tokens(lexer)
   return lexer._TOKENRULE
 end
 
--- Adds a given lexer and any of its embedded lexers to a given grammar.
--- @param grammar The grammar to add the lexer to.
--- @param lexer The lexer to add.
-local function add_lexer(grammar, lexer, token_rule)
-  local token_rule = join_tokens(lexer)
-  local lexer_name = lexer._NAME
-  for i = 1, #lexer._CHILDREN do
-    local child = lexer._CHILDREN[i]
-    if child._CHILDREN then add_lexer(grammar, child) end
-    local child_name = child._NAME
-    local rules = child._EMBEDDEDRULES[lexer_name]
-    local rules_token_rule = grammar['__'..child_name] or rules.token_rule
-    grammar[child_name] = (-rules.end_rule * rules_token_rule)^0 *
-                          rules.end_rule^-1 * lpeg_V(lexer_name)
-    local embedded_child = '_'..child_name
-    grammar[embedded_child] = rules.start_rule * (-rules.end_rule *
-                              rules_token_rule)^0 * rules.end_rule^-1
-    token_rule = lpeg_V(embedded_child) + token_rule
-  end
-  grammar['__'..lexer_name] = token_rule -- can contain embedded lexer rules
-  grammar[lexer_name] = token_rule^0
-end
+-- Metatable for Scintillua grammars.
+-- These grammars are just tables ultimately passed to `lpeg.P()`.
+local grammar_mt = {
+  __index = {
+    -- Adds lexer *lexer* and any of its embedded lexers to this grammar.
+    -- @param lexer The lexer to add.
+    add_lexer = function(self, lexer)
+      local lexer_name = lexer._PARENTNAME or lexer._NAME
+      local token_rule = lexer:join_tokens()
+      for _, child in ipairs(lexer._CHILDREN) do
+        if child._CHILDREN then self:add_lexer(child) end
+        local rules = child._EMBEDDEDRULES[lexer_name]
+        local rules_token_rule = self['__' .. child._NAME] or rules.token_rule
+        self[child._NAME] = (-rules.end_rule * rules_token_rule)^0 * rules.end_rule^-1 *
+          lpeg_V(lexer_name)
+        local embedded_child = '_' .. child._NAME
+        self[embedded_child] = rules.start_rule * (-rules.end_rule * rules_token_rule)^0 *
+          rules.end_rule^-1
+        token_rule = lpeg_V(embedded_child) + token_rule
+      end
+      self['__' .. lexer_name] = token_rule -- can contain embedded lexer rules
+      self[lexer_name] = token_rule^0
+    end
+  }
+}
 
 -- (Re)constructs `lexer._GRAMMAR`.
--- @param lexer The parent lexer.
--- @param initial_rule The name of the rule to start lexing with. The default
---   value is `lexer._NAME`. Multilang lexers use this to start with a child
---   rule if necessary.
+-- @param initial_rule The name of the rule to start lexing with. The default value is
+--   `lexer._NAME`. Multilang lexers use this to start with a child rule if necessary.
 local function build_grammar(lexer, initial_rule)
-  local children = lexer._CHILDREN
-  if children then
-    local lexer_name = lexer._NAME
-    if not initial_rule then initial_rule = lexer_name end
-    local grammar = {initial_rule}
-    add_lexer(grammar, lexer)
+  if not lexer._RULES then return end
+  if lexer._CHILDREN then
+    if not initial_rule then initial_rule = lexer._NAME end
+    local grammar = setmetatable({initial_rule}, grammar_mt)
+    grammar:add_lexer(lexer)
     lexer._INITIALRULE = initial_rule
     lexer._GRAMMAR = lpeg_Ct(lpeg_P(grammar))
   else
-    local function tmout(_, _, t1, redrawtime_max, flag)
-	    if not redrawtime_max or os.clock() - t1 < redrawtime_max then return true end
-	    if flag then flag.timedout = true end
-    end
-    local tokens = join_tokens(lexer)
-    -- every 500 tokens (approx. a screenful), check whether we have exceeded the timeout
-    lexer._GRAMMAR = lpeg_Ct((tokens * tokens^-500 * lpeg_Cmt(lpeg_Carg(1) * lpeg_Carg(2) * lpeg_Carg(3), tmout))^0)
-  end
-end
-
-local string_upper = string.upper
--- Default styles.
-local default = {
-  'nothing', 'whitespace', 'comment', 'string', 'number', 'keyword',
-  'identifier', 'operator', 'error', 'preprocessor', 'constant', 'variable',
-  'function', 'class', 'type', 'label', 'regex', 'embedded'
-}
-for i = 1, #default do
-  local name, upper_name = default[i], string_upper(default[i])
-  M[upper_name] = name
-  if not M['STYLE_'..upper_name] then
-    M['STYLE_'..upper_name] = ''
-  end
-end
--- Predefined styles.
-local predefined = {
-  'default', 'linenumber', 'bracelight', 'bracebad', 'controlchar',
-  'indentguide', 'calltip', 'folddisplaytext'
-}
-for i = 1, #predefined do
-  local name, upper_name = predefined[i], string_upper(predefined[i])
-  M[upper_name] = name
-  if not M['STYLE_'..upper_name] then
-    M['STYLE_'..upper_name] = ''
+    lexer._GRAMMAR = lpeg_Ct(lexer:join_tokens()^0)
   end
 end
 
 ---
--- Initializes or loads and returns the lexer of string name *name*.
--- Scintilla calls this function in order to load a lexer. Parent lexers also
--- call this function in order to load child lexers and vice-versa. The user
--- calls this function in order to load a lexer when using Scintillua as a Lua
--- library.
--- @param name The name of the lexing language.
--- @param alt_name The alternate name of the lexing language. This is useful for
---   embedding the same child lexer with multiple sets of start and end tokens.
--- @param cache Flag indicating whether or not to load lexers from the cache.
---   This should only be `true` when initially loading a lexer (e.g. not from
---   within another lexer for embedding purposes).
---   The default value is `false`.
--- @return lexer object
--- @name load
-function M.load(name, alt_name, cache)
-  if cache and M.lexers[alt_name or name] then return M.lexers[alt_name or name] end
-  parent_lexer = nil -- reset
-
-  -- When using Scintillua as a stand-alone module, the `property` and
-  -- `property_int` tables do not exist (they are not useful). Create them to
-  -- prevent errors from occurring.
-  if not M.property then
-    M.property, M.property_int = {}, setmetatable({}, {
-      __index = function(t, k) return tonumber(M.property[k]) or 0 end,
-      __newindex = function() error('read-only property') end
-    })
-  end
-
-  -- Load the language lexer with its rules, styles, etc.
-  M.WHITESPACE = (alt_name or name)..'_whitespace'
-  local lexer_file, error = package.searchpath('lexers/'..name, M.LEXERPATH)
-  local ok, lexer = pcall(dofile, lexer_file or '')
-  if not ok then
-    return nil
-  end
-  if alt_name then lexer._NAME = alt_name end
-
-  -- Create the initial maps for token names to style numbers and styles.
-  local token_styles = {}
-  for i = 1, #default do token_styles[default[i]] = i - 1 end
-  for i = 1, #predefined do token_styles[predefined[i]] = i + 31 end
-  lexer._TOKENSTYLES, lexer._numstyles = token_styles, #default
-  lexer._EXTRASTYLES = {}
-
-  -- If the lexer is a proxy (loads parent and child lexers to embed) and does
-  -- not declare a parent, try and find one and use its rules.
-  if not lexer._rules and not lexer._lexer then lexer._lexer = parent_lexer end
-
-  -- If the lexer is a proxy or a child that embedded itself, add its rules and
-  -- styles to the parent lexer. Then set the parent to be the main lexer.
-  if lexer._lexer then
-    local l, _r, _s = lexer._lexer, lexer._rules, lexer._tokenstyles
-    if not l._tokenstyles then l._tokenstyles = {} end
-    if _r then
-      for i = 1, #_r do
-        -- Prevent rule id clashes.
-        l._rules[#l._rules + 1] = {lexer._NAME..'_'.._r[i][1], _r[i][2]}
-      end
-    end
-    if _s then
-      for token, style in pairs(_s) do l._tokenstyles[token] = style end
-    end
-    lexer = l
-  end
-
-  -- Add the lexer's styles and build its grammar.
-  if lexer._rules then
-    if lexer._tokenstyles then
-      for token, style in pairs(lexer._tokenstyles) do
-        add_style(lexer, token, style)
+-- Embeds child lexer *child* in parent lexer *lexer* using patterns *start_rule* and *end_rule*,
+-- which signal the beginning and end of the embedded lexer, respectively.
+-- @param lexer The parent lexer.
+-- @param child The child lexer.
+-- @param start_rule The pattern that signals the beginning of the embedded lexer.
+-- @param end_rule The pattern that signals the end of the embedded lexer.
+-- @usage html:embed(css, css_start_rule, css_end_rule)
+-- @usage html:embed(lex, php_start_rule, php_end_rule) -- from php lexer
+-- @name embed
+function M.embed(lexer, child, start_rule, end_rule)
+  if lexer._lexer then lexer = lexer._lexer end -- proxy; get true parent
+  -- Add child rules.
+  if not child._EMBEDDEDRULES then child._EMBEDDEDRULES = {} end
+  if not child._RULES then error('Cannot embed lexer with no rules') end
+  child._EMBEDDEDRULES[lexer._NAME] = {
+    start_rule = start_rule, token_rule = child:join_tokens(), end_rule = end_rule
+  }
+  if not lexer._CHILDREN then lexer._CHILDREN = {} end
+  local children = lexer._CHILDREN
+  children[#children + 1] = child
+  -- Add child styles.
+  for token, style in pairs(child._EXTRASTYLES) do lexer:add_style(token, style) end
+  -- Add child fold symbols.
+  if child._FOLDPOINTS then
+    for token_name, symbols in pairs(child._FOLDPOINTS) do
+      if token_name ~= '_SYMBOLS' then
+        for symbol, v in pairs(symbols) do lexer:add_fold_point(token_name, symbol, v) end
       end
     end
-    for i = 1, #lexer._rules do
-      add_rule(lexer, lexer._rules[i][1], lexer._rules[i][2])
-    end
-    build_grammar(lexer)
-  end
-  -- Add the lexer's unique whitespace style.
-  add_style(lexer, lexer._NAME..'_whitespace', M.STYLE_WHITESPACE)
-
-  -- Process the lexer's fold symbols.
-  if lexer._foldsymbols and lexer._foldsymbols._patterns then
-    local patterns = lexer._foldsymbols._patterns
-    for i = 1, #patterns do patterns[i] = '()('..patterns[i]..')' end
   end
-
-  lexer.lex, lexer.fold = M.lex, M.fold
-  M.lexers[alt_name or name] = lexer
-  return lexer
+  lexer:build_grammar()
+  child._lexer = lexer -- use parent's tokens if child is embedding itself
 end
 
 ---
--- Lexes a chunk of text *text* (that has an initial style number of
--- *init_style*) with lexer *lexer*.
--- If *lexer* has a `_LEXBYLINE` flag set, the text is lexed one line at a time.
--- Otherwise the text is lexed as a whole.
--- @param lexer The lexer object to lex with.
+-- Lexes a chunk of text *text* (that has an initial style number of *init_style*) using lexer
+-- *lexer*, returning a table of token names and positions.
+-- @param lexer The lexer to lex text with.
 -- @param text The text in the buffer to lex.
--- @param init_style The current style. Multiple-language lexers use this to
---   determine which language to start lexing in.
--- @param redrawtime_max Stop lexing after that many seconds and set the second return value (timedout) to true.
--- @param init Start lexing from this offset in *text* (default is 1).
+-- @param init_style The current style. Multiple-language lexers use this to determine which
+--   language to start lexing in.
 -- @return table of token names and positions.
--- @return whether the lexing timed out.
 -- @name lex
-function M.lex(lexer, text, init_style, redrawtime_max, init)
+function M.lex(lexer, text, init_style)
   if not lexer._GRAMMAR then return {M.DEFAULT, #text + 1} end
   if not lexer._LEXBYLINE then
-    -- For multilang lexers, build a new grammar whose initial_rule is the
-    -- current language.
+    -- For multilang lexers, build a new grammar whose initial_rule is the current language.
     if lexer._CHILDREN then
       for style, style_num in pairs(lexer._TOKENSTYLES) do
         if style_num == init_style then
-          local lexer_name = style:match('^(.+)_whitespace') or lexer._NAME
-          if lexer._INITIALRULE ~= lexer_name then
-            build_grammar(lexer, lexer_name)
-          end
+          local lexer_name = style:match('^(.+)_whitespace') or lexer._PARENTNAME or lexer._NAME
+          if lexer._INITIALRULE ~= lexer_name then lexer:build_grammar(lexer_name) end
           break
         end
       end
     end
-    local flag = {}
-    return lpeg_match(lexer._GRAMMAR, text, init, os.clock(), redrawtime_max, flag), flag.timedout
+    return lpeg_match(lexer._GRAMMAR, text)
   else
-    local tokens = {}
     local function append(tokens, line_tokens, offset)
       for i = 1, #line_tokens, 2 do
         tokens[#tokens + 1] = line_tokens[i]
         tokens[#tokens + 1] = line_tokens[i + 1] + offset
       end
     end
+    local tokens = {}
     local offset = 0
     local grammar = lexer._GRAMMAR
-    local flag = {}
     for line in text:gmatch('[^\r\n]*\r?\n?') do
-      local line_tokens = lpeg_match(grammar, line, init, os.clock(), redrawtime_max, flag)
+      local line_tokens = lpeg_match(grammar, line)
       if line_tokens then append(tokens, line_tokens, offset) end
       offset = offset + #line
       -- Use the default style to the end of the line if none was specified.
@@ -1169,75 +1215,90 @@ function M.lex(lexer, text, init_style, redrawtime_max, init)
         tokens[#tokens + 1], tokens[#tokens + 2] = 'default', offset + 1
       end
     end
-    return tokens, flag.timedout
+    return tokens
   end
 end
 
 ---
--- Determines fold points in a chunk of text *text* with lexer *lexer*.
--- *text* starts at position *start_pos* on line number *start_line* with a
--- beginning fold level of *start_level* in the buffer. If *lexer* has a `_fold`
--- function or a `_foldsymbols` table, that field is used to perform folding.
--- Otherwise, if *lexer* has a `_FOLDBYINDENTATION` field set, or if a
--- `fold.by.indentation` property is set, folding by indentation is done.
--- @param lexer The lexer object to fold with.
+-- Determines fold points in a chunk of text *text* using lexer *lexer*, returning a table of
+-- fold levels associated with line numbers.
+-- *text* starts at position *start_pos* on line number *start_line* with a beginning fold
+-- level of *start_level* in the buffer.
+-- @param lexer The lexer to fold text with.
 -- @param text The text in the buffer to fold.
--- @param start_pos The position in the buffer *text* starts at, starting at
---   zero.
--- @param start_line The line number *text* starts on.
+-- @param start_pos The position in the buffer *text* starts at, counting from 1.
+-- @param start_line The line number *text* starts on, counting from 1.
 -- @param start_level The fold level *text* starts on.
--- @return table of fold levels.
+-- @return table of fold levels associated with line numbers.
 -- @name fold
 function M.fold(lexer, text, start_pos, start_line, start_level)
   local folds = {}
   if text == '' then return folds end
   local fold = M.property_int['fold'] > 0
   local FOLD_BASE = M.FOLD_BASE
-  local FOLD_HEADER, FOLD_BLANK  = M.FOLD_HEADER, M.FOLD_BLANK
-  if fold and lexer._fold then
-    return lexer._fold(text, start_pos, start_line, start_level)
-  elseif fold and lexer._foldsymbols then
+  local FOLD_HEADER, FOLD_BLANK = M.FOLD_HEADER, M.FOLD_BLANK
+  if fold and lexer._FOLDPOINTS then
     local lines = {}
-    for p, l in (text..'\n'):gmatch('()(.-)\r?\n') do
-      lines[#lines + 1] = {p, l}
-    end
+    for p, l in (text .. '\n'):gmatch('()(.-)\r?\n') do lines[#lines + 1] = {p, l} end
     local fold_zero_sum_lines = M.property_int['fold.on.zero.sum.lines'] > 0
-    local fold_symbols = lexer._foldsymbols
-    local fold_symbols_patterns = fold_symbols._patterns
-    local fold_symbols_case_insensitive = fold_symbols._case_insensitive
+    local fold_compact = M.property_int['fold.compact'] > 0
+    local fold_points = lexer._FOLDPOINTS
+    local fold_point_symbols = fold_points._SYMBOLS
     local style_at, fold_level = M.style_at, M.fold_level
     local line_num, prev_level = start_line, start_level
     local current_level = prev_level
-    for i = 1, #lines do
-      local pos, line = lines[i][1], lines[i][2]
+    for _, captures in ipairs(lines) do
+      local pos, line = captures[1], captures[2]
       if line ~= '' then
-        if fold_symbols_case_insensitive then line = line:lower() end
+        if lexer._CASEINSENSITIVEFOLDPOINTS then line = line:lower() end
+        local ranges = {}
+        local function is_valid_range(s, e)
+          if not s or not e then return false end
+          for i = 1, #ranges - 1, 2 do
+            local range_s, range_e = ranges[i], ranges[i + 1]
+            if s >= range_s and s <= range_e or e >= range_s and e <= range_e then
+              return false
+            end
+          end
+          ranges[#ranges + 1] = s
+          ranges[#ranges + 1] = e
+          return true
+        end
         local level_decreased = false
-        for j = 1, #fold_symbols_patterns do
-          for s, match in line:gmatch(fold_symbols_patterns[j]) do
-            local symbols = fold_symbols[style_at[start_pos + pos + s - 1]]
-            local l = symbols and symbols[match]
-            if type(l) == 'function' then l = l(text, pos, line, s, match) end
-            if type(l) == 'number' then
-              current_level = current_level + l
-              if l < 0 and current_level < prev_level then
-                -- Potential zero-sum line. If the level were to go back up on
-                -- the same line, the line may be marked as a fold header.
-                level_decreased = true
+        for _, symbol in ipairs(fold_point_symbols) do
+          local word = not symbol:find('[^%w_]')
+          local s, e = line:find(symbol, 1, true)
+          while is_valid_range(s, e) do
+            -- if not word or line:find('^%f[%w_]' .. symbol .. '%f[^%w_]', s) then
+            local word_before = s > 1 and line:find('^[%w_]', s - 1)
+            local word_after = line:find('^[%w_]', e + 1)
+            if not word or not (word_before or word_after) then
+              local symbols = fold_points[style_at[start_pos + pos - 1 + s - 1]]
+              local level = symbols and symbols[symbol]
+              if type(level) == 'function' then
+                level = level(text, pos, line, s, symbol)
+              end
+              if type(level) == 'number' then
+                current_level = current_level + level
+                if level < 0 and current_level < prev_level then
+                  -- Potential zero-sum line. If the level were to go back up on the same line,
+                  -- the line may be marked as a fold header.
+                  level_decreased = true
+                end
               end
             end
+            s, e = line:find(symbol, s + 1, true)
           end
         end
         folds[line_num] = prev_level
         if current_level > prev_level then
           folds[line_num] = prev_level + FOLD_HEADER
-        elseif level_decreased and current_level == prev_level and
-               fold_zero_sum_lines then
+        elseif level_decreased and current_level == prev_level and fold_zero_sum_lines then
           if line_num > start_line then
             folds[line_num] = prev_level - 1 + FOLD_HEADER
           else
             -- Typing within a zero-sum line.
-            local level = fold_level[line_num - 1] - 1
+            local level = fold_level[line_num] - 1
             if level > FOLD_HEADER then level = level - FOLD_HEADER end
             if level > FOLD_BLANK then level = level - FOLD_BLANK end
             folds[line_num] = level + FOLD_HEADER
@@ -1247,33 +1308,29 @@ function M.fold(lexer, text, start_pos, start_line, start_level)
         if current_level < FOLD_BASE then current_level = FOLD_BASE end
         prev_level = current_level
       else
-        folds[line_num] = prev_level + FOLD_BLANK
+        folds[line_num] = prev_level + (fold_compact and FOLD_BLANK or 0)
       end
       line_num = line_num + 1
     end
-  elseif fold and (lexer._FOLDBYINDENTATION or
-                   M.property_int['fold.by.indentation'] > 0) then
+  elseif fold and (lexer._FOLDBYINDENTATION or M.property_int['fold.by.indentation'] > 0) then
     -- Indentation based folding.
     -- Calculate indentation per line.
     local indentation = {}
-    for indent, line in (text..'\n'):gmatch('([\t ]*)([^\r\n]*)\r?\n') do
+    for indent, line in (text .. '\n'):gmatch('([\t ]*)([^\r\n]*)\r?\n') do
       indentation[#indentation + 1] = line ~= '' and #indent
     end
-    -- Find the first non-blank line before start_line. If the current line is
-    -- indented, make that previous line a header and update the levels of any
-    -- blank lines inbetween. If the current line is blank, match the level of
-    -- the previous non-blank line.
+    -- Find the first non-blank line before start_line. If the current line is indented, make
+    -- that previous line a header and update the levels of any blank lines inbetween. If the
+    -- current line is blank, match the level of the previous non-blank line.
     local current_level = start_level
-    for i = start_line - 1, 0, -1 do
+    for i = start_line, 1, -1 do
       local level = M.fold_level[i]
       if level >= FOLD_HEADER then level = level - FOLD_HEADER end
       if level < FOLD_BLANK then
         local indent = M.indent_amount[i]
         if indentation[1] and indentation[1] > indent then
           folds[i] = FOLD_BASE + indent + FOLD_HEADER
-          for j = i + 1, start_line - 1 do
-            folds[j] = start_level + FOLD_BLANK
-          end
+          for j = i + 1, start_line - 1 do folds[j] = start_level + FOLD_BLANK end
         elseif not indentation[1] then
           current_level = FOLD_BASE + indent
         end
@@ -1309,91 +1366,295 @@ function M.fold(lexer, text, start_pos, start_line, start_level)
   return folds
 end
 
+---
+-- Creates a returns a new lexer with the given name.
+-- @param name The lexer's name.
+-- @param opts Table of lexer options. Options currently supported:
+--   * `lex_by_line`: Whether or not the lexer only processes whole lines of text (instead of
+--     arbitrary chunks of text) at a time. Line lexers cannot look ahead to subsequent lines.
+--     The default value is `false`.
+--   * `fold_by_indentation`: Whether or not the lexer does not define any fold points and that
+--     fold points should be calculated based on changes in line indentation. The default value
+--     is `false`.
+--   * `case_insensitive_fold_points`: Whether or not fold points added via
+--     `lexer.add_fold_point()` ignore case. The default value is `false`.
+--   * `inherit`: Lexer to inherit from. The default value is `nil`.
+-- @usage lexer.new('rhtml', {inherit = lexer.load('html')})
+-- @name new
+function M.new(name, opts)
+  local lexer = {
+    _NAME = assert(name, 'lexer name expected'), _LEXBYLINE = opts and opts['lex_by_line'],
+    _FOLDBYINDENTATION = opts and opts['fold_by_indentation'],
+    _CASEINSENSITIVEFOLDPOINTS = opts and opts['case_insensitive_fold_points'],
+    _lexer = opts and opts['inherit']
+  }
+
+  -- Create the initial maps for token names to style numbers and styles.
+  local token_styles = {}
+  for i = 1, #default do token_styles[default[i]] = i end
+  for i = 1, #predefined do token_styles[predefined[i]] = i + 32 end
+  lexer._TOKENSTYLES, lexer._numstyles = token_styles, #default + 1
+  lexer._EXTRASTYLES = {}
+
+  return setmetatable(lexer, {
+    __index = {
+      add_rule = M.add_rule, modify_rule = M.modify_rule, get_rule = M.get_rule,
+      add_style = M.add_style, add_fold_point = M.add_fold_point, join_tokens = join_tokens,
+      build_grammar = build_grammar, embed = M.embed, lex = M.lex, fold = M.fold
+    }
+  })
+end
+
+-- Legacy support for older lexers.
+-- Processes the `lex._rules`, `lex._tokenstyles`, and `lex._foldsymbols` tables. Since legacy
+-- lexers may be processed up to twice, ensure their default styles and rules are not processed
+-- more than once.
+local function process_legacy_lexer(lexer)
+  local function warn(msg) --[[io.stderr:write(msg, "\n")]]end
+  if not lexer._LEGACY then
+    lexer._LEGACY = true
+    warn("lexers as tables are deprecated; use 'lexer.new()'")
+    local token_styles = {}
+    for i = 1, #default do token_styles[default[i]] = i end
+    for i = 1, #predefined do token_styles[predefined[i]] = i + 32 end
+    lexer._TOKENSTYLES, lexer._numstyles = token_styles, #default + 1
+    lexer._EXTRASTYLES = {}
+    setmetatable(lexer, getmetatable(M.new('')))
+    if lexer._rules then
+      warn("lexer '_rules' table is deprecated; use 'add_rule()'")
+      for _, rule in ipairs(lexer._rules) do lexer:add_rule(rule[1], rule[2]) end
+    end
+  end
+  if lexer._tokenstyles then
+    warn("lexer '_tokenstyles' table is deprecated; use 'add_style()'")
+    for token, style in pairs(lexer._tokenstyles) do
+      -- If this legacy lexer is being processed a second time, only add styles added since
+      -- the first processing.
+      if not lexer._TOKENSTYLES[token] then lexer:add_style(token, style) end
+    end
+  end
+  if lexer._foldsymbols then
+    warn("lexer '_foldsymbols' table is deprecated; use 'add_fold_point()'")
+    for token_name, symbols in pairs(lexer._foldsymbols) do
+      if type(symbols) == 'table' and token_name ~= '_patterns' then
+        for symbol, v in pairs(symbols) do lexer:add_fold_point(token_name, symbol, v) end
+      end
+    end
+    if lexer._foldsymbols._case_insensitive then lexer._CASEINSENSITIVEFOLDPOINTS = true end
+  elseif lexer._fold then
+    lexer.fold = function(self, ...) return lexer._fold(...) end
+  end
+end
+
+local lexers = {} -- cache of loaded lexers
+---
+-- Initializes or loads and returns the lexer of string name *name*.
+-- Scintilla calls this function in order to load a lexer. Parent lexers also call this function
+-- in order to load child lexers and vice-versa. The user calls this function in order to load
+-- a lexer when using Scintillua as a Lua library.
+-- @param name The name of the lexing language.
+-- @param alt_name The alternate name of the lexing language. This is useful for embedding the
+--   same child lexer with multiple sets of start and end tokens.
+-- @param cache Flag indicating whether or not to load lexers from the cache. This should only
+--   be `true` when initially loading a lexer (e.g. not from within another lexer for embedding
+--   purposes). The default value is `false`.
+-- @return lexer object
+-- @name load
+function M.load(name, alt_name, cache)
+  if cache and lexers[alt_name or name] then return lexers[alt_name or name] end
+
+  -- When using Scintillua as a stand-alone module, the `property`, `property_int`, and
+  -- `property_expanded` tables do not exist (they are not useful). Create them in order prevent
+  -- errors from occurring.
+  if not M.property then
+    M.property = setmetatable({['lexer.lpeg.home'] = package.path:gsub('/%?%.lua', '')}, {
+      __index = function() return '' end,
+      __newindex = function(t, k, v) rawset(t, k, tostring(v)) end
+    })
+    M.property_int = setmetatable({}, {
+      __index = function(t, k) return tonumber(M.property[k]) or 0 end,
+      __newindex = function() error('read-only property') end
+    })
+    M.property_expanded = setmetatable({}, {
+      __index = function(t, key)
+        return M.property[key]:gsub('[$%%](%b())', function(key) return t[key:sub(2, -2)] end)
+      end, __newindex = function() error('read-only property') end
+    })
+  end
+
+  -- Load the language lexer with its rules, styles, etc.
+  -- However, replace the default `WHITESPACE` style name with a unique whitespace style name
+  -- (and then automatically add it afterwards), since embedded lexing relies on these unique
+  -- whitespace style names. Note that loading embedded lexers changes `WHITESPACE` again,
+  -- so when adding it later, do not reference the potentially incorrect value.
+  M.WHITESPACE = (alt_name or name) .. '_whitespace'
+  local path = M.property['lexer.lpeg.home']:gsub(';', '/?.lua;') .. '/?.lua'
+  local lexer = dofile(assert(searchpath('lexers/'..name, path)))
+  assert(lexer, string.format("'%s.lua' did not return a lexer", name))
+  if alt_name then lexer._NAME = alt_name end
+  if not getmetatable(lexer) or lexer._LEGACY then
+    -- A legacy lexer may need to be processed a second time in order to pick up any `_tokenstyles`
+    -- or `_foldsymbols` added after `lexer.embed_lexer()`.
+    process_legacy_lexer(lexer)
+    if lexer._lexer and lexer._lexer._LEGACY then
+      process_legacy_lexer(lexer._lexer) -- mainly for `_foldsymbols` edits
+    end
+  end
+  lexer:add_style((alt_name or name) .. '_whitespace', M.styles.whitespace)
+
+  -- If the lexer is a proxy or a child that embedded itself, set the parent to be the main
+  -- lexer. Keep a reference to the old parent name since embedded child rules reference and
+  -- use that name.
+  if lexer._lexer then
+    lexer = lexer._lexer
+    lexer._PARENTNAME, lexer._NAME = lexer._NAME, alt_name or name
+  end
+
+  if cache then lexers[alt_name or name] = lexer end
+  return lexer
+end
+
 -- The following are utility functions lexers will have access to.
 
 -- Common patterns.
 M.any = lpeg_P(1)
-M.ascii = lpeg_R('\000\127')
-M.extend = lpeg_R('\000\255')
 M.alpha = lpeg_R('AZ', 'az')
 M.digit = lpeg_R('09')
 M.alnum = lpeg_R('AZ', 'az', '09')
 M.lower = lpeg_R('az')
 M.upper = lpeg_R('AZ')
 M.xdigit = lpeg_R('09', 'AF', 'af')
-M.cntrl = lpeg_R('\000\031')
 M.graph = lpeg_R('!~')
-M.print = lpeg_R(' ~')
 M.punct = lpeg_R('!/', ':@', '[\'', '{~')
 M.space = lpeg_S('\t\v\f\n\r ')
 
-M.newline = lpeg_S('\r\n\f')^1
+M.newline = lpeg_P('\r')^-1 * '\n'
 M.nonnewline = 1 - M.newline
-M.nonnewline_esc = 1 - (M.newline + '\\') + '\\' * M.any
 
 M.dec_num = M.digit^1
 M.hex_num = '0' * lpeg_S('xX') * M.xdigit^1
 M.oct_num = '0' * lpeg_R('07')^1
 M.integer = lpeg_S('+-')^-1 * (M.hex_num + M.oct_num + M.dec_num)
 M.float = lpeg_S('+-')^-1 *
-          ((M.digit^0 * '.' * M.digit^1 + M.digit^1 * '.' * M.digit^0) *
-           (lpeg_S('eE') * lpeg_S('+-')^-1 * M.digit^1)^-1 +
-           (M.digit^1 * lpeg_S('eE') * lpeg_S('+-')^-1 * M.digit^1))
+  ((M.digit^0 * '.' * M.digit^1 + M.digit^1 * '.' * M.digit^0 * -lpeg_P('.')) *
+    (lpeg_S('eE') * lpeg_S('+-')^-1 * M.digit^1)^-1 +
+    (M.digit^1 * lpeg_S('eE') * lpeg_S('+-')^-1 * M.digit^1))
+M.number = M.float + M.integer
 
 M.word = (M.alpha + '_') * (M.alnum + '_')^0
 
+-- Deprecated.
+M.nonnewline_esc = 1 - (M.newline + '\\') + '\\' * M.any
+M.ascii = lpeg_R('\000\127')
+M.extend = lpeg_R('\000\255')
+M.cntrl = lpeg_R('\000\031')
+M.print = lpeg_R(' ~')
+
 ---
--- Creates and returns a token pattern with token name *name* and pattern
--- *patt*.
--- If *name* is not a predefined token name, its style must be defined in the
--- lexer's `_tokenstyles` table.
--- @param name The name of token. If this name is not a predefined token name,
---   then a style needs to be assiciated with it in the lexer's `_tokenstyles`
---   table.
+-- Creates and returns a token pattern with token name *name* and pattern *patt*.
+-- If *name* is not a predefined token name, its style must be defined via `lexer.add_style()`.
+-- @param name The name of token. If this name is not a predefined token name, then a style
+--   needs to be assiciated with it via `lexer.add_style()`.
 -- @param patt The LPeg pattern associated with the token.
 -- @return pattern
--- @usage local ws = token(l.WHITESPACE, l.space^1)
--- @usage local annotation = token('annotation', '@' * l.word)
+-- @usage local ws = token(lexer.WHITESPACE, lexer.space^1)
+-- @usage local annotation = token('annotation', '@' * lexer.word)
 -- @name token
 function M.token(name, patt)
   return lpeg_Cc(name) * patt * lpeg_Cp()
 end
 
 ---
--- Creates and returns a pattern that matches a range of text bounded by
--- *chars* characters.
--- This is a convenience function for matching more complicated delimited ranges
--- like strings with escape characters and balanced parentheses. *single_line*
--- indicates whether or not the range must be on a single line, *no_escape*
--- indicates whether or not to ignore '\' as an escape character, and *balanced*
--- indicates whether or not to handle balanced ranges like parentheses and
--- requires *chars* to be composed of two characters.
+-- Creates and returns a pattern that matches from string or pattern *prefix* until the end of
+-- the line.
+-- *escape* indicates whether the end of the line can be escaped with a '\' character.
+-- @param prefix String or pattern prefix to start matching at.
+-- @param escape Optional flag indicating whether or not newlines can be escaped by a '\'
+--  character. The default value is `false`.
+-- @return pattern
+-- @usage local line_comment = lexer.to_eol('//')
+-- @usage local line_comment = lexer.to_eol(S('#;'))
+-- @name to_eol
+function M.to_eol(prefix, escape)
+  return prefix * (not escape and M.nonnewline or M.nonnewline_esc)^0
+end
+
+---
+-- Creates and returns a pattern that matches a range of text bounded by strings or patterns *s*
+-- and *e*.
+-- This is a convenience function for matching more complicated ranges like strings with escape
+-- characters, balanced parentheses, and block comments (nested or not). *e* is optional and
+-- defaults to *s*. *single_line* indicates whether or not the range must be on a single line;
+-- *escapes* indicates whether or not to allow '\' as an escape character; and *balanced*
+-- indicates whether or not to handle balanced ranges like parentheses, and requires *s* and *e*
+-- to be different.
+-- @param s String or pattern start of a range.
+-- @param e Optional string or pattern end of a range. The default value is *s*.
+-- @param single_line Optional flag indicating whether or not the range must be on a single
+--   line. The default value is `false`.
+-- @param escapes Optional flag indicating whether or not the range end may be escaped by a '\'
+--   character. The default value is `false` unless *s* and *e* are identical, single-character
+--   strings. In that case, the default value is `true`.
+-- @param balanced Optional flag indicating whether or not to match a balanced range, like the
+--   "%b" Lua pattern. This flag only applies if *s* and *e* are different.
+-- @return pattern
+-- @usage local dq_str_escapes = lexer.range('"')
+-- @usage local dq_str_noescapes = lexer.range('"', false, false)
+-- @usage local unbalanced_parens = lexer.range('(', ')')
+-- @usage local balanced_parens = lexer.range('(', ')', false, false, true)
+-- @name range
+function M.range(s, e, single_line, escapes, balanced)
+  if type(e) ~= 'string' and type(e) ~= 'userdata' then
+    e, single_line, escapes, balanced = s, e, single_line, escapes
+  end
+  local any = M.any - e
+  if single_line then any = any - '\n' end
+  if balanced then any = any - s end
+  if escapes == nil then
+    -- Only allow escapes by default for ranges with identical, single-character string delimiters.
+    escapes = type(s) == 'string' and #s == 1 and s == e
+  end
+  if escapes then any = any - '\\' + '\\' * M.any end
+  if balanced and s ~= e then
+    return lpeg_P{s * (any + lpeg_V(1))^0 * lpeg_P(e)^-1}
+  else
+    return s * any^0 * lpeg_P(e)^-1
+  end
+end
+
+-- Deprecated function. Use `lexer.range()` instead.
+-- Creates and returns a pattern that matches a range of text bounded by *chars* characters.
+-- This is a convenience function for matching more complicated delimited ranges like strings
+-- with escape characters and balanced parentheses. *single_line* indicates whether or not the
+-- range must be on a single line, *no_escape* indicates whether or not to ignore '\' as an
+-- escape character, and *balanced* indicates whether or not to handle balanced ranges like
+-- parentheses and requires *chars* to be composed of two characters.
 -- @param chars The character(s) that bound the matched range.
--- @param single_line Optional flag indicating whether or not the range must be
---   on a single line.
--- @param no_escape Optional flag indicating whether or not the range end
---   character may be escaped by a '\\' character.
--- @param balanced Optional flag indicating whether or not to match a balanced
---   range, like the "%b" Lua pattern. This flag only applies if *chars*
---   consists of two different characters (e.g. "()").
+-- @param single_line Optional flag indicating whether or not the range must be on a single line.
+-- @param no_escape Optional flag indicating whether or not the range end character may be
+--   escaped by a '\\' character.
+-- @param balanced Optional flag indicating whether or not to match a balanced range, like the
+--   "%b" Lua pattern. This flag only applies if *chars* consists of two different characters
+--   (e.g. "()").
 -- @return pattern
--- @usage local dq_str_escapes = l.delimited_range('"')
--- @usage local dq_str_noescapes = l.delimited_range('"', false, true)
--- @usage local unbalanced_parens = l.delimited_range('()')
--- @usage local balanced_parens = l.delimited_range('()', false, false, true)
--- @see nested_pair
+-- @usage local dq_str_escapes = lexer.delimited_range('"')
+-- @usage local dq_str_noescapes = lexer.delimited_range('"', false, true)
+-- @usage local unbalanced_parens = lexer.delimited_range('()')
+-- @usage local balanced_parens = lexer.delimited_range('()', false, false, true)
+-- @see range
 -- @name delimited_range
 function M.delimited_range(chars, single_line, no_escape, balanced)
+  print("lexer.delimited_range() is deprecated, use lexer.range()")
   local s = chars:sub(1, 1)
   local e = #chars == 2 and chars:sub(2, 2) or s
   local range
   local b = balanced and s or ''
   local n = single_line and '\n' or ''
   if no_escape then
-    local invalid = lpeg_S(e..n..b)
+    local invalid = lpeg_S(e .. n .. b)
     range = M.any - invalid
   else
-    local invalid = lpeg_S(e..n..b) + '\\'
+    local invalid = lpeg_S(e .. n .. b) + '\\'
     range = M.any - invalid + '\\' * M.any
   end
   if balanced and s ~= e then
@@ -1404,12 +1665,10 @@ function M.delimited_range(chars, single_line, no_escape, balanced)
 end
 
 ---
--- Creates and returns a pattern that matches pattern *patt* only at the
--- beginning of a line.
+-- Creates and returns a pattern that matches pattern *patt* only at the beginning of a line.
 -- @param patt The LPeg pattern to match on the beginning of a line.
 -- @return pattern
--- @usage local preproc = token(l.PREPROCESSOR, l.starts_line('#') *
---   l.nonnewline^0)
+-- @usage local preproc = token(lexer.PREPROCESSOR, lexer.starts_line(lexer.to_eol('#')))
 -- @name starts_line
 function M.starts_line(patt)
   return lpeg_Cmt(lpeg_C(patt), function(input, index, match, ...)
@@ -1421,15 +1680,14 @@ function M.starts_line(patt)
 end
 
 ---
--- Creates and returns a pattern that verifies that string set *s* contains the
--- first non-whitespace character behind the current match position.
+-- Creates and returns a pattern that verifies the first non-whitespace character behind the
+-- current match position is in string set *s*.
 -- @param s String character set like one passed to `lpeg.S()`.
 -- @return pattern
--- @usage local regex = l.last_char_includes('+-*!%^&|=,([{') *
---   l.delimited_range('/')
+-- @usage local regex = lexer.last_char_includes('+-*!%^&|=,([{') * lexer.range('/')
 -- @name last_char_includes
 function M.last_char_includes(s)
-  s = '['..s:gsub('[-%%%[]', '%%%1')..']'
+  s = string.format('[%s]', s:gsub('[-%%%[]', '%%%1'))
   return lpeg_P(function(input, index)
     if index == 1 then return index end
     local i = index
@@ -1438,109 +1696,77 @@ function M.last_char_includes(s)
   end)
 end
 
----
--- Returns a pattern that matches a balanced range of text that starts with
--- string *start_chars* and ends with string *end_chars*.
--- With single-character delimiters, this function is identical to
--- `delimited_range(start_chars..end_chars, false, true, true)`.
+-- Deprecated function. Use `lexer.range()` instead.
+-- Returns a pattern that matches a balanced range of text that starts with string *start_chars*
+-- and ends with string *end_chars*.
+-- With single-character delimiters, this function is identical to `delimited_range(start_chars ..
+-- end_chars, false, true, true)`.
 -- @param start_chars The string starting a nested sequence.
 -- @param end_chars The string ending a nested sequence.
 -- @return pattern
--- @usage local nested_comment = l.nested_pair('/*', '*/')
--- @see delimited_range
+-- @usage local nested_comment = lexer.nested_pair('/*', '*/')
+-- @see range
 -- @name nested_pair
 function M.nested_pair(start_chars, end_chars)
+  print("lexer.nested_pair() is deprecated, use lexer.range()")
   local s, e = start_chars, lpeg_P(end_chars)^-1
   return lpeg_P{s * (M.any - s - end_chars + lpeg_V(1))^0 * e}
 end
 
 ---
--- Creates and returns a pattern that matches any single word in list *words*.
--- Words consist of alphanumeric and underscore characters, as well as the
--- characters in string set *word_chars*. *case_insensitive* indicates whether
--- or not to ignore case when matching words.
--- This is a convenience function for simplifying a set of ordered choice word
--- patterns.
--- @param words A table of words.
--- @param word_chars Optional string of additional characters considered to be
---   part of a word. By default, word characters are alphanumerics and
---   underscores ("%w_" in Lua). This parameter may be `nil` or the empty string
---   in order to indicate no additional word characters.
--- @param case_insensitive Optional boolean flag indicating whether or not the
---   word match is case-insensitive. The default is `false`.
+-- Creates and returns a pattern that matches any single word in list or string *words*.
+-- *case_insensitive* indicates whether or not to ignore case when matching words.
+-- This is a convenience function for simplifying a set of ordered choice word patterns.
+-- @param word_list A list of words or a string list of words separated by spaces.
+-- @param case_insensitive Optional boolean flag indicating whether or not the word match is
+--   case-insensitive. The default value is `false`.
+-- @param word_chars Unused legacy parameter.
 -- @return pattern
--- @usage local keyword = token(l.KEYWORD, word_match{'foo', 'bar', 'baz'})
--- @usage local keyword = token(l.KEYWORD, word_match({'foo-bar', 'foo-baz',
---   'bar-foo', 'bar-baz', 'baz-foo', 'baz-bar'}, '-', true))
+-- @usage local keyword = token(lexer.KEYWORD, word_match{'foo', 'bar', 'baz'})
+-- @usage local keyword = token(lexer.KEYWORD, word_match({'foo-bar', 'foo-baz', 'bar-foo',
+--   'bar-baz', 'baz-foo', 'baz-bar'}, true))
+-- @usage local keyword = token(lexer.KEYWORD, word_match('foo bar baz'))
 -- @name word_match
-function M.word_match(words, word_chars, case_insensitive)
-  local word_list = {}
-  for i = 1, #words do
-    word_list[case_insensitive and words[i]:lower() or words[i]] = true
+function M.word_match(word_list, case_insensitive, word_chars)
+  if type(case_insensitive) == 'string' or type(word_chars) == 'boolean' then
+    -- Legacy `word_match(word_list, word_chars, case_insensitive)` form.
+    word_chars, case_insensitive = case_insensitive, word_chars
+  elseif type(word_list) == 'string' then
+    local words = word_list -- space-separated list of words
+    word_list = {}
+    for word in words:gsub('%-%-[^\n]+', ''):gmatch('%S+') do word_list[#word_list + 1] = word end
+  end
+  if not word_chars then word_chars = '' end
+  for _, word in ipairs(word_list) do
+    word_list[case_insensitive and word:lower() or word] = true
+    for char in word:gmatch('[^%w_%s]') do
+      if not word_chars:find(char, 1, true) then word_chars = word_chars .. char end
+    end
   end
   local chars = M.alnum + '_'
-  if word_chars then chars = chars + lpeg_S(word_chars) end
+  if word_chars ~= '' then chars = chars + lpeg_S(word_chars) end
   return lpeg_Cmt(chars^1, function(input, index, word)
     if case_insensitive then word = word:lower() end
     return word_list[word] and index or nil
   end)
 end
 
----
--- Embeds child lexer *child* in parent lexer *parent* using patterns
--- *start_rule* and *end_rule*, which signal the beginning and end of the
--- embedded lexer, respectively.
+-- Deprecated legacy function. Use `parent:embed()` instead.
+-- Embeds child lexer *child* in parent lexer *parent* using patterns *start_rule* and *end_rule*,
+-- which signal the beginning and end of the embedded lexer, respectively.
 -- @param parent The parent lexer.
 -- @param child The child lexer.
--- @param start_rule The pattern that signals the beginning of the embedded
---   lexer.
+-- @param start_rule The pattern that signals the beginning of the embedded lexer.
 -- @param end_rule The pattern that signals the end of the embedded lexer.
--- @usage l.embed_lexer(M, css, css_start_rule, css_end_rule)
--- @usage l.embed_lexer(html, M, php_start_rule, php_end_rule)
--- @usage l.embed_lexer(html, ruby, ruby_start_rule, ruby_end_rule)
+-- @usage lexer.embed_lexer(M, css, css_start_rule, css_end_rule)
+-- @usage lexer.embed_lexer(html, M, php_start_rule, php_end_rule)
+-- @usage lexer.embed_lexer(html, ruby, ruby_start_rule, ruby_end_rule)
+-- @see embed
 -- @name embed_lexer
 function M.embed_lexer(parent, child, start_rule, end_rule)
-  -- Add child rules.
-  if not child._EMBEDDEDRULES then child._EMBEDDEDRULES = {} end
-  if not child._RULES then -- creating a child lexer to be embedded
-    if not child._rules then error('Cannot embed language with no rules') end
-    for i = 1, #child._rules do
-      add_rule(child, child._rules[i][1], child._rules[i][2])
-    end
-  end
-  child._EMBEDDEDRULES[parent._NAME] = {
-    ['start_rule'] = start_rule,
-    token_rule = join_tokens(child),
-    ['end_rule'] = end_rule
-  }
-  if not parent._CHILDREN then parent._CHILDREN = {} end
-  local children = parent._CHILDREN
-  children[#children + 1] = child
-  -- Add child styles.
-  if not parent._tokenstyles then parent._tokenstyles = {} end
-  local tokenstyles = parent._tokenstyles
-  tokenstyles[child._NAME..'_whitespace'] = M.STYLE_WHITESPACE
-  if child._tokenstyles then
-    for token, style in pairs(child._tokenstyles) do
-      tokenstyles[token] = style
-    end
-  end
-  -- Add child fold symbols.
-  if not parent._foldsymbols then parent._foldsymbols = {} end
-  if child._foldsymbols then
-    for token, symbols in pairs(child._foldsymbols) do
-      if not parent._foldsymbols[token] then parent._foldsymbols[token] = {} end
-      for k, v in pairs(symbols) do
-        if type(k) == 'number' then
-          parent._foldsymbols[token][#parent._foldsymbols[token] + 1] = v
-        elseif not parent._foldsymbols[token][k] then
-          parent._foldsymbols[token][k] = v
-        end
-      end
-    end
-  end
-  child._lexer = parent -- use parent's tokens if child is embedding itself
-  parent_lexer = parent -- use parent's tokens if the calling lexer is a proxy
+  if not getmetatable(parent) then process_legacy_lexer(parent) end
+  if not getmetatable(child) then process_legacy_lexer(child) end
+  parent:embed(child, start_rule, end_rule)
 end
 
 -- Determines if the previous line is a comment.
@@ -1584,16 +1810,17 @@ local function next_line_is_comment(prefix, text, pos, line, s)
 end
 
 ---
--- Returns a fold function (to be used within the lexer's `_foldsymbols` table)
--- that folds consecutive line comments that start with string *prefix*.
--- @param prefix The prefix string defining a line comment.
--- @usage [l.COMMENT] = {['--'] = l.fold_line_comments('--')}
--- @usage [l.COMMENT] = {['//'] = l.fold_line_comments('//')}
--- @name fold_line_comments
-function M.fold_line_comments(prefix)
+-- Returns for `lexer.add_fold_point()` the parameters needed to fold consecutive lines that
+-- start with string *prefix*.
+-- @param prefix The prefix string (e.g. a line comment).
+-- @usage lex:add_fold_point(lexer.COMMENT, lexer.fold_consecutive_lines('--'))
+-- @usage lex:add_fold_point(lexer.COMMENT, lexer.fold_consecutive_lines('//'))
+-- @usage lex:add_fold_point(lexer.KEYWORD, lexer.fold_consecutive_lines('import'))
+-- @name fold_consecutive_lines
+function M.fold_consecutive_lines(prefix)
   local property_int = M.property_int
-  return function(text, pos, line, s)
-    if property_int['fold.line.comments'] == 0 then return 0 end
+  return prefix, function(text, pos, line, s)
+    if property_int['fold.line.groups'] == 0 then return 0 end
     if s > 1 and line:match('^%s*()') < s then return 0 end
     local prev_line_comment = prev_line_is_comment(prefix, text, pos, line, s)
     local next_line_comment = next_line_is_comment(prefix, text, pos, line, s)
@@ -1603,73 +1830,26 @@ function M.fold_line_comments(prefix)
   end
 end
 
-M.property_expanded = setmetatable({}, {
-  -- Returns the string property value associated with string property *key*,
-  -- replacing any "$()" and "%()" expressions with the values of their keys.
-  __index = function(t, key)
-    return M.property[key]:gsub('[$%%]%b()', function(key)
-      return t[key:sub(3, -2)]
-    end)
-  end,
-  __newindex = function() error('read-only property') end
-})
+-- Deprecated legacy function. Use `lexer.fold_consecutive_lines()` instead.
+-- Returns a fold function (to be passed to `lexer.add_fold_point()`) that folds consecutive
+-- line comments that start with string *prefix*.
+-- @param prefix The prefix string defining a line comment.
+-- @usage lex:add_fold_point(lexer.COMMENT, '--', lexer.fold_line_comments('--'))
+-- @usage lex:add_fold_point(lexer.COMMENT, '//', lexer.fold_line_comments('//'))
+-- @name fold_line_comments
+function M.fold_line_comments(prefix)
+  print('lexer.fold_line_comments() is deprecated, use lexer.fold_consecutive_lines()')
+  return select(2, M.fold_consecutive_lines(prefix))
+end
 
 --[[ The functions and fields below were defined in C.
 
 ---
--- Returns the line number of the line that contains position *pos*, which
+-- Returns the line number (starting from 1) of the line that contains position *pos*, which
 -- starts from 1.
 -- @param pos The position to get the line number of.
 -- @return number
 local function line_from_position(pos) end
-
----
--- Individual fields for a lexer instance.
--- @field _NAME The string name of the lexer.
--- @field _rules An ordered list of rules for a lexer grammar.
---   Each rule is a table containing an arbitrary rule name and the LPeg pattern
---   associated with the rule. The order of rules is important, as rules are
---   matched sequentially.
---   Child lexers should not use this table to access and/or modify their
---   parent's rules and vice-versa. Use the `_RULES` table instead.
--- @field _tokenstyles A map of non-predefined token names to styles.
---   Remember to use token names, not rule names. It is recommended to use
---   predefined styles or color-agnostic styles derived from predefined styles
---   to ensure compatibility with user color themes.
--- @field _foldsymbols A table of recognized fold points for the lexer.
---   Keys are token names with table values defining fold points. Those table
---   values have string keys of keywords or characters that indicate a fold
---   point whose values are integers. A value of `1` indicates a beginning fold
---   point and a value of `-1` indicates an ending fold point. Values can also
---   be functions that return `1`, `-1`, or `0` (indicating no fold point) for
---   keys which need additional processing.
---   There is also a required `_patterns` key whose value is a table containing
---   Lua pattern strings that match all fold points (the string keys contained
---   in token name table values). When the lexer encounters text that matches
---   one of those patterns, the matched text is looked up in its token's table
---   to determine whether or not it is a fold point.
---   There is also an optional `_case_insensitive` option that indicates whether
---   or not fold point keys are case-insensitive. If `true`, fold point keys
---   should be in lower case.
--- @field _fold If this function exists in the lexer, it is called for folding
---   the document instead of using `_foldsymbols` or indentation.
--- @field _lexer The parent lexer object whose rules should be used. This field
---   is only necessary to disambiguate a proxy lexer that loaded parent and
---   child lexers for embedding and ended up having multiple parents loaded.
--- @field _RULES A map of rule name keys with their associated LPeg pattern
---   values for the lexer.
---   This is constructed from the lexer's `_rules` table and accessible to other
---   lexers for embedded lexer applications like modifying parent or child
---   rules.
--- @field _LEXBYLINE Indicates the lexer can only process one whole line of text
---    (instead of an arbitrary chunk of text) at a time.
---    The default value is `false`. Line lexers cannot look ahead to subsequent
---    lines.
--- @field _FOLDBYINDENTATION Declares the lexer does not define fold points and
---    that fold points should be calculated based on changes in indentation.
--- @class table
--- @name lexer
-local lexer
 ]]
 
 return M