aboutsummaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README597
1 files changed, 0 insertions, 597 deletions
diff --git a/README b/README
deleted file mode 100644
index d5c6e93..0000000
--- a/README
+++ /dev/null
@@ -1,597 +0,0 @@
-Why another text editor?
-========================
-
-It all started when I was recently reading the excellent
-[Project Oberon](http://www.inf.ethz.ch/personal/wirth/ProjectOberon/),
-where in chapter 5 a data structure for managing text is introduced.
-I found this rather appealing and wanted to see how it works in practice.
-
-After some time I decided that besides just having fun hacking around I
-might as well build something which could (at least in the long run)
-replace my current editor of choice: vim.
-
-This should be accomplished by a reasonable amount of clean (your mileage
-may vary), modern and legacy free C code. Certainly not an old,
-[500'000 lines long](https://www.openhub.net/p/vim) #ifdef cluttered
-mess which tries to run on all broken systems ever envisioned by mankind.
-
-Admittedly vim has a lot of functionally, most of which I don't use. I
-therefore set out with the following main goals:
-
- - Unicode aware
-
- - binary clean
-
- - handle arbitrary files (this includes large ones, think >100M SQL-dumps)
-
- - unlimited undo/redo support
-
- - syntax highlighting
-
- - regex search (and replace)
-
- - multiple file/window support
-
-The goal could thus be summarized as "80% of vim's features (in other
-words the useful ones) implemented in roughly 1% of the code".
-
-Finally and most importantly it is fun! Writing a text editor presents
-some interesting challenges and design decisions, some of which are
-explained below.
-
-Text management using a piece table/chain
-=========================================
-
-The core of this editor is a persistent data structure called a piece
-table which supports all modifications in `O(m)`, where `m` is the number
-of non-consecutive editing operations. This bound could be further
-improved to `O(log m)` by use of a balanced search tree, however the
-additional complexity doesn't seem to be worth it, for now.
-
-The actual data is stored in buffers which are strictly append only.
-There exist two types of buffers, one fixed-sized holding the original
-file content and multiple append-only ones storing the modifications.
-
-A text, i.e. a sequence of bytes, is represented as a double linked
-list of pieces each with a pointer into a buffer and an associated
-length. Pieces are never deleted but instead always kept around for
-redo/undo support. A span is a range of pieces, consisting of a start
-and end piece. Changes to the text are always performed by swapping
-out an existing, possibly empty, span with a new one.
-
-An empty document is represented by two special sentinel pieces which
-always exist:
-
- /-+ --> +-\
- | | | |
- \-+ <-- +-/
- #1 #2
-
-Loading a file from disk is as simple as mmap(2)-ing it into a buffer,
-creating a corresponding piece and adding it to the double linked list.
-Hence loading a file is a constant time operation i.e. independent of
-the actual file size (assuming the operating system uses demand paging).
-
- /-+ --> +-----------------+ --> +-\
- | | | I am an editor! | | |
- \-+ <-- +-----------------+ <-- +-/
- #1 #3 #2
-
-Insert
-------
-
-Inserting a junk of data amounts to appending the new content to a
-modification buffer. Followed by the creation of new pieces. An insertion
-in the middle of an existing piece requires the creation of 3 new pieces.
-Two of them hold references to the text before respectively after the
-insertion point. While the third one points to the newly added text.
-
- /-+ --> +---------------+ --> +----------------+ --> +--+ --> +-\
- | | | I am an editor| |which sucks less| |! | | |
- \-+ <-- +---------------+ <-- +----------------+ <-- +--+ <-- +-/
- #1 #4 #5 #6 #2
-
- modification buffer content: "which sucks less"
-
-During this insertion operation the old span [3,3] has been replaced
-by the new span [4,6]. Notice that the pieces in the old span were not
-changed, therefore still point to their predecessors/successors, and can
-thus be swapped back in.
-
-If the insertion point happens to be at a piece boundary, the old span
-is empty, and the new span only consists of the newly allocated piece.
-
-Delete
-------
-
-Similarly a delete operation splits the pieces at appropriate places.
-
- /-+ --> +-----+ --> +--+ --> +-\
- | | | I am| |! | | |
- \-+ <-- +-----+ <-- +--+ <-- +-/
- #1 #7 #6 #2
-
-Where the old span [4,5] got replaced by the new span [7,7]. The underlying
-buffers remain unchanged.
-
-Cache
------
-
-Notice that the common case of appending text to a given piece is fast
-since, the new data is simply appended to the buffer and the piece length
-is increased accordingly. In order to keep the number of pieces down,
-the least recently edited piece is cached and changes to it are done
-in place (this is the only time buffers are modified in a non-append
-only way). As a consequence they can not be undone.
-
-Undo/redo
----------
-
-Since the buffers are append only and the spans/pieces are never destroyed
-undo/redo functionality is implemented by swapping the required spans/pieces
-back in.
-
-As illustrated above, each change to the text is recorded by an old and
-a new span. An action consists of multiple changes which logically belong
-to each other and should thus also be reverted together. For example
-a search and replace operation is one action with possibly many changes
-all over the text. Actions are kept in an undo respectively redo stack.
-
-A state of the text can be marked by means of a snapshotting operation.
-The undo/redo functionality operates on such marked states and switches
-back and forth between them.
-
-The history is currently linear, no undo / history tree is implemented.
-
-Properties
-----------
-
-The main advantage of the piece chain as described above is that all
-operations are performed independent of the file size but instead linear
-in the number of pieces i.e. editing operations. The original file buffer
-never changes which means the `mmap(2)` can be performed read only which
-makes optimal use of the operating system's virtual memory / paging system.
-
-The maximum editable file size is limited by the amount of memory a process
-is allowed to map into its virtual address space, this shouldn't be a problem
-in practice. The whole process assumes that the file can be used as is.
-In particular the editor assumes all input and the file itself is encoded
-as UTF-8. Supporting other encodings would require conversion using `iconv(3)`
-or similar upon loading and saving the document, which defeats the whole
-purpose.
-
-Similarly the editor has to cope with the fact that lines can be terminated
-either by `\n` or `\r\n`. There is no conversion to a line based structure in
-place. Instead the whole text is exposed as a sequence of bytes. All
-addressing happens by means of zero based byte offsets from the start of
-the file.
-
-The main disadvantage of the piece chain data structure is that the text
-is not stored contiguous in memory which makes seeking around somewhat
-harder. This also implies that standard library calls like the `regex(3)`
-functions can not be used as is. However this is the case for all but
-the most simple data structures used in text editors.
-
-Screen Drawing
-==============
-
-The current code takes a rather simple approach to screen drawing. It
-basically only remembers the starting position of the area being shown.
-Then fetches a "screen full" of bytes and outputs one character at a
-time until the end of the window is reached. A consequence of this
-approach is that lines are always wrapped and horizontal scrolling is
-not supported.
-
-No efforts are made to reduce the terminal output. This task is delegated
-to the underlying curses library which already performs a kind of double
-buffering. The window is always redrawn completely even if only a single
-character changes. It turns out this is actually necessary if one wants
-to support multiline syntax highlighting.
-
-While outputting the individual characters a cell matrix is populated
-where each entry stores the length in bytes of the character displayed
-at this particular cell. For characters spanning multiple columns the
-length is always stored in the leftmost cell. As an example a tab has a
-length of 1 byte followed by up to 7 cells with a length of zero.
-Similarly a `\r\n` line ending occupies only one screen cell but has a
-length of 2.
-
-This matrix is actually stored per line inside a double linked list of
-structures representing screen lines. For each line we keep track of
-the length in bytes of the underlying text, the display width of all
-characters part of the line, and the logical line number.
-
-All cursor positioning is always performed in bytes from the start of
-the file and works by traversing the double linked list of screen lines
-until the correct line is found. Then the cell array is consulted to
-move to the correct column.
-
-Syntax-Highlighting
--------------------
-
-The editor takes a similar regex-based approach to syntax highlighting
-than sandy and reuses its syntax definitions but always applies them to
-a "screen full" of text thus enabling multiline coloring.
-
-Window-Management
------------------
-
-It is possible to open multiple windows via the `:split/:vsplit/:open`
-commands or by passing multiple files on the command line.
-
-In principle it would be nice to follow a similar client/server approach
-as sam/samterm i.e. having the main editor as a server and each window
-as a separate client process with communication over a unix domain socket.
-
-That way window management would be taken care of by dwm or dvtm and the
-different client processes would still share common cut/paste registers
-etc.
-
-However at the moment I don't want to open that can of worms and instead
-settled for a single process architecture.
-
-Search and replace
-------------------
-
-Currently the editor copies the whole text to a contiguous memory block
-and then uses the standard regex functions from libc. Clearly this is not
-a satisfactory solution for large files and kind of defeats the whole
-effort spent on the piece table.
-
-The long term solution is to write our own regular expression engine or
-modify an existing one to make use of the iterator API. This would allow
-efficient search without having to double memory consumption.
-
-The used regex engine should use a non-backtracking algorithm. Useful
-resources include:
-
- - [Russ Cox's regex pag](http://swtch.com/~rsc/regexp/)
- - [TRE](https://github.com/laurikari/tre) as
- [used by musl](http://git.musl-libc.org/cgit/musl/tree/src/regex)
- which uses a parallel [TNFA matcher](http://laurikari.net/ville/spire2000-tnfa.ps)
- - [Plan9's regex library](http://plan9.bell-labs.com/sources/plan9/sys/src/libregexp/)
- which has its root in Rob Pike's sam text editor
- - [RE2](https://github.com/google/re2) C++ regex library
-
-Command-Prompt
---------------
-
-The editor needs some form of command prompt to get user input
-(think `:`, `/`, `?` in vim).
-
-At first I wanted to implement this in terms of an external process,
-similar to the way it is done in sandy with communication back to the
-editor via a named pipe.
-
-At some point I thought it would be possible to provide all editor commands
-as shell scripts in a given directory, then set $PATH accordingly and run
-the shell. This would give us readline editing, tab completion, history and
-Unicode support for free. But unfortunately it won't work due to quoting
-issues and other conflicts of special symbols with different meanings.
-
-Later it occurred to me that the editor prompt could just be treated as
-special 1 line file. That is all the main editor functionality is reused
-with a slightly different set of key bindings.
-
-This approach also has the added benefit of further testing the main editor
-component (in particular corner cases like editing at the end of the file).
-
-Editor Frontends
-================
-
-The editor core is written in a library like fashion which should make
-it possible to write multiple frontends with possibly different user
-interfaces/paradigms. The frontend to run is selected based on the
-executable name.
-
-The default, and currently only, interface is a vim clone called vis.
-
-Key binding modes
------------------
-
-The available key bindings for the different modes are arranged in a
-hierarchical way in config.h (there is also an ascii tree giving an
-overview in that file). Each mode can specify a parent mode which is
-used to look up a key binding if it is not found in the current mode.
-This reduces redundancy for keys which have the same meaning in
-different modes.
-
-Each mode can also provide hooks which are executed upon entering/leaving
-the mode and when there was an unmatched key.
-
-vis a vim like frontend
------------------------
-
-The following section gives a quick overview over various vim features
-and their current support in vis.
-
-### Operators
-
- d (delete)
- c (change)
- y (yank)
- p (put)
- > (shift-right)
- < (shift-left),
- J (join)
- ~ (swap case)
- gu (make lowercase)
- gU (make uppercase)
-
-### Movements
-
- h (char left)
- l (char right)
- j (line down)
- k (line up)
- gj (display line down)
- gk (display line up)
- 0 (start of line)
- ^ (first non-blank of line)
- g_ (last non-blank of line)
- $ (end of line)
- % (match bracket)
- b (previous start of a word)
- B (previous start of a WORD)
- w (next start of a word)
- W (next start of a WORD)
- e (next end of a word)
- E (next end of a WORD)
- ge (previous end of a word)
- gE (previous end of a WORD)
- { (previous paragraph)
- } (next paragraph)
- ( (previous sentence)
- ) (next sentence)
- gg (begin of file)
- G (goto line or end of file)
- | (goto column)
- n (repeat last search forward)
- N (repeat last search backwards)
- f{char} (to next occurrence of char to the right)
- t{char} (till before next occurrence of char to the right)
- F{char} (to next occurrence of char to the left)
- T{char} (till before next occurrence of char to the left)
- /{text} (to next match of text in forward direction)
- ?{text} (to next match of text in backward direction)
-
- An empty line is currently neither a word nor a WORD.
-
- The semantics of a paragraph and a sentence is also not always 100%
- the same as in vim.
-
- Some of these commands do not work as in vim when prefixed with a
- digit i.e. a multiplier. As an example in vim `3$` moves to the end
- of the 3rd line down. However vis treats it as a move to the end of
- current line which is repeated 3 times where the last two have no
- effect.
-
- In general there are still a lot of improvements to be made in the
- case movements are forced to be line or character wise. Also some of
- them should be inclusive in some context and exclusive in others.
- At the moment they always behave the same.
-
-### Text objects
-
- All of the following text objects are implemented in an inner variant
- (prefixed with 'i') and a normal variant (prefixed with 'a'):
-
- w word
- W WORD
- s sentence
- p paragraph
- [,], (,), {,}, <,>, ", ', ` block enclosed by these symbols
-
- For sentence and paragraph there is no difference between the
- inner and normal variants.
-
-### Modes
-
- At the moment there exists a more or less functional insert, replace
- and visual mode (in both line and character wise variants).
-
-### Marks
-
- [a-z] general purpose marks
- < start of the last selected visual area in current buffer
- > end of the last selected visual area in current buffer
-
- No marks across files are supported. Marks are not preserved over
- editing sessions.
-
-### Registers
-
- Only the 26 lower case registers `[a-z]` and 1 additional default register
- is supported.
-
-### Undo/Redo and Repeat
-
- The text is currently snapshoted whenever an operator is completed as
- well as when insert or replace mode is left. Additionally a snapshot
- is also taken if in insert or replace mode a certain idle time elapses.
-
- Another idea is to snapshot based on the distance between two consecutive
- editing operations (as they are likely unrelated and thus should be
- individually reversible).
-
- The repeat command `.` works for all operators and is able to repeat
- the last insertion or replacement.
-
-### Macros
-
-`[a-z]` are recoginized macro names, `q` starts a recording, `@` plays it back.
-`@@` refers to the least recently recorded macro.
-
-### Command line prompt
-
- At the `:`-command prompt only the following commands are recognized:
-
- :nnn go to line nnn
- :bdelete close all windows which display the same file as the current one
- :edit replace current file with a new one or reload it from disk
- :open open a new window
- :qall close all windows, exit editor
- :quit close currently focused window
- :read insert content of another file at current cursor position
- :split split window horizontally
- :vsplit split window vertically
- :new open an empty window, arrange horizontally
- :vnew open an empty window, arrange vertically
- :wq write changes then close window
- :xit like :wq but write only when changes have been made
- :write write current buffer content to file
- :saveas save file under another name
- :set set the options below
-
- tabwidth [1-8]
-
- set display width of a tab and number of spaces to use if
- expandtab is enabled
-
- expandtab (yes|no)
-
- whether typed in tabs should be expanded to tabwidth spaces
-
- autoindent (yes|no)
-
- replicate spaces and tabs at the beginning of the line when
- starting a new line.
-
- number (yes|no)
- relativenumber (yes|no)
-
- whether absolute or relative line numbers are printed alongside
- the file content
-
- syntax name
-
- use syntax definition given (e.g. "c") or disable syntax
- highlighting if no such definition exists (e.g :set syntax off)
-
- Each command can be prefixed with a range made up of a start and
- an end position as in start,end. Valid position specifiers are:
-
- . start of the current line
- +n and -n start of the line relative to the current line
- 'm position of mark m
- /pattern/ first match after current position
-
- If only a start position without a command is given then the cursor
- is moved to that position. Additionally the following ranges are
- predefined:
-
- % the whole file, equivalent to 1,$
- * the current selection, equivalent to '<,'>
-
- The substitute command is recognized but not yet implemented. The `!`
- command to filter text through an external program is also planned.
-
- History support, tab completion and wildcard expansion are other
- worthwhile features. However implementing them inside the editor
- feels wrong.
-
-### Tab <-> Space and Line endings \n vs \r\n
-
- Tabs can optionally be expaned to a configurable number of spaces.
- The first line ending in the file determines what will be inserted
- upon a line break (defaults to \n).
-
-### Jump list and change list
-
- A per window, file local jump list (navigate with `CTRL+O` and `CTRL+I`)
- and change list (navigate with `g;` and `g,`) is supported. The jump
- list is implemented as a fixed sized ring buffer.
-
-### Mouse support
-
- The mouse is currently not used at all.
-
-### Other features
-
- Things which I would like to add in the long term are:
-
- + code completion: this should be done as an external process. I will
- have to take a look at the tools from the llvm / clang project. Maybe
- dvtm's terminal emulation support could be reused to display an
- slmenu inside the editor at the cursor position?
-
- + something similar to vim's quick fix functionality
-
- Things I might add
-
- + runtime configurable key bindings
- + visual block mode / multiple selections
- + text folding
-
- Stuff which vim does which I don't use and have no plans to add:
-
- - GUIs (neither x11, motif, gtk, win32 ...)
- - plugins (certainly not vimscript, if anything it should be lua based)
- - right-to-left text
- - tabs (as in multiple workspaces)
- - ex mode
-
-How to help?
-------------
-
-At this point it might be best to fetch the code, edit some scratch file,
-notice an odd behavior or missing functionality, write and submit a patch
-for it, then iterate.
-
-WARNING: There are probably still some bugs left which could corrupt your
- unsaved changes. Use at your own risk. At this point I suggest to
- only edit non-critical files which are under version control and
- thus easily recoverable!
-
-A quick overview over the code structure to get you started:
-
- File(s) | Description
- ------------------- | -----------------------------------------------------
- `text.[ch]` | low level text / marks / {un,re}do / piece table implementation
- `text-motions.[ch]` | movement functions take a file position and return a new one
- `text-objects.[ch]` | functions take a file position and return a file range
- `vis.c` | vi(m) specific editor frontend, program entry point
- `editor.[ch]` | editor window management
- `window.[ch]` | ui-independent viewport, syntax highlighting, cursor placement
- `ui.h` | abstract interface as implemented by user interface
- `ui-curses.h` | a terminal / curses based user interface implementation
- `buffer.[ch]` | dynamically growing buffer used for registers and macros
- `ring-buffer.[ch]` | fixed size ring buffer used for the jump list
- `map.[ch]` | crit-bit tree based map supporting unique prefix lookups and ordered iteration. used to implement `:`-commands.
- `config.def.h` | definition of key bindings, commands, syntax highlighting
-
-Hope this gets the interested people started.
-
-TODO
-----
-
-Here is an incomplete list of TODO items and/or ideas for further work
-in no particular order:
-
- * Review and cleanup the existing implementation (e.g. selection handling)
- - Eliminate global state and expose vis frontend as "library"
- * Implement `:!` using a proper (libuv based?) mainloop
- * Implement `:substitute`
- * Bugfix: editing the same file in multiple windows can cause "corruption"
- * Review/Implement cindent mode #33
- * Add history support to `:`-prompt
- * Implement wordwrap (i.e `gq` and `:set textwidth`) using `fmt(1)` ?
- * Overhaul key bindings to support runtime configuration / streamline config.def.h
- * Implement/review/merge history undo tree
- * Implement a regex engine which works with the iterator API
- * Write [unit test](http://ccodearchive.net/info/tap.html) for the low
- level `text_*` interface
- * Improve syntax highlighting, investigate whether already existing
- syntax definitions from other editors could be reused
- * Optimize `text_delete` in case of consecutive delete operations
- * Add a RPC interface, experiment with a client/server architecture and
- delegate window management to dwm/dvtm
-
-Feel free to ask questions if something is unclear! There are still a lot
-of bugs left to fix, but by now I'm fairly sure that the general concept
-should work.
-
-As always, comments and patches welcome!
-
-Cheers,
-Marc