From 3d6583383bffc75694635fadca2bc89cd5f980ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc=20Andr=C3=A9=20Tanner?= <mat@brain-dump.org>
Date: Sat, 11 Apr 2015 19:58:05 +0200
Subject: Rename README -> README.md

---
 README | 597 -----------------------------------------------------------------
 1 file changed, 597 deletions(-)
 delete mode 100644 README

(limited to 'README')

diff --git a/README b/README
deleted file mode 100644
index d5c6e93..0000000
--- a/README
+++ /dev/null
@@ -1,597 +0,0 @@
-Why another text editor?
-========================
-
-It all started when I was recently reading the excellent
-[Project Oberon](http://www.inf.ethz.ch/personal/wirth/ProjectOberon/),
-where in chapter 5 a data structure for managing text is introduced.
-I found this rather appealing and wanted to see how it works in practice.
-
-After some time I decided that besides just having fun hacking around I
-might as well build something which could (at least in the long run)
-replace my current editor of choice: vim.
-
-This should be accomplished by a reasonable amount of clean (your mileage
-may vary), modern and legacy free C code. Certainly not an old,
-[500'000 lines long](https://www.openhub.net/p/vim) #ifdef cluttered
-mess which tries to run on all broken systems ever envisioned by mankind.
-
-Admittedly vim has a lot of functionally, most of which I don't use. I
-therefore set out with the following main goals:
-
- - Unicode aware
-
- - binary clean
-
- - handle arbitrary files (this includes large ones, think >100M SQL-dumps)
-
- - unlimited undo/redo support
-
- - syntax highlighting
-
- - regex search (and replace)
-
- - multiple file/window support
-
-The goal could thus be summarized as "80% of vim's features (in other
-words the useful ones) implemented in roughly 1% of the code".
-
-Finally and most importantly it is fun! Writing a text editor presents
-some interesting challenges and design decisions, some of which are
-explained below.
-
-Text management using a piece table/chain
-=========================================
-
-The core of this editor is a persistent data structure called a piece
-table which supports all modifications in `O(m)`, where `m` is the number
-of non-consecutive editing operations. This bound could be further
-improved to `O(log m)` by use of a balanced search tree, however the
-additional complexity doesn't seem to be worth it, for now.
-
-The actual data is stored in buffers which are strictly append only.
-There exist two types of buffers, one fixed-sized holding the original
-file content and multiple append-only ones storing the modifications.
-
-A text, i.e. a sequence of bytes, is represented as a double linked
-list of pieces each with a pointer into a buffer and an associated
-length. Pieces are never deleted but instead always kept around for
-redo/undo support. A span is a range of pieces, consisting of a start
-and end piece. Changes to the text are always performed by swapping
-out an existing, possibly empty, span with a new one.
-
-An empty document is represented by two special sentinel pieces which
-always exist:
-
-    /-+ --> +-\
-    | |     | |
-    \-+ <-- +-/
-     #1     #2
-
-Loading a file from disk is as simple as mmap(2)-ing it into a buffer,
-creating a corresponding piece and adding it to the double linked list.
-Hence loading a file is a constant time operation i.e. independent of
-the actual file size (assuming the operating system uses demand paging).
-
-    /-+ --> +-----------------+ --> +-\
-    | |     | I am an editor! |     | |
-    \-+ <-- +-----------------+ <-- +-/
-     #1             #3              #2
-
-Insert
-------
-
-Inserting a junk of data amounts to appending the new content to a
-modification buffer. Followed by the creation of new pieces. An insertion
-in the middle of an existing piece requires the creation of 3 new pieces.
-Two of them hold references to the text before respectively after the
-insertion point. While the third one points to the newly added text.
-
-    /-+ --> +---------------+ --> +----------------+ --> +--+ --> +-\
-    | |     | I am an editor|     |which sucks less|     |! |     | |
-    \-+ <-- +---------------+ <-- +----------------+ <-- +--+ <-- +-/
-     #1            #4                   #5                #6      #2
-
-           modification buffer content: "which sucks less"
-
-During this insertion operation the old span [3,3] has been replaced
-by the new span [4,6]. Notice that the pieces in the old span were not
-changed, therefore still point to their predecessors/successors, and can
-thus be swapped back in.
-
-If the insertion point happens to be at a piece boundary, the old span
-is empty, and the new span only consists of the newly allocated piece.
-
-Delete
-------
-
-Similarly a delete operation splits the pieces at appropriate places.
-
-    /-+ --> +-----+ --> +--+ --> +-\
-    | |     | I am|     |! |     | |
-    \-+ <-- +-----+ <-- +--+ <-- +-/
-     #1       #7         #6      #2
-
-Where the old span [4,5] got replaced by the new span [7,7]. The underlying
-buffers remain unchanged.
-
-Cache
------
-
-Notice that the common case of appending text to a given piece is fast
-since, the new data is simply appended to the buffer and the piece length
-is increased accordingly. In order to keep the number of pieces down,
-the least recently edited piece is cached and changes to it are done
-in place (this is the only time buffers are modified in a non-append
-only way). As a consequence they can not be undone.
-
-Undo/redo
----------
-
-Since the buffers are append only and the spans/pieces are never destroyed
-undo/redo functionality is implemented by swapping the required spans/pieces
-back in.
-
-As illustrated above, each change to the text is recorded by an old and
-a new span. An action consists of multiple changes which logically belong
-to each other and should thus also be reverted together. For example
-a search and replace operation is one action with possibly many changes
-all over the text. Actions are kept in an undo respectively redo stack.
-
-A state of the text can be marked by means of a snapshotting operation.
-The undo/redo functionality operates on such marked states and switches
-back and forth between them.
-
-The history is currently linear, no undo / history tree is implemented.
-
-Properties
-----------
-
-The main advantage of the piece chain as described above is that all
-operations are performed independent of the file size but instead linear
-in the number of pieces i.e. editing operations. The original file buffer
-never changes which means the `mmap(2)` can be performed read only which
-makes optimal use of the operating system's virtual memory / paging system.
-
-The maximum editable file size is limited by the amount of memory a process
-is allowed to map into its virtual address space, this shouldn't be a problem
-in practice. The whole process assumes that the file can be used as is.
-In particular the editor assumes all input and the file itself is encoded
-as UTF-8. Supporting other encodings would require conversion using `iconv(3)`
-or similar upon loading and saving the document, which defeats the whole
-purpose.
-
-Similarly the editor has to cope with the fact that lines can be terminated
-either by `\n` or `\r\n`. There is no conversion to a line based structure in
-place. Instead the whole text is exposed as a sequence of bytes. All
-addressing happens by means of zero based byte offsets from the start of
-the file.
-
-The main disadvantage of the piece chain data structure is that the text
-is not stored contiguous in memory which makes seeking around somewhat
-harder. This also implies that standard library calls like the `regex(3)`
-functions can not be used as is. However this is the case for all but
-the most simple data structures used in text editors.
-
-Screen Drawing
-==============
-
-The current code takes a rather simple approach to screen drawing. It
-basically only remembers the starting position of the area being shown.
-Then fetches a "screen full" of bytes and outputs one character at a
-time until the end of the window is reached. A consequence of this
-approach is that lines are always wrapped and horizontal scrolling is
-not supported.
-
-No efforts are made to reduce the terminal output. This task is delegated
-to the underlying curses library which already performs a kind of double
-buffering. The window is always redrawn completely even if only a single
-character changes. It turns out this is actually necessary if one wants
-to support multiline syntax highlighting.
-
-While outputting the individual characters a cell matrix is populated
-where each entry stores the length in bytes of the character displayed
-at this particular cell. For characters spanning multiple columns the
-length is always stored in the leftmost cell. As an example a tab has a
-length of 1 byte followed by up to 7 cells with a length of zero.
-Similarly a `\r\n` line ending occupies only one screen cell but has a
-length of 2.
-
-This matrix is actually stored per line inside a double linked list of
-structures representing screen lines. For each line we keep track of
-the length in bytes of the underlying text, the display width of all
-characters part of the line, and the logical line number.
-
-All cursor positioning is always performed in bytes from the start of
-the file and works by traversing the double linked list of screen lines
-until the correct line is found. Then the cell array is consulted to
-move to the correct column.
-
-Syntax-Highlighting
--------------------
-
-The editor takes a similar regex-based approach to syntax highlighting
-than sandy and reuses its syntax definitions but always applies them to
-a "screen full" of text thus enabling multiline coloring.
-
-Window-Management
------------------
-
-It is possible to open multiple windows via the `:split/:vsplit/:open`
-commands or by passing multiple files on the command line.
-
-In principle it would be nice to follow a similar client/server approach
-as sam/samterm i.e. having the main editor as a server and each window
-as a separate client process with communication over a unix domain socket.
-
-That way window management would be taken care of by dwm or dvtm and the
-different client processes would still share common cut/paste registers
-etc.
-
-However at the moment I don't want to open that can of worms and instead
-settled for a single process architecture.
-
-Search and replace
-------------------
-
-Currently the editor copies the whole text to a contiguous memory block
-and then uses the standard regex functions from libc. Clearly this is not
-a satisfactory solution for large files and kind of defeats the whole
-effort spent on the piece table.
-
-The long term solution is to write our own regular expression engine or
-modify an existing one to make use of the iterator API. This would allow
-efficient search without having to double memory consumption.
-
-The used regex engine should use a non-backtracking algorithm. Useful
-resources include:
-
- - [Russ Cox's regex pag](http://swtch.com/~rsc/regexp/)
- - [TRE](https://github.com/laurikari/tre) as
-   [used by musl](http://git.musl-libc.org/cgit/musl/tree/src/regex)
-   which uses a parallel [TNFA matcher](http://laurikari.net/ville/spire2000-tnfa.ps)
- - [Plan9's regex library](http://plan9.bell-labs.com/sources/plan9/sys/src/libregexp/)
-   which has its root in Rob Pike's sam text editor
- - [RE2](https://github.com/google/re2) C++ regex library
-
-Command-Prompt
---------------
-
-The editor needs some form of command prompt to get user input
-(think `:`, `/`, `?` in vim).
-
-At first I wanted to implement this in terms of an external process,
-similar to the way it is done in sandy with communication back to the
-editor via a named pipe.
-
-At some point I thought it would be possible to provide all editor commands
-as shell scripts in a given directory, then set $PATH accordingly and run
-the shell. This would give us readline editing, tab completion, history and
-Unicode support for free. But unfortunately it won't work due to quoting
-issues and other conflicts of special symbols with different meanings.
-
-Later it occurred to me that the editor prompt could just be treated as
-special 1 line file. That is all the main editor functionality is reused
-with a slightly different set of key bindings.
-
-This approach also has the added benefit of further testing the main editor
-component (in particular corner cases like editing at the end of the file).
-
-Editor Frontends
-================
-
-The editor core is written in a library like fashion which should make
-it possible to write multiple frontends with possibly different user
-interfaces/paradigms. The frontend to run is selected based on the
-executable name.
-
-The default, and currently only, interface is a vim clone called vis.
-
-Key binding modes
------------------
-
-The available key bindings for the different modes are arranged in a
-hierarchical way in config.h (there is also an ascii tree giving an
-overview in that file). Each mode can specify a parent mode which is
-used to look up a key binding if it is not found in the current mode.
-This reduces redundancy for keys which have the same meaning in
-different modes.
-
-Each mode can also provide hooks which are executed upon entering/leaving
-the mode and when there was an unmatched key.
-
-vis a vim like frontend
------------------------
-
-The following section gives a quick overview over various vim features
-and their current support in vis.
-
-###  Operators
-
-    d   (delete)
-    c   (change)
-    y   (yank)
-    p   (put)
-    >   (shift-right)
-    <   (shift-left),
-    J   (join)
-    ~   (swap case)
-    gu  (make lowercase)
-    gU  (make uppercase)
-
-### Movements
-
-    h        (char left)
-    l        (char right)
-    j        (line down)
-    k        (line up)
-    gj       (display line down)
-    gk       (display line up)
-    0        (start of line)
-    ^        (first non-blank of line)
-    g_       (last non-blank of line)
-    $        (end of line)
-    %        (match bracket)
-    b        (previous start of a word)
-    B        (previous start of a WORD)
-    w        (next start of a word)
-    W        (next start of a WORD)
-    e        (next end of a word)
-    E        (next end of a WORD)
-    ge       (previous end of a word)
-    gE       (previous end of a WORD)
-    {        (previous paragraph)
-    }        (next paragraph)
-    (        (previous sentence)
-    )        (next sentence)
-    gg       (begin of file)
-    G        (goto line or end of file)
-    |        (goto column)
-    n        (repeat last search forward)
-    N        (repeat last search backwards)
-    f{char}  (to next occurrence of char to the right)
-    t{char}  (till before next occurrence of char to the right)
-    F{char}  (to next occurrence of char to the left)
-    T{char}  (till before next occurrence of char to the left)
-    /{text}  (to next match of text in forward direction)
-    ?{text}  (to next match of text in backward direction)
-
-  An empty line is currently neither a word nor a WORD.
-
-  The semantics of a paragraph and a sentence is also not always 100%
-  the same as in vim.
-
-  Some of these commands do not work as in vim when prefixed with a
-  digit i.e. a multiplier. As an example in vim `3$` moves to the end
-  of the 3rd line down. However vis treats it as a move to the end of
-  current line which is repeated 3 times where the last two have no
-  effect.
-
-  In general there are still a lot of improvements to be made in the
-  case movements are forced to be line or character wise. Also some of
-  them should be inclusive in some context and exclusive in others.
-  At the moment they always behave the same.
-
-### Text objects
-
-  All of the following text objects are implemented in an inner variant
-  (prefixed with 'i') and a normal variant (prefixed with 'a'):
-
-    w  word
-    W  WORD
-    s  sentence
-    p  paragraph
-    [,], (,), {,}, <,>, ", ', `         block enclosed by these symbols
-
-  For sentence and paragraph there is no difference between the
-  inner and normal variants.
-
-### Modes
-
-  At the moment there exists a more or less functional insert, replace
-  and visual mode (in both line and character wise variants).
-
-### Marks
-
-    [a-z] general purpose marks
-    <     start of the last selected visual area in current buffer
-    >     end of the last selected visual area in current buffer
-
-  No marks across files are supported. Marks are not preserved over
-  editing sessions.
-
-### Registers
-
-  Only the 26 lower case registers `[a-z]` and 1 additional default register
-  is supported.
-
-### Undo/Redo and Repeat
-
-  The text is currently snapshoted whenever an operator is completed as
-  well as when insert or replace mode is left. Additionally a snapshot
-  is also taken if in insert or replace mode a certain idle time elapses.
-
-  Another idea is to snapshot based on the distance between two consecutive
-  editing operations (as they are likely unrelated and thus should be
-  individually reversible).
-
-  The repeat command `.` works for all operators and is able to repeat
-  the last insertion or replacement.
-
-### Macros
-
-`[a-z]` are recoginized macro names, `q` starts a recording, `@` plays it back.
-`@@` refers to the least recently recorded macro.
-
-### Command line prompt
-
-  At the `:`-command prompt only the following commands are recognized:
-
-    :nnn     go to line nnn
-    :bdelete close all windows which display the same file as the current one
-    :edit    replace current file with a new one or reload it from disk
-    :open    open a new window
-    :qall    close all windows, exit editor
-    :quit    close currently focused window
-    :read    insert content of another file at current cursor position
-    :split   split window horizontally
-    :vsplit  split window vertically
-    :new     open an empty window, arrange horizontally
-    :vnew    open an empty window, arrange vertically
-    :wq      write changes then close window
-    :xit     like :wq but write only when changes have been made
-    :write   write current buffer content to file
-    :saveas  save file under another name
-    :set     set the options below
-
-     tabwidth   [1-8]
-
-       set display width of a tab and number of spaces to use if
-       expandtab is enabled
-
-     expandtab  (yes|no)
-
-       whether typed in tabs should be expanded to tabwidth spaces
-
-     autoindent (yes|no)
-
-       replicate spaces and tabs at the beginning of the line when
-       starting a new line.
-
-     number         (yes|no)
-     relativenumber (yes|no)
-
-       whether absolute or relative line numbers are printed alongside
-       the file content
-
-     syntax      name
-
-       use syntax definition given (e.g. "c") or disable syntax
-       highlighting if no such definition exists (e.g :set syntax off)
-
-  Each command can be prefixed with a range made up of a start and
-  an end position as in start,end. Valid position specifiers are:
-
-   .          start of the current line
-   +n and -n  start of the line relative to the current line
-   'm         position of mark m
-   /pattern/  first match after current position
-
-  If only a start position without a command is given then the cursor
-  is moved to that position. Additionally the following ranges are
-  predefined:
-
-    %          the whole file, equivalent to 1,$
-    *          the current selection, equivalent to '<,'>
-
-  The substitute command is recognized but not yet implemented. The `!`
-  command to filter text through an external program is also planned.
-
-  History support, tab completion and wildcard expansion are other
-  worthwhile features. However implementing them inside the editor
-  feels wrong.
-
-### Tab <-> Space and Line endings \n vs \r\n
-
-  Tabs can optionally be expaned to a configurable number of spaces.
-  The first line ending in the file determines what will be inserted
-  upon a line break (defaults to \n).
-
-### Jump list and change list
-
-  A per window, file local jump list (navigate with `CTRL+O` and `CTRL+I`)
-  and change list (navigate with `g;` and `g,`) is supported. The jump
-  list is implemented as a fixed sized ring buffer.
-
-### Mouse support
-
-  The mouse is currently not used at all.
-
-### Other features
-
- Things which I would like to add in the long term are:
-
-   + code completion: this should be done as an external process. I will
-     have to take a look at the tools from the llvm / clang project. Maybe
-     dvtm's terminal emulation support could be reused to display an
-     slmenu inside the editor at the cursor position?
-
-   + something similar to vim's quick fix functionality
-
- Things I might add
-
-   + runtime configurable key bindings
-   + visual block mode / multiple selections
-   + text folding
-
- Stuff which vim does which I don't use and have no plans to add:
-
-   - GUIs (neither x11, motif, gtk, win32 ...)
-   - plugins (certainly not vimscript, if anything it should be lua based)
-   - right-to-left text
-   - tabs (as in multiple workspaces)
-   - ex mode
-
-How to help?
-------------
-
-At this point it might be best to fetch the code, edit some scratch file,
-notice an odd behavior or missing functionality, write and submit a patch
-for it, then iterate.
-
-WARNING: There are probably still some bugs left which could corrupt your
-         unsaved changes. Use at your own risk. At this point I suggest to
-         only edit non-critical files which are under version control and
-         thus easily recoverable!
-
-A quick overview over the code structure to get you started:
-
- File(s)             | Description
- ------------------- | -----------------------------------------------------
- `text.[ch]`         | low level text / marks / {un,re}do / piece table implementation
- `text-motions.[ch]` | movement functions take a file position and return a new one
- `text-objects.[ch]` | functions take a file position and return a file range
- `vis.c`             | vi(m) specific editor frontend, program entry point
- `editor.[ch]`       | editor window management
- `window.[ch]`       | ui-independent viewport, syntax highlighting, cursor placement
- `ui.h`              | abstract interface as implemented by user interface
- `ui-curses.h`       | a terminal / curses based user interface implementation
- `buffer.[ch]`       | dynamically growing buffer used for registers and macros
- `ring-buffer.[ch]`  | fixed size ring buffer used for the jump list
- `map.[ch]`          | crit-bit tree based map supporting unique prefix lookups and ordered iteration. used to implement `:`-commands.
- `config.def.h`      | definition of key bindings, commands, syntax highlighting
-
-Hope this gets the interested people started.
-
-TODO
-----
-
-Here is an incomplete list of TODO items and/or ideas for further work
-in no particular order:
-
- * Review and cleanup the existing implementation (e.g. selection handling)
-    - Eliminate global state and expose vis frontend as "library"
- * Implement `:!` using a proper (libuv based?) mainloop
- * Implement `:substitute`
- * Bugfix: editing the same file in multiple windows can cause "corruption"
- * Review/Implement cindent mode #33
- * Add history support to `:`-prompt
- * Implement wordwrap (i.e `gq` and `:set textwidth`) using `fmt(1)` ?
- * Overhaul key bindings to support runtime configuration / streamline config.def.h
- * Implement/review/merge history undo tree
- * Implement a regex engine which works with the iterator API
- * Write [unit test](http://ccodearchive.net/info/tap.html) for the low
-   level `text_*` interface
- * Improve syntax highlighting, investigate whether already existing
-   syntax definitions from other editors could be reused
- * Optimize `text_delete` in case of consecutive delete operations
- * Add a RPC interface, experiment with a client/server architecture and
-   delegate window management to dwm/dvtm
-
-Feel free to ask questions if something is unclear! There are still a lot
-of bugs left to fix, but by now I'm fairly sure that the general concept
-should work.
-
-As always, comments and patches welcome!
-
-Cheers,
-Marc
-- 
cgit v1.2.3