doc: overhaul of the readmes

* README-hacking.md (Working from the Repository): Make it first to
make it easier to find the instructions to build from the repo.
(Implementation Notes): New.
* README: Provide more links.
This commit is contained in:
Akim Demaille
2020-06-28 14:35:55 +02:00
parent e0b0a67b86
commit 160df55c56
3 changed files with 205 additions and 190 deletions

View File

@@ -5,185 +5,6 @@ Everything related to the development of Bison is on Savannah:
http://savannah.gnu.org/projects/bison/.
Administrivia
=============
## If you incorporate a change from somebody on the net:
First, if it is a large change, you must make sure they have signed the
appropriate paperwork. Second, be sure to add their name and email address
to THANKS.
## If a change fixes a test, mention the test in the commit message.
## Bug reports
If somebody reports a new bug, mention his name in the commit message and in
the test case you write. Put him into THANKS.
The correct response to most actual bugs is to write a new test case which
demonstrates the bug. Then fix the bug, re-run the test suite, and check
everything in.
Hacking
=======
## Visible Changes
Which include serious bug fixes, must be mentioned in NEWS.
## Translations
Only user visible strings are to be translated: error messages, bits of the
.output file etc. This excludes impossible error messages (comparable to
assert/abort), and all the --trace output which is meant for the maintainers
only.
## Vocabulary
- "nonterminal", not "variable" or "non-terminal" or "non terminal".
Abbreviated as "nterm".
- "shift/reduce" and "reduce/reduce", not "shift-reduce" or "shift reduce",
etc.
## Syntax highlighting
It's quite nice to be in C++ mode when editing lalr1.cc for instance.
However tools such as Emacs will be fooled by the fact that braces and
parens do not nest, as in `[[}]]`. As a consequence you might be misguided
by its visual pairing to parens. The m4-mode is safer. Unfortunately the
m4-mode is also fooled by `#` which is sees as a comment, stops pairing with
parens/brackets that are inside...
## Coding Style
Do not add horizontal tab characters to any file in Bison's repository
except where required. For example, do not use tabs to format C code.
However, make files, ChangeLog, and some regular expressions require tabs.
Also, test cases might need to contain tabs to check that Bison properly
processes tabs in its input.
Prefer "res" as the name of the local variable that will be "return"ed by
the function.
### Bison
Follow the GNU Coding Standards.
Don't reinvent the wheel: we use gnulib, which features many components.
Actually, Bison has legacy code that we should replace with gnulib modules
(e.g., many ad hoc implementations of lists).
#### Includes
The `#include` directives follow an order:
- first section for *.c files is `<config.h>`. Don't include it in header
files
- then, for *.c files, the corresponding *.h file
- then possibly the `"system.h"` header
- then the system headers.
Consider headers from `lib/` like system headers (i.e., `#include
<verify.h>`, not `#include "verify.h"`).
- then headers from src/ with double quotes (`#include "getargs.h"`).
Keep headers sorted alphabetically in each section.
See also the [Header
files](https://www.gnu.org/software/gnulib/manual/html_node/Header-files.html)
and the [Implementation
files](https://www.gnu.org/software/gnulib/manual/html_node/Implementation-files.html#Implementation-files)
nodes of the gnulib documentation.
Some source files are in the build tree (e.g., `src/scan-gram.c` made from
`src/scan-gram.l`). For them to find the headers from `src/`, we actually
use `#include "src/getargs.h"` instead of `#include "getargs.h"`---that
saves us from additional `-I` flags.
### Skeletons
We try to use the "typical" coding style for each language.
#### CPP
We indent the CPP directives this way:
```
#if FOO
# if BAR
# define BAZ
# endif
#endif
```
Don't indent with leading spaces in the skeletons (it's OK in the grammar
files though, e.g., in `%code {...}` blocks).
On occasions, use `cppi -c` to see where we stand. We don't aim at full
correctness: depending `-d`, some bits can be in the *.c file, or the *.h
file within the double-inclusion cpp-guards. In that case, favor the case
of the *.h file, but don't waste time on this.
Don't hesitate to leave a comment on the `#endif` (e.g., `#endif /* FOO
*/`), especially for long blocks.
There is no consistency on `! defined` vs. `!defined`. The day gnulib
decides, we'll follow them.
#### C/C++
Follow the GNU Coding Standards.
The `glr.c` skeleton was implemented with `camlCase`. We are migrating it
to `snake_case`. Because we are standardizing the code, it is currently
inconsistent.
Use `YYFOO` and `yyfoo` for entities that are exposed to the user. They are
part of our contract with the users wrt backward compatibility.
Use `YY_FOO` and `yy_foo` for private matters. Users should not use them,
we are free to change them without fear of backward compatibility issues.
Use `*_t` for types, especially for `yy*_t` in which case we shouldn't worry
about the C standard introducing such a name.
#### C++
Follow the C++ Core Guidelines
(http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines). The Google
ones may be interesting too
(https://google.github.io/styleguide/cppguide.html).
Our enumerators, such as the kinds (symbol and token kinds), should be lower
case, but it was too late to follow that track for token kinds, and symbol
kind enumerators are made to be consistent with them.
Use `*_type` for type aliases. Use `foo_get()` and `foo_set(v)` for
accessors, or simply `foo()` and `foo(v)`.
Use the `yy` prefix for private stuff, but there's no need for it in the
public API. The `yy` prefix is already taken care of via the namespace.
#### Java
We follow https://www.oracle.com/technetwork/java/codeconventions-150003.pdf
and https://google.github.io/styleguide/javaguide.html. Unfortunately at
some point some GNU Coding Style was installed in Java, but it's an error.
So we should for instance stop putting spaces in function calls. Because we
are standardizing the code, it is currently inconsistent.
Use a 2-space indentation (Google) rather than 4 (Oracle).
Don't use the "yy" prefix for public members: "getExpectedTokens", not
"yyexpectedTokens" or "yygetExpectedTokens".
## Commit Messages
Imitate the style we use. Use `git log` to get sources of inspiration.
If the changes have a small impact on Bison's generated parser, embed these
changes in the commit itself. If the impact is large, first push all the
changes except those about src/parse-gram.[ch], and then another commit
named "regen" which is only about them.
## Debugging
Bison supports tracing of its various steps, via the `--trace` option.
Since it is not meant for the end user, it is not displayed by `bison
--help`, nor is it documented in the manual. Instead, run `bison
--trace=help`.
## Documentation
Use `@option` for options and options with their argument if they have no
space (e.g., `@option{-Dfoo=bar}`). However, use `@samp` elsewhere (e.g.,
`@samp{-I foo}`).
Working from the Repository
===========================
@@ -357,6 +178,196 @@ version, compile bison, then force it to recreate the files:
$ make -C _build
Administrivia
=============
## If you incorporate a change from somebody on the net:
First, if it is a large change, you must make sure they have signed the
appropriate paperwork. Second, be sure to add their name and email address
to THANKS.
## If a change fixes a test, mention the test in the commit message.
## Bug reports
If somebody reports a new bug, mention his name in the commit message and in
the test case you write. Put him into THANKS.
The correct response to most actual bugs is to write a new test case which
demonstrates the bug. Then fix the bug, re-run the test suite, and check
everything in.
Hacking
=======
## Visible Changes
Which include serious bug fixes, must be mentioned in NEWS.
## Translations
Only user visible strings are to be translated: error messages, bits of the
.output file etc. This excludes impossible error messages (comparable to
assert/abort), and all the --trace output which is meant for the maintainers
only.
## Vocabulary
- "nonterminal", not "variable" or "non-terminal" or "non terminal".
Abbreviated as "nterm".
- "shift/reduce" and "reduce/reduce", not "shift-reduce" or "shift reduce",
etc.
## Syntax Highlighting
It's quite nice to be in C++ mode when editing lalr1.cc for instance.
However tools such as Emacs will be fooled by the fact that braces and
parens do not nest, as in `[[}]]`. As a consequence you might be misguided
by its visual pairing to parens. The m4-mode is safer. Unfortunately the
m4-mode is also fooled by `#` which is sees as a comment, stops pairing with
parens/brackets that are inside...
## Implementation Notes
There are several places with interesting details about the implementation:
- [Understanding C parsers generated by GNU
Bison](https://www.cs.uic.edu/~spopuri/cparser.html) by Satya Kiran Popuri,
is a wonderful piece of work that explains the implementation of Bison,
- [src/gram.h](src/gram.h) documents the way the grammar is represented
- [src/tables.h](src/tables.h) documents the generated tables
- [data/README.md](data/README.md) contains details about the m4 implementation
## Coding Style
Do not add horizontal tab characters to any file in Bison's repository
except where required. For example, do not use tabs to format C code.
However, make files, ChangeLog, and some regular expressions require tabs.
Also, test cases might need to contain tabs to check that Bison properly
processes tabs in its input.
Prefer `res` as the name of the local variable that will be "return"ed by
the function.
### Bison
Follow the GNU Coding Standards.
Don't reinvent the wheel: we use gnulib, which features many components.
Actually, Bison has legacy code that we should replace with gnulib modules
(e.g., many ad hoc implementations of lists).
#### Includes
The `#include` directives follow an order:
- first section for *.c files is `<config.h>`. Don't include it in header
files
- then, for *.c files, the corresponding *.h file
- then possibly the `"system.h"` header
- then the system headers.
Consider headers from `lib/` like system headers (i.e., `#include
<verify.h>`, not `#include "verify.h"`).
- then headers from src/ with double quotes (`#include "getargs.h"`).
Keep headers sorted alphabetically in each section.
See also the [Header
files](https://www.gnu.org/software/gnulib/manual/html_node/Header-files.html)
and the [Implementation
files](https://www.gnu.org/software/gnulib/manual/html_node/Implementation-files.html#Implementation-files)
nodes of the gnulib documentation.
Some source files are in the build tree (e.g., `src/scan-gram.c` made from
`src/scan-gram.l`). For them to find the headers from `src/`, we actually
use `#include "src/getargs.h"` instead of `#include "getargs.h"`---that
saves us from additional `-I` flags.
### Skeletons
We try to use the "typical" coding style for each language.
#### CPP
We indent the CPP directives this way:
```
#if FOO
# if BAR
# define BAZ
# endif
#endif
```
Don't indent with leading spaces in the skeletons (it's OK in the grammar
files though, e.g., in `%code {...}` blocks).
On occasions, use `cppi -c` to see where we stand. We don't aim at full
correctness: depending `-d`, some bits can be in the *.c file, or the *.h
file within the double-inclusion cpp-guards. In that case, favor the case
of the *.h file, but don't waste time on this.
Don't hesitate to leave a comment on the `#endif` (e.g., `#endif /* FOO
*/`), especially for long blocks.
There is no consistency on `! defined` vs. `!defined`. The day gnulib
decides, we'll follow them.
#### C/C++
Follow the GNU Coding Standards.
The `glr.c` skeleton was implemented with `camlCase`. We are migrating it
to `snake_case`. Because we are gradually standardizing the code, it is
currently inconsistent.
Use `YYFOO` and `yyfoo` for entities that are exposed to the user. They are
part of our contract with the users wrt backward compatibility.
Use `YY_FOO` and `yy_foo` for private matters. Users should not use them,
we are free to change them without fear of backward compatibility issues.
Use `*_t` for types, especially for `yy*_t` in which case we shouldn't worry
about the C standard introducing such a name.
#### C++
Follow the [C++ Core
Guidelines](http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines).
The [Google ones](https://google.github.io/styleguide/cppguide.html) may be
interesting too.
Our enumerators, such as the kinds (symbol and token kinds), should be lower
case, but it was too late to follow that track for token kinds, and symbol
kind enumerators are made to be consistent with them.
Use `*_type` for type aliases. Use `foo_get()` and `foo_set(v)` for
accessors, or simply `foo()` and `foo(v)`.
Use the `yy` prefix for private stuff, but there's no need for it in the
public API. The `yy` prefix is already taken care of via the namespace.
#### Java
We follow the [Java Code
Conventions](https://www.oracle.com/technetwork/java/codeconventions-150003.pdf)
and [Google Java Style
Guide](https://google.github.io/styleguide/javaguide.html). Unfortunately
at some point some GNU Coding Style was installed in Java, but it's an
error. So we should for instance stop putting spaces in function calls.
Because we are standardizing the code, it is currently inconsistent.
Use a 2-space indentation (Google) rather than 4 (Oracle).
Don't use the "yy" prefix for public members: "getExpectedTokens", not
"yyexpectedTokens" or "yygetExpectedTokens".
## Commit Messages
Imitate the style we use. Use `git log` to get sources of inspiration.
If the changes have a small impact on Bison's generated parser, embed these
changes in the commit itself. If the impact is large, first push all the
changes except those about src/parse-gram.[ch], and then another commit
named "regen" which is only about them.
## Debugging
Bison supports tracing of its various steps, via the `--trace` option.
Since it is not meant for the end user, it is not displayed by `bison
--help`, nor is it documented in the manual. Instead, run `bison
--trace=help`.
## Documentation
Use `@option` for options and options with their argument if they have no
space (e.g., `@option{-Dfoo=bar}`). However, use `@samp` elsewhere (e.g.,
`@samp{-I foo}`).
Test Suite
==========
@@ -366,9 +377,9 @@ examples, and the main test suite.
### The Examples
In examples/, there is a number of ready-to-use examples (see
examples/README.md). These examples have small test suites run by `make
check`. The test results are in local `*.log` files (e.g.,
`$build/examples/c/calc/calc.log`).
[examples/README.md](examples/README.md)). These examples have small test
suites run by `make check`. The test results are in local `*.log` files
(e.g., `$build/examples/c/calc/calc.log`).
### The Main Test Suite
The main test suite, in tests/, is written on top of GNU Autotest, which is
@@ -548,7 +559,8 @@ re-run the tests, run:
Release Procedure
=================
See README-release.
See the [README-release file](README-release), created when the package is
bootstrapped.
<!--