Abstract
My thoughts and notes on how I would like a new programming language to look like.
The unique selling point of the language is:
Automatic memory reclamation without garbage collection.
Other selling points of the language are:
- Simple and elegant. (So that it is suitable for the academia.)
- Expressive. (So that it is suitable for experienced programmers.)
- Consistent. (So that it is attractive to developer teams.)
- Guiding. (So that it promotes best practices.)
- Fast. (So that it is suitable for high performance computing.)
- Lean. (So that it is suitable for resource-constrained computing.)
This is work-in-progress; It is bound to be heavily amended as time passes, especially if I try some new language, like Kotlin or Rust.
Summary of language characteristics
(Useful pre-reading: About these papers)
The main goals of the language are achieved via the following characteristics:
-
For simplicity and elegance:
- Scoping by indentation instead of curly braces. (Similar to python.)
- Keyword-rich syntax which avoids cryptic abbreviations and symbols.
- Clear distinction between what is a statement and what is an expression.
- Automatic memory management.
-
For expressiveness:
- Lightweight properties, user-defined operators, generics, etc.
- Type inference whenever possible.
- Explicit nullability of reference types.
- Full support for functional programming.
- Full support for imperative programming without functional Nazism.
- Async-transparency.
-
For consistency:
- Extensive, mandatory, and in many cases non-suppressible, code inspections.
- Whenever possible, only one way of expressing any given thing.
- Extensive and strict formatting rules ensure all code looks the same.
- Reformatability spares developers from having to type code in a particular way.
-
For performance:
- Strongly typed.
- Primitive value types correspond to machine words.
- Intermediate-code-based, Just-In-Time compiled.
- Fibers. (By means of async-transparency.)
-
For leanness:
- Reference counting instead of garbage collection.
- Minimalistic mandatory runtime library.
- Separate and optional standard library.
-
For a list of shortcomings of other languages, which this language intends to fix, see:
Language characteristics in detail
-
Supports reference types and value types, as C# does.
-
The
nullvalue is valid only with explicitly nullable reference types.- As in C# 8.0 with
#nullable enable:- A non-nullable reference can be used when a nullable reference is expected.
- A nullable reference cannot be used when a non-nullable reference is expected, unless:
- the compiler knows, via data-flow analysis, that the value is not null.
- For example, by means of an if-statement which precludes null.
- the value is explicitly cast to non-null. (As with the “null-forgiving” or “damnit” operator in C#.)
- the compiler knows, via data-flow analysis, that the value is not null.
- However, unlike C#:
- The non-null cast is also an assertion against null, so it does not just circumvent the nullability checks of the compiler, it acts as an if-statement which precludes null.
- Thus, a non-nullable reference can never accidentally hold null.
- It is illegal to apply the non-null cast on a reference that is already non-nullable.
- It is illegal to assign the result of the non-null-cast to a nullable reference.
- As in C# 8.0 with
-
Compiles into an intermediate code format. There are two possibilities:
- LLVM.
- A new intermediate code format called ObjectCode, which is either interpreted or further compiled into machine code by a Just-In-Time (JIT) compiler.
- Functionally, ObjectCode is a stack machine language, just as JVM ByteCode is.
- ObjectCode is expressed as a hierarchical data structure.
- A binary ObjectCode file is the result of the serializing that data structure into a binary stream. Serialization into a text stream should also be possible.
- ObjectCode is not trying hard to look like machine language, the way JVM ByteCode does. For example:
- Instructions have no alternative short-form versions that accomplish the same thing but with fewer bytes.
- There are no instructions for operations between
Integer,Real,Boolean, etc; instead, these operations are available as methods exposed by those value types.
- Very few instructions have knowledge of any particular data type:
- Boolean operations have knowledge of the
booleantype. (So that the compiler can apply short-circuit evaluation and branching.) - The
throwinstruction has knowledge of theExceptiontype. - The
switchinstruction has knowledge of theintegertype.
- Boolean operations have knowledge of the
- No unnecessary JVM gimmicks like bytecode verification, stack verification, etc.
-
Executable code is packaged into modules which correspond to C# assemblies.
- So, no myriads of class files floating around.
- Each class in a module has its own timestamp.
- When a module is being made, unchanged classes are copied verbatim from the old module instead of being recompiled, thus retaining their timestamps.
-
For the benefit of benchmarking, the runtime environment can be programmatically instructed to JIT everything at once so that nothing gets interpreted from that moment on.
-
Async-transparency and fibers.
- Looks and feels synchronous, but works asynchronously under the hood.
- A function can be declared as
async; this signifies that the function works asynchronously, but nothing else changes:- When invoking: you call it and obtain its result just as with any other function.,
- When implementing: you just return a result, just like any other function.
- When an async function is invoked, the compiler does not emit a direct invocation to the function; instead, it invokes a special InvokeAsync function of the runtime, which accepts the function to be invoked as a parameter, and returns the result returned by the function. So, it looks as if the runtime will invoke the target function, block-waiting for it to complete, and return the result. However, the runtime does the following instead:
- starts the asynchronous operation,
- obtains a promise under the hood,
- sets aside the promise and the current stack,
- proceeds to do other stuff.
- When the promise is satisfied, the runtime:
- gets the return value from the promise,
- switches back to that stack,
- continues execution from there.
- Note: something called “_hyperscript” already purports to support async-transparency; I do not know whether they switch stacks or pass promises/futures under the hood all over the place. See https://hyperscript.org/docs/#async
- Note: this is related to OpenJDK JEP 425: Virtual Threads. See https://openjdk.org/jeps/425
- An abstraction of an
EventDriveris provided, which encapsulates an event driven system. AConcreteEventDriveris provided, which is a default (“reference”) implementation of an event-driven system. - The
EventDriverdoes not contain apostmethod; instead, it exposes anInjectorinterface, which does. So, code that only needs topostonly needs to have access to anInjector, not to the wholeEventDriver. - Threads and thread-pools exist for interfacing with legacy systems; the preferred way of working is with fibers and fiber-pools.
- Each fiber-pool has its own event-driver.
- TODO: describe exactly what a fiber is.
- TODO: describe how a fiber exposes a proxy for invocation from other fibers and how the proxy asserts that everything passed back and forth is either thread-safe or immutable.
- Note that in multi-threaded execution models purity is of very limited usefulness because it does not prevent reading mutable state, so it does not avoid race conditions. However, this language makes use of fibers instead of threads, so there can be no race conditions, so purity becomes useful.
-
Support for functional programming.
- For example:
- Lambdas.
- Tuples.
- Everything is read-only by default.
- The keyword
mutablemust be used to denote something which may vary. (Scala’svarandvalare too cryptic and too similar; mutability must stand out like a sore thumb.) - So, the syntax for declaring a mutable local integer is:
mutable local x: integer
- It is an error to declare something as
mutableand forget to ever mutate it.
- The keyword
- All interfaces are pure by default.
- A special keyword
impuremust be used to denote an interface which is allowed to contain impure methods. - It is an error to declare an interface as
impureand forget to include any impure methods in it.
- A special keyword
- Methods can be either pure or impure, and this has severe implications on what they may and may not do.
- Most language constructs like
if,for,while,switchetc. have both functional and imperative forms.- The functional forms must be pure; the imperative forms can be impure.
- The functional forms may not use flow-control keywords that would affect enclosing scopes; in other words,
- A functional construct may not use the
returnkeyword to exit the current function - A functional construct may not use the
breakorcontinuekeywords to exit or repeat an enclosing loop.
- A functional construct may not use the
- The functional forms make use of the
yieldkeyword to produce values. So, the functionalifstatement isif( x ) yield 5; else yield 6; - Functional loops evaluate to
Enumerableand each execution ofyieldproduces a new element.
- The standard library offers various monads like
Optional,Try, and other common functional goodies. - The standard collections support fluent constructs.
- The functional constructs are like Scala’s collections, which means that they are somewhat like C#’s linq and not like Java’s collection streams.
- There is no support in the standard collections for parallelization.
- No such thing as the
reforoutparameters of C#.
- However:
- No functional Nazism.
- No obstacles to having mutable state, other than having to use an extra keyword here and there.
- A proper
forloop.- Even the functional version of the
forloop is a first-class language construct, not yet another higher order function. - Thus, when single-stepping through code, you do not have to remember to use step-into instead of step-over in order to skip the header of the loop and reach the body of the loop.
- Even the functional version of the
- Proper
breakandcontinuekeywords. - Freedom to re-assign parameters.
- Thus making the original value inaccessible.
- To allow this, the
mutablekeyword must be added to the parameter. - The
mutablekeyword on a parameter has no meaning for the caller of the method, and therefore does not become part of the method prototype.
- Everything that can be accomplished functionally can also be accomplished imperatively.
- No functional gimmicks.
- The expression evaluated last within a function does not magically become the return value of the function without a
returnstatement;returnstatements cannot simply be omitted. Same foryieldstatements. - No copy-on-mutation collections.
- No such thing as Scala’s
Unit. Two approaches are possible:- We maintain a clear distinction between functions and procedures, in which case
Unitis unnecessary just asvoidis unnecessary. - Everything is a function, but instead of
Unitwe stick to good old familiarvoid, which now becomes an actual data type of which there exists only one instance.- Normally, the instance of void should never need to be accessed, (and therefore might not even be accessible,) because it is implied when necessary. For example, the statement
returnis equivalent toreturn void.instance.
- Normally, the instance of void should never need to be accessed, (and therefore might not even be accessible,) because it is implied when necessary. For example, the statement
- We maintain a clear distinction between functions and procedures, in which case
- The expression evaluated last within a function does not magically become the return value of the function without a
- The compiler makes a very clear distinction between statements and expressions.
- A block scope consists of statements.
- Statements and expressions are not interchangeable:
- A statement may contain expressions, but an expression may not contain statements.
- An expression cannot appear in place of a statement.
- A statement cannot appear in place of an expression. (With the possible exception of throwing an exception.)
- Most languages allow invoking a function and ignoring its return value; we put an end to that abhorrent malpractice.
- When a statement is expected, and we use something which yields a value, that value must be dealt with, in order to be left with a statement and not an expression.
- The language might provide a mechanism for ignoring a value, (perhaps a cast to void?) but this can also be accomplished by invoking a void-returning method which accepts one parameter and just ignores it.
- Assignment is a statement, and it requires the use of the
letkeyword, as inlet a = 5;unless a field or local is being declared and initialized at once, in which case theletkeyword is omitted, as inlocal a = 5;This has some drawbacks and some benefits:- Drawback: We cannot initialize multiple variables in one go, as in
let a = b = c = 5;because everything after the first=must be an expression. That’s inconsequential, perhaps even arguably a benefit. - Drawback: We cannot assign and compare in one go, as in
if( ( let a = f() ) > 5 )...because assignment is a statement, so it cannot be used inside an expression. That’s inconsequential, perhaps even arguably a benefit. - Benefit: since the compiler can always tell whether it is compiling a statement or an expression, it can treat certain things differently depending on whether they appear in a statement or an expression. Namely, the equals sign can now be used either in a statement, as the assignment operator, or in an expression, as the equality check operator.
- Thus, after so many decades, we can finally say good-bye to the inelegant double-equals (
==) legacy of C, and start using the single equals sign for equality comparison, as it was always meant to be. - The inequality operator can either stay as
!=or become<>.
- Drawback: We cannot initialize multiple variables in one go, as in
- The prefix and postfix increment operators are problematic because they are expressions with side-effects, (they both mutate an existing value and yield a new value,) so we might disallow them, and require the use of the long form instead:
let x = x + 1;- If we keep them, then they will certainly only be allowed in expressions.
- (You could make it a statement with
(void) x++;but why would you?)
- We keep
staticas in Java and avoid Scala’s inelegant companion objects. - There is no support in the standard collections for parallelization.
- When declaring a lambda, the keyword
functionmust be used. - When declaring a tuple, the keyword
tuplemust be used.
- No functional Nazism.
- For example:
-
Everything is private by default, unless explicitly given a higher visibility.
- Therefore, the language does not have a keyword to indicate that something is private.
- Note that this also applies to interface methods: if you want an interface method to be public, you have to declare it as public, otherwise it stays private and may only be invoked from other methods of the same interface.
-
Everything is non-inheritable by default, unless explicitly declared as inheritable.
- (Except for interfaces, which are by definition inheritable.)
- Therefore, the language does not have a keyword to indicate that something is non-inheritable (sealed in C#, final in Java.)
- Note that this also applies to interface methods: if you want an interface method to be overridable, you have to declare it as overridable.
- This makes certain other rules unnecessary, for example we do not have to stipulate that it is an error to explicitly declare a method as non-overridable in a class which has already been declared as non-inheritable.
- It is an error to declare something as inheritable and fail to ever inherit from it.
- This is enforceable because inheritance is confined within a module, so all members of an inheritance hierarchy are known during the compilation of the module.
-
Emphasis on purity.
- There are two ways we can go about this, and which way we will go is yet to be decided.
- Procedures and functions
- A method can be either a procedure or a function.
- A procedure:
- Does not return anything.
- Is impure. (Must have at least one side-effect.)
- Can indicate failure only by means of throwing an exception.
- A function:
- Returns something.
- Can indicate failure either by throwing an exception or by returning
a
Trymonad. - Is pure. (Must have no side-effects.)
- Experimental idea: the keyword
methodcan be used to denote a higher order method which is either a procedure or a function depending on whether its parameter is a procedure or a function.- It must have a parameter declared as
methodinstead of the more specificprocedureorfunction. - It may have additional parameters that are explicitly
functionorprocedure. - It must treat its parameter method as a function, meaning that when it invokes that method, it must obtain a return value from it.
- It can be coded as a function, meaning that it can return that value.
- From the point of view of the caller, it behaves either as a procedure or as a function depending on whether the caller passes a procedure or a function to its method parameter.
- The caller may actually pass yet another a method to it, in which case the caller is in turn a method instead of a procedure or function.
- Such a construct would eliminate the need to declare both a function and a procedure for each higher order operation, and at the same time avoid the inelegance of
Unit.
- It must have a parameter declared as
- Pure and impure methods
- All methods are functions.
- Methods that have nothing to return must be declared to return
void, which is equivalent to Scala’sUnitin the sense that it is an actual data type of which there exists only one instance.- Thus, void-returning and non-void-returning functions can be treated in exactly the same way in all situations. For example:
- From within a
voidfunction we can use thereturnkeyword to return the result of invoking anothervoidfunction.
- From within a
- This in turn means that a single higher order function can operate both on void-returning and non-void-returning functions.
- Thus, void-returning and non-void-returning functions can be treated in exactly the same way in all situations. For example:
- Impure methods must be explicitly marked with the
impurekeyword. - An impure method may return either void or non-void.
- A pure method must return non-void. (It would not make sense to return void, because it cannot perform any side-effects, so its sole reason of existence is to return something.)
- Procedures and functions
- In all cases:
- A pure method / function:
- May not assign to any field of
this. - May not invoke any impure methods / procedures on any of its parameters, including
this. - May not escape an impure interface of any of its parameters, including
this.- It is okay to escape pure interfaces, since there will be no side-effects.
- May still declare and manipulate mutable locals, including the ability to escape mutable locals or impure interfaces thereof.
- It would be nice to be able to say that a pure method / function can never throw an exception; however, we cannot do that, because even a pure method / function can, for example, accidentally divide by zero.
- May not assign to any field of
- A pure method / function:
- Mechanisms are provided whereby purity checks can be suppressed when necessary, in order to allow for functions which, although formally pure, may under the hood modify caches, update statistics, perform diagnostic I/O, etc.
- There are two ways we can go about this, and which way we will go is yet to be decided.
-
Emphasis on readability, at the expense of terseness when necessary.
- Typing is not one of the major problems faced by our profession; unreadable code is.
- The language should be suitable for universities to teach, so unlike Scala, it needs to have a low entry barrier.
- All language keywords are fully spelled out and avoid unnecessary technicalities .
- No inelegant abbreviations like
fun,def,mut, etc. - A function is denoted by
function. (Duh!) - A field is denoted by
field. (Duh!) - A mutable field is denoted by
mutable field. (Duh!) - A local is denoted by
local. (Duh!) - A mutable local is denoted by
mutable local. (Duh!) - The Boolean type is
boolean, notbool. - The Integer type is
integer, notint.- Nobody will ever have to type
i,n,t,e,g,e,r, because any halfway decent code editor will give youintegerif you just typei, hitCtrl+Spaceto open up auto-completion, and thenEnterto pick the first suggestion.
- Nobody will ever have to type
- The Long Integer type is
long integer, notlong. - The 64-bit IEEE floating point type is
real, notdouble. - The 32-bit IEEE floating point type is
short real, notfloat.
- No inelegant abbreviations like
- In general, the language aims to reduce the amount of parentheses.
- Expressions may not be parenthesized, only sub-expressions may.
- So, the popular construct
return (result)is not just redundant; it is actually a compiler error.
- So, the popular construct
- Expressions may not be parenthesized, only sub-expressions may.
- In general, the language favors words over punctuation, so:
- Inheritance by means of
extendsandimplementskeywords as in Java instead of the:character of C#. - Fully spelled out
for each a in b dolike C# instead of thefor( a : b )of Java. - Boolean operators are words, like Pascal and Python and unlike the
C family.
- i.e. the operators are
and,or, andnotinstead of&&,||, and!.
- i.e. the operators are
- The compiler handles boolean operators, applying operator precedence and short-circuit evaluation.
- The compiler maps all other operators to method calls, (observing operator precedence rules,) as follows:
a + bmaps toa.Plus( b ).a - bmaps toa.Minus( b ).a * bmaps toa.Times( b ).a / bmaps toa.Per( b ).a % bmaps toa.Modulo( b ).a ^ bmaps toa.Power( b ).a = bmaps toa.Equals( b ).a < bmaps toa.Below( b ).a > bmaps toa.Above( b ).a != bmaps tonot a.Equals( b ). (*)a <= bmaps tonot a.Above( b ). (*)a => bmaps tonot a.Below( b ). (*)-amaps toa.Negative.~amaps toa.TwosComplement.++amaps toa.PreIncrement().a++maps toa.PostIncrement().
- So, when we code
a + b, this will only compile if the type ofahas a function calledPluswith a parameter of the type ofb. - (*) These negations are meant to save us from having to have negative forms of the functions; I think they are okay; it remains to be seen if there are situations where this will not work. NaN comes to mind as a possible pitfall, but then again a comparison against NaN should perhaps throw an exception.
- Inheritance by means of
-
Preference towards having only one way for any given thing.
- When multiple ways of accomplishing the same thing are conceivable, the language design tries, when possible, and when it makes sense, to make a specific choice and prohibit all other ways. For example:
- When it is unnecessary to qualify an instance member with
this, it is an error to qualify it. - When it is unnecessary to qualify a static member with the class name, it is an error to qualify it.
- When the body of the “then” part of an
ifstatement never falls through (because it ends with either areturnor athrowstatement) it is an error to use theelsekeyword.
- When it is unnecessary to qualify an instance member with
- When multiple ways of accomplishing the same thing are conceivable, the language design tries, when possible, and when it makes sense, to make a specific choice and prohibit all other ways. For example:
-
Encapsulation:
- A nested scope has access to private members of the enclosing scope.
- The enclosing scope never has access to private members of nested scopes.
- Note that this corrects the insanity of Java which allows an enclosing class to have access to private members of nested classes. (Duh!?)
- When a source file declares a namespace as public, only the classes in that source file are exported.
- This stipulation is necessary since multiple source files may declare a namespace, but only some of those source files might declare the namespace as public.
- A module may expose interfaces, enums, records (value types), and classes. However, when a module exposes a class, what actually gets exposed is only the interface of that class, not a class itself. In other words, the language will never expose across modules the constructor of a class, nor its protected methods. This has some very interesting implications:
- All classes participating in an inheritance hierarchy must be defined within a single module: One cannot extend a class defined in another module.
- All classes participating in an inheritance hierarchy are known during the compilation of the module that contains the hierarchy.
- This allows for certain useful optimizations.
- The creation of a new instance of a class defined in another module cannot be accomplished by invoking a constructor; it can only be accomplished via a factory method.
-
Memory management: Reference counting instead of garbage-collection.
- The memory model looks a lot like the memory model of Java and C#:
- The heap consists of big chunks of memory that are allocated from the operating system at once. The runtime does its own memory management within these chunks, for efficiency.
- Objects are actually pointers to objects that live on the heap.
- Pointers cannot be manipulated as they can in C++.
- Value types live either in local storage or as members of other types.
- When necessary, value types can be treated as reference types by means of boxing.
- Pointers are implemented as smart (shared) pointers, so that:
- There is no need for garbage collection.
- There is no need for each object to have its own lock.
- There is no need for finalization.
- There are no preposterous situations like object resurrection.
- There are fewer sources of randomness and non-determinism in the memory layout and in the responsiveness of the code.
- Destruction is assured and immediate the moment an object ceases to be referenced.
- Destruction involves real destructors as in C++.
- While a destructor executes, all objects referenced by the object being destructed are guaranteed to still be present and alive. (Unlike garbage-collected languages, where finalizers have to cope with the fact that some of the referenced objects may have already been collected.)
- The reference count is accommodated in the object itself, so smart pointers can be appreciably more lightweight than in C++.
- The runtime may choose to implement smart pointers using double indirection, so as to be able to perform memory defragmentation.
- Addressing the pitfalls of reference counting:
- Reference counting suffers from two pitfalls:
- Long reference chains:
- May result in stack overflow when disposed.
- Circular references:
- Result in memory leaks.
- Long reference chains:
- We address these pitfalls as follows:
- Long reference chains:
- We solve this by making destructors deliberately fail if they are ever re-entered, so that we can detect the deallocation of even the smallest chain that consists of only two nodes. The programmer can then modify their code to do one of the following:
- Manually perform the destruction of the chain in a way that avoids recursion.
- Refactor things so that the objects are kept in a collection instead of forming an ad-hoc chain.
- Explicitly unlink and destroy the chain using the
delete chainkeyword, which works in a non-recursive way.
- We solve this by making destructors deliberately fail if they are ever re-entered, so that we can detect the deallocation of even the smallest chain that consists of only two nodes. The programmer can then modify their code to do one of the following:
- Circular references:
- A debug-time-only mark-sweep checker that runs on its own thread detects leaked cyclic object graphs and warns the programmer about them. (It does not attempt to fix anything.) The programmer can then modify their code to do one of the following:
- Break any cycles in the graph before unlinking it.
- Explicitly unlink and destroy the cyclic graph using the
delete cyclickeyword, which gracefully handles cyclic object graphs.
- A debug-time-only mark-sweep checker that runs on its own thread detects leaked cyclic object graphs and warns the programmer about them. (It does not attempt to fix anything.) The programmer can then modify their code to do one of the following:
- Long reference chains:
- These means of addressing the pitfalls of reference counting are not perfect, so some extra maintenance will sometimes be required. For example, we might think that we are properly handling all cyclic object graphs, but as a result of a change somewhere, we may now discover that we have a new cyclic object graph, which we must deal with; Still, the extra trouble is expected to be rare, and it is expected to be very well worth all the trouble we save by not having to have a garbage collector.
- Reference counting suffers from two pitfalls:
- Of interest: https://verdagon.dev/blog/hybrid-generational-memory
- The memory model looks a lot like the memory model of Java and C#:
-
Constructor syntax like Scala.
- Constructor parameters in the class header.
- Constructor code in the class body. (With the additional restriction that it must all appear up-front.)
- Additional constructors by means of static factory methods.
- Any constructor parameters that are referenced by methods automatically become fields so that we do not have to declare extra fields and initialize them from the parameters.
-
Strong distinction between release runs and debug runs.
- (But not necessarily different builds; Optimization is a JIT concern.)
-
Externally supplied constant values.
- A special type of constant can be defined, whose value is not specified in the source code, and must instead be supplied later:
- During compilation, by means of a special parameter to the compiler, or
- At runtime, by means of a special parameter to the launcher.
- These constants are better than C-style “manifest constants” and C#-style “defined symbols” because they are well defined, strongly typed, mandatory, and obey normal static immutable field rules. This means that:
- It is possible to know the set of all external constants that must be defined in order to compile and run something.
- An attempt to compile or run something without supplying all external constant values will always result in an error.
- An attempt to supply an external constant value for a non-existent external constant will always result in an error.
- Each externally supplied constant value must be of the correct type expected by the constant declared in the code.
- When using external constants for conditional compilation, the code paths that are not selected will result in no code being generated, but must still pass compilation, so there is no danger of code rot.
- With some help from the loader we can write tests that exercise code under different values for runtime-supplied external constants.
- A special type of constant can be defined, whose value is not specified in the source code, and must instead be supplied later:
-
Integer types:
- Fixed Integer types with explicitly defined sizes, as per C#.
- Flex integer types whose size is determined by the runtime according to what is most efficient for the underlying hardware architecture.
- Each flex integer has a “Guaranteed Width”, which is the minimum width that this integer is guaranteed to have on any hardware architecture.
These widths are:
- 8 bits for
tiny integer - 16 bits for
short integer - 32 bits for
integer - 64 bits for
long integer
- 8 bits for
- On debug runs, the runtime checks all operations on flex integers, and if there is an overflow past the guaranteed width, a runtime exception is thrown. Thus, we ensure consistent flex integer behavior on any architecture.
- This corrects the narrow-mindedness of C# where
inthas been defined to be exactly 32 bits long, even on architectures with a larger machine word size. (Which is pretty much all major architectures today that 64-bit is the norm.)
- Each flex integer has a “Guaranteed Width”, which is the minimum width that this integer is guaranteed to have on any hardware architecture.
These widths are:
-
Full set of signed and unsigned integers as per C#, both for the fixed and flex flavors.
-
Exceptions
- Lightweight exceptions that are inexpensive to throw and to catch.
- No such thing as the “checked” exceptions of Java.
- No extra baggage:
- The base
Exceptionclass does not even have a “message”, let alone a “localized message”.
- The base
- The
ToString()method of the baseExceptionclass:- Is not overridable.
- Yields a string consisting of the class name of the exception followed by the name and the string representation of the value of each one of its fields, obtained using reflection.
- If you want an exception to result in a human-readable error message that you can actually show to an end user, you have to accomplish this entirely by yourself. (Please make sure to do this in the end-user’s native language, which, statistically speaking, is unlikely to be English.)
-
Standard Collections Model
- The standard language runtime provides the following:
- An assortment of unmodifiable collection interfaces:
Enumerable,Collection,List,Map, etc. Enumerableexposes a property for accessing the current element, and separate methods for checking whether there exist more elements and for advancing to the next element, as in C#.- A
Collectionis anEnumerablewith a length and the ability to check whether it contains a certain element, as in Java. Mapis also a collection ofMap.Entry.- This is as in C#, where a
Dictionaryis a collection ofKeyValuePair. - This is unlike Java, where
Mapis not a collection, and in order to obtain the collection of entries you must invokeMap.entrySet().
- This is as in C#, where a
- Factory methods create immutable collection classes implementing the unmodifiable collection interfaces.
- An assortment of “rigid” (i.e. mutable, but structurally immutable) interfaces which extend the unmodifiable interfaces adding methods to replace existing items but no methods to add or remove items:
RigidEnumerable,RigidCollection,RigidList,RigidMap, etc. - An assortment of mutable collection interfaces which extend the rigid interfaces adding add/remove/clear methods:
MutableEnumerable,MutableCollection,MutableList,MutableMap,Queue,Stack, etc. - A
MutableCollectionsfactory exposing methods that create mutable collection classes implementing the mutable collection interfaces. - The
Valuescollection of a mutable map returns aRigidCollectionof map values, so that:- You can replace an element in this collection, which will have the side-effect of associating an existing key with a new value.
- You cannot add an element to this collection, which makes sense because you have no means of specifying the key that should map to the newly inserted value.
- The method for adding an item to a collection is called ‘Add’, not ‘Push’.
- For consistency, even the
Stackcollection exposes anAddmethod, not aPushmethod.
- For consistency, even the
- An assortment of unmodifiable collection interfaces:
- Collaboration between the language runtime and collections:
- The for-each loop operates on
Enumerable.- The loop variable can be reassigned, causing the current element of the Enumerable to be replaced with a new value. In this case, the for-each loop requires a
RigidEnumerable. - A special keyword allows removing the current item, in which case the for-each loop requires a
MutableEnumerable. - Since we have proper destructors, there is no need for special handling of disposable enumerators. (Something which C# provides, but Java lacks.)
- The loop variable can be reassigned, causing the current element of the Enumerable to be replaced with a new value. In this case, the for-each loop requires a
- An array literal evaluates to an instance of
RigidList, so the language is free from arrays, like Scala.
- The for-each loop operates on
- The standard language runtime provides the following:
-
Heavy promotion of assertions and plenty of built-in extra error-checking on debug runs, such as:
- Arithmetic checking
- An exception is thrown when any of the following occurs:
- Division by zero.
- Fixed integral type overflow. (This can be selectively suppressed on an individual expression basis as with the “unchecked” keyword of C#.)
- Flex integer guaranteed width overflow.
- (Possibly) Operations on NaNs.
- An exception is thrown when any of the following occurs:
- Throwing Switches
- If the switch data type is exhaustively switchable (e.g. boolean):
- It is an error if not all cases are covered and no default case is provided.
- It is an error if all cases are covered and a default case is provided.
- If the switch data type is not exhaustively switchable (e.g. integer):
- If no default case is provided, an implicit default case is supplied by the compiler which throws an exception.
- This plays nicely with code coverage: no more uncoverable assertions in unreachable default clauses.
- If you want a switch statement with default case fall-through on a non-exhaustively switchable type, add an empty
defaultcase. (Duh!)
- If the switch data type is exhaustively switchable (e.g. boolean):
- Arithmetic checking
-
Big on warnings and errors.
- Most things traditionally thought of as warnings are errors.
- Most checks of the kind that IntelliJ IDEA calls “inspections” are built-into the language as warnings, many of them even as errors.
- Selective warning suppression only; no bulk suppression.
- Warning suppression is possible only on the individual statement where the problem occurs, never on a larger scope.
- Warnings always cause compilation to fail.
- It is as if a “treat warnings as errors” option is always on and cannot be turned off.
- The difference between warnings and errors is not that you can ignore warnings and proceed to run; the difference is that a warning can be suppressed, whereas an error cannot.
- Furthermore, the language designates a message as a warning or an error based not on its severity, but instead on whether the programmer can reasonably be required to fix it or not.
- If it is reasonable to require the programmer to fix it, then the programmer better fix it, so there is no need to be able to suppress it, so it is an error.
- If it is unreasonable to require the programmer to fix it, then the programmer should be able to suppress it, so it is a warning.
- For example:
- If you have an unused import statement, you can very easily remove that import statement, so it is reasonable to require you to fix it. Therefore, the “unused import” message is an error.
- If you have marked something as deprecated, and yet you must still make use of it in a couple of places until the day that it gets completely removed, then you have no way of fixing this problem, therefore you must be allowed to suppress it, therefore the “use of deprecated symbol” message is a warning. You will, however, have to explicitly suppress that warning on each and every usage of that symbol.
- A warning suppression on a statement that does not actually produce a warning is an error.
-
Syntax:
- Line-oriented, with scoping dictated by indentation (roughly as in Python) instead of curly braces.
- Since it is very difficult (if not impossible) to express indentation rules in a formal grammar, this is handled by the tokenizer:
- When the indentation increases, the tokenizer emits a hidden scope-start token.
- When the indentation decreases, the tokenizer emits a hidden scope-end token.
- The tokenizer also handles line breaking and line joining, so that the parser ends up parsing a C-style language.
- Since it is very difficult (if not impossible) to express indentation rules in a formal grammar, this is handled by the tokenizer:
- There are two types of statements: simple and compound.
- A simple statement occupies a single line; it may contain expressions, but it may not contain any nested scopes.
- A compound statement begins with a simple statement as a header, and is followed by a dependent scope.
- A scope contains statements, which may in turn be either simple or compound.
- Some constructs that normally correspond to compound statements (e.g. the
ifstatement) also come in “expression form”.- The
forloop does not have an expression form, due to the extra complexity of the multiple statements that it contains; however, thefor-eachloop does come in expression form.
- The
- Normally, each simple statement must be on a separate line.
- To allow joining multiple simple statements in one line, a special line-joining punctuation is used, which is the semicolon.
- Therefore, the semicolon is illegal at the end of a line.
- Normally, an entire simple statement must be contained within a single line; in other words, a simple statement may not span multiple lines.
- To allow splitting a simple statement into multiple lines, a special line-splitting construct is used. This construct is to be determined:
- It may be a backslash at the end of the line that is being split into the next
- It may be double the amount of indentation on the next line, signifying that it belongs to the previous one.
- It may be both of the above.
- Line-oriented, with scoping dictated by indentation (roughly as in Python) instead of curly braces.
-
Formatting:
- The code formatting style of the language is thoroughly and unambiguously defined by an extensive set of rules.
- Some degree of freedom is allowed, but even that is unambiguously controlled by special punctuation that exists specifically for that purpose.
- This means that the formatting of a source file is thoroughly, accurately, and deterministically predictable from the language formatting rules and the punctuation present within the file.
- This in turn allows code editors that can:
- at any moment reflow an entire source file to its proper format, or even:
- continuously reflow code, as it is being typed, to its proper format.
- This in turn allows a compiler which imposes strict enforcement of the formatting rules, so that the slightest deviation, even by a single space, is a compiler error.
- This brings us to the following paradox:
- Even though the formatting rules are extremely detailed,
- And even though the enforcement of the formatting rules is draconian,
- The programmer never has to worry about code formatting, because it is being taken care of automatically.
- The benefit of all this is that all code by all programmers will always have the exact same formatting, and yet no programmer will ever have to be bothered with having to type code in a specific way.
- (It will also make the language parser slightly faster.)
- Some indicative highlights of the formatting rules:
- Tabs for indentation
- The
tabcharacter denotes indentation, and may only appear at the beginning of a line; it is prohibited anywhere else. - Only the
tabcharacter may be used to denote indentation; the use of anything else to denote indentation, including the space character, is an error. - It is an error to have indentation in a line which is otherwise blank.
- The
- The language defines where a space may and may not appear.
- When a space is expected, exactly one space must be given. (For example, right after a comma.)
- When zero spaces are expected, exactly zero spaces must be given. (For example, right before a comma.)
- Note that this prevents tabular code formatting, which is the practice of inserting spaces to column-align similar parts of consecutive statements.
- That is okay, because tabular code formatting is a bad idea anyway, since it is a source of needless git merge conflicts.
- In any case, if some folks really need tabular code formatting, they can achieve it via spacing comments ( /* */ ).
- The language strictly defines when and how blank lines may be used. For example:
- There must never be two consecutive blank lines anywhere, at all, under any circumstances, for any reason, ever.
- There must always be exactly one blank line before a block comment. (Even a single-line block comment.)
- If you want a comment without a blank line, then use a line comment instead of a block comment.
- There must never be a blank line anywhere else, including:
- Between method definitions.
- This allows us to define whole groups of single-line methods without wasting a lot of screen real estate.
- If you want blank lines between method definitions, add a block comment before each method definition; thus, a blank line will be mandatory before the block comment.
- Between lines of code.
- Most programmers have the habit of using blank lines within method bodies, to separate logical groups of lines of code. This is bad practice, because only the programmer who wrote the code knows why those lines form a separate group and why that group should stand out from the rest.
- If you have multiple conceptually distinct groups of lines of code within a single method, then either:
- Add block comments explaining what each group does, (in which case a blank line before the block comment is mandatory,) or
- Move each group into a separate function, and give the function a descriptive name.
- The language supports functions nested within functions, so you can do this without polluting the namespace of the class.
- The language uses no curly braces, so you will not be wasting a lot of screen real estate in doing so.
- Between class definitions.
- This allows us to define whole groups of single-line classes without wasting a lot of screen real estate. Admittedly, single-line classes are rare, so let’s just say that this rule exists just for consistency.
- Between method definitions.
- Special formatting punctuation allows overriding language default formatting rules on a case per case basis. For example:
- A “line splitter” is a special punctuation character which allows splitting a construct into multiple lines when the language formatting rules would have normally required that construct to be all in one line.
- For example: the language formatting rule for expressions is that an expression must fit in one line; so, if an expression needs to span multiple lines, a line splitter must be used to indicate precisely at which point the expression is to break into the next line.
- The use of a line splitter in a place where it is not required is an error.
- The “line joiner” is a special punctuation character which allows a construct to appear all in one line when the language formatting rules would have normally required that construct to be split into multiple lines.
- For example: the language formatting rule for methods is that the body of the method must be on a separate line from the prototype. So, if a very short method needs to fit entirely in one line, a line joiner can be used to allow this.
- The use of a line joiner in a place where it is not required is an error.
- A “line splitter” is a special punctuation character which allows splitting a construct into multiple lines when the language formatting rules would have normally required that construct to be all in one line.
- Tabs for indentation
- Capitalization
- The language is case sensitive, and capitalization matters a lot more than in other languages.
- Identifier casing must be one of the following:
- lowercase
- SentenceCase
- kebab-case
- SentenceKebab-case
- Note that kebab-case is possible because the language mandates spacing around operators, so there is no possibility to confuse an identifier containing a dash with the dash operator between two identifiers.
- The following are expressly disallowed:
- The dash as first or last character of an identifier.
- camelCase.
- UPPERCASE and SCREAMING-KEBAB-CASE.
- Two or more consecutive capital letters.
- For an explanation why, see the following section about spell-checking.
- Separate capital letters with dashes; for example,
XSpacingis not allowed, butX-Spacingis fine. - Do not use acronyms; use either:
- fully spelled out words, i.e.. “GraphicalUserInterfaceStyle”, or
- words that replace acronyms, i.e. “GuiStyle”.
- Underscores and all forms of snake_case. (Though an underscore alone might act as a special identifier, or special punctuation, to be determined.)
- This is because we support kebab-case, and snake_case does not look sufficiently different from kebab-case.
- Kebab-case is preferable to snake_case because on most keyboards the dash is slightly easier to produce than the underscore, since it does not require Shift.
- Some capitalization rules apply to language constructs and are enforced by the compiler:
- All names of types and namespaces must start with an uppercase letter.
- All public and protected member names must start with an uppercase letter.
- All private members must start with a lowercase letter.
- All local and parameter names must start with a lowercase letter.
- Spell Checking
- The language comes together with a spell-checking dictionary, the contents of which are part of the language specification.
- A module can have a supplemental user-defined spell-checking dictionary file which:
- Is meant to be committed to source control
- Is meant to undergo code review just as any other source file.
- The compiler spell-checks source code and issues a warning if it encounters any unrecognized words.
- Specifically, the compiler will issue a warning when any of the following fails to pass spell-check:
- Any part of an identifier.
- A word inside a string literal.
- A word in a comment, unless it is markup referring to an identifier.
- Specifically, the compiler will issue a warning when any of the following fails to pass spell-check:
- For the purpose of spell-checking, identifiers are broken into parts based on SentenceCase and kebab-case boundaries, as well as boundaries between letters and digits. This means that:
- “CryptoGraphy” will not pass spell-check unless “graphy” has been added to the spell-checker. (It shouldn’t; it is not an English word; use “Cryptography” instead.)
- “Mousepointer” will not pass spell-check unless
mousepointerhas been added to the spell-checker. (It shouldn’t; it is not an English word; use “MousePointer” instead.)
- A warning for a misspelled identifier is issued only at the point of definition and not on each occurrence of the identifier, so that:
- You only see the warning once, not five hundred times.
- There is no warning at all for identifiers that you have no control over, due to them being defined in external modules. In other words, a module does not have to duplicate the spelling dictionaries of external modules, nor does a module have to ship with its spelling dictionary.
- Two or more consecutive capital letters are disallowed because:
- Each individual capital letter acts as a word delimiter, so it constitutes a word by itself.
- To allow for single-letter variables, every individual letter passse spell-check.
- So, a word made of capital letters circumvents the spell-checker.
- The language does not allow circumventing the spell-checker.
- (One day someone will inevitably submit a feature request for some means of disabling the spell checker; the answer they will receive is that if they do not have to use this language; there are so many other languages to choose from.)
-
Comments
- Special formatting within comments is achievable with the use of
Markdownas opposed to HTML or any ad-hoc syntax. - This special formatting is available in all comments, not just doc-comments.
- Some extensions to markdown are necessary in order to specify relationships between code.
- For example, when defining a link, one can omit the part within the parentheses, in which case the part within the square brackets is expected to be a resolvable symbol, and the resulting link points to that symbol.
- The syntax for specifying the symbol requires no gimmicks like the hash-sign which is needed in Java’s doc-comments to separate the type name from the member name.
- If the symbol is not fully qualified then there must be an import statement for that symbol somewhere within the source file.
- The use of a symbol in a comment is enough to prevent the corresponding import statement from being flagged by the compiler as unused.
- Possibly: allow the comment that describes a parameter to be placed with the parameter itself.
- Special formatting within comments is achievable with the use of
-
Inheritance
- A class may extend only one other class but implement any number of interfaces.
- The only difference between a class and an interface is that an interface cannot have fields, a constructor, or a destructor; in all other respects, classes and interfaces are equivalent, meaning that an interface can have static, public, protected, and private methods.
- By default, a class cannot be extended unless it is marked as
extensible. - By default, a method of a class cannot be overridden/extended unless it is marked as follows:
- If it is abstract, it must be marked as
abstract. - If it is overridable, must be marked as
overridable. (Duh!)- This corrects Java’s exuberance of allowing any method to be overridden unless declared “final”, and C#’s unwarranted technicalism of calling such methods “virtual”.
- If it is overridable with the provision that overriding methods must invoke the base method, it must be marked as
extensible.
- If it is abstract, it must be marked as
- Methods that override other methods must be marked as follows:
- A class method which implements an abstract method must be marked with
implements base. - A class method which overrides an overridable method must be marked with
overrides base. - A class method which extends an extensible method must be marked with
extends base. Within the extending method:- The base method must be invoked exactly once.
- (This can become a bit complicated with alternative execution paths, so we might want to mandate that there must be only one possible execution path at the point where the base method is invoked.)
- The invocation of the base method can be simplified:
- The name of the method can be replaced with
base. - The parameters can be omitted.
- In this case the base method is invoked with the values that the parameters have at the moment of the invocation, allowing the extending method to alter the values of the parameters before invoking base.
- The name of the method can be replaced with
- The base method must be invoked exactly once.
- A class method which implements a method of an interface must be marked with
implements X.- X is the interface-qualified-method-name of the method being implemented. The implementing method name may differ from the implemented method name (as long as the parameter list matches) and it will be accessible via both names.
- X can also be a comma-separated list of interface-qualified-function-names, if the method implements multiple interface methods of different interfaces. In this case, the method will be accessible via any of the names.
- Note that this corrects the stupidity of C# where no special marking is necessary for a class method that implements an interface method.
- Note that C# provides a syntax for optionally specifying that a class method implements a method of a particular interface, but makes the implementing methods inaccessible, which renders the feature unusable.
- Note that a class method may both override a superclass method and implement interface methods by adding both
overridesandimplements. - A class method does not automatically become overridable or extensible by virtue of implementing, overriding, or extending another method; it must in turn be marked as overridable or extensible if that is the intention.
- A class method which implements an abstract method must be marked with
-
Built-in Intertwine.
-
Built-in Domain-Oriented Programming features.
- Alternatively, look into Scala’s implicit parameter lists.
-
Built-in support for testing.
- Bundled Testana, see Testana: A better way of running tests.
- Somewhat different testing semantics than JUnit:
- The test class does not get re-instantiated prior to invoking each test method.
- No ‘before’ method: use the test class constructor for this.
- No ‘after’ method: use the test class destructor for this.
- Use of the exact same assertion facility for test code as for production code.
- No other test facility gimmicks like “expect”, “assume”, etc.: write the darn thing in code.
- Test methods are always executed in the order in which they appear in the source file.
- When a test class is derived from another test class, the test methods of the base class are always executed before the test methods of the derived class.
- To enable separate testing of debug runs and release runs, assertions are always enabled for the testing code, but for the code-under-test they can be either enabled or disabled.
- Even though all source files that constitute a module are compiled into a single binary file, (as per C# assemblies,) each class within that binary file comes with its own timestamp, to accommodate tools like Testana.
-
(Possibly) Explicit distinction between logic classes and data classes.
- (Possibly) Built-in versioned externalization of data classes.
- (Possibly) Built-in data-modelling framework for the data classes.
-
Built-in internationalization features (i.e. Unicode strings and culture-aware operations) but also full support for ANSI strings and culture-neutral operations.
-
Lightweight properties, exactly like in C#, with additional compiler support for obtaining a property as a separate entity and manipulating it independently of the object that it belongs to. (Probably a value type containing a reference to the object that owns the property and a reference to the reflection object that represents the property.)
-
NO compiler support for events.
-
Time Coordinate Data Type
- Internally represented as a 64-bit IEEE floating point number of days since some epoch, allowing for:
- low-precision coordinates billions of years away from the epoch
- femtosecond precision coordinates near the epoch.
- Internally represented as a 64-bit IEEE floating point number of days since some epoch, allowing for:
-
A special static “this” keyword (
this class?) that you can use to refer to the current type in a static context without having to code the name of the type as you have to do in Java and in C#. -
Proper method literals and field literals
- No compromise like the
nameof()of C#. - For example:
Method m = method someMethod;assigns tomthe reflection method object of someMethod but causes a compiler error if someMethod has overloads.Method m = method someMethod(int);assigns tomthe reflection method object of a specific overload of someMethod.Method m = this method;assigns tomthe reflection method object of the method that is currently being compiled.Field f = field someField;assigns tofthe reflection field object of someField.
- No compromise like the
-
Source intrinsics
- A source-line intrinsic similar to the
__LINE__macro of C and C++ or the[CallerLineNumber]attribute of C#. - A source-file intrinsic similar to the
__FILE__macro of C and C++ or the[CallerFileName]attribute of C#. Note that the source filename yielded by this intrinsic is relative to the root of the source tree, not absolute. - A source-root intrinsic which yields the absolute path to the root of the source tree.
- A source-line intrinsic similar to the
-
Namespaces, mostly as seen in C#, but with some differences.
- You cannot just import a namespace and make everything in it accessible; instead, you have to do one of the following:
- Import a specific name from a namespace; (like a java import statement without a wildcard;) then, you can use that one name without qualification.
- Import an entire namespace, but with a namespace alias, like in XML; then, you can use any name from that namespace, but each name will have to be qualified with the chosen alias.
- Compiler-enforced conformity between directory names and namespace names, as in java, and unlike C#. (Or, as in C# with ReSharper.)
- Compiler-enforced conformity between source file names and class names, as in Java and unlike C#, with a couple of differences:
- The file name of a source file may match a namespace defined in that file.
- The file name of a source file may match the base-most class defined in the file, but the file may also contain additional classes derived from it.
- You cannot just import a namespace and make everything in it accessible; instead, you have to do one of the following:
-
System functionality injection
- System functionality is available strictly via interfaces.
- The main entry point function of a program declares in its argument list each system interface that it intends to use. Yes, this can become unwieldly; and yet that’s how it is going to be.
- When about to run an application, the runtime uses reflection to discover which interfaces are needed by the entry point, and passes each one of them to it.
- From that moment on, user code makes sure to propagate system interfaces to all application code that needs them.
- This means that no system functionality is provided statically. For example:
- The data type for expressing time coordinates does not include a static method for obtaining the current time, as in most other languages. Instead, there is a
SystemClockinterface which provides this functionality, and this interface must be obtained viamain()and propagated to all places that need to use it. - Similarly, if you want to open a file, you cannot just instantiate a file class; you have to obtain the
FileSysteminterface, and ask it to open aFilefor you.
- The data type for expressing time coordinates does not include a static method for obtaining the current time, as in most other languages. Instead, there is a
-
Runtime environment:
DirectoryPathandFilePathinterfaces that encapsulate file-system pathnames and filenames, so that one rarely needs to engage in string manipulation with paths.- No such thing as a “current directory”; All paths are absolute. When a path is constructed from a string, the absolute path is immediately computed, and the computation may take into account whatever the host
system considers to be the “current directory” of the process. (So, by obtaining the
DirectoryPathof “.” one can discover the current directory, but one has no way of changing it, and the runtime environment will never change it.)
Cover image: “Coding Software Running On A Computer Monitor” by Scopio from NounProject.com