Art of Programming

Just like “machine learning”, learning from different data and the machine will get the essential pattern(s) among the data. So, by experiencing various programming languages, different software libraries, and different frameworks, allows me to grow in my programming skills, and strengthen my philosophy of programming.

Programming is a combination of science and art. As science, the code can be replicated, experimented, and always produces same result. As art, the source code can be written according to an individual’s thinking, expression, and emotions.

Instead of using the blog post, I prefer to use “page” in this article, because as my knowledge and experience grow, the philosophy and concept will be changed a little by little.

Preface

To be strong in programming, you need to adopt 逍遥 (xiāo yáo).

Data structure and algorithm

This is the first thing I learnt during my bachelor degree (of Cognitive Science). My favourite lecturer taught us about algorithm and writing pseudocode and writing comments in the source code. This is the best thing. Because by writing algorithm, we are verbalising our thoughts into words. Whatever thing we can express in words, it can be written in programming language. (But this doesn’t mean it can work, because whether it can work is based on your logic!)  Conforming my belief, that is why the world is created by Word!

Data structure. As it is one of the most important UNIX philosophy, Pike (n.d.),

Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

That is why, designing the data structure is so important that it will have a long term effect. A good design of data structure, allows a stable and mature file format, eg HTML DOM tree.

Object oriented

As C++, object oriented is most powerful and most influential programming paradigm. Almost all modern languages support object oriented, even they are multi-paradigmed programming language. PHP, Java, C#, Python, all of them can be written as object oriented. JavaScript though is prototyped language, object oriented concept can also be applied. Not only language, the C library like GObject allows you to create object in C programming too.

Object oriented, the most powerful features are inheritance and polymorphism. As a result, you can create your own project in rapid development using MVC framework. The framework provides base classes, and we only need to implement specific feature based on the project requirements.

Object oriented, it assumes everything as an object. You have a vehicle, it is an object. Specifically, a car, it is an object of vehicle category. A vehicle has capacity of passenger, it is an attribute/property of an object. It can move, can stop, and these are the methods. Whatever a vehicle can do, your car which is inherited from vehicle can also do. And it can do even more specific.

Therefore, object oriented is a concept, it makes you to view the world in a more logical sense, so that you can implement them into programming.

Encapsulation

Related to OOP, encapsulation hides some information from the developers, to avoid unnecessary manipulation of the properties. Providing everything to the developers doesn’t benefit, because not all functions or methods are needed. So, similar to encapsulation, we sometimes just needs to know how a function is used, instead of how a function works. For example, we never need to know how printf is implemented.

Abstraction

Abstraction is useful as it has a higher level interface, so that the developers can make the low-level changes without affecting the existing application. But wrapping every low-level functions with an abstract interface is just infeasible.

So, when we use an object, we can use the common sense to invoke the methods, need not to understand what has happened in the method. Eg, user.savePost(), we can understand easily it is a user save the post (what is the post can depend on the context of the code).

Comments

Previously I was using a lot of comments to explain the code. But this is not the right way, as this gives too many unnecessary information for the developers to read. The better solution is practice the “Clean Code” (Martin, 2009). And, with the current technology, any documentation can be written in README.md. The old way of commenting is only good for teaching, to explain the source code to the students.

But some comments like special note, todo, are still necessary. And if want to use Doxygen to generate the documentation, still need to write the comments for the documentation. Other extra comments, are strongly unrecommended.

Recursive function

Whether you like if or not, recursive function is so useful that can solve problem with two steps: base step and recursive step. But depends on problem, not all problems can be solved by recursive function. Most prominent situation is the tree-based problem, such as pathfinding, yet not the NP-hard problem. You need it, as the rise of functional programming in the current age.

By using it, you can reduce the usage of do-while loop, for loop, or while loop.

Tree

Tree is my favourite data structure, just like Tree of Life. Scenegraph is a tree, HTML is a tree, XML is a tree, programming language is parsed into parse tree, pathfinding can be represented as search tree.

Pointer, linked list

Pointer is a wondrous concept, with the pointer, you can create your own linked list without library. The tree is based on pointer, JavaScript object works as pointer, mutable works as pointer.

If you know pointer, you can even create circularly linked list. With the pointer, you can create callback function in C.

Function

Write function if necessary, but not beyond the necessity. Write functions, as to reduce the length of a function, so that the reader can read within a single page. Longer page causes reader mental overload, cannot process such long information that requires mental processing, understanding and reasoning.

Too many functions produces too many function calls (call stack) and dependencies. Debugging becomes a depth-first search for a developer.

Balancing is important! That is why, encapsulation plays an important role.

State machine

State machine like OpenGL stores the current state of the machine. Everything is stored as the state. We just use the function to get the current state and perform further calculations.

Functional programming

Contrary to imperative programming, functional programming is a declarative programming paradigm. As the advancement of the technology today, FP emerges because it can eliminate the side effects, that is, the changes of the machine state. That means, it guarantees the same input will always produce the exact output, as in the mathematical function.

The language like JavaScript supports functional programming. Most modern languages are adopting FP characteristics, including C++11 with lambda function.

FP is an ideal solution to fit big data and parallel programming.

Concurrent programming

Perform a task one by one is slower than perform multiple tasks simulatneously. That is what we need in multithreading. But, the data may need to shared among the threads, thus, mutually exclusive and conditional variables are important to avoid race conditions.

Furthermore, because of the data from network, we sometimes cannot guarantee when we will receive the data. This is where futures and promises come in.

Do one thing and do it well

DOTADIW is one of the UNIX philosophy. It is the greatest idea I like. Instead of complicating the solution, focus on one thing, and make sure it works. Though program like busybox do various things, yet it is actually consists of various simple tools. Therefore, inside-out is better than outside-in. Do the planning from top to down; implement the solution from bottom-up.

Microservices

Similarly, the microservices architecture (MSA, aka microservices), it is a design of the solution. Everything is just small, each has its own data storage or database, logic, and functions to be called.

Isolation

One more thing, isolation. Isolation is important that, each component supposes need not to know how other works, or calling the methods of other components. Each component only needs to know what are the input and the output, and how to communicate among each other.

KISS principle

Keep it simple, stupid! Yes, just keep everything as simple as possible. Do not complicate the problem, so does your solution.

My summary about KISS, everything should be small, function should be small, object should be small, including the test should be small. If it is big, it is complex, our brain cannot remember and digest. If it is big, just break it to smaller and smaller. Until everything is small and do the task well. And it is just a task. Thus, the rule of thumb is very human way and easy.

RTFM

Yes. Read the f*cking manual. This is a self learning habit. Some people they like to ask the questions without thinking. They ask the questions without preparation. They didn’t go through the documentation or manual, or read enough examples, or do more testing. They just dropped out the questions to wait for others to help them to solve the problem. Even though they would like to ask the question, they need to trim down the problem as small as possible, and then ask the question. If you don’t RTFM, then people will tell you RTFM!

String and text

Graphical is not my way. The HTML is string, XML is string, JSON is string, HTTP protocol involves string, and commands are string. As UNIX philosophy, “Write programs to handle text streams, because that is a universal interface” (Salus, n.d.). Similarly, Tcl language treats every operation as command. Moreover, your source code is text. That is the text that gives semantics, that is the text that has grammar.

Regular expression and parser

Regular expression is one of the common features among the programming languages. The most common regular expression syntax is Perl, and also adopted by JavaScript, and even C++11. However, one needs to know, regular expression is not a parser. To parse a file with syntax, it needs to be parsed into parse tree. Regular expression is suitable for regular language only. In the format like XML or JSON, they are too complex that regular expression will have a problem of recursion. That is why, XML parser and JSON parser are the actual thing we need when we are doing the parsing.

Logging

As the string and text as a universal interface, logging is so important that can provide sufficient information to the developers to diagnose and debug. By using the log appropriately, the developers can analyse the state of the system through the log and find the bugs easily, without thousands of trials-and-errors.

Pipeline

Pipeline, a chain of data processing to process the input to an output. This is a very interesting solution, and it simplifies the process. For example, UNIX frequently uses pipelines to process the input to a specific output, like

cat myfile.txt | sort | gzip > sorted.txt.gz

Similarly, the technology like DirectShow and GSreamer are also using pipeline design, that from the input source, the data is filtered to produce the target output.

Therefore, instead of object oriented inheritance, we can also process the data in the linear way.

GUI

Though graphical is not my way, GUI is still important to simplify my works. You cannot use your mouse to write the text or do typing. But neither can you use your keyboard to perform a drag-and-drop.

So, other than learning HTML as a web UI, learning a widget toolkit like Qt and GTK+ let you understand the GUI other than web.

Error handling and exception handling

Our code may not cover all the possiblities, but it should cover as wide as possible, so that it can cope when something goes wrong. We need to take into account for these possiblities, and how to handle them. Either popup error message, logging, or deal with it!

TDD

Test driven development and behaviour driven development are the modern development methods, especially web development. Due to the dynamic data typing languages like JavaScript, PHP, Ruby, Python, Perl, the dynamic data type sometimes produce runtime errors because the unexpexted type of input is passed to a function. Therefore, to make the product more stable, proper testing is needed.

TDD is a development method, not a programming paradigm. We need to get the system requirements, then write the test cases, and expecting the desired result. Then only start implementing the code. Firstly implement the code that fails the test, to make sure it will fail, that is to identify the problem. Next, write the code that passes. Finally, refactor it.

It is different from our conventional development method. Conventionally, we implement the solutions, then we test on the solutions. But this may cause difficulty for the testing. Because the code is not written for the test!

Therefore, the code needs to be designed for testing purpose. So that, with the tests, your solutions are stable and robust to any error.

Futhermore, due to the large scope in web solution, which involves user, authentication, logging, email, admin, etc, solving one issue may produce another issue. That is why, running a test suite allows you to know the errors before production. Else delivering buggy products will bring you non technical problems.

Refactoring

About refactoring, it should be paired with tests. Because only through sufficient test cases, the refactoring will not break existing functionality.

MVC

In web development, Model-View-Controller makes development much more easy. The frameworks that provides object-oriented allows you to inherit the base class to start the routes and controllers. The controller works as API to interact with the model and view. Then you design your solution and database structure in the model. With the ORM (object-relational mapping), you need not to understand the low level data structure or using the SQL directly. This advantage allows you to port your model to any database theoretically (though practically not every database implemented with same standard). Finally, you present your solution through the view.

The most interesting part of MVC is the separation of model and view. This allows designer and software developer works separately.

Similarly, MVVM (model-view-view-model) is a recent architecture used by most client side JavaScript frameworks, such as AngularJS and Backbone. VM plays the role to process the model for the view.

Not only web development, widget toolkits like GTK+ and Qt also use model and view design, especially when working on the tree widget. So that, the view is updated according to the model, omitting the implementation of synchronising between the data and output throughout the application.

When dealing with the view, we must know template engine. Template engine should have minimal logic. All the logic should be happened in the model or controller.

Vector, matrix, and tensor

Matrix is the essential of computer graphics; vector is the essential of game physics; tensor is the essential of machine learning.

Game development

I believe most males like to play computer games. For those who play computer games, must have their dream games. Then, develop a game becomes a dream. Yeah. game development requires you to learn physics, computer graphics, multimedia, AI, etc.

Random

In game development and AI, random is a very powerful factor. Random can produce a uniform distribution. We cannot simply guess the outcome, because every event has same probability.

It is used in machine learning weight initialization, genetic algorithm’s selection, crossover, and mutation. It is used in various games.

I strongly believe that it is the crucial factor in artificial creativity.

GPGPU

If you need fast computation to do advance computing such as machine learning, AI research, then you need GPGPU. GPU is not just for gaming, it can be used for general purpose!

VCS (version control system)

Just like a text editor or an IDE is a requirement for the development, a specific VCS is also the must. Most popular one is Git. There are also bazaar, mercurial, SVN, etc. The best thing about VCS is the tracking of the code changes. By using VCS, we need not to backup our code in another file, or copy the codes somewhere else. VCS will track all the changes. So, remove all the unnecessary codes, to make the code clean and easy to read, so that the code is easy to be maintained.

Technical debt

Technical debt is something like the codes that are never cleaned up, which you put it there. This reduces your coding performance, as the code is difficult to read and maintain, it has too much unnecessary information, either comments, unused functions, or duplicated codes. If we do not clean up our code or practice “Clean Code”, that means we will accumulate the unwanted code more and more. That is why, clean up the code to reduce the debt, and any changes will be kept by VCS (version control system).

Summary

Software development is not just writing the codes, but involves the whole development process from planning until disposing. Development should be planned by priority, small increment by iterations. Therefore, the design of the solution should be fast to be created as priority, improved through iterations.

On the other hand, our code should have these properties: extensible, testable, functional, readable, reusable, hackable, and reasonable.

Extensible is the concept of OOP, that they are able to inherit and become more specific.

Testable is that the code should be designed for unit test.

Functional is that to reduce unnecessary access of data outside the scope, especially global scope. However, this characteristic is contradict with OOP design, as the properties is accessible by any method.

Readable is important. Just name the variables properly, reduce the usage of short form. But for a small generic function, short variable name is feasible.

Reusable as the functions and the objects are re-usable.

Hackable as possible for injecting some functions or variables.

Reasonable is most important. The function names should be reasonable to avoid confusing and ambiguity.