C++ future


Recently updating my hobby project Med, memory editor for Linux, still under heavy development with various bugs.

In this project, I use several C++1x features (compiled with C++14 standard). Most recent notable feature is multi-threading scanning. In memory scanning, scan through the accessible memory blocks sequentially is slow. Therefore, I need to scan the memory blocks in parallel. To implement this, I have to create multiple threads to scan through the memory blocks.

How many threads I need? I make it into variable n, default to 4. Meaning, when scanning is started, the n threads will start scanning asynchronously. When one of the thread finish scanning, the next (n+1) thread will start scanning the next (n+1) memory block, until the end.

I design the solution top-down, and implement it bottom-up. In order to design the solution for the requirement above, I created a ThreadManager (header file here). So the ThreadManager basically will let me queue the tasks that I am going to launch in parallel with n threads. After queuing all the tasks, I just need to start, then they will run in parallel as multi-threading. This is what ThreadManager is doing. If mutex is needed, it is the task need to handle with, not the ThreadManager to handle. ThreadManager just make sure the tasks are run in parallel.

This is the simple test that uses the ThreadManager.

Technically, there are several important C++ standard libraries used, vector, functional, future, mutex, and condition_variable. Vector is an STL that allows me to store a list of items as vector (just like array or list).

Since C++11, it supports lambda expression. Then using functional, I can use std::function template to create any function object.

std::function<void()> fn = []() {
  for (int i = 0; i < 4; i++) {
    this_thread::sleep_for(chrono::milliseconds(300));
    cout << "thread1: " << i << endl;
  }
};

The code above initialize a variable fn which stores an anonymous function. Previously, this can be done using callback function, which makes the code difficult to manage. By using std::function and std::vector, I can store all the anonymous functions to the vector.

Future¬†is a very interesting library. If we are familiar with JavaScript promise or C# async, then it is similar to these (futures and promises). Whenever a task is start, it will return a future. Because we don’t know when the task will be ended. We can also do something like using a loop to check the condition of a task whether is ended, but this will be over complicated. Because future will let you handle what should be done when the task is ended.

Using future, I need not to create thread directly (though it is called ThreadManager). I can use async function to run the callback function asynchronously. It is the async that returns future. And this async function allows lambda expression as function argument. Great C++11.

C++11 supports mutex (mutual exclusion) and condition variable. Mutex can prevent race condition. When we are using multi-threading, most of the time the threads are using some shared resource. Read the empty data may crash the program. Therefore, we need to make sure when reading or writing, the resource is not accessible by other threads. This can be done by locking the mutex, so that other threads cannot continue. Then after the operation, unlock the mutex, and the other threads can lock the mutex and continue. Hence, only a single thread can access the resource.

Condition variable is used together with mutex. We can use condition variable to wait if a condition is fulfilled. When a wait is performed, the mutex will be locked (by unique lock). As a result, the thread will be blocked and cannot continue. The thread will wait until the condition variable notifies the thread to perform a condition check. If it is fulfilled, then the mutex will be unlocked and the thread will continue.

In ThreadManager, my previous code uses a loop to check the condition, if the condition doesn’t allows to run the next thread, then it will sleep a while and check again. This method is wasting the CPU resources. Because it keeps checking the condition. By using condition variable and mutex, I can just stop the thread, until it is notified to continue.

Yeah. Modern C++ is cool!

Advertisements

Academic people should git and TeX


Mr Torvalds created two amazing things: Linux and Git. Former is an OS kernel; latter is a version control system. Unluckily, none is prevailing in Malaysia.

When I was a lecturer, creating a new programme with various courses is truly exhaustive. The worst case is recording the changes of the documents for the government agency’s accreditation. If you are systematic, you will backup the files. But backing up the files does not tell you what are the changes you had made. Unless you create another note for each changes you made. But that will be double works. If you say you can use Microsoft Word’s feature to compare the documents and see the changes, it is totally impractical if the two documents are big and there have a vast changes.

What is the best solution? In practice, you need to ask your boss to step down and change all your colleagues ūüėČ, because your boss doesn’t understand your solution, he and your colleagues will treat you as idiot.

In the condition above, the best solution is using TeX and Git. TeX allows you to create your document in plain text. Plain text is so important for Git. Git allows you to keep track the changes you have made, and you can see the difference of the changes line by line in text format.

Git allows you to work collaboratively with your team members to work on same project by preventing the conflicts. Preventing conflicts doesn’t mean there will have no conflict, but you will detect the conflicts earlier and need to resolve the conflicts manually.

TeX, unlike WYSIWYG application software, such as Microsoft Word or LibreOffice Writer (I don’t think Malaysians use LibreOffice), we create the document using markup language and setup the paper style through some complex TeX statements. Though the setup may be exhaustive and TeX has a steep learning curve, the results can sustain for long-term. The document style can be re-used for the whole institution, especially if the students are provided with the thesis format in TeX form. Moreover, the TeX skill is useful to publish papers to the journals and conferences. You can easily port your content to another TeX style such as IEEE conference paper style.

Sadly, most people take easy learning tools to do the complex tasks, yet feel proud and not open minded to learn useful skills.

An academic institution should offer training to the staff, lecturers, and students to learn TeX. It would be even better to offer Git training, Linux and command-line. Open source software can reduce students financial burden, avoid pirated software, and prevent virus infection.

However these are not implemented in most academic institutions. As a result, the users are spending hours to edit the document style, generating the table of contents, or even preparing table of contents manually. That is sad to create table of contents manually. Any change on the page, you will have to edit the table of contents. However, if you are using TeX, you will focus on the content, instead of the styling.

Lastly, because the culture focuses on the outlook such as styling of the document, instead of the quality of the contents, that is why implementing Git and TeX is just an unrealistic approach. Great Microsoft Word, you are a legend.

PHP programming


PHP was a great programming language in web development. It surpasses the VBscript for ASP and Perl for CGI. It is favoured because of the syntax based C and C++. It supports procedural programming paradigm and object-oriented paradigm. A lot of functions resemble C functions such as printf, fprintf, sprintf, fopen, etc. Similarly, it can work directly to the C library such as expat, zlib, libxml2, etc. A lot of great content management systems (CMS) are written in PHP, such as Drupal, WordPress, Joomla, etc.

However, a lot of new programming language emerges and surpassing it.

Taken from http://skillprogramming.com/top-rated/php-best-practices-1234

Array is passed by value

Because PHP syntax is very similar to C and C++, it can use “&” reference operator and pass the parameter by reference in a function. But this will be very different from other languages such Python and JavaScript. Python and JavaScript function parameters are passed by value for all primitive data types, such as integer, float, string, and boolean; complex data type like object and array are passed by reference, meaning they are mutable, including Date object in JavaScript.

function change_array($arr) {
    $arr[0] = 100;
}

function change_object($obj) {
    $obj->value = 100;
}

function change_many_objects($arr) {
    $arr[0]->value = 100;
}

function change_object_array($obj) {
    $obj->array[0] = 100;
}

class MyObj {
    var $value;
    var $array;
}

function main() {
    $arr = [1, 2, 3];
    $obj = new MyObj();
    $obj->value = 10;

    change_array($arr);
    change_object($obj);

    echo $arr[0], "\n"; // still 1, not changing
    echo $obj->value, "\n"; // changed to 100

    $arr_obj = [ new MyObj(), new MyObj(), new MyObj() ];
    $arr_obj[0]->value = 10;
    change_many_objects($arr_obj);
    echo $arr_obj[0]->value, "\n"; // changed to 100

    $obj_arr = new MyObj();
    $obj_arr->array = [1, 2, 3];
    change_object_array($obj_arr);
    echo $obj_arr->array[0], "\n"; // changed to 100

    $obj_a = new MyObj();
    $obj_a->value = 10;
    $obj_b = $obj_a;
    $obj_b->value = 20;
    echo $obj_a->value, "\n"; // 20
    echo $obj_b->value, "\n"; // 20

    $obj_c = &$obj_a;
    $obj_c->value = 30;
    echo $obj_a->value, "\n"; // 30
    echo $obj_b->value, "\n"; // 30
    echo $obj_c->value, "\n"; // 30
}

main();

In the example above, the function change_array() will not modify the array that being passed, this is because it is passed by value. Unless we use the “&” reference operator.

The function change_object() will change the object that being passed.

One of the key-points of PHP 5 OOP that is often mentioned is that “objects are passed by references by default”. This is not completely true. […]

(from PHP manual)

So, basically, the function parameters are passed by value, even though it is an array. But the object will be dealt differently. We can treat it as a pointer, if you are familiar with C++ “new” operator. In C++, “new” operator will create an instance and return a pointer to the instance. If we understand this concept, then this is how it works in PHP (according to what I know).

Consequently, the function change_many_objects() though the argument is for an array, and an array is passed into it, but the function changes the value of the object within the array. This is because the array stores the pointer to the object instances. The function does change the instance is pointed by the “pointers” stored in the array.

In summary, PHP deals array as value, which is different from Python, JavaScript, and even C and C++. However, PHP deals object as pointer, that is why object is mutable when it is passed to a function.

Other limitations

PHP was created before the technology of RESTful API. PHP focuses on the GET and POST, but not like PUT and DELETE. The HTTP methods are server dependent. As a result, the HTTP server such as Apache requires some extra configurations for PHP to work. Unlike Node and Ruby on Rails, Node itself has a HTTP module; Ruby on Rails has WEBrick HTTP server.

Comparing to the language like Python, Node with JavaScript, Ruby, Lua, it lacks of REPL (read-eval-print loop). Interactive shell is different from REPL. With REPL, the function to print the result in the console is omitted. REPL will print the result whenever the function returns value.

Closure


In JavaScript, we can create closure like this,

var foo = (() => {
  let x = 0;
  return () => {
    return x++;
  }
})();

for (var i = 0; i < 10; i++) {
  var x = foo();
  console.log(x);
}

But translating above code into C++, it cannot work as expected for the variable, unless the variable is outside the lambda function.

// In a function
int x; // variable here
auto foo = [&x]() {
  x = 0;
  return [&x]() {
    return x++;
  };
}();
for (int i = 0; i < 10; i++) {
  int x = foo();
  cout << x << endl;
}

This is because C++ variable can be only accessed in the function scope. After the function return the value, the variable is not accessible anymore.

Similarly, Python can also return the function, but it cannot work as closure like JavaScript. However, Python can create non-primitive variable such as object or list so that the variable will be accessed by reference.

def _foo():
    x = [0]
    def increment():
        x[0] += 1
        return x[0]
    return increment

foo = _foo()
for i in range(10):
    x = foo()
    print(x)

Prayer Clock GTK3


My first open source project, Prayer Clock, I moved from SourceForge to GitHub recently. Yeah! Everyone should git!!!

And today I just made some changes, and updated to GTK3.

With GTK 3, I removed the title bar. But not yet successfully moving the menu bar to the icon like Evince or Nautilus.

I plan to convert the right hand panel to WebKitGtk. But this will not be the priority yet.

CSV to something


Just revived my old script into a project in GitHub, csv_to_something. An old simple script that was created to manage some student data. Because the students data were collected through Google Forms, so I convert it to CSV, then from CSV to SQLite. So that I can use the SQL to query whatever data I need.

Using SQLite software such as sqliteman and sqlitebrowser, I can create any new table, grouping, sorting etc.

Then recently, I need some JSON data for the table. So, I revive the script to convert CSV to JSON format. With the functional programming characteristic in JavaScript, manipulating the data becomes more interesting. Using Node or any JavaScript interpreter, we can perform map and reduce. Other useful functions like find and filter can get what I want. Example,

// Read the JSON data as object
let students;
fs.readFile('./table.json', (err, content) => {
  if (err) throw err;
  students = JSON.parse(content);
});
let average = students.reduce((reduced, item) => item.score, 0) / data.length;
let grades = students.map(student => {
  if (student.score >= 90) return 'A';
  else if (student.score < 90 && student.score >= 70) return 'B';
  else return 'Something else';
});

Why SQLite and JSON? Because they are extremely lightweight.

Monty Hall problem and frog riddle


Monty Hall paradox

Probability topic is the fundamental concept of the statistics. And machine learning is closely related to statistics. That is why, understand the probability very important if you are doing research, statistics, and machine learning.

Monty Hall is a very interesting problem. It says, if you are given 3 doors to choose. One of them contains a car (which you want), the other two are goats (which you don’t want). After you made your choice, before opening the door, the host will open the door that you didn’t choose yet contains the goat (he knows which door has the goat). Now, if you are given an opportunity to change your choice to another door (which you didn’t choose earlier), are you going to change?

In the first glance, you will feel that whatever you choose, the probability is always 1/3. However, the conditional probability tells you that, if you always make the switch after the host opened the door that has a goat, your probability to win the car will increase to 2/3. What??

In order to prove this, I wrote a Python script.

#!/usr/bin/env python
# This is simulating Monty Hall Paradox

import random


def monty(switch):
    # random shuffle
    doors = [0, 0, 1]  # one of the door contains the car
    random.shuffle(doors)

    openDoor = None

    # choose the first door (not open)
    # if the first door is 1, randomly open the other
    if doors[0] == 1:
        # open the door
        openDoor = random.randint(1, 2)
    else:  # open the door that contains goat
        if doors[1] == 1:
            openDoor = 2
        else:
            openDoor = 1

    # now open the last door
    if not switch:
        return doors[0]
    else:
        if openDoor == 2:
            return doors[1]
        else:
            return doors[2]


def main():
    total = 10000
    car = 0
    for i in range(total):
        car += monty(True)

    print("Always switch the door. Total: {}, car: {}. P = {}".format(total, car, car / total))

    car = 0
    for i in range(total):
        car += monty(False)

    print("No switch the door. Total: {}, car: {}. P = {}".format(total, car, car / total))


main()

Run the code, you will always get the probabilty close to 0.6667 if you always switch the door.

Always switch the door. Total: 10000, car: 6625. P = 0.6625
No switch the door. Total: 10000, car: 3309. P = 0.3309

Frog riddle

Recently I just watched a Youtube about frog riddle.

It also mentions about the conditional probability. Interestingly, quite a lot of comments mentioned that the author is wrong.

In order to prove that the author is correct, I wrote another Python script.

#!/usr/bin/env python

import random

# Frog 0 for female, 1 for male


def create_frog():
    return random.randint(0, 1)


def has_croak(pairs):  # also male
    return 1 in pairs


def has_female(frogs):
    return 0 in frogs


def choose_without_croak(choose_two):
    frogs = [create_frog() for i in range(3)]
    # first frog at the right side
    # second and third at the left side

    if choose_two:
        return has_female(frogs[1:])  # choose two frogs

    return has_female(frogs[0:1])


def main():
    total = 10000
    correct = 0
    for i in range(total):
        correct += choose_without_croak(True)
    print('Just choose two frogs. Total: {}, correct: {}. P = {}'.format(total, correct, correct / total))

    correct = 0
    for i in range(total):
        correct += choose_without_croak(False)
    print('Just choose one frog. Total: {}, correct: {}. P = {}'.format(total, correct, correct / total))


# The exact question is,
# "What is the probability of the frogs in the pair has female,
# given that one of them is male?"
def exact_calculation():
    total = 10000
    croak = 0
    correct = 0
    for i in range(total):
        frogs = [create_frog() for i in range(3)]
        if has_croak(frogs[1:]):
            croak += 1
            if has_female(frogs[1:]):
                correct += 1
    print('Total croak: {}, correct: {}. P = {}'.format(croak, correct, correct / croak))


main()
exact_calculation()

Running the script, you will get

Just choose two frogs. Total: 10000, correct: 7498. P = 0.7498
Just choose one frog. Total: 10000, correct: 4974. P = 0.4974
Total croak: 7474, correct: 4998. P = 0.6687182231736687

Based on the result, if you choose two frogs, the probability of survive is close to 0.75. If you choose one frog, the probability is 0.5.

Now, the tricky part is the probability 0.67 mentioned in¬†the video. The question should be “What is the probability of the frogs in the pair has female,¬†given that one of them is male?”

So, based on the question, my similuation needs to get the total count of the male (that has croak), and within these pairs, count the female frogs.

To convert this into mathematical expression,

P(\text{female frog}) = 0.5

P(\text{at least one male frog}) = 0.75

P(\text{female frog} | \text{at least one male frog}) = \frac{0.5}{0.75} = 0.6667

Then, based on the simulation and calculation, you will get the 0.6667.