RSS Feed

Monty Hall problem and frog riddle

Posted on

Monty Hall paradox

Probability topic is the fundamental concept of the statistics. And machine learning is closely related to statistics. That is why, understand the probability very important if you are doing research, statistics, and machine learning.

Monty Hall is a very interesting problem. It says, if you are given 3 doors to choose. One of them contains a car (which you want), the other two are goats (which you don’t want). After you made your choice, before opening the door, the host will open the door that you didn’t choose yet contains the goat (he knows which door has the goat). Now, if you are given an opportunity to change your choice to another door (which you didn’t choose earlier), are you going to change?

In the first glance, you will feel that whatever you choose, the probability is always 1/3. However, the conditional probability tells you that, if you always make the switch after the host opened the door that has a goat, your probability to win the car will increase to 2/3. What??

In order to prove this, I wrote a Python script.

#!/usr/bin/env python
# This is simulating Monty Hall Paradox

import random


def monty(switch):
    # random shuffle
    doors = [0, 0, 1]  # one of the door contains the car
    random.shuffle(doors)

    openDoor = None

    # choose the first door (not open)
    # if the first door is 1, randomly open the other
    if doors[0] == 1:
        # open the door
        openDoor = random.randint(1, 2)
    else:  # open the door that contains goat
        if doors[1] == 1:
            openDoor = 2
        else:
            openDoor = 1

    # now open the last door
    if not switch:
        return doors[0]
    else:
        if openDoor == 2:
            return doors[1]
        else:
            return doors[2]


def main():
    total = 10000
    car = 0
    for i in range(total):
        car += monty(True)

    print("Always switch the door. Total: {}, car: {}. P = {}".format(total, car, car / total))

    car = 0
    for i in range(total):
        car += monty(False)

    print("No switch the door. Total: {}, car: {}. P = {}".format(total, car, car / total))


main()

Run the code, you will always get the probabilty close to 0.6667 if you always switch the door.

Always switch the door. Total: 10000, car: 6625. P = 0.6625
No switch the door. Total: 10000, car: 3309. P = 0.3309

Frog riddle

Recently I just watched a Youtube about frog riddle.

It also mentions about the conditional probability. Interestingly, quite a lot of comments mentioned that the author is wrong.

In order to prove that the author is correct, I wrote another Python script.

#!/usr/bin/env python

import random

# Frog 0 for female, 1 for male


def create_frog():
    return random.randint(0, 1)


def has_croak(pairs):  # also male
    return 1 in pairs


def has_female(frogs):
    return 0 in frogs


def choose_without_croak(choose_two):
    frogs = [create_frog() for i in range(3)]
    # first frog at the right side
    # second and third at the left side

    if choose_two:
        return has_female(frogs[1:])  # choose two frogs

    return has_female(frogs[0:1])


def main():
    total = 10000
    correct = 0
    for i in range(total):
        correct += choose_without_croak(True)
    print('Just choose two frogs. Total: {}, correct: {}. P = {}'.format(total, correct, correct / total))

    correct = 0
    for i in range(total):
        correct += choose_without_croak(False)
    print('Just choose one frog. Total: {}, correct: {}. P = {}'.format(total, correct, correct / total))


# The exact question is,
# "What is the probability of the frogs in the pair has female,
# given that one of them is male?"
def exact_calculation():
    total = 10000
    croak = 0
    correct = 0
    for i in range(total):
        frogs = [create_frog() for i in range(3)]
        if has_croak(frogs[1:]):
            croak += 1
            if has_female(frogs[1:]):
                correct += 1
    print('Total croak: {}, correct: {}. P = {}'.format(croak, correct, correct / croak))


main()
exact_calculation()

Running the script, you will get

Just choose two frogs. Total: 10000, correct: 7498. P = 0.7498
Just choose one frog. Total: 10000, correct: 4974. P = 0.4974
Total croak: 7474, correct: 4998. P = 0.6687182231736687

Based on the result, if you choose two frogs, the probability of survive is close to 0.75. If you choose one frog, the probability is 0.5.

Now, the tricky part is the probability 0.67 mentioned in the video. The question should be “What is the probability of the frogs in the pair has female, given that one of them is male?”

So, based on the question, my similuation needs to get the total count of the male (that has croak), and within these pairs, count the female frogs.

To convert this into mathematical expression,

P(\text{female frog}) = 0.5

P(\text{at least one male frog}) = 0.75

P(\text{female frog} | \text{at least one male frog}) = \frac{0.5}{0.75} = 0.6667

Then, based on the simulation and calculation, you will get the 0.6667.

Switching display/monitor/screen in Linux

Posted on

Because I am using the Openbox (window manager), and I believe that the laptop Fn+F8 (or whatever combination with Fn) doesn’t work properly on Linux. Because the combination is detected as Super+p (aka Win+p). As a result, I wrote a Perl script to solve the switching display/monitor/screen issue on my laptop.

#!/usr/bin/perl

# This script requires xrandr, and several bash script created by arandr

use strict;
use warnings;

# Edit these global variables based on your setting
my $primary = 'eDP1';
my $secondary = 'HDMI1';

my %scripts = (default => 'default.sh',
               external_only => 'large_only.sh',
               clone => 'clone.sh',
               dual => 'dual.sh');
my $script_path = '~/.screenlayout';
# End edit

sub get_xrandr {
    return `xrandr`;
}

sub is_active {
    my ($monitor) = @_;
    for my $i (0 .. (scalar @$monitor - 1)) {
        my $line = $monitor->[$i];
        if ($line =~ /\*/) {
            return 1;
        }
    }
    return 0;
}

sub is_left {
    my ($monitor) = @_;
    my $line = $monitor->[0];
    if ($line =~ /\d+x\d+\+(\d+)\+\d+/) {
        if ($1 > 0) {
            return 0;
        }
    }
    return 1;
}

sub is_default {
    my ($primary, $secondary) = @_;
    return &is_active($primary) && !&is_active($secondary);
}

sub is_external_only {
    my ($primary, $secondary) = @_;
    return !&is_active($primary) && &is_active($secondary);
}

sub is_clone {
    my ($primary, $secondary) = @_;
    return &is_active($primary) && &is_active($secondary) &&
        &is_left($primary) && &is_left($secondary);;
}

sub is_dual {
    my ($primary, $secondary) = @_;
    return &is_active($primary) && &is_active($secondary) &&
        &is_left($primary) && !&is_left($secondary);;
}

sub get_monitor_style {
    my ($primary, $secondary) = &get_monitor_details;
    if (&is_default($primary, $secondary)) {
        return 'default';
    }
    elsif (&is_clone($primary, $secondary)) {
        return 'clone';
    }
    elsif (&is_dual($primary, $secondary)) {
        return 'dual';
    }
    elsif (&is_external_only($primary, $secondary)) {
        return 'external_only';
    }
    return 'unknown';
}

sub set_monitor_style {
    my ($style) = @_;
    my $script =  join('/', $script_path, $scripts{$style});
    my $cmd = "sh $script";
    `$cmd`;
}

sub switch_next_monitor_style {
    my $current_style = &get_monitor_style;
    if ($current_style eq 'default') {
        &set_monitor_style('external_only');
    }
    elsif ($current_style eq 'external_only') {
        &set_monitor_style('dual');
    }
    elsif ($current_style eq 'dual') {
        &set_monitor_style('clone');
    }
    elsif ($current_style eq 'clone') {
        &set_monitor_style('default');
    }
    else {
        print STDERR "Unknown monitor style";
    }
}

sub switch_prev_monitor_style {
    my $current_style = &get_monitor_style;
    if ($current_style eq 'default') {
        &set_monitor_style('clone');
    }
    elsif ($current_style eq 'external_only') {
        &set_monitor_style('default');
    }
    elsif ($current_style eq 'dual') {
        &set_monitor_style('external_only');
    }
    elsif ($current_style eq 'clone') {
        &set_monitor_style('dual');
    }
    else {
        print STDERR "Unknown monitor style";
    }
}

sub switch_monitor_style {
    my ($prev) = @_;
    if ($prev) {
        &switch_prev_monitor_style;
    }
    else {
        &switch_next_monitor_style;
    }
}

sub get_monitor_details {
    my $xrandr = &get_xrandr;
    my @lines = split(/\n/, $xrandr);

    my @primary_lines;
    my @secondary_lines;
    my $current_block;
    for my $i (0 .. $#lines) {
        my $line = $lines[$i];
        if ($i == 0) {
            next;  # not "continue"
        }
        if ($line =~ /^${primary}/) {
            $current_block = 'primary';
        }
        elsif ($line =~ /^${secondary}/) {
            $current_block = 'secondary';
        }
        if ($current_block eq 'primary') {
            push @primary_lines, $line;
        }
        elsif ($current_block eq 'secondary') {
            push @secondary_lines, $line;
        }
    }
    return (\@primary_lines, \@secondary_lines);
}

sub main {
    my ($prev) = @_;
    &switch_monitor_style($prev);
}

&main(@ARGV);

The script requires “xrandr” command. Furthermore, you need to have some actual switching monitor bash script, which can be created by using ARandR. Example of the script

#!/bin/sh
xrandr --output HDMI1 --primary --mode 1920x1080 --pos 0x0 --rotate normal --output VIRTUAL1 --off --output eDP1 --off

So, my Perl script will detect existing screen setup, whether it is laptop only (“default”), external only (“external_only”), laptop with external monitor at the right side (“dual”), or clone (“clone”) for both monitor sharing same screen. Therefore, we need to create four bash scripts using ARandR for these settings.

To invoke the script,

perl /path/to/monitor_switch.pl

This will switch to the screen to the “next” setting, in this order: default -> external_only -> dual -> clone -> default.

In order to switch between default and external_only, I extended the script with an argument.

perl /path/to/monitor_switch.pl prev

When passing with an argument (any argument), the monitor setup will switch in the reverse order: default -> clone -> dual -> external_only -> default. By this, we can switch between default and external_only easily.

Next, just apply the keybinding (aka hotkey or shortcut) to your preferred combination, then you can switch the screen with your favourite key combination.

Yeay!

P/S: The reason I wrote this script is, when I show my screen on external only, and the power is cut, the screen doesn’t switch to laptop automatically. That means, I cannot see anything to change my screen display. Before the script is written, I blindly use the Terminal, Ctrl+R, and type the keyword and press Enter to switch back. But this is extreemly impractical.

ROC, AUC, WTF?

Posted on

These few days I was spending my whole time to understand this ROC (receiver operating characteristic) curve. In machine learning, ROC is a very common way to evaluate the prediction performance. The AUC (area under curve) of ROC indicates the accuracy of prediction of a classifier.

If you wish to learn more, these two links are the best resources: here and here.

I searched through tutorial and Q and A sites on how to do the plot of ROC and calculating the AUC. The answers were telling me about “cut-off”, “threshold”, or some weird terms. And some answers were telling me to use R package to plot the graph. WTF? I know all of these things. My question was, “How can I plot ROC curve with my classifier?!”

So, there was a real gap between what I had known and the problem I faced. The gap was the classifier that I created cannot be used to plot the ROC curve directly. Because my classifier is a discrete classifier. That is, it just predicts with a label, then checks with actual label whether the prediction is true or false. Though, I can improve this result to true positive, false positive, true negative, and false negative, it can only produce one point in the ROC space.

In order to plot the curve, I need a theshold that can be manipulated to produce a sequence of TPRs (true positive rates) and FPRs (false positive rates). If I have a discrete classifier, I can never produce the sequence of TPRs and FPRs. That is my real problem.

In order to solve this, the only solution is to transform my classifier to ranking or scoring classifier. This is the crucial part that solves my problem. Now, the question is, what is the score here? I believe all classifiers involve some calculations to get some values, then only do the prediction whether the input should be labelled as true or false, even it is a discrete classifier. So, that calculated value is the score! Even if your classifier does a random guess, the random value can be used as the score.

As a result, I need to discard the discrete classification when plotting ROC curve. But the discrete classification is stilled being used during the training. So, by adjusting the theshold, for any input that produces the score greater than the theshold will be labelled as the positive. And from here, I just need to check whether it is true positive or false positive. Continuously adjusting the threshold, I can get a sequence of TPRs and FPRs. Yay. No more discrete classification, and the ROC curve produced.

P/S: Maybe this concept is just as simple as ABC, that is why nobody cares and mention it online.

AAC file re-visit


In my previous post, I mentioned about AAC and the ID3 tag. And I mentoined that

I have an AAC audio file (technically M4A) […]

I used Audacious previously, then change to DeadBeef. The main reason I changed was because I kept failing to play AAC audio file. What’s wrong? FFplay can play it, SMPlayer can play it, DeadBeef can play it, Clementine can play it, but Audacious cannot. Audacious has a AAC plugin, it should support AAC format.

But if I play the AAC file with Audacious, I will get this error,

Unknown playback error (check the console for detailed error information)
ERROR aac.cc:373 [play]: No valid frame header found.

Seach online and found that, it is the file extension issue. So, what is it?

Using the “exiftool”, I found that the MIME Type is video/mp4. That is the fix!!

I renamed the AAC file to Mp4. Now Audacious can play the file well.

Now, when it was in the AAC extension, in order to view the audio metadata such as author, title, ablum, I used Kid3 to edit the APE instead of ID3v2, so that FFplay can play it. ID3v2 is actually for the MP3 format. If use ID3v2 on the AAC extension audio file, FFmpeg cannot convert it and FFplay and SMPlayer both failed to play the audio file properly.

But after I rename the file into MP4, there is an issue. MP4 metadata (I think it is also ID3v2) cannot be written using Kid3, due to the existence of APE tag. (Kid3 doesn’t mention anything about this, I solve this heuristically.)

  1. So, I need to remove the APE tag using Kid3 by naming the file as AAC first.
  2. Then, I rename it to MP4 and use Kid3 to edit the ID3v2

As a conclusion, I use back Audacious as my primary audio player. It is the best!

(Why not DeadBeef? DeadBeef lack of the feature to copy-paste the songs from one playlist to another.)

AAC audio file and ID3 tag


I just found that, if I have an AAC audio file (technically M4A), and if I added the ID3 tag 2 (aka ID3v2), then the audio file will failed to be converted by ffmpeg.

It can be either converted to mp3

  • using DeadBeef audio player, or
  • remove the ID3v2 tag then convert

So, how to add the metadata like ID3 tag? Use the Kid3 and add the Tag 3 (aka APE tag). This will not affect how ffmpeg to read the file.

Dell Vostro 5459 hibernation


In the previous post (1 year ago), I mentioned the hibernation issue. I believed that it was related to the NVidia graphic card. Related forum can be found here.

But these few days, I notice that whenever I shut down the laptop, it will show the systemd messages. Previously, if I did suspend my laptop, then resume, then shut down will show only black blank screen, until the power off. I believe that the graphic card issue is being fixed with the recent update.

I am now using linux-lts 4.9.13-1 and nvidia-dkms 378.13-2

A brief comparison of GTK+ and Qt


I used to like C language, because it is a basic of programming, and it is portable, and it is low-level. When writing program with C language, it is just like showing off your advanced programming skill, how you manage the memory, how you manage the pointers and creating the linked list. However, in terms of efficiency, C++ is much more powerful, because of object-oriented and the syntax.

Because I like C language, so I chose GTK+ over Qt for long time ago. Not only that, I am also fond of GTK+ desktop environments like GNOME, Xfce4, LXDE, Cinnamon, but not Mate. I feel that KDE is heavy weight.

One of my personal projects, Med, which was written with GTK+, had some issues with multi-threading. I believe that my engine part does not have any problem. As a result, the suspicious part is the GTK+. The program crash without symptoms, and difficult to reproduce. Therefore, I decided to re-write the UI with Qt. If, Qt also produces the same problem, meaning my algorithm is problematic.

In my opinion, GTK+ is straight forward, because of procedural paradigm. Therefore, it is easy to learn and implement. The UI can be design with Glade. But I feel that it is still lack of something, such as button click callback function. Besides that, though there is gtkmm (C++ interface of GTK+), the library like WebKitGTK+ does not have the documentation for gtkmm (I believe still can work). In summary, GTK+ is slower as writing the code in C language.

On the other hand, Qt is more complicated. To use it, better to use qmake project file to generate the Makefile, or use the CMake, so that we can include the necessary headers and link with necessary libraries. And we also need to use some macro. This makes me feel that the coding is very Qt-oriented, not just a C++ language. However, in terms of design, I feel that Qt Designer is much easier. But GTK+ and Qt layout uses different concept.

Though GTK+ and Qt are both object oriented, GTK+ is in the library level yet Qt is in the language level. It is easier to write inherited classes in Qt, and easily make changes at the language level.

As a conclusion, Qt is better than GTK+ for development as C++ is better than C.