Rename files according to date

I recently wrote a Perl script, that renames the files in a directory according to the date, in the format “YYYYMMDD ##” where “##” is the running number.

Rationale

Because I used to download the photos using the mobile apps like Weibo or Twitter, however the file names are almost random. This made me hard to organize these photos on my computer.

The artists (or celebrities) usually share a set of their photos, so when I download these photos, the files should have mtime (modified time) in the correct order.

Yet, I don’t need to rename the file to the time precision like “HH:MM:SS”. I just need the date and followed by the running number, because it looks shorter.

Though we can just use the file browsers to sort the files according to the time, it is still inconvenient to browse the images by changing the sorting condition. Furthermore, mtime can be changed, and this will void the purpose of the sorting.

Lastly, the randomized filename is just meaningless to me. Rename them according to the date is much more useful, in my opinion.

 

Script


#!/usr/bin/perl -w
use strict;
use warnings;
use POSIX 'strftime';
use File::Basename;
sub trim { my $s = shift; $s =~ s/^\s+|\s+$//g; return $s };
sub get_files_from_directory {
my $path = $_[0];
opendir my $dir, $path or die "Cannot open directory: $!";
my @files = readdir($dir);
@files = grep(!/^\.$|^\.\.$/, @files);
closedir $dir;
return @files;
}
sub group_files_by_date {
my ($dir, @files) = @_;
my %groups = ();
foreach my $file (@files) {
my $timestamp = (stat "$dir/$file")[9];
my $date = strftime("%Y%m%d", localtime($timestamp));
if (not exists $groups{$date}) {
@{$groups{$date}} = ();
}
push @{$groups{$date}}, "$dir/$file";
}
return %groups;
}
sub sort_files_in_groups {
my (%groups) = @_;
foreach my $key (sort keys %groups) {
my $files = $groups{$key};
@{$groups{$key}} = sort { (stat $a)[9] <=> (stat $b)[9] } @{$files};
}
return %groups;
}
sub build_new_name {
my ($date, $num, $dir, $ext) = @_;
my $zero_num = sprintf("%02d", $num);
return "$dir${date} ${zero_num}${ext}";
}
sub name_pairs {
my (%groups) = @_;
my %pairs;
foreach my $key (keys %groups) {
my $files = $groups{$key};
for (my $i = 0; $i < scalar @{$files}; $i++) {
my $file = @{$files}[$i];
my ($basename, $dir, $ext) = fileparse($file, qr/\.[^.]*/);
my $new_name = &build_new_name($key, $i + 1, $dir, $ext);
$pairs{$file} = $new_name;
}
}
return %pairs;
}
sub print_pairs {
my (%pairs) = @_;
foreach my $key (keys %pairs) {
print "$key\t->\t$pairs{$key}\n";
}
}
sub rename_files {
my (%pairs) = @_;
foreach my $key (keys %pairs) {
rename $key, $pairs{$key};
}
}
sub save_log {
my (%pairs) = @_;
my $file = 'rename_files_to_date.log';
open(my $fh, '>', $file);
foreach my $key (keys %pairs) {
print $fh "$key\t->\t$pairs{$key}\n";
}
close $fh;
}
sub main {
my @argv = @_;
my $dir = $argv[0];
my @files = &get_files_from_directory($dir);
my %groups = &group_files_by_date($dir, @files);
%groups = &sort_files_in_groups(%groups);
my %pairs = &name_pairs(%groups);
&print_pairs(%pairs);
print "\nWARNING: Rename is irreversible. Recommend to make a backup.\n",
"Confirm rename files? (y/N) ";
my $choice = <STDIN>;
print $choice;
exit 0 unless trim($choice) eq 'y';
&rename_files(%pairs);
&save_log(%pairs);
print "Done. Log file also saved.\n";
}
&main(@ARGV);

 

Why Perl?

In my opinion, Perl is less famous like Python in the present day. But I prefer to use Perl, due to the popularity in most Linux distribution. For example Perl is the base package of Arch Linux. Once I installed Arch Linux, I can run Perl script immediately.

Though Python is great, backward incompatibility sometimes causes issue, which I may need to maintain the script. If I write with Perl, I can pay less effort to maintain the script.

 

WARNING! And usage

As the script mentioned,

Rename is irreversible. Recommend to make a backup.

The usage is,

./rename_files_to_date.pl ./target_dir

Where target_dir is the directory that contains the files you want to rename. It will not rename the files recursively.

PLEASE USE THE SCRIPT AT YOUR OWN RISK!!!

After renaming the files, a log file will be created. It is used just in case you want to revert the file name. (But you have to do this manually.)

Inheritance and composition

The modern JavaScript with the ES6 syntax and the rise of the popularity like ReactJS, functional programming becomes more and more common. When using React, one of the common practice is to use composition instead of inheritance.

Because I started learning programming when the OOP was the most prevailing paradigm, I was trained to solve the problem by using OOP concepts like polymorphoism, inheritance, encapsulation, etc.

I think JS is the most interesting programming language in the modern technology. It supports server-side and client-side development. With the ES6, it supports OOP keywords like class and also using FP (functional programming) syntax like fat arrow (=>).

In OOP, the most common usage is the inheritance and polymorphoism. The following is an example of inheritance in JS,

class Shape {
  constructor(w, h) {
    this.width = w;
    this.height = h;
  }
  area() {
    return this.width * this.height;
  }
}

class Rectangle extends Shape {
  constructor(w, h) {
    super(w, h);
  }
}

class Triangle extends Shape {
  constructor(w, h) {
    super(w, h);
  }
  area() {
    return super.area() / 2;
  }
}

function main() {
  const rectangle = new Rectangle(4, 5);
  const triangle = new Triangle(4, 5);
  console.log('Rectangle area: ', rectangle.area());
  console.log('Triangle area: ', triangle.area());
}

main();

The shape area calculation can be re-written to composition instead of inheritance as followings,

class Rectangle {
  constructor(w, h) {
    this.width = w;
    this.height = h;
  }
  area() {
    return this.width * this.height;
  }
}

class Triangle {
  constructor(w, h) {
    this.width = w;
    this.height = h;
  }
  area() {
    const rect = new Rectangle(this.width, this.height);
    return rect.area() / 2;
  }
}

Therefore, Rectangle and Triangle do not inherit from Shape. In fact, Triangle uses Rectangle to calculate the area. This is the object composition, and it is same as the way of composition in React. Furthermore, one of the greatest features of JS is closure. This allows React to pass a function with specific logic as a parameter to a generic component. Thus, the generic component can be designed without the prior knowledge of the business/application logic. This will produce a result similar to method override in OOP.

Moreover, the object composition can be re-written to function composition as FP.

const rectangleArea = (w, h) => w * h; // In math, f(x,y) = x * y
const halving = (area) => area / 2; // In math, g(x) = x / 2
const triangleArea = (w, h) => halving(rectangleArea(w, h)); // In math, h(x,y) = g(f(x,y)) = f(x,y) / 2

function main() {
  console.log('Rectangle area: ', rectangleArea(4, 5));
  console.log('Triangle area: ', triangleArea(4, 5));
}

main();

Coin Flip Conundrum

I watched this video,

Very interesting.

So, I managed to prove it through some scripting.


#!/usr/bin/env python
import random
# 0 = Head, 1 = Tail
def start_flipping_coin(target, saved):
return flip_until(target, 0, 0, saved)
def flip_coin():
return random.randint(0, 1)
# This is curcial part, doesn't mean you just go back one step.
# Because it assumes you are tossing the coin consecutively.
# So, if the your target is {H,T} or {T,H}, then
# If the second toss is not correct, you will still continue with the 2nd toss.
# If the target size is 3, it will be totally different
def step_back(target, step, coin):
if target[0] != target[1]:
if step == 1 and target[0] == coin:
return 1
else:
return 0
else:
if step == 0 or target[step] == coin:
return step
return step_back(target, step – 1, coin)
def flip_until(target, step, count, saved):
if step >= len(target):
return count
coin = flip_coin()
count += 1
if saved is not None:
saved.append(coin)
if target[step] == coin:
step += 1
else:
step = step_back(target, step, coin)
return flip_until(target, step, count, saved)
def main():
target1 = [0, 1]
target2 = [0, 0]
total_turns = 10000
tossed1 = 0
coins1 = []
for i in range(total_turns):
tossed1 += start_flipping_coin(target1, coins1)
if coins1[-2:] != target1:
print("Error1!") # This is a check to make sure the coin set is correct
coins1 = []
tossed2 = 0
coins2 = []
for i in range(total_turns):
tossed2 += start_flipping_coin(target2, coins2)
if coins2[-2:] != target2:
print("Error2!")
coins2 = []
print("Result:")
print("Target: {} average steps = {}".format(target1, tossed1 / total_turns))
print("Target: {} average steps = {}".format(target2, tossed2 / total_turns))
main()

view raw

flip_coin.py

hosted with ❤ by GitHub

And I get some result like this,

Result:
Target: [0, 1] average steps = 3.987
Target: [0, 0] average steps = 5.9505

P/S: Wrote a robust flip coin script, which can accept the coin tossing sequence with any length. [here]

C++ future

Recently updating my hobby project Med, memory editor for Linux, still under heavy development with various bugs.

In this project, I use several C++1x features (compiled with C++14 standard). Most recent notable feature is multi-threading scanning. In memory scanning, scan through the accessible memory blocks sequentially is slow. Therefore, I need to scan the memory blocks in parallel. To implement this, I have to create multiple threads to scan through the memory blocks.

How many threads I need? I make it into variable n, default to 4. Meaning, when scanning is started, the n threads will start scanning asynchronously. When one of the thread finish scanning, the next (n+1) thread will start scanning the next (n+1) memory block, until the end.

I design the solution top-down, and implement it bottom-up. In order to design the solution for the requirement above, I created a ThreadManager (header file here). So the ThreadManager basically will let me queue the tasks that I am going to launch in parallel with n threads. After queuing all the tasks, I just need to start, then they will run in parallel as multi-threading. This is what ThreadManager is doing. If mutex is needed, it is the task need to handle with, not the ThreadManager to handle. ThreadManager just make sure the tasks are run in parallel.

This is the simple test that uses the ThreadManager.

Technically, there are several important C++ standard libraries used, vector, functional, future, mutex, and condition_variable. Vector is an STL that allows me to store a list of items as vector (just like array or list).

Since C++11, it supports lambda expression. Then using functional, I can use std::function template to create any function object.

std::function<void()> fn = []() {
  for (int i = 0; i < 4; i++) {
    this_thread::sleep_for(chrono::milliseconds(300));
    cout << "thread1: " << i << endl;
  }
};

The code above initialize a variable fn which stores an anonymous function. Previously, this can be done using callback function, which makes the code difficult to manage. By using std::function and std::vector, I can store all the anonymous functions to the vector.

Future is a very interesting library. If we are familiar with JavaScript promise or C# async, then it is similar to these (futures and promises). Whenever a task is start, it will return a future. Because we don’t know when the task will be ended. We can also do something like using a loop to check the condition of a task whether is ended, but this will be over complicated. Because future will let you handle what should be done when the task is ended.

Using future, I need not to create thread directly (though it is called ThreadManager). I can use async function to run the callback function asynchronously. It is the async that returns future. And this async function allows lambda expression as function argument. Great C++11.

C++11 supports mutex (mutual exclusion) and condition variable. Mutex can prevent race condition. When we are using multi-threading, most of the time the threads are using some shared resource. Read the empty data may crash the program. Therefore, we need to make sure when reading or writing, the resource is not accessible by other threads. This can be done by locking the mutex, so that other threads cannot continue. Then after the operation, unlock the mutex, and the other threads can lock the mutex and continue. Hence, only a single thread can access the resource.

Condition variable is used together with mutex. We can use condition variable to wait if a condition is fulfilled. When a wait is performed, the mutex will be locked (by unique lock). As a result, the thread will be blocked and cannot continue. The thread will wait until the condition variable notifies the thread to perform a condition check. If it is fulfilled, then the mutex will be unlocked and the thread will continue.

In ThreadManager, my previous code uses a loop to check the condition, if the condition doesn’t allows to run the next thread, then it will sleep a while and check again. This method is wasting the CPU resources. Because it keeps checking the condition. By using condition variable and mutex, I can just stop the thread, until it is notified to continue.

Yeah. Modern C++ is cool!

PHP programming

PHP was a great programming language in web development. It surpasses the VBscript for ASP and Perl for CGI. It is favoured because of the syntax based C and C++. It supports procedural programming paradigm and object-oriented paradigm. A lot of functions resemble C functions such as printf, fprintf, sprintf, fopen, etc. Similarly, it can work directly to the C library such as expat, zlib, libxml2, etc. A lot of great content management systems (CMS) are written in PHP, such as Drupal, WordPress, Joomla, etc.

However, a lot of new programming language emerges and surpassing it.

Taken from http://skillprogramming.com/top-rated/php-best-practices-1234

Array is passed by value

Because PHP syntax is very similar to C and C++, it can use “&” reference operator and pass the parameter by reference in a function. But this will be very different from other languages such Python and JavaScript. Python and JavaScript function parameters are passed by value for all primitive data types, such as integer, float, string, and boolean; complex data type like object and array are passed by reference, meaning they are mutable, including Date object in JavaScript.

function change_array($arr) {
    $arr[0] = 100;
}

function change_object($obj) {
    $obj->value = 100;
}

function change_many_objects($arr) {
    $arr[0]->value = 100;
}

function change_object_array($obj) {
    $obj->array[0] = 100;
}

class MyObj {
    var $value;
    var $array;
}

function main() {
    $arr = [1, 2, 3];
    $obj = new MyObj();
    $obj->value = 10;

    change_array($arr);
    change_object($obj);

    echo $arr[0], "\n"; // still 1, not changing
    echo $obj->value, "\n"; // changed to 100

    $arr_obj = [ new MyObj(), new MyObj(), new MyObj() ];
    $arr_obj[0]->value = 10;
    change_many_objects($arr_obj);
    echo $arr_obj[0]->value, "\n"; // changed to 100

    $obj_arr = new MyObj();
    $obj_arr->array = [1, 2, 3];
    change_object_array($obj_arr);
    echo $obj_arr->array[0], "\n"; // changed to 100

    $obj_a = new MyObj();
    $obj_a->value = 10;
    $obj_b = $obj_a;
    $obj_b->value = 20;
    echo $obj_a->value, "\n"; // 20
    echo $obj_b->value, "\n"; // 20

    $obj_c = &$obj_a;
    $obj_c->value = 30;
    echo $obj_a->value, "\n"; // 30
    echo $obj_b->value, "\n"; // 30
    echo $obj_c->value, "\n"; // 30
}

main();

In the example above, the function change_array() will not modify the array that being passed, this is because it is passed by value. Unless we use the “&” reference operator.

The function change_object() will change the object that being passed.

One of the key-points of PHP 5 OOP that is often mentioned is that “objects are passed by references by default”. This is not completely true. […]

(from PHP manual)

So, basically, the function parameters are passed by value, even though it is an array. But the object will be dealt differently. We can treat it as a pointer, if you are familiar with C++ “new” operator. In C++, “new” operator will create an instance and return a pointer to the instance. If we understand this concept, then this is how it works in PHP (according to what I know).

Consequently, the function change_many_objects() though the argument is for an array, and an array is passed into it, but the function changes the value of the object within the array. This is because the array stores the pointer to the object instances. The function does change the instance is pointed by the “pointers” stored in the array.

In summary, PHP deals array as value, which is different from Python, JavaScript, and even C and C++. However, PHP deals object as pointer, that is why object is mutable when it is passed to a function.

Other limitations

PHP was created before the technology of RESTful API. PHP focuses on the GET and POST, but not like PUT and DELETE. The HTTP methods are server dependent. As a result, the HTTP server such as Apache requires some extra configurations for PHP to work. Unlike Node and Ruby on Rails, Node itself has a HTTP module; Ruby on Rails has WEBrick HTTP server.

Comparing to the language like Python, Node with JavaScript, Ruby, Lua, it lacks of REPL (read-eval-print loop). Interactive shell is different from REPL. With REPL, the function to print the result in the console is omitted. REPL will print the result whenever the function returns value.

Closure

In JavaScript, we can create closure like this,

var foo = (() => {
  let x = 0;
  return () => {
    return x++;
  }
})();

for (var i = 0; i < 10; i++) {
  var x = foo();
  console.log(x);
}

But translating above code into C++, it cannot work as expected for the variable, unless the variable is outside the lambda function.

// In a function
int x; // variable here
auto foo = [&x]() {
  x = 0;
  return [&x]() {
    return x++;
  };
}();
for (int i = 0; i < 10; i++) {
  int x = foo();
  cout << x << endl;
}

This is because C++ variable can be only accessed in the function scope. After the function return the value, the variable is not accessible anymore.

Similarly, Python can also return the function, but it cannot work as closure like JavaScript. However, Python can create non-primitive variable such as object or list so that the variable will be accessed by reference.

def _foo():
    x = [0]
    def increment():
        x[0] += 1
        return x[0]
    return increment

foo = _foo()
for i in range(10):
    x = foo()
    print(x)

CSV to something

Just revived my old script into a project in GitHub, csv_to_something. An old simple script that was created to manage some student data. Because the students data were collected through Google Forms, so I convert it to CSV, then from CSV to SQLite. So that I can use the SQL to query whatever data I need.

Using SQLite software such as sqliteman and sqlitebrowser, I can create any new table, grouping, sorting etc.

Then recently, I need some JSON data for the table. So, I revive the script to convert CSV to JSON format. With the functional programming characteristic in JavaScript, manipulating the data becomes more interesting. Using Node or any JavaScript interpreter, we can perform map and reduce. Other useful functions like find and filter can get what I want. Example,

// Read the JSON data as object
let students;
fs.readFile('./table.json', (err, content) => {
  if (err) throw err;
  students = JSON.parse(content);
});
let average = students.reduce((reduced, item) => item.score, 0) / data.length;
let grades = students.map(student => {
  if (student.score >= 90) return 'A';
  else if (student.score < 90 && student.score >= 70) return 'B';
  else return 'Something else';
});

Why SQLite and JSON? Because they are extremely lightweight.

A brief comparison of GTK+ and Qt

I used to like C language, because it is a basic of programming, and it is portable, and it is low-level. When writing program with C language, it is just like showing off your advanced programming skill, how you manage the memory, how you manage the pointers and creating the linked list. However, in terms of efficiency, C++ is much more powerful, because of object-oriented and the syntax.

Because I like C language, so I chose GTK+ over Qt for long time ago. Not only that, I am also fond of GTK+ desktop environments like GNOME, Xfce4, LXDE, Cinnamon, but not Mate. I feel that KDE is heavy weight.

One of my personal projects, Med, which was written with GTK+, had some issues with multi-threading. I believe that my engine part does not have any problem. As a result, the suspicious part is the GTK+. The program crash without symptoms, and difficult to reproduce. Therefore, I decided to re-write the UI with Qt. If, Qt also produces the same problem, meaning my algorithm is problematic.

In my opinion, GTK+ is straight forward, because of procedural paradigm. Therefore, it is easy to learn and implement. The UI can be design with Glade. But I feel that it is still lack of something, such as button click callback function. Besides that, though there is gtkmm (C++ interface of GTK+), the library like WebKitGTK+ does not have the documentation for gtkmm (I believe still can work). In summary, GTK+ is slower as writing the code in C language.

On the other hand, Qt is more complicated. To use it, better to use qmake project file to generate the Makefile, or use the CMake, so that we can include the necessary headers and link with necessary libraries. And we also need to use some macro. This makes me feel that the coding is very Qt-oriented, not just a C++ language. However, in terms of design, I feel that Qt Designer is much easier. But GTK+ and Qt layout uses different concept.

Though GTK+ and Qt are both object oriented, GTK+ is in the library level yet Qt is in the language level. It is easier to write inherited classes in Qt, and easily make changes at the language level.

As a conclusion, Qt is better than GTK+ for development as C++ is better than C.

Heading, anchor, and bookmarking

Sometimes I read online articles, and these articles are usually long pages and have outlines at the beginning. These outlines are the hyperlinks to the subtopics headings. Technically, you click the outline hyperlink, your browser will browse to the “anchor”, the URL will append with hash (#). Therefore, it is useful for bookmarking, so that you can share the URL target on the topic to someone else, or re-visit your bookmark.

Thus, I wrote a script as bookmarklet to solve this issue, so that I can click the headings when I am reading the article. You can create the bookmark with the following URL:

javascript:var elems=document.querySelectorAll("h1[id],h2[id],h3[id],h4[id],h5[id],h6[id],h1>*[id],h2>*[id],h3>*[id],h4>*[id],h5>*[id],h6>*[id],a[name]:not([href])");var array=Array.from(elems);array.forEach(function(elem){elem.style.cursor="pointer";elem.setAttribute("title","anchor:"+(elem.id?elem.id:elem.name));elem.innerHTML+=" *";elem.addEventListener("click",function(el){location.hash=el.target.id?el.target.id:el.target.name})});

(Yes, it starts with javascript: instead of http://. Sadly WordPress.com doesn’t allow to create bookmarklet in the post. Moreover, I was actually using ⚓ instead of asterisk *, but WordPress.com auto convert it to emoji, consequently cannot use it in the script above.)

You can try any Wikipedia page to test it. It also works on Stackoverflow, so that you can share the URL to a certain answer to others.

Lecturer, researcher, hobbyist, and software developer

I am cognitive science student. That is why I learnt AI, computational linguistic, machine learning, expert system, etc. 

Since I was a researcher on Augmented Reality, then I applied my computing skills. After this, I became a non-computer related lecturer, and spent my time doing programming as a hobbyist. Then later I became computer science lecturer, yet still had to do programming as a hobbyist. Now as a software developer, I know what are the differences of these roles: lecturer, researcher, hobbyist, and a true software developer. 

A lecturer, as long as they know how to bluff, they are considered good in programming skill.

A researcher, whether they are good in programming skill is totally not important. Because the focus is the research. As a researcher, you do not need to write programs  or scripts. Though these may help, they are not required. As long as your research methodology is correct, that is the core element of the research.

As a hobbyist, you can learn any programming language and write any program. You can write the code that you yourself think it is elegant. You can write any powerful function and use a simple but accurate solution to solve your own problem. You can share your code and publish your project. You can have very strong knowledge to write wondrous algorithm and solve the problem effectively and efficiently. BUT, you are still lacked of industrial experience. 

An actual software developer who works in industrial, what you need is to produce stable products. Yes. Stable. In order to produce stable products, you need to have your products fully tested. Hire a tester can only perform black box testing. They test only as an end user perspective. But the problem is, the bugs are found after you have written the dependent codes on the bugs. So, it would be best to test before these, during the development level. As a developer, writing test cases becomes exhaustive. That is why, Test Driven Development comes in. A stable product is the actual thing that the customers want. They paid the money to get the product, or specifically, the customers hire you to develop a product to them. That is why your products must be stable. Unlike the commercial products, the government projects usually have no clear objective. As long as you deliver a prototype, you can get the money. That is why, the products standards are very different, the developers quality is also very different.

As a lecturer, you talk.

As a researcher, you do experiments.

As a hobbyist, you learn and write.

As an actual software developer, you write and test.