Firefox batch download extension: DownloadSelected


In my previous post, I mentioned about DownThemAll on Firefox 56. Now the latest release of Firefox is version 61, but there is no update of DownThemAll for Firefox Quantum support. Using older unsupported version like Firefox 56 is not a good idea, because there will have no security update. Firefox ESR 52 is slower than Firefox 56 as I have mentioned.

Since I have spare time recently, I spent some days to write a Firefox extension, DownloadSelected, to solve my problem.

DownloadSelected demo

The screenshot above shows a list of URLs, which I used Greasemonkey script to generate. Then highlight, right-click, and DownloadSelected. I don’t write any UI elements to indicate the download progress, but I put the progress to the console. Once the files are downloaded, they will be archived into a zip file and a Save As dialog will be shown. The source code can be found here.

This is not a replacement or alternative extension to DownThemAll, but this extension solve my fundamental problem. Main features are:

  1. Bulk download
  2. Download only selected text
  3. Downloaded filenames are based on HTML text instead of URLs
  4. Quantum and Google Chrome compatible

 

Advertisements

How to solve C/C++ memory leaking (in Linux)?


My hobby project Med is written in C++. A lot of implementations need to use dynamic memory allocation and instantiation. Because complex data is impractical to be passed by value, like the case of JavaScript object and array. Since C++ doesn’t have garbage collection, it is possible that the developer doesn’t free/delete the dynamically created memory properly.

As in my project Med, the program will retrieve the memory from other process. That means, it needs to make a copy of the scanned memory. And this will involve creating dynamic memory (using new operator). When using the program to filter the memory to get the target result, it needs to get a new copy of memory with the updated values, then compare with the previous copy. The unmatched will need to be discarded (free/delete); the matched will need to replace the old one (also free/delete the old one, because the new one is dynamically allocated).

Though it can be easily expressed in verbal form, writing the algorithm is usually inducing some human mistakes, such as using wrong variable name. As a result, we may access the memory that is not dynamically allocated, or freeing the memory twice. These will cause segmentation fault, and C++ compiler will not tell you where the error comes from. Debugging is hellish in this case. As a result, I used several solutions to fix memory leaking.

Adopt TDD

I have covered this in my previous post. Smaller function is easier to be tested. Just make sure your functions or methods are testable.

Valgrind (Linux only)

Compile your application with debugging information (g++ with -g option). Then run your test suite with valgrind.

valgrind --leak-check=yes ./testSnapshot

It will tell you which function access the non-allocated memory, how many bytes are untracked, and so on.

Valgrind is super useful to check memory leaking.

Memory Manager

Because I wrote the code with some mistakes, so the memory is freed twice and causes the error. But I failed to find where is the cause (which finally I found it).

Therefore, I wrote a memory manager to make sure the memory will not be freed more than once. However, this is actually not a good solution to avoid memory leaking. It just prevents the program to free the memory twice.

ByteManager::ByteManager() {}

ByteManager::~ByteManager() {
  clear();
}

// Singleton
ByteManager& ByteManager::getInstance() {
  static ByteManager instance;
  return instance;
}

Byte* ByteManager::newByte(int size) {
  Byte* byte = new Byte[size];
  recordedBytes[byte] = byte;
  return byte;
}

void ByteManager::deleteByte(Byte* byte) {
  auto found = recordedBytes.find(byte);
  if (found != recordedBytes.end()) {
    delete[] recordedBytes[byte];
    recordedBytes.erase(found);
  }
}

void ByteManager::clear() {
  for (auto it = recordedBytes.begin(); it != recordedBytes.end(); ++it) {
    delete[] it->second;
  }
  recordedBytes.clear();
}

map ByteManager::getRecordedBytes() {
  return recordedBytes;
}

Then, I refactored both new operator and delete operator to be replaced by newByte() and deleteByte() methods. And I use a std::map to store all the created memory.

Smart pointer

C++11 introduced smart pointers shared_ptr and unique_ptr. unique_ptr can be replaced by auto_ptr before C++11. By using smart pointer, we can use new operator to instantiate an object, then we need not to delete it. Because it will be deleted automatically when it lost the reference. For example,

typedef std::shared_ptr<SnapshotScan> SnapshotScanPtr;

// some where else
SnapshotScanPtr matched = SnapshotScanPtr(new SnapshotScan(curr.getAddress() + i + currOffset, scanType));

So, program will instantiate a new SnapshotScan object as the shared_ptr. Then it will be stored in a std::vector. When it lost the reference, such as removed from the std::vector, it will be deleted automatically.

In my opinion, it is a better solution than the Memory Manager above. However, my existing project can be hardly refactored to use smart pointer.

C++ Unit Test and Dependency Injection


TDD (test driven development) is widely adopted in modern development such as web development. Because it allows the developers to test the solution robustly in order to produce a more stable product.

Higher level programming languages like JavaScript and Ruby allows the developers to easily mock the functions and data to test the target specification. However, programming language like C++ is not designed for TDD. It will be more complex if you want to mock functions.

In order to adopt TDD, we need to write the function as small as possible, so that it can be easily mocked. As a result, the design of the object class needs to be testable. The methods that we are going to test needs to be publicly accessible.

Unlike JavaScript, C++ is prototype-based programming language. JavaScript can easily be mocked by overriding methods. In order to mock the C++ class, I have to use inheritance and mock the public methods. Example from my project Med.

class SnapshotScanTester : public SnapshotScan {
public:
  SnapshotScanTester() : SnapshotScan() {
    scannedValue = NULL;
  }
  virtual Bytes* getValueAsNewBytes(long pid, ScanType scanType) {
    Byte* data = bm.newByte(8);
    memset(data, 0, 8);
    data[0] = 20;
    Bytes* bytes = new Bytes(data, 8);
    return bytes;
  }
};

In order to mock the class SnapshotScan, I need to make the getValueAsNewBytes method as virtual public. Then using CxxTest framework to test with SnapshotScanTester.

Dependency Injection

I learnt the term Dependency Injection when I was developing with AngularJS project. By using dependency injection solution, we can create the services and inject to the client (object).

Med project is complex. The main feature of Med is to scan the memory of other process. Mocking other process is impractical for the test. In order to solve this, I refactored the code that involves PID (process ID) as the parameter, and write them into a service. For example,

class SnapshotScanService {
public:
  SnapshotScanService();
  virtual ~SnapshotScanService();

  virtual bool compareScan(SnapshotScan* scan, long pid, const ScanParser::OpType& opType, const ScanType& scanType);

  virtual void updateScannedValue(SnapshotScan* scan, long pid, const ScanType& scanType);
};

Before the refactoring, my code is doing something like

snapshotScan.compareScan(pid, opType, scanType);

By refactoring them into a service, then I can mock the service to produce any value for the testing.

In order to inject the service, I wrote the constructor that can accept the service,

class Snapshot {
public:
  Snapshot(SnapshotScanService* service = NULL);
  // ...
};

If service is not provided to the constructor, then default service will be instantiated. Therefore, if I want to mock the service, I just create a new class derived from SnapshotScanService. For example,

class SnapshotScanServiceTester : public SnapshotScanService {
public:
  SnapshotScanServiceTester() {}

  virtual bool compareScan(SnapshotScan*, long, const ScanParser::OpType&, const ScanType&) {
    return true;
  }
  virtual void updateScannedValue(SnapshotScan* scan, long pid, const ScanType& scanType) {
    Bytes* currentBytes = Bytes::create(20);
    currentBytes->getData()[0] = 60;
    scan->freeScannedValue();
    scan->setScannedValue(currentBytes);
  }
};

Then in the test,

    SnapshotScanService* service = new SnapshotScanServiceTester();
    SnapshotTester* snapshot = new SnapshotTester(service);

So, by using the dependency injection, I can finally test my class properly to make sure its functionality reliable. Without unit test, meaning I need to test my code by running thousands of time to create different situations. I am not genius, I don’t think that my code without proper test can function properly.

Complexity and simplicity


When we are developing a solution or a system, we are prone to choose a simple solution. Because simple solution is just better than complex solution. However, most of the time, we choose a simple solution inappropriately, and this causes more troubles gradually when the system is growing.

The complexity of a solution, should depend on the complexity of the problem itself, not the other way round. For example, we cannot create an operating system with a single line of programming statement. We also cannot create an operating system with just a single source file. Because an operating system is very complex (managing devices, memory, process, etc), no simple solution can fulfil the requirements.

That is why, most of the time global variables are not encouraged, because they become difficult to be managed when your source code is growing. However, if the problem is simple and global variables can solve the problem efficiently, then the approach will be acceptable.

Our human mind is limited. We cannot process too much information. Hence, if a source file contains a lot of global variables (or similar case like too many parameters in a function), we cannot process the information well. Because it is complex. And when a function is too long, with hundreds line of statements, we cannot remember what was happened in the beginning of the function. However, if we organize the variables and parameters properly, then we can process the source code much better.

As UNIX philosophy, “Do One Thing and Do It Well” (DOTADIW) (so does Microservices), this is what we ought to design our solution. We simplify the solution, not the problem, because problem cannot be changed. As a result, a very complex problem will need a lot of simple solutions or services to be built.

In reality, life form like human is complex, that is why we have multiple systems such as digestive system, respiratory system, circulatory system, etc. And each system is focusing on one task. However, the low life form organism like amoeba is very simple. We cannot expect the biological system of amoeba is workable on a human. Moreover, a large organization will need a very complex management system (not in terms of the software system), comparing to a small organization. You cannot expect the CEO have contact with thousands of employees every day in the large organization. But in a small organization, CEO can contact with every one in the team.

Therefore, if a problem is complex, or the system requirement is complex, we can only “divide and conquer” by breaking down the main problem into sub-problems, then for each sub-problem we solve it with smaller and simpler solution.

Pyramid, tree, or pipeline

When a community is growing, it will end up become a pyramid like hierarchy system. When a file folder is growing, it will end up become tree structure. If the data flow is linear, then pipeline will be the appropriate solution. Therefore, as the system is growing, your information needs to be passed from unit to unit. It is inefficient to convey the message, but it is efficient to be managed.

(But in reality, pyramid hierarchy is troublesome, because human is full of flaws and corruptions.)

Pure function

Interestingly, by learning ReactJS, uses the pure function method for development helps managing the code much simpler. Because all the input of a function is immutable, or read-only. That means, you will not create a side effect to the parent component or the caller. Similar to microservices, we just need to focus on the functionality of each component.