2013-04-30

Thomasbrötchen


Heute Morgen fragte Kerstin mich: ,,Kannst du zur Bäckerei gehen und 10 Franzbrötchen kaufen?''
Ich antwortete: ,,OK. Aber wie heißen die nochmal?''
,,Franzbrötchen, wie ein Name, Franz.''
,,OK.''

Ich ging die Straße entlang und ich sagte den Namen immer wieder um ihn nicht zu vergessen.
,,Franzbrötchen, Franz--bröt--chen, eins, zwei, drei. Franzbrötchen, Franz--bröt--chen, eins, zwei, drei...''
Um die Ecke traf ich einen Freund.
,,Hallo Hitoshi. Schöne Ostern.''
,,Hallo Thomas. Schöne Ostern.''

Ich ging weiter.

,,Thomasbrötchen, Thomas--bröt--chen, eins, zwei, drei. Thomas--brötchen.''
In der Bäckerei sagte ich der Verkäuferin: ,,Hallo, schönen Tag. Ich möchte 10 Thomasbrötchen, bitte.''
,,Wie bitte?''
,,Thomasbrötchen, 10 Stück.''
Sie fragte ihren Mann.
,,Franz, was sind Thomasbrötchen?''
,,Ach ja'', sagte ich, ,,Franzbötchen, 10 Stück, bitte.''
Franzbötchen
(Es gibt eine ähnliche alte japanische überlieferung, Dango-dokkosiho.)

2013-04-26

Math objects on programming (2)

Last article, I showed a simple enumeration generator. But I needed one more flexibility. In my job, I use GPU, that is a fast processing unit, but the available memory size is one order of magnitude small compare to a decent workstation. (E.g, GPU's memory size is 2GB to 6GB, a decent workstation can have 24GB to 256GB main memory.) In this pseudo code example, the unit of memory size is GB. 64GB or 512 GB are too much to the current stare of the art GPUs in 2013.  512GB is too much for the best workstation. Therefore, our product uses a cluster, many workstations are cooperates to do a single job. Almost every customer wants to see how our products scales regarding to the number of workstations. Because they have their needs and they want to know how many workstations and how many GPU are needed for their task. Therefore, we need to demonstrate how the performance changes depends on the number of nodes. Here one more parameter, the number of nodes are added as the following.

 data_size_list = [ 5, 64, 512, ]
 screen_resolution_list = [ 
   '2560x1440', '3840x2160', ]
 node_count_list = [ 
   1, 2, 4, 8, 16, 32, 64, ]

However, most of the customer immediately understand if the memory can not hold the data, the system becomes extremely slow. Therefore, the performance graph only needs more than number of nodes that can hold the input data. When we need to generate such parameter combination, the simple idea is to filter out the unnecessary data points. Here is a for-loop implementation example.

for d in data_size_list:
for d in data_size_list:
  for s in screen_resolution_list:
    for n in node_count_list:
      # FILTER: filter some cases.
      # assume one node can handle 
      # 64G, but not more
      if (n * 64 < d):
        continue
      item_list = [
        d, s, n,
      ]
      comb_list.append(item_list)

Then I thought that I would like to apply the same idea to the direct product. I actually started the for-loop implementation, so this filtering idea seems simple. But I figure out that I need the filter function for each input parameter set. I actually started implementing such code, but I felt it looked way to complicated. Most of the filter function does nothing in this case. Moreover, when I extends the functionality, I found the change is too complex, I can not handle such complexity. I believe it will work, but the implementation is too complex for me. It is hard to explain my ``too complex feeling.'' Though, that is the feeling: ``Why does this simple problem needs such a complex implementation?'' I usually have a feeling when I didn't understand the problem itself well.

I re-thought the problem and tried to formalize, and asked myself: what is this problem? The number of nodes I needed is depends on the data size. Why did I use filter function? This is a generate-and-test algorithm. But do I really need the test function after generating the combination candidates? At that point, I figured out. This is just a map from data size to the number of nodes, I don't need the test function. I can have it as an input. The mathematical object I should use here is a `map' instead of a `filter'. My final implementation was the following.

data_size_list = [ 5, 64, 512, ]
screen_resolution_list = [ 
   '2560x1440', '3840x2160', ]
data_size_node_count_map = {
  5: [1, 2, 4, 8, 16, 32, 64]
  64: [8, 16, 32, 64]
  512: [32, 64]
}
all_list = [
  data_size_list,
  screen_resolution_list,
]
comb_list = []
c_tuple = [''] * len(all_list) 

def make_tuple(idx):
  if(idx >= len(all_list)):
    comb_list.append(
        copy.deepcopy(c_tuple))
  else:
    for i in range(len(all_list[idx])):
      c_tuple[idx] = all_list[idx][i]
      make_tuple(idx + 1)

def gen_combination_list():
    make_tuple(0)
    for i in comb_list:
        gen_with_node(i)

gen_with_node() function generates combination lists by using data_size_node_count_map() function. This is more efficient since there is no generate and test (means discarding is included), and simpler since the input data doesn't need filter functors. I sometimes think that some people's code misuses the mathematical objects. I sometimes criticize the code from people who has high implementation ability, but less considering the mathematical objects.

I used to respect these people because of their high implementation ability. I used to try to write such code. However, often I experienced that I was be able to write a simple program if I thoroughly thought the mathematical meaning of the program. First I surprised my program is often comparable speed or sometimes faster than them. Moreover, my code tend to be simple and less bugs. This was a surprising discovery and then I tend to study more on a mathematics aspect of the program.

This time, I mistook which mathematical object to use in my program. I should review myself first.

By the way, why does it usually better that thinking mathematical object in a program? Are there any reasons? I have two hypothetical reasons. One is we can use all the mathematical research results that has a thousand years of history. Many genius thought through about many problems. They classify the problems and found many patterns to solve them. I don't surprised these genius's ideas are better than my ideas that is usually made up in a five minutes. Actually I never remember that my own idea was better than the combination of these genius's ideas. The other reason is many programming language supports these mathematical object. Many of computer language designers also found the mathematical object is useful to solve problems. Therefore, these objects are often supported. The language supports the mathematical object, it is usually efficient and simple. In this particular problem, I use a direct product and a map. For instance, map is supported in Python as `dict', lisp has `mapcar', Java has a Map in util, C++ STL has `map` template class.

This was a simple problem, but it turned out it's interesting to me.

References

[1] Hisao Tamaki, ``Introduction to probability theory for the information scientist (in Japanese)'', Saiensu, ISBN4-7819-1012-2, 2002

Implementation example code

http://sundayresearch.de/hitoshi/sundayresearch/programming/Python/20130323_math_object/enum_code_20130323.zip

2013-04-25

Python PIL experiment (a image comparison tool) continued


PIL and numpy

When I ran this program on my data files, I found the processing time is around 6 seconds, the memory consumption size is 230MB on a 1024x1024 size image. When I processed images resolution of 3840x2160, it took 263 seconds and 2.3 GB memory is consumed. The difference of these resolutions makes only eight times different number of pixels. But the processing time is increased more than 40 times. In my program I only use three buffers for processing, my first estimated minimal program sizes are 10MB for 1024x1024 resolution and 72MB for 3840x2160 resolution. However, the `top' reported 30 times more memory size.

When I profiled the program, the most of the time is consumed by the tuple construction (RBG value) and abs function. Therefore, I tried to use numpy to vectorize these code. A table below shows the result. My test environment of Intel Core i7-2720 2.20GHz Linux (Kubuntu 12.10, kernel 3.5.0-27), Python 2.7.

+-----------+--------------------------------------+
| image res |   1024x1024      |    3840x2160      |
+-----------+-------+----------+--------+----------+
|           |  mem  | time     |  mem   | time     |
+-----------+-------+----------+--------+----------+
| native    | 230MB | 6.0  sec | 2300MB | 263  sec | 
| numpy     | 110MB | 0.21 sec | 320MB  | 1.18 sec |
+-----------+-------+----------+--------+----------+

The performance was improved 30 times and up to 200 times faster. The memory consumption size reduced to 50% up to 15%. Actually, my first implementation can be improved only twice faster, so I was disappointed. After profiling, I found the sum function spend most of the time. I used the sum function to count the non-zero elements in the array. This sum function is python build-in function and can access to the numpy's array. However, I expect this sum function accesses to the each data and return to the python environment. When I replaced this sum with numpy.sum, the numpy.sum executed almost no time. I achieved 200 times better performance. This is pretty much like to matlab programming. (numpy is a matlab's Python port. I mean it is similar not only the syntax, but it is also similar to how to get the performance.)

ImgCompNumpy.py code

2013-04-24

Python PIL experiment (a image comparison tool)


Abstract:
Writing image comparison tool with Python PIL.

Python PIL module

Python Imaging Library (PIL) is a useful Python module to process image files. This time I have a situation that

  • I have different image file format files
  • But the contents must be the same.

For example, I wrote a image generation tool and I want to test it. I compress the reference images, but my program produces images with non-compressed image file format. I can use convert (ImageMagick) tools, though this time, I just would like to try a new tool. You can find my image comparison tool here.

2013-04-22

Math objects on programming (1)

Abstract:

Using mathematical objects often makes a program simpler. This time I have such experience and write it down here.

Mathematical object and programming

In a program test, we often need to generate a combination of input parameter sets. One of the most easy method to generate a combination is using nested loops.

In this article, I use pseudo code based on the Python language. I will provide the real implementation of the program in the appendix.

For example, we have following two parameter sets:

 data_size_list = [ 5, 64, 512, ]
 screen_resolution_list = [ 
   '2560x1440', '3840x2160', ].

The following program can generate the combination of them:

  for d in data_size_list:
    for s in screen_resolution_list:
      print_comb(d, s) # output

This method is simple and straightforward, however, less flexible in some cases. For example, we don't know which sets are necessary to generate a combination when the program is written. To overcome this problem, we can use an algorithm, direct product, to generate a combination [1].Let's assume there are \(k\)-sets and we want to know all the combination of these \(k\) set's elements. We can define such combination as the following:

  1. \(k = 0\), this means 0 sets. The direct product result is one empty list ([]).
  2. \(k \geq 1\), \begin{eqnarray*} A_1 \times \cdots \times A_k &=&\left\{(a,t)| a \in A_1, t \in A_2 \times \cdots \times A_k \right\}\end{eqnarray*}

The second condition means that if we have an element combination list from \(k-1\) sets, then we can add one more element (\(a\)) from the \(k\)'s set to generate an element combination list from the \(k\) sets.

An implementation example is the following:

data_size_list = [ 5, 64, 512, ]
screen_resolution_list = [ 
   '2560x1440', '3840x2160', ]
all_list = [
  data_size_list,
  screen_resolution_list,
]
comb_list = []
c_tuple = [''] * len(all_list) 

def make_tuple(idx):
  if(idx >= len(all_list)):
    comb_list.append(
        copy.deepcopy(c_tuple))
  else:
    for i in range(len(all_list[idx])):
      c_tuple[idx] = all_list[idx][i]
      make_tuple(idx + 1)

def gen_combination_list():
    make_tuple(0)
    for i in comb_list:
        print i

make_tuple() function is the implementation of the direct product.

The direct product is an mathematical object. I generalized the process of generating combinations. In direct product code, the number of nesting of for-loop is depends on the input data. This means, the number of nested loops is defined by the input data. We often change the test case depends on the situation. The for-loop implementation needs to change the code every time and this direct product implementation doesn't need to change the code. This example is too simple and you might not see the necessity, however, my case needed a flexibility and this paid off.

Next time, I will explain how I mistook the mathematical object in this program.

2013-04-04

My solution of Google drive hang up at "One moment please"

Today I installed Google drive to my Windows 7 environment to share files with my Linux machines. After sign in, the application window said "processing," then it hanged up. There was a button "you must enable javascript". I pushed it, then "One moment please..." after 5 minutes, I exited the program tried it again. It seems some security setting causes this problem.

My solution: set https://accounts.google.com as a trusted site.
Procedure:

  • Open the control panel
  • Go to network and control
  • Go to Internet Options
  • Open Security Tab
  • Click Trusted sites
  • Click the "site" button
  • copy & paste https://accounts.google.com to "Add this website to the zone" and click Add button
Now it worked for me. But if I removed this site, it still works. That puzzled me a bit...