2013-12-22

Hier is an advertisement on a floor of U-Bahn station Kurfürstendamm.

Advertisement on the floor of U-Bhf Kurfürstendamm 1

Advertisement on the floor of U-Bhf Kurfürstendamm 2

2013-12-07

2013-10-05

Pyramid (exponential) Power (2)

Here is the answer of my math question.

A1. 

Assume the 3rd and 4th levels are complete, Hitoshi can get the money: 40^2 * 1000 * 0.25 * 0.9 - 1000 = 359000 Euro. Hitoshi is rich!

A2.

Daniel is the same. Wait, Daniel needs the 6th level of the pyramid. To complete the 6th level, we need:

    1+ 40 + 40^2 + 40^3 + 40^4 + 40^5 = 105025641 (around 100 million)> 2.5 million

There is not enough people in Berlin for Daniel. He lost his 1000 Euro every month. and Hitoshi lost Daniel as a friend. ;_;  Actually German population is around 80 million, so, even if all the German people joined this, Daniel can not get the expected full payment. Actually next answer shows Daniel has absolutely no chance to get any Euro.

A3. 

Hitoshi needs total 5 levels.

    1+ 40 + 40^2 + 40^3 + 40^4 = 2625641 > 2.5 million

So Hitoshi's level needs a bit more than the people in Berlin. But if everyone is convinced in Berlin, even babies and children pay 1000 Euro every month, Hitoshi can be rich. (Which is basically impossible.) This also means Daniel has no chance to get any money.

A pyramid system is basically a Monarchy system. The king can get money and it is true someone is the king. So you can see the proof of someone gets the money. But I think the question is ``you'' can be a king or not. Well, good luck for that. I think if Hitoshi saw more than 30 people related with this system, Hitoshi has basically no chance in Berlin, Hitoshi is in the slave level in this case.

Essay question: Why do some pyramid systems try to expand to the foreign countries?

A picture at near my company.

This place changes quickly....

Pyramid (exponential) Power (1)

When I walked down a street, I was thinking about creating an exponent exercise for my math class. I sometimes just get inspired an idea when I walk down a street. My main purpose is to show how fast the exponential can grow.

Hitoshi has no money, so he wants to join a pyramid system. The system asked to him to buy something 1,000 Euro every month from the company. But, if he has 40 people for his next level, and the next level has 40 people each, he can get some commission from 25% of two levels lower's people's purchase. When he got the money, he also need to pay 10% to the higher level. Berlin is quite new for this system, so there are only founder (level 1) and co-founders(level 2), 41 people are there. Assume Berlin has 2.5 million people.

  1. How much money Hitoshi can get every month? Assume Hitoshi can convince even everyone in Berlin.
  2. Hitoshi asked to join his friend Daniel, to be his next level person. How much money Daniel can get?
  3. How many people must be convinced that Hitoshi get money in Berlin?

The answer will be in the next post.

2013-09-06

Ich suhe einen Mann, der mich am 5.9 um 0:40 Uhr ...

A strange man is sitting at a S-bahn station. His note said, "Ich suche nach einem Mann, der mich am 5.9 um 0:40 Uhr auf dem Ku'damm tätlich angegriffen hat und dann weggelaufen ist. Er rief, dass er alle Ausländer, Chinesen und Japaner hasse. Wenn er wirklich dir Zukunft von Deutschland (und der Welt) retten will, dann kann ich ihm einen Job anbieten. Aber nur für kurze Zeit. Sehe sundayresearch.eu".

 My translation is "I am looking for a man who (physically) attacked me on Sept. 5th at 0:40 at Ku'damm and then ran away. He said, he hates all foreigner, Chinese, and Japanese. If he will really save the Germany (and the world), I can offer a job for him. But only the short time. See sundayresearch.eu."
A man with eyepatch.

His odd note.

2013-09-04

Passing command line options that have white spaces to a bash script: $* and $@

I always try to simplify a command line parser implementation regardless any programming language: C++, bash, ... To do that, I restrict the command line option to a regular form only. All my arguments should be

    '-arg_key value'

form. Even I want to specify a file name, my program needs an argument key, e.g., '-in_file input_filename. If you do in this way, each command line argument has the key, so I can put all the command line options to a map. This makes the command line parser simple. I usually don't need getopt library. Also I try to simplify the command line option support, means smaller number of command line options, and use a config file, which contains 'key = value' lines. This has two advantages: you can support negative values without confusing the command line option, easy to reproduce the test case.

However, this method still has a problem when the command line option includes white space. If I could, I will only try to use config files, but, in practice, it is not always the solution.

Let me show you such command line options. For instance, I want to pass a vector to my command as the following.

   command -eye_position '0 0 -10' -up_vector '0 1 0'
If I run a executable by hand from a shell, this is fine, but I usually want to run a command from a shell script like for automated tests. Here is an implementation example. test_1.sh:
  
-- test_1.sh --
echo "call with two args, but the second one has spaces."
echo "./test_2.sh args0 'args1_1 args1_2 args1_3'"
./test_2.sh args0 'args1_1 args1_2 args1_3'
-- test_1.sh --
The script test_2.sh just shows how the commend line options are passed.
-- test_2.sh --
echo "show the args with \$*."

for i in $*
do
    echo " $i"
done

echo "show the args with quoted \"\$*\"."

for i in "$*"
do
    echo " $i"
done

echo "show the args with \$@."

for i in $@
do
    echo " $i"
done

echo "show the args with quoted \"\$@\"."

for i in "$@"
do
    echo " $i"
done
-- test_2.sh --
What I want to get in the \verb|test_2.sh| is two arguments as the following:
  • $1: args0
  • $2: args1_1 args1_2 args1_3
Please note, in the second argument has while spaces. The execution result of the script is as following.
-- result --
call with two args, but the second one has spaces.
./test_2.sh args0 'args1_1 args1_2 args1_3'
show the args with $*.
 args0
 args1_1
 args1_2
 args1_3
show the args with quoted "$*".
 args0 args1_1 args1_2 args1_3
show the args with $@.
 args0
 args1_1
 args1_2
 args1_3
show the args with quoted "$@".
 args0
 args1_1 args1_2 args1_3
-- result --
If I use $* or $@ only, the white space separates the second argument, and made four arguments. To prevent this, I quote the arguments in the script,but "$*" becomes now only one argument. The "$@" is the one I wanted. This is the difference between $* and $@ of bash. This is a detail, and not necessary to know about it I Think. However, if you know it, you are a bit happier than before, I wish.

Myron Krueger's Amazing demo in 1988.

Myron Krueger's Amazing demo in 1988.

This video below is Myron Krueger's 1988's demo. It is one of the origin of virtual reality and the interactive art. It includes 'zoom in' gesture by two fingers and moving around a figure by hand or finger gesture. It is common in these days on a smart phone and a tablet. (see around 4:30).

http://youtu.be/dmmxVA5xhuo?t=4m30s

But I have a bit strange side story. last year (2012), Apple win Samsung in lawsuit and get 1 billion dollars. One of the Apple's inventions is this zoom in effect by fingers.

http://www.ndtv.com/article/world/samsung-apple-lawsuit-penalty-of-1-billion-dollar-slashed-in-half-337277

Apple invented this ``technology'' in 2005 as I read.

More strangely, someone invented this technology in 1993, and he won the lawsuit against Apple.
http://easthamptonstar.com/News/2013829/Springs%E2%80%99-Own-Beats-Apple

Myron Krueger's demo is magnificent in many ways. The demo itself is great. It also shows these lawsuit has no meaning, and also shows the case of the patent system isn't working.

2013-09-03

I just found that some reason, Google translate doesn't like number 8 and prefer 6.


2013-07-20

Malala Yousafzai: UN speech Subtitled (English, Japanese)

October 2012, Malala Yousafzai has been shot by Taliban who against her campaign for children' rights of education. Here is subtitle of her full speech of her 16 years old birthday. English and Japanese. Please use a HTML5 and Javascript supported browser (Firefox, Chrome, IE10, ...) to watch the video. If you have a problem to watch the video, this page is another page to watch the video.





2013-07-09

Election of the House of Councilors

Today I visited the Japanese embassy to vote the Saninsen (House of Councilors). It seems LDP will hugely win. The focus is not over half, but 2/3 of the parliament. After the vote, they will restart more than 8 atom reactors. This time I felt the sign of perish of Japan. It is not about atom reactors, the mentality of the country.

Once Japanese faced to a disaster, people cooperated, they invented a new technology, rebuild the city, even a better one. Disasters were hard challenges, but Japanese eventually handle each of them and use them to develop themselves to the next level. Sometimes Japanese made an opportunity from a disaster. This time they even have not been able to rebuild the city in Fukushima, have no concrete plan for the new energy.  Even I hypothesize myself to use atomic reactors, the plutonium thermal use plan has been so delayed, no breeder reactor technology yet, no sign of practical fusion reactor technology, there is no future of light water reactor. Now we depends on the past legacy. Where did the strength of Japan go? When does Japanese start behaving a dying person who don't realize the death is coming, just extend a few more years life?  When? When haven't we seen the future?

I say to myself, Japan is just a country, it is a small things to compare to the whole world. Even so, I don't want to see that is perished. I didn't know that: to see the death of country which once was great, is sad. Especially the country I was grown up.

2013-06-26

Why does he need to run.


Edward Snowden is on the run. I wondered why? Because the president admitted his saying was correct. Why does he need to run?

The following is my thought simulation. You might find disturbing since in my simulation you don't find human factor. So I usually don't show it. But my motivation is I don't want to see him to be harmed.

``There is no worth to kill him.'' I thought. If someone or some organization kills him, what kind of effect is expected.

I found two types of organizations which can see the value to kill him.

One is the government, to avoid the domino effect. This government doesn't want to see other employee follows him. But this is obvious, if the government killed him, the world blame the government.

The other is the organization that against the government. People suspect the first motivation. If someone against the government, the one kills him, and claims ``The government did it. You see that the government is evil.''

Now his life's value becomes more and more important. Even I can imagine these, so, the government is again be a possible player. The government kills him and tell my second story and blame them.

I now understand why he must run. But, if he died, the government is suspicious now. I suggest to the government, you must protect him, keep him alive. Otherwise, any political opponent would take the advantage.

2013-05-26

I believe that the education is only the possible solution to make the world better. Would you like to join me? (In English)

This is a English translation of the last post.


Hi onigiri people,

How are you? Not seen for a long time.

I am still thinking about how I can help the Tsunami victims in  Tohoku. Although for most of the people, this is an old story. If so, you can skip the rest of my letter. You might have found a new challenging mission.

If you still care, Welcome. I finally got an idea.

I read an article about Tsunami victim children who have a problem to catch up the class in their school. I cannot stop thinking how can I help them.

I believe that the education is only the hope to solve real long term problems. But how can I help? A few months ago, I found the people who have the similar idea and they have implemented the system to help. It's free, but they need voluntary help.

It's called Kahn academy: "Khan Academy is an organization on a mission. We're a not-for-profit with the goal of changing education for the better by providing a free world-class education for anyone anywhere." I found this web page is fantastic. I take some biology and mathematics courses on this site. Unfortunately, it is only in English. But, we can voluntarily translate the site.

The number of uses of this site is more than millions. Isn't it great if Japanese or German children can also use it? For me it is just sad if children can not use the system because of just the language.

This work is steady, not a one time party. But this has an long-term impact to the world. I only work 30 minutes a day. Though I continue every day. If some of you join, the goal will be nearer.

The founder, Salman Kahn's talk. Introduced by Bill Gates.
http://www.ted.com/talks/salman_khan_let_s_use_video_to_reinvent_education.html

An interview of teachers who use the Kahn academy.
https://www.khanacademy.org/coach-res/KA-in-the-classroom/classroom-vision/v/why-use-ka

Here is how to join the translation (Japanese, German, ...).
  1. Visit https://crowdin.net/join and sign up for a Crowdin account.
  2. Under "Edit Profile" select your "Assistance Languages."
  3. Visit https://crowdin.net/project/khanacademy to select a language.
  4. Click the green "translate" button on the next page.
  5. Start translating!

I am so happy if you just think about the possibility of changing the world to be better.

Thanks a lot.

I believe that the education is only the possible solution to make the world better. Would you like to join me?

This is a letter I sent today to my old friends. We once worked together for Tohoku tsunami victims. Many thanks to RM to check my German.


Hallo Onigiri Leute,

Wie geht es Euch? Lange nichts von Euch gehört.

Obwohl für die meisten eine alte Geschichte ist, denke ich noch immer darüber nach, wie ich den Tsunami-Opfer in Tohoku helfen kann. Wenn Du damit abgeschlossen hast und eine neue, herausfordernde Aufgabe gefunden hast, kannst du den Rest meiner Email überspringen.

Wenn Dich die Frage immer noch beschäftigt: herzlich Willkommen,  Ich habe eine Idee!

Vor mehr als einem Jahr habe ich einen Artikel über Kinder, die Tsunami Opfer geworden sind gelesen. Die Kinder haben das Problem, den Unterrichtsstoff in der Schule aufzuholden. Seitdem mache ich mir Gedanken darüber, wie ich ihnen helfen kann.

Ich glaube, dass allein Bildung langfristige Probleme lösen kann. Aber wie kann ich ihnen helfen? Vor ein paar Monaten habe ich Menschen gefunden, die eine ähnliche Idee verfolgen und die das notwendige System haben, Bildung zu verbrieten. Das alles ist kostenlos, aber sie brauchen Freiwilligenarbeit.

Das Projekt heißt 'Kahn Academy': "Khan Academy ist eine Organisation auf einer Mission. Wir sind eine Non-Profit Organisation. Unsere Ziel ist es, die Veränderung durch Schaffung einer freien Weltklasse-Ausbildung für alle, überall, ermöglichen wird."  Ich habe diese Webseite fantastisch gefunden. Jetzt nehme ich Biologie und Mathematik Unterricht auf dieser Website. Leider ist es nur auf Englisch. Aber wir können als Freiwillige auf Teile dieser Website übersetzen.

Mehr als Millionen Leute benutzen diese Website. Ist es nicht toll, wenn japanische oder deutsche Kinder dies auch tun könnten? Für mich ist es traurig, wenn Kinder diese Website nicht besuchen können, weil das System nur Englisch ist.

Die Arbeit ist nicht keine einmalige Sache, sondern beständig und hat so einen langfristigen Einfluss auf die ganze Welt. Ich arbeite nur 30 Minuten pro Tag für die Kahn Academy, dafür allerdings das jeden Tag. Wenn einige von euch dazukommen, wird das Ziel näher sein.

Hier der Vortrag des Gründer, Salman Kahn. Eingeführt von Bill Gates.
http://www.ted.com/talks/salman_khan_let_s_use_video_to_reinvent_education.html

Ein Interview von Lehrern, die den Kahn academy verwenden.
https://www.khanacademy.org/coach-res/KA-in-the-classroom/classroom-vision/v/why-use-ka

So kann man selbst Übersetzen (Japanisch, Deutsch, ...) und Beiträge damit Sprechern anderer Sprachen zugängig machen:
 1. Besuche https://crowdin.net/join und melden Sie sich für ein Konto Crowdin.
 2. Unter "Edit Profile" wähle "Assistance Languages."
 3. Besuche https://crowdin.net/project/khanacademy um eine Sprache
auszuwählen.
 4. Klicke auf den grünen "translate" Knopf auf der nächsten Seite.
 5. Übersetzen!

Ich wäre glücklich, wenn Du über die Möglichkeit nachdächtest, die Welt zu verbessern.

Vielen Dank.

----

2013-05-22

Vision Summit 2013 (30.8-2.9)

Today, I heard the following summit. I have already registered.
If anyone comes to Berlin to join the conference, see you then!

http://www.visionsummit.org/events/308-192013.html


2013-05-21

Fascist hunter

I have a friend who is a writer. He is unsuccessful as Kilgore Trout, but he is not at all famous like Kilgore Trout. The title of today's story is ``Fascist hunter.''


Umaya is a fascist hunter. A fascist hunter is a bounty hunter who catches people who are called ``real fascism.'' It is difficult to find a real fascist. Since usually they are no different from the normal citizen. Each country's government has a problem to find them out and put a quite high award on them. The real fascist seems well organized. In Umaya's world, people need to pay tax for alive. If someone could not pay the tax, the one will be suspended. Although everyone has a right to alive at least one year per five years. Umaya failed his business once and he was in a cold sleeping machine for four years.

Umaya needed some money quickly. The hunter job is high risk, but also high return. Every hunter must put a think-support-unit in his brain. Because it is believed that the real fascist has a good brainwashing method. Many hunters never returned and sometimes they are found in the real fascist. This think-support-unit whispers to the brain that ``the most important human rights are person's freedom and the right of personal well-being'' and the brain feels it. This is believed to be able to against the brainwashing of the real fascist.

Sometimes the real fascists are hard to distinguish from the normal citizen. But this time there are victims. One is a high manager of a huge energy company and the other is the chief researcher of a multinational agricultural biotechnology corporation.

Umaya finally can be able to contact with a doctor who is called Leuko. Umaya suspected Leuko is a member of the real fascists. Umaya hid himself and he pretended to be interested in the real fascists. However, Leuko ans his friends never show any proof that they are the member of the real fascists. Leuko frequently visits some public schools and voluntarily gave some classes. Umaya once visited his class since he suspected there is some kind of brainwashing in the class. Children in the class show interests of the class. Leuko teach a biology class including what is cell and so on. He explains when cells grow too much, they restrict themselves not to grow anymore. Sometimes not so healthy cells suicides called Apoptosis. These suicidal cells make more room for the healthy cells. If every cell eats as it wants, and every cell wants to grow as it wants, the healthy cell can not survive anymore. These cells are called cancer cells. Umaya cannot find any brainwashing activity from these classes.

Umaya tries to find how can he prove Leuko is one of the real fascists. But one day, he got into too deep and captured by the real fascists. His think-support-unit has been removed. But Umaya never got any brainwashing.

Leuko explained the activity of their organization. They are a non-profit company. Their mission is saving the human species. They believe that sometimes the person's freedom and human being existence contradict. He shows an example that one person consumes energy to fulfill one's comfort, but if everyone consume such amount of energy, the children of the human being has no energy left. The person's happiness endangers the human being existence. What is the difference between cancer cells and these people who claims freedom and well-being? Fascism started at DNA level. It developed as DNA, cell, individual, family, tribe, nation, company, .... A person's well-being is a fascism, it is just the application range is small.

They are the REAL fascists. Fascists only care the specific groups, and ignore others. They care only the whole human being. Therefore, they against the personal freedom and well-being. It is secondary for them. Old fascists only think very small group, like one country. For Leuko, they are not the real fascists. The real fascists consider the human being regardless countries, races, and religious. Leuko said, if the fascists think only a small group is a problem. The extreme is person. The fascist who respects the freedom of a person is the worst fascist, worse than world war II time's fascist.

Leuko's organization entered the next level of the fascism. His fascism only care the human being, that is not global enough. The new wave of the fascism cares the earth, not only just human being. All the living creatures, all the life must be respected in the third level of fascism. One day, the fascism considers the whole universe, and completed. Without his think-support-unit, Umaya doesn't know that the Leuko's fascism is evil or not.

``Welcome to the real fascism.''



I told Billy that the idea is interesting. Fascism is considered an absolute evil, sometimes it's a taboo to even think about it. I think this ``without-thinking'' is dangerous. I think I found it interesting this aspect.

But this story has the similar ending with Kilgore Trout's story ``Welcome to the doghouse''. I suggest maybe he can remove the ending or changing the ending. He confessed that after he read the ``Welcome to the doghouse'', he can not resist to write the ending. But, he agreed that maybe he can remove the ending when he published the story.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

2013-05-05

Unselected man


I have a friend who is a writer. He is unsuccessful as Kilgore Trout,  but he is not at all famous like Kilgore Trout. I got a permission from him to write one of his plots here. The title is ``Unselected man''.

Billy is a successful businessman. He is also a charming person. But, he never succeeded with a woman. Actually, his mother also put some distance with him.
The success of business didn't bring him a joy. He tried to find what brings him a joy: looking around the world, try out many hobbies... When he came back to his town, he is only interested in teaching children. He could not find nothing can make a world better other than education.
He quieted his job. He put all his money to establish a small school. He chased his dream and to find a partner who may share his life. But no success. Whenever he feels a woman, the woman said the same answer: ``You are nice. I like you. But, it doesn't feel right.''
One day, he found a story about a polar bear story of Berlin's zoo. Knut is a polar bear, he was abandoned by his mother when he was born. Later he lived with a few female bears together, but they never mated.  He died and people found out that he had a brain disease.  One hypothesis is that maybe this is the reason he was abandoned and no mated. The mother and the female bears just felt it. 
Billy thought one thing. He went to a hospital and find some abnormality in his brain. He thought that is the reason all the women feel something not right on him. His dream was having children and teach them about the world, but he believe it will never happen. One day, he closed his room with black tapes and died. 
The selection of the title is natural selection. The man unselected by natural selection. Maybe I could not summarize his story well, but the main character, Billy, suffered living without accepted. His brain problem was very special in the story, any women feel it. That crystallized to the word, ``It doesn't feel right.'' He tried to live for children, however, he was not selected by the nature. I felt that this is an interesting as an Gedankenexperiment, but not as a story.

I asked Billy (the author), ``Is alive not good sometimes?'' He answered, ``If one suffered because of living, I would say no. I understand some depression patients who didn't feel sad, but, they just didn't feel joy and interesting in the world. That is hard. How I can say them to live.'' I was surprised since I know he suffered such problem. I try to continue, ``I think a life cannot be taken away by a person.'' Though I can not say anymore. He said, ``I agree this ending is not good. It is not yet really published, I wish I could change the end. But, I can not see any other ending so far.'' ``It's too hard to live if a person never accepted. I can easily say you should live if it is just saying, but I don't know what is the end of this story.''
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Memory leak


I have a friend who is a writer. But he only sold 20 books so far, he is  unsuccessful as Kilgore Trout, but he is not at all famous like Kilgore Trout. I got a permission from him to write his plot here. Otherwise, nobody knows him at all. The title of the story is ``Memory leak''.
The main character ``Fred'' in the story is a researcher of ``resurrection''.  It is called ``resurrection'' in general, but he found that is more like a sharing memory with past people. Fred found there are resurrected people who has more than two people's memory. Though many of the case, the contents of the memory is corrupted.
Fred realized he shared his memory with his young six-year-old daughter. The memory share is not necessary with a past person. One day, he lost his wife and his daughter by a space travel accident. He shared a piece of memory with his daughter's last moment. That is physically nonsense, maybe not true. But, he has a hypothesis of this phenomenon: this world is not a physical world. This world is a simulation. There is some kind of bug in the simulation, that is observed as a memory leak. He was in sorrow. Other people think he become crazy. But he try to exploit this bug and try to get back his wife and his daughter. 
There are many SF about a virtual world, including how the magic works and how a time machine works, but in his story, resurrection is cased by a simulation bug. I found it interesting.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

2013-04-30

Thomasbrötchen


Heute Morgen fragte Kerstin mich: ,,Kannst du zur Bäckerei gehen und 10 Franzbrötchen kaufen?''
Ich antwortete: ,,OK. Aber wie heißen die nochmal?''
,,Franzbrötchen, wie ein Name, Franz.''
,,OK.''

Ich ging die Straße entlang und ich sagte den Namen immer wieder um ihn nicht zu vergessen.
,,Franzbrötchen, Franz--bröt--chen, eins, zwei, drei. Franzbrötchen, Franz--bröt--chen, eins, zwei, drei...''
Um die Ecke traf ich einen Freund.
,,Hallo Hitoshi. Schöne Ostern.''
,,Hallo Thomas. Schöne Ostern.''

Ich ging weiter.

,,Thomasbrötchen, Thomas--bröt--chen, eins, zwei, drei. Thomas--brötchen.''
In der Bäckerei sagte ich der Verkäuferin: ,,Hallo, schönen Tag. Ich möchte 10 Thomasbrötchen, bitte.''
,,Wie bitte?''
,,Thomasbrötchen, 10 Stück.''
Sie fragte ihren Mann.
,,Franz, was sind Thomasbrötchen?''
,,Ach ja'', sagte ich, ,,Franzbötchen, 10 Stück, bitte.''
Franzbötchen
(Es gibt eine ähnliche alte japanische überlieferung, Dango-dokkosiho.)

2013-04-26

Math objects on programming (2)

Last article, I showed a simple enumeration generator. But I needed one more flexibility. In my job, I use GPU, that is a fast processing unit, but the available memory size is one order of magnitude small compare to a decent workstation. (E.g, GPU's memory size is 2GB to 6GB, a decent workstation can have 24GB to 256GB main memory.) In this pseudo code example, the unit of memory size is GB. 64GB or 512 GB are too much to the current stare of the art GPUs in 2013.  512GB is too much for the best workstation. Therefore, our product uses a cluster, many workstations are cooperates to do a single job. Almost every customer wants to see how our products scales regarding to the number of workstations. Because they have their needs and they want to know how many workstations and how many GPU are needed for their task. Therefore, we need to demonstrate how the performance changes depends on the number of nodes. Here one more parameter, the number of nodes are added as the following.

 data_size_list = [ 5, 64, 512, ]
 screen_resolution_list = [ 
   '2560x1440', '3840x2160', ]
 node_count_list = [ 
   1, 2, 4, 8, 16, 32, 64, ]

However, most of the customer immediately understand if the memory can not hold the data, the system becomes extremely slow. Therefore, the performance graph only needs more than number of nodes that can hold the input data. When we need to generate such parameter combination, the simple idea is to filter out the unnecessary data points. Here is a for-loop implementation example.

for d in data_size_list:
for d in data_size_list:
  for s in screen_resolution_list:
    for n in node_count_list:
      # FILTER: filter some cases.
      # assume one node can handle 
      # 64G, but not more
      if (n * 64 < d):
        continue
      item_list = [
        d, s, n,
      ]
      comb_list.append(item_list)

Then I thought that I would like to apply the same idea to the direct product. I actually started the for-loop implementation, so this filtering idea seems simple. But I figure out that I need the filter function for each input parameter set. I actually started implementing such code, but I felt it looked way to complicated. Most of the filter function does nothing in this case. Moreover, when I extends the functionality, I found the change is too complex, I can not handle such complexity. I believe it will work, but the implementation is too complex for me. It is hard to explain my ``too complex feeling.'' Though, that is the feeling: ``Why does this simple problem needs such a complex implementation?'' I usually have a feeling when I didn't understand the problem itself well.

I re-thought the problem and tried to formalize, and asked myself: what is this problem? The number of nodes I needed is depends on the data size. Why did I use filter function? This is a generate-and-test algorithm. But do I really need the test function after generating the combination candidates? At that point, I figured out. This is just a map from data size to the number of nodes, I don't need the test function. I can have it as an input. The mathematical object I should use here is a `map' instead of a `filter'. My final implementation was the following.

data_size_list = [ 5, 64, 512, ]
screen_resolution_list = [ 
   '2560x1440', '3840x2160', ]
data_size_node_count_map = {
  5: [1, 2, 4, 8, 16, 32, 64]
  64: [8, 16, 32, 64]
  512: [32, 64]
}
all_list = [
  data_size_list,
  screen_resolution_list,
]
comb_list = []
c_tuple = [''] * len(all_list) 

def make_tuple(idx):
  if(idx >= len(all_list)):
    comb_list.append(
        copy.deepcopy(c_tuple))
  else:
    for i in range(len(all_list[idx])):
      c_tuple[idx] = all_list[idx][i]
      make_tuple(idx + 1)

def gen_combination_list():
    make_tuple(0)
    for i in comb_list:
        gen_with_node(i)

gen_with_node() function generates combination lists by using data_size_node_count_map() function. This is more efficient since there is no generate and test (means discarding is included), and simpler since the input data doesn't need filter functors. I sometimes think that some people's code misuses the mathematical objects. I sometimes criticize the code from people who has high implementation ability, but less considering the mathematical objects.

I used to respect these people because of their high implementation ability. I used to try to write such code. However, often I experienced that I was be able to write a simple program if I thoroughly thought the mathematical meaning of the program. First I surprised my program is often comparable speed or sometimes faster than them. Moreover, my code tend to be simple and less bugs. This was a surprising discovery and then I tend to study more on a mathematics aspect of the program.

This time, I mistook which mathematical object to use in my program. I should review myself first.

By the way, why does it usually better that thinking mathematical object in a program? Are there any reasons? I have two hypothetical reasons. One is we can use all the mathematical research results that has a thousand years of history. Many genius thought through about many problems. They classify the problems and found many patterns to solve them. I don't surprised these genius's ideas are better than my ideas that is usually made up in a five minutes. Actually I never remember that my own idea was better than the combination of these genius's ideas. The other reason is many programming language supports these mathematical object. Many of computer language designers also found the mathematical object is useful to solve problems. Therefore, these objects are often supported. The language supports the mathematical object, it is usually efficient and simple. In this particular problem, I use a direct product and a map. For instance, map is supported in Python as `dict', lisp has `mapcar', Java has a Map in util, C++ STL has `map` template class.

This was a simple problem, but it turned out it's interesting to me.

References

[1] Hisao Tamaki, ``Introduction to probability theory for the information scientist (in Japanese)'', Saiensu, ISBN4-7819-1012-2, 2002

Implementation example code

http://sundayresearch.de/hitoshi/sundayresearch/programming/Python/20130323_math_object/enum_code_20130323.zip

2013-04-25

Python PIL experiment (a image comparison tool) continued


PIL and numpy

When I ran this program on my data files, I found the processing time is around 6 seconds, the memory consumption size is 230MB on a 1024x1024 size image. When I processed images resolution of 3840x2160, it took 263 seconds and 2.3 GB memory is consumed. The difference of these resolutions makes only eight times different number of pixels. But the processing time is increased more than 40 times. In my program I only use three buffers for processing, my first estimated minimal program sizes are 10MB for 1024x1024 resolution and 72MB for 3840x2160 resolution. However, the `top' reported 30 times more memory size.

When I profiled the program, the most of the time is consumed by the tuple construction (RBG value) and abs function. Therefore, I tried to use numpy to vectorize these code. A table below shows the result. My test environment of Intel Core i7-2720 2.20GHz Linux (Kubuntu 12.10, kernel 3.5.0-27), Python 2.7.

+-----------+--------------------------------------+
| image res |   1024x1024      |    3840x2160      |
+-----------+-------+----------+--------+----------+
|           |  mem  | time     |  mem   | time     |
+-----------+-------+----------+--------+----------+
| native    | 230MB | 6.0  sec | 2300MB | 263  sec | 
| numpy     | 110MB | 0.21 sec | 320MB  | 1.18 sec |
+-----------+-------+----------+--------+----------+

The performance was improved 30 times and up to 200 times faster. The memory consumption size reduced to 50% up to 15%. Actually, my first implementation can be improved only twice faster, so I was disappointed. After profiling, I found the sum function spend most of the time. I used the sum function to count the non-zero elements in the array. This sum function is python build-in function and can access to the numpy's array. However, I expect this sum function accesses to the each data and return to the python environment. When I replaced this sum with numpy.sum, the numpy.sum executed almost no time. I achieved 200 times better performance. This is pretty much like to matlab programming. (numpy is a matlab's Python port. I mean it is similar not only the syntax, but it is also similar to how to get the performance.)

ImgCompNumpy.py code

2013-04-24

Python PIL experiment (a image comparison tool)


Abstract:
Writing image comparison tool with Python PIL.

Python PIL module

Python Imaging Library (PIL) is a useful Python module to process image files. This time I have a situation that

  • I have different image file format files
  • But the contents must be the same.

For example, I wrote a image generation tool and I want to test it. I compress the reference images, but my program produces images with non-compressed image file format. I can use convert (ImageMagick) tools, though this time, I just would like to try a new tool. You can find my image comparison tool here.

2013-04-22

Math objects on programming (1)

Abstract:

Using mathematical objects often makes a program simpler. This time I have such experience and write it down here.

Mathematical object and programming

In a program test, we often need to generate a combination of input parameter sets. One of the most easy method to generate a combination is using nested loops.

In this article, I use pseudo code based on the Python language. I will provide the real implementation of the program in the appendix.

For example, we have following two parameter sets:

 data_size_list = [ 5, 64, 512, ]
 screen_resolution_list = [ 
   '2560x1440', '3840x2160', ].

The following program can generate the combination of them:

  for d in data_size_list:
    for s in screen_resolution_list:
      print_comb(d, s) # output

This method is simple and straightforward, however, less flexible in some cases. For example, we don't know which sets are necessary to generate a combination when the program is written. To overcome this problem, we can use an algorithm, direct product, to generate a combination [1].Let's assume there are \(k\)-sets and we want to know all the combination of these \(k\) set's elements. We can define such combination as the following:

  1. \(k = 0\), this means 0 sets. The direct product result is one empty list ([]).
  2. \(k \geq 1\), \begin{eqnarray*} A_1 \times \cdots \times A_k &=&\left\{(a,t)| a \in A_1, t \in A_2 \times \cdots \times A_k \right\}\end{eqnarray*}

The second condition means that if we have an element combination list from \(k-1\) sets, then we can add one more element (\(a\)) from the \(k\)'s set to generate an element combination list from the \(k\) sets.

An implementation example is the following:

data_size_list = [ 5, 64, 512, ]
screen_resolution_list = [ 
   '2560x1440', '3840x2160', ]
all_list = [
  data_size_list,
  screen_resolution_list,
]
comb_list = []
c_tuple = [''] * len(all_list) 

def make_tuple(idx):
  if(idx >= len(all_list)):
    comb_list.append(
        copy.deepcopy(c_tuple))
  else:
    for i in range(len(all_list[idx])):
      c_tuple[idx] = all_list[idx][i]
      make_tuple(idx + 1)

def gen_combination_list():
    make_tuple(0)
    for i in comb_list:
        print i

make_tuple() function is the implementation of the direct product.

The direct product is an mathematical object. I generalized the process of generating combinations. In direct product code, the number of nesting of for-loop is depends on the input data. This means, the number of nested loops is defined by the input data. We often change the test case depends on the situation. The for-loop implementation needs to change the code every time and this direct product implementation doesn't need to change the code. This example is too simple and you might not see the necessity, however, my case needed a flexibility and this paid off.

Next time, I will explain how I mistook the mathematical object in this program.

2013-04-04

My solution of Google drive hang up at "One moment please"

Today I installed Google drive to my Windows 7 environment to share files with my Linux machines. After sign in, the application window said "processing," then it hanged up. There was a button "you must enable javascript". I pushed it, then "One moment please..." after 5 minutes, I exited the program tried it again. It seems some security setting causes this problem.

My solution: set https://accounts.google.com as a trusted site.
Procedure:

  • Open the control panel
  • Go to network and control
  • Go to Internet Options
  • Open Security Tab
  • Click Trusted sites
  • Click the "site" button
  • copy & paste https://accounts.google.com to "Add this website to the zone" and click Add button
Now it worked for me. But if I removed this site, it still works. That puzzled me a bit...

2013-03-29

Hasenschule: A girl who invented a matrix.


I have a fun to teach C. Six months ago, she had a problem of one digit plus and minus. She often cried in my class. But now she can calculate three digits plus and minus.

One day, she was solving a question shown in Figure 1.
Figure 1. The question.
I expected the answer as shown in Figure 2. If she could do it, I would be happy.
Figure 2. Expected answer.
While I looked other students, she worked on the problem. I just walked next to her, and what I saw was, my god, a matrix! (Figure 3)
Figure 3. A matrix is invented.
I asked her, ``Wait a moment! Did you do this alone?'' she answered me ``Yes, I did. It is not correct?''

It doesn't matter. The correctness of the calculation doesn't matter. I was astonished that she organized the answer like this. I could see two vectors in the original question, but both numbers are written in the horizontal direction. She rearranged one of a set of horizontal numbers to the vertical direction and put the operator '+' at the top left. Then she filled the matrix.

This is a reasonable notation. Figure 4 shows how many repetition are there in Figure 2. Each number is repeated three times, the `+', `-', and `=' are repeated nine times.
Figure 4. Unnecessary duplication.
The upper matrix in Figure 3, the operator `+' is shown up only once, at the top left. Because all the operators are `+'. But she has not removed since the bottom figure uses the different operator `-'. She didn't write `=' at all. This is great, if she removed the `+' or `-' operator, then we cannot see which operator is used. But `=' is always use, so removing that makes no misunderstanding. She wrote this in a minimal and sufficient way.

Moreover, there is no repeated number in Figure 3. Again she wrote minimal and sufficient information. People may make a mistake when they copied anything by hand. The mathematical notation has been developed for long time and I think the matrix form is one of the most compact and sophisticated form. If you use some spreadsheet software (e.g., Excel), you know a spreadsheet is a powerful notation. It is much easier than Figure 2 form.

She might see this form. But even so, she knows how to use it and when to use it. It seems it was so natural to her to organize this calculation in this way. I was moved. I usually don't give a gold point, but, this time, I felt the gold point is for this. I told her she did so great. She might not understand this since it is just natural to her. I hope one day she will understand more deeply what she did today.

One thing I am afraid is that someone think this is too advance to her and say that the answer should be like in Figure 2.

Today, I was surprised and moved. I was happy that I saw children do mathematics so freely.

The PDF version is here.

2013-03-09

Hasenschule: Was bedeutet das? Bitte erklären das mir. What does it mean? Please explain me that. (3)

Case A.
A. was studying geometry. That time, A Rechtschreibung (spelling) teacher Ms M watched her. The question was how many cross point (Schnittpunkt) of the three lines (Gerade) in the figure. In the question figure, the cross points are emphasized, but, she could not answer the question. Ms M asked me to help her.

As usual, I asked her (A.), ``Could you please explain me what is a line? (Bitte erklären mir was ist Gerade.)'' She answered me ``A line is a line. (Gerade ist Gerade.)'' Well, that's true, but there is no information.

``How the school taught you. A line has an end? Or a line has no end?'' ``A line has no end.'' I see, so I know she learned the difference between line, half line (ray), and segment at her school. ``OK, then what is Schnittpunkt?'' I actually didn't know what is a Schnittpunkt. She answered, ``I don't know.'' So we asked other teacher, what is a Schnittpunkt. It is a cross point of two lines. This actually solved her problem. I think I am better teacher when I teach in German, since my German is not good, therefore, I first need to know whet is the problem. I asked the meaning of the question to my student. I found many students just don't know what the question means. Therefore, many teacher failed to teach. If a student doesn't know the meaning of the question, teaching answer has no meaning.

Now we back to the problem, the question is clear. Then she classified the all cases of the 0, 1, 2, 3 Schnittpunkte correctly.

I asked her what is her mother tongue. She use Spanish and English at home. I should remember I should always first check the question is clear or not. I wish soon she can figure out what she didn't understand by herself.

Today, I asked my students, ``What does this mean?'' Some students think I don't understand math questions. Actually No, I often don't understand the problem. In Hasenschule, when a student became better, she/he usually became both German and math became better.

A few days ago, coincidentally, only A. took a math course. But another teacher was there. So, I just watch how she learn from the other teacher. She learned a calender math. I saw a word ``Schaltjahr'' that I didn't know the meaning. So, I asked her, what is a ``Schaltjahr''. She explained me it well. (Schaltjahr in English is a leap year.) I asked her why it is called ``Schaltjahr''. She didn't know, the other teacher didn't know either. By the way, why leap year is called leap year is an interesting. Japanese leap year is ``閏年'', the character shows `a king is behind the gate.', since that day the king doesn't do the official work.  My next question is why such strange year exists? Although, the other teacher seems to continue to teach how to calculate the days, so I didn't have a chance to tell that story. Maybe another time.

Hasenschule: Was bedeutet das? Bitte erklären das mir. What does it mean? Please explain me that. (2)

Case S.

S. was solving a multiplication problem. One piece of black bread costs 2.9 Euro. How much is the each of Anzahl (quantity) : 2, 4, 6, and 8? Figure 1 show the problem.
Figure 1. Case S. question.
She answered the first question of Anzahl (quantity) 2 case as 5.8 Euro. (As shown in the figure, some European countries including Germany use the comma as the decimal point. In this text I use period as the decimal point.) However, next question, she calculated 2.9 x 5.8 for the quantity 4 case. I asked her why she did it. (In Figure 2, you can see the trace of that.)

She believe she should do that and she explained something. However, I didn't understand it. So, I said, I don't understand your explanation. It turns out she also doesn't know why she did that. So I wrote Figure 2, then I explained if we have four pieces of bread, 4 x 2.9 would be the answer.
Figure 2. How to calculate the price?
First she fixed my figure to put the shadow on the left side of the bread, so it looks more realistic.

However, she said she don't know what to do for the quantity six case. I was puzzled, whey she couldn't. I asked her to write down what is the question in a normal sentence. She might not know what the question is. She didn't know what the question is. I told her that the question is I want to buy 2.9 Euro four pieces of bread as seen in Figure 2. She understood what this means, but she didn't understand why this is relevant to the problem.

I asked her all the related words. ``What is black bread?'' She knows it. 2.9 Euro? OK. ``What is quantity? (Was ist die Anzahl?)'' It took a while, but she answered ``I don't know. (Ich weiss night.)'' I see!  I told her, ``I think this means `how many', but, actually I also don't know this German word, so let's ask other teacher.'' My guess was correct, she said, ``Ah, you mean how many pieces (Wie viel Stück).''

Then, she solved 6 and 8 cases so easy. I always have fun to find what they don't understand. They usually don't know what they don't understand themselves. I asked her what is her mother tongue. She talks her father with German and her mother with Turkish. However, I don't see so much problem in that case.

When I was a high school student, the students are classified with literature course and science course. Japanese and English were important in the literature course and Mathematics and Science were important in the science course. I didn't understand this classification because to learn Mathematics and Science I needed Japanese and English. Without Japanese and English I can not learn any Mathematics and Science. I can not think anything without language. So, my favorite classes were Japanese and Mathematics. I am not sure there is still this classification in Japan. I am more confident now that the language is so essential. I learn the word Anzahl in math teaching last week with S.

By the way, in Japan the price of four pieces of bread should be calculated as 2.9 x 4 and 4 x 2.9 sometimes is not correct.  (Asahi.com's article) In Japanese, ``I bought bread four times (パンを4つ買いました)'' is natural saying order, so maybe this is reflected. But, in English or in German, four pieces of bread (vier Stück) is also natural. In Japanese, we can also say in the same order (4つのパンを買いました). I think this too much restriction harms later because: 1. later a student learns algebra, then constant factor multiplication of x is ax instead of xa. 2.9 x 4 becomes wrong without any reason explained, 2. ax is the international standard in math. In this global time, teaching the international standard is wrong sounds not so good idea. These two reasons, I think we should not make the 4 x 2.9 wrong.

Hasenschule: Was bedeutet das? Bitte erklären das mir. What does it mean? Please explain me that. (1)

When my students asked me a question, I usually answerd the following:
``Was bedeutet das? Bitte erklëren das mir. (What does it mean? Could you please explain me that?)''
I continue as:
``Mathe ist eine Sprache. Es gibt eine Bedeutung. (Math is a language. There is usually some meaning.)''
When I asked my students to explain the meaning of the question, they sometimes answer me, ``You are a teacher, you explain me.'' Well, that's true. But, I want to know they understand the question. I also want to teach them how to explain something. Therefore, I ask them, ``What does the question mean?'', ``Is it true?'', ``Please explain that why.''

Sometimes some students cried saying, ``You didn't teach me an answer.'' or ``You didn't help me. Help me, please.'' I was thinking, ``The answer is not so important. I want to you to learn how to learn by yourself. This is a practice. I wish soon you don't need my help. I want you to practice to solve a new problem. In the future, you will confront a total new problem that no human kind ever met, and you need to solve it. I want to help you to prepare that time because I can not help you that time.''  However, I am also still learning how to solve a new problem. I cannot really teach it since I still don't know it well. Therefore, I said, ``Please don't cry. You can understand if you think slowly, take time as much you need. If you can not do it today, there is a tomorrow. The answer is not so important. The understanding is the most important.'' By the way, in German the word ``correct'' is ``richtig'' and ``important'' is ``wichtig''. I can not say well that ``Richtig ist nicht so wichtig.'' since for Japanese, ``r'' and ``w'' are difficult to pronounce. This makes them laugh.

Five months ago, one student was always crying and her grade was always 5. She got last month 2 in math. I am so happy for her. (`1' is the highest grade in Germany.)

The following articles, I would like to talk about two stories.

2013-02-25

Web hosting story: Could you google your company's advertisement?


This is a story of web hosting on an Internet provider. It didn't work unfortunately in my case.

I use Alice since you can quite it in a few months time span instead of  a two years contract. I signed up ``comfort'' that has a better service even the hardware is the same with extra 15 Euro/month. I liked it since I never need to wait the service operator. The quality of the operator was good. That was around 2009. After that, O2 bought Alice.

This time, I could not manage the ftp connection. I can connect to the server, but the connection only kept a few seconds. It was unstable. I changed client software, I looked forum, I changed operating system, I changed other connection point... I spent around eight hours in three days.

I called the service, but it turned out, the operator didn't know what was ftp. Another operator showed up, he seems also no clue about what I was asking. He told me this night shift could not handle the request, we will call back in the morning before 10. There was no call back. So I called them again.

The operator seems no idea about ftp again. I asked the operator, ``Koennen Sie googlen ``O2 ftp homepage domain''? Dann Sie sehen Ihre Werbung. Das funktioniert nicht.'' (Could you please search in Google with ``O2 ftp homepage domain''? Then your companies advertisement about FTP shows up. That doesn't work.) Finally, the operator found the one who know about their FTP service. The answer is the system is temporarily down, please be patience and try it later. I asked when it will be up again, she answered maybe half an hour. By the way, this is the page about ftp in O2.

http://hilfe.o2online.de/t5/o2online-Login/o2-Mail-DSL-Festnetz-Homepage-und-Domain/ta-p/273952

The later of the day, I tried, but it is the same.  I felt stupid, I paid 15 Euro/month more to avoid this kind of problem. I gave up and bought the Strato's web hosting service. It was around 50 Euro/year, and it worked immediately as expected. Actually much better since I have ssh connection now. Paying this 50 Euro to Strato is saving money for me.

I found a cheaper service is actually expensive. Once I got a trouble, so much time is wasted, and the trouble happens again and again. Think about the time. 8 hours, I can read a book, watch a movie, do volunteer work. How much you can save is usually just 10 Euro/month. It's not saving money.

For example, if I bought a cheaper monitor. Maybe a good monitor is 200 Euro more expensive than the cheaper one. But, if I easily tired or my eyes loosing sight, 200 Euro is nothing. I bought a 150 Euro keyboard. If I got thecitis, 150 Euro is nothing. Maybe I became old, I started to pay quality.

In old Japanese saying about this is ``There is nothing expensive more than free.'' and ``If you buy cheap one, you just lose your money.'' I find this is quite true.

Does your company do outsourcing like this? In many cases, that's more expensive. The people didn't count the people's time, which is actually expensive. Before your IT was outsourced, the problem solved in a day. Now your IT is in somewhere outside of the country, then your computer set up now takes a week if you are lucky. The company thinks this is saving money, but there is a hidden cost. The one asked to fix the problem can not work a week. That is usually much expensive to have a cheaper lower quality job.

http://3d.xkcd.com/806/

2013-01-03

Authors in a Markov matrix Part 2 (11): Appendix


Appendix A: Unicode and Python 2.7.x


This time I develop python programs. I use python 2.7.3. Handling Unicode was needed to process web pages, not only for Japanese and German web pages, but also for English pages. Because some of the English authors have accent characters.  In the early development stage, I was bothered UnicodeDecodeError and UnicodeEncodeError exceptions. Here I will explain what they are, why they raised, and how to handle them.

How the Unicode encodes characters?


As far as I understand, Unicode uses two maps to encode characters. This depends on how you understand this coding system. I hadn't known this until I worked on this research. My understanding was that there are many kind of Unicode, like UTF-8, UTF-16, UTF-32. But this was my misunderstanding. UTF-8 is how to encode the Unicode data and Unicode is an encoding system how to encode characters. UTF-8 is one of the mapping methods, or transformation formats and UTF-8 is not Unicode (Universal character set). This is cumbersome.

  • Unicode: a map from number to character
  • UTF-X:   a map from Unicode encoded data to a specific data

Unicode itself defines a map from numbers to character descriptions. This in one bijection map. For example, 0x0061 'a'; LATIN SMALL LETTER A is an entry of the map. In this example, the number is 0x0061, this is called ``code point,'' and description of this number is `` 'a'; LATIN SMALL LETTER A.'' Using a map of the description to a font, we can see a letter `a'. The shape of the description is called glyph. Unicode has the bijection map, therefore, we can also say, `a' is map to a code point 0x0061.

This description and code point mapping is Unicode. A character is represented by a number. But, this Unicode's code point map is usually not used. Here ``usually'' means, the Unicode encoded text usually doesn't have this code points. Most of the case, a code point text is converted to UTF-X (UCS (Universal Character Set) Transformation Format X). There are several UTF, for example, UTF-8, UTF-16 with endian information. Most of the case, the Unicode encoded text is converted to one of the UTF, then, these UTF binary is save to your disk. This conversion is common, therefore, I misunderstood that there are many Unicodes, which I thought odd, since Uni means ``one.'' The concept of Unicode I understood was you can use all the characters, no matter which language you are writing, even you can mix any language characters. If there are several kind of Unicode, this coding system doesn't make any sense. This was wrong understanding. Unicode itself is one map. But when you use this coding system, there are several formats. These formats are UTF-X. Why is this so complicated? To keep all the characters in the world needs some space. This means if someone switch to ASCII to Unicode, your file size becomes suddenly four times larger. This is a dilemma: you want to have a big character set, and, you don't want to make your file larger. To solve this dilemma, the second mapping, UTF-X was introduced.

I learn these information from [3].

Python 2.7.x's Unicode representation


Python 2.7.x has two build-in datatype for representing strings: unicode data type and str data type. Both data types can keep the printable strings, however, str is more suitable for ASCII characters, even it can keep any 8-bit binary.  Each data type has a encoding or decoding method to convert the encoding [2]. However, this encoding sometimes causes a problem when you print the unicode data. Figure [8] shows the relationship between str type, unicode type, and encode decode methods.

Figure 8: Relationship between Unicode type and 8-bit str type in Python 2.7.x.

The unicode type of Python 2.7.x has a method encode() and the str type has a method decode(). We can convert each type to the other type via these methods. However, some encoding method can not apply to specific byte sequence, since some byte sequence is not valid for some encoding. For example, `ascii' encoding method doesn't allow when a byte data's 8th bit is on. When we specify the error handling method `strict' for an encoding, the encoding or decoding method may raise an exception. UnicodeEncodeError can be raised by the encoding method, UnicodeDecodeError can be raised by the decoding method. This is a bit cumbersome. Let me show you some examples.

First we define an unicode type strings.
uc = u'Wächter'
print type(uc)
-> <
type 'unicode'>

Let's encode this with `utf-8' encoding to a str type.
s = uc.encode('utf-8', 'ignore')
print type(s)
-> 
<type 'str'>

Python 2.7's print statement accepts str type in default, but not unicode type, therefore, the given unicode data to the print statement will be encoded.
print uc
-> UnicodeEncodeError: 'ascii'
codec can't encode character
u'\xe4' in position 1: ordinal
not in range(128)
uc has a invalid character in ascii code, therefore an exception has been raised. Please note, the error is an encoding error.

However, encoding method has an option to ignore the encoding error.
print uc.encode('utf-8', 'ignore')
-> 'Wächter'
If your terminal accepts the utf-8 encoding, you can see the unicode character.

There is a more complicated case, for example, an encoded str type data is decoded back as the following:
print u'{0}'.format(uc.encode('utf-8', 'ignore'))
-> UnicodeDecodeError: 'ascii'
   codec can't decode byte 0xc3 in
   position 1: ordinal not in range(128)
Here, the format method gets str type data, but, this format method is an unicode type's format method, therefore, this accepts only unicode data. The uc.encode generates str type data, this doesn't fit to the unicode.format method, therefore, decode method is called to generate an unicode data before the format method is called. This decode method cannot handle some unicode character, therefore, the UnicodeDecodeError exception is raised. I was puzzled by this exception why this is not an EncodeError, since it seems only encode method is called. But, actually there is a hidden another decode method call is in this code. To avoid this, we can encode after the format method is called as the following:
print u'\{0\}'.format(uc).encode('utf-8', 'ignore')
-> Wächter
This is a subtle issue, however, we cannot ignore this to write a code that handles unicode.


Appendix B: Contribution to Wikipedia


We found some mistakes in the Wikipedia's authors list as a side effect of this experiment. We have contributed to update the list in Wikipedia.

We needed to generate an adjacency matrix and performed eigenanalysis on the matrix.  We require the independence of the eigenvectors in this analysis. However, it is almost impossible to have such good matrix in our problem setting. Because it is hard to avoid a few problems: a page which doesn't have a link to any other authors, a page which has no reference link from others, mistakes of link duplication of the root page. PageRank algorithm expects this kind of singularity in the adjacency matrix and gave us a solution of this issue.  In our problem settings, we can easily detect the last issue, link duplication of the root page.

I am happy to contribute Wikipedia.


References

[2] Brené Brown, The power of vulnerability,
http://www.ted.com/talks/brene_brown_on_vulnerability.html

[3] Python documentation 2.7.3, Unicode HOWTO, http://docs.python.org/2/howto/unicode.html


Authors in a Markov matrix Part 2 (10) Experimental results: Which author do people find most inspiring?


Conclusion


To find out that which author do people find most inspiring, we used the link structure of Wikipedia. First we extracted the link structure of Wikipedia and create the adjacency matrix, then we apply an eigenanalysis method, which is also called PageRank, to answer the first question. We showed the results of German, English, and Japanese authors.  We also compared the same category (authors), but between the different data source, i.e., different language Wikipedia. We can see the interesting similarity and also difference.  Personally, one of the authors was surprised me that Winston Churchill and Issac Newton have a high ranking score. He didn't know Winston Churchill is the Nobel Prize winner of the literature.


Computational literature


Recently, I use a mathematical approach or an information scientific approach to understand literature and languages. This approach has a huge limitation, but on the other hand, it gives me some measureable values. Brené Brown said in her TED talk [2], ``Maybe stories are just data with a soul.'' Maybe so. And I think the soul can cast a vague shadow on data. I agree that we can not reconstruct the soul now.  However, reading a book is just an act of reading symbol sequence -- reading data sequence --, I still know my soul can be moved by the act.  I want to see a footprint of the soul in the data. This article is one of this kind of trial. I don't know how to call this approach, therefore, tentatively, I call this approach ``Computational literature,'' until I found a better name.

Future work


We summarize the future work:


  • Are there any bias based on Wikipedia writers? (Ditger v A.)
  • How can we avoid the category problem. How can we automatize the data collection.
  • Apply other graph analysis methods. We only apply the eigenanalysis (PageRank) in this article.
  • We saw the adjacency matrix has some property (e.g., not full rank). We can deeply look into the graph structure using some graph theory tool.
  • It is interesting to apply this method to other language authors.
  • This method is not limited to authors. We can apply this method to other area, for example, actors, musicians, politicians, mathematicians, and so on.


This was a relatively large project as a Sunday research, it took almost half a year. But, this was fun.


Acknowledgments


I thank to all the friends who gave me a lot of useful comments at the lunch time. Thanks to Andy K. to check some part of my English in part 1. I thank to Rebecca M., who first asked me the question. This project doesn't exist if she didn't ask me the question.

2013-01-02

Authors in a Markov matrix Part 2 (9) Experimental results: Which author do people find most inspiring?

This time is a follow up discussion of the result.

No link found problem


We have an impression there are some amount of Japanese author links that have no reference page in German Wikipedia. We didn't check the exact numbers, but while we debugged the program, we looked into several pages. A typical no link reference case is, for instance, a page mentioned about 良寛 (Ryōkan) has a link to Ryokan, or Sōseki link to Seseki, and so on. These special characters are often omitted, this causes no link reference found.

Cross reference between Wikipedia


It was relatively easy to make a cross reference list between English and German Wikipedia results since these Wikipedias share how to write the author names, i.e., using the Latin character set. However, Japanese Wikipedias uses Japanese characters for the author's name. For example, Lowis Carroll is ルイス・キャロル in Japanese Wikipedia. In Japanese Wikipedia has the information also in Latin characters, but, the Wiki page keys are all in Japanese. To make a cross reference table, we need to have a Japanese written name to Latin written name map. We could not find a easy way to do that this time, therefore, there is no cross reference between English and Japanese results, or between German and Japanese results. This is also a future work.

Correlating with other data


We have some discussion with our friends about these results. They have some interesting questions. Especially they are interested in correlating with some other data:

  • Correlating Nobel prize winner and PageRank results
  • Are there any correlation between Wikipedia's writer and PageRank result. For example, if a few specific Wikipedia writers are actively writing the articles, are there any bias of these Wikipedia writers bias in the PageRank results?

Johann Wolfgang von Goethe is 10th in Japanese Wiki


The rank of Johann Wolfgang von Goethe is 10th in Japanese Wikipedia. This is unexpectedly low for us. However, the total number of Japanese Wikipedia pages that can construct a valid graph is only 31. The number of pages is too low and a slightly different link structure may change the result. By the way, the first rank of German writers in Japanese Wikipedia is Gerhart Hauptmann.

This was a long article, but, now we are close to the end. Next time, I will present the conclusion of this theme.

Authors in a Markov matrix Part 2 (8) Experimental results: Which author do people find most inspiring?


Wikipedia's Category problem

The category problem here is: we expect a specific category has some expected authors on the list, but the actual Wikipedia's category doesn't have the authors we expected. This causes some data missing. There are three interesting cases we found in the following subsections. We didn't do any additional process for this problem. For example, ``Shakespeare does not exist as an English writer in the Japanese Wikipedia.'' Since we did nothing for this, there is no Shakespeare in the English author rank table in Japanese Wikipedia in our result.

We tried to obtain the data as automatic as possible since this is just our Sunday hobby research project. We didn't spend much time for the fine tuning of these problem. But these are not intuitive (e.g., Shakespeare is not an English author in Japanese Wikipedia.), so how to automatically fill this gap between Wikipedia sense and our intuition is the future work.

No Shakespeare in the Japanese Wikipedia result

The rank of Shakespeare is the best in German Wikipedia and English Wikipedia. However, Japanese Wikipedia doesn't have Shakespeare. Actually, in Japanese Wikipedia has a category called, ``Shakespeare'' and it is the same level of English authors.  The level of English authors category has the following categories in Japanese Wikipedia (as of 2012-11-19) and they are not classified as English authors. Figure 7 shows this page.

Figure 7: The category of English authors page in ja.wikipedia.org as of 2012-11-19.

  • English authors (which has an item: The list of English authors)
  • H. G. Wells
  • William Shakespeare
  • George Bernard Shaw
  • Lord Byron
  • William Blake
  • Oscar Wilde
These authors and the category ``English authors'' are at the same level in the category hierarchy, therefore, Wells, Shakespeare, Shaw, Byron, Blake, Wilde don't exist in the list of English authors. This is a property of Japanese Wikipedia only and other language Wikipedias don't have this problem. The problem was we assumed that the list of English authors have Shakespeare and other those authors. We thought this assumption was reasonable when we started this research.

No Shiki Masaoka in the Japanese Wikipedias result

Shiki Masaoka doesn't exist in the Japanese Wikipedia result. Shiki is under the Japanese 歌人 俳人 (Kajin Haijin) category and not in the Japanese authors category. Therefore, Japanese Kajin Haijin are not listed in this research. We found this when the first result we got by comparing different Wikipedias. This is a good example that the comparison between other Wikipedia is effective.

Not available in other Wikipedia problem

Some Wikipedia categorizes the author depends on what language they wrote instead of which country they lived. For instance, German Wikipedia has ``the list of British authors,'' but English Wikipedia only has ``The list of English writers.'' This list has the authors who wrote their book in English, therefore, it also includes American and Australian, and other English speaking countries' authors. As a result, the comparison between different language Wikipedia is not well defined.

There is another factor that makes the comparison difficult. The size of the list of authors highly depend on each language Wikipedia. For instance, the list of German authors of German Wikipedia has 5975 entries. On the other hand, The list of German authors of Japanese Wikipedia has only 136 entries.

Table 7 and 8 show the PageRank results comparison between German Wikipedia and English Wikipedia. There are n.a. (Not Available in other Wikipedia) entries in the both tables. This entries show the problem.  Table 8 has 16 n.a.s out of 40 entries. This means these authors are listed in the German Wikipedia as British writers, but they are not listed in English Wikipedia as English writes.

We would like to continue some other interesting issues in the next article.