Skip to main content

Posts

Showing posts from February, 2012

gcc 4.5.x or higher undefined reference link problem when shared library refers another shared library.

This is too much details of gcc, but, some developers might be interested in. Recently I switched to gcc/g++ 4.6.x, then I experienced a linking problem of my C++ programs. It suddenly missing symbols. even some missing system symbols are reported  (dlopen, ostream operators...). For example, libutil.so: undefined reference to `dlopen' libutil.so: undefined reference to `dlclose' libutil.so: undefined reference to `dlerror' libutil.so: undefined reference to `dlsym' I tried several things, checking libdl.so, manually add linker options.., but nothing helped. Finally I found http://wiki.debian.org/ToolChain/DSOLinking page. gcc 4.5.x (or higher) passes --as-needed linker option by default, this gives you some missing symbol when your program linked shared library that implicitly linked shared library. For a package creation, this new default setting removes dependency, therefore, this default makes sense. However, this is a difficult problem. Solving this pro

Integer division. Is int(-1/2) 0 or -1?

My friend, Christian told me a story about what is int(-1/2), 0 or -1? Christian found a problem when he explained a binary search program. When he computed the lower bound, but, the entry is not in an array.   mid = (left+right)/2 When left = -1, right = 0. The mid is 0 in C, C++, and Java. The mid is -1 in python and ruby. I was also surprised this simple looking code depends on languages. This is based on rounding method. C, C++, Java, and elisp does round to zero (truncate), therefore, this is 0. python, ruby does round down (floor), therefore, this is -1. When we think about modulo, python and ruby holds    x = x/y + x%y condition, but not C++, Java. The modulo operation also depends on the languages when minus value is considered. One more note, when I looked up binary search on the web, this mid code has overflow problem, so, it should be   mid = low + (high - low)/2; References http://en.wikipedia.org/wiki/Modulo_operation http://en.wikipedia.org/wiki/Roun

Future work: Computational literacy

Future work: Computational literature I don't know how to call these kind of approach to natural languages, I might say, computational literacy, or something like that. As far as I know, there are some research of similar approach. For instance, some kind of spam filter using entropy based approach. Some people use statistical approach to finding an author of a document. In a Science Fiction novel, Asimov wrote a scene that a politician talked a lot, but people find out there is no information at all in the talk by an information analysis (The Foundation Series). We can extend the presented method more systematically way. For example, we can analyze famous widely available books, e.g., the Bible, some Shakespeare's, IKEA's catalogs, and so on. Also the translation of the Bible altered in the history, I would like to see the history of the information in it. If you know anything about research in this approach, please put it in the comment. Appendix 1: person + tree =

Can we measure the complexly of natural language by an entropy based compression method? (6)

Conclusion When we wrote an article in different languages, the length of the document differs even the contents are the same. But, if we compress these files by a entropy based compression algorithm, they become almost the same size. Even we wrote it in German which has complex grammatical structure, or in Japanese with completely different character system. From this observation, I have a hypothesis: ``The complexity of natural languages are more or less the same.'' Of course this article tested only one document and only three different languages. Therefore, this cannot be any proof of this hypothesis. But still, I am interested in the result. We need more experiences, but, now I got some ideas of the applications if this hypothesis stands.  Comparison of news articles: Assume the news source is in English and it is translated into Japanese. If we compress these two articles and the compressed size differs more than 50\%, I suspect the quality of the translation. Som

Can we measure the complexly of natural language by an entropy based compression method? (5)

Entropy of a document When I talked with Joerg, I recall my bachelor student time. At that time, I could not write a paper in English directly. Therefore, I first wrote a manuscript in Japanese, then, I translated it in English. The size of each TeX file differed, hoverer, when I compressed these files, I realized the compressed file sizes are similar. I found it interesting, but, I did not think further on that. It was around 1996, so I think I used ``compress'' program. At the Gruenkohl Party, I recall this story again. I also realized I have translated a few articles to three different languages. For example, Haruki Murakami's Catalunya Prize speech at 2011-6-11. Figure 1 shows the compressed result of the same contents, but the different language and different encoding scheme. Figure 1. The compression size result of three languages, but the same  content's documents. Even the original document size depends on  encoding methods, but the compressed sizes be

Can we measure the complexly of natural language by an entropy based compression method? (4)

Size of books depends on languages My friend (and my teacher) Joerg once asked me that how large the Japanese translated books compared with other languages. Japanese uses Kanji (Chinese characters). Because these characters can encode several Latin characters to one Kanji, he inferred Japanese translated books are smaller or thinner than the original ones. For example, the character ``mountain'' is one character ``å±±'' in Japanese. But, Kanji usually needs higher resolution compared with Latin characters. I answered the books seems thinner than original ones. I have several Shakespeare's books, and I assume these translations are as accurate as possible. Some friends visited my place impressed how small Japanese books are. But, there are some other factors, for instance, a Japanese book might be made of thinner paper, the size of characters might be relatively smaller, and so on. This is an interesting point, but, we must consider many parameters.

Can we measure the complexly of natural language by an entropy based compression method? (3)

Complexity of language I recalled two ideas when we were talking about the difference of languages: Complexity of language,  Size of books depends on languages. My friend (and my teacher), Alexander has a hypothesis: the complexity of all the natural languages is more or less the same. His complexity of a language means the total complexity of a language. It includes number of vocabulary, grammatical structure, representation of writing (complexity of characters), pronunciation, anything. He told us that any language has some difficult aspects, but at the same time, there are some simple aspects also. If we can average all the aspects of each language, and compare them, complexity of natural languages might be almost the same. I have the same impression about language complexity with Alexander. Each language I have learned has some difficulty and also has some simple part. I also think the complexity of natural language is depends on human brain ability. Because any childre

Can we measure the complexly of natural language by an entropy based compression method? (2)

Gruenkohl Party At January 20th, 2012, We had a Gruekohl party at Daniel's place. The gathered people were from Holland, Germany, US, Canada, and Japan. At a such international party, we often talk about own languages and compare their properties. For example, one told us how the Chinese pronunciation system is complex and almost impossible to learn that according to his Chinese course experience. German's Noun gender and article system is also a popular topic. A friend pointed me out, Japanese has special counting system.  When we count objects, how to count depends on what you count.  For example, how to count person and how to count paper are different, yet I explain we always uses units like English saying, two piece of papers and three pairs of jeans. Japanese has this counting system all the time. I usually heard many languages are so difficult to learn. However, I suspect it may be not complex as it sounds. People tend to pick the most difficult aspect of a lang

Can we measure the complexly of natural language by an entropy based compression method?(1)

Many of my friends came from other countries.  We often talk about our own mother tongues. The discussion goes to which language is difficult or what kind of unique property each language has. German has a complex grammar system, Japanese has complex characters and unique counting system, and English has a huge vocabulary. I wonder ``What is the complexity of natural languages?'' and ``Can we measure them?'' Together with my friends I translated one Japanese text to English and German. Then we apply an entropy based compression method on them to see how much information each translated text has. This might tell which language is complex in a sense of entropy. Namely, I try to measure that ``If the contents are the same, how much information entropy differs depends on a language?'' I will write a few articles regarding with this topic.