Skip to main content

Authors in a Markov matrix Part 2 (7) Experimental results: Which author do people find most inspiring?


We have seen the eigenanalysis results of the authors in Wikipedia. I found it is interesting just going through the result tables, and thinking why this person is there. For instance, I was surprised Winston Churchill is in the high rank in the English literature. But a few of my friends pointed me out that he is the only the Nobel prize winner of literature and the prime minister of Britain. Following some articles, I would like to discuss about the results.

Discussion

Matrix rank

Table 3 shows that the matrix is not full rank even we removed sink rank pages and out going link only pages. This means there are some groups. These group inside there are connection by links, but between these groups have no links. It is interesting to analyze these groups, but, this will be a future work.

Japanese Wikipedia template bias


Our first PageRank result of Japanese Wikipedia surprised us. Because, Sōseki Natume, Ryūnosuke Akutagawa, Yukio Mishima, Ōgai Mori are all under 100 rank. German Wikipedia result and English Wikipedia result have some similarity, but it seems there is no similarity between Japanese Wikipedia result and other two results. We looked into the result, first we realized the high rank authors are all recent authors, specifically, they are all working after 1930. We first thought the recent authors are more actively edited and updated by the Wikipedia writers. Then, we found all the Akutagawa award winner has high rank. Akutagawa award is a prestigious award, but, we don't understand why Akutagawa himself is too behind these winners. Finally, we found out all the Akutagawa winners has the mutual links as shown in Figure 5. All the award winner got incoming links from all the other winners. This makes these winner's PageRank higher. We consider this is an artificial bias since our assumption is Wikipedia writers makes a link when the writer thinks there is a relationship. But this award links are based on Wikipedia editing template of Japanese authors. We removed these award mutual links, which is shown in Table 12.

Figure 5: Award winner cross link bias problem.

For the readers who are interested in this Akutagawa-award mutual link effect, we show the PageRank result that includes Akutagawa-award mutual link in Table 13 (Note 1). With this bias, all the first to 101st ranks are fulled with Akutagawa-award winner and the first non-Akutagawa award winner finally shows up at 102nd rank who is Mishima Yukio.

After post-processing, only the following eight Akutagawa-award winners are in the top 40: 大江健三郎 (ōe Kenzaburō),松本清張 (Matumoto Seichō),吉行淳之介 (Yoshiyuki Jyunnosuke),開高健 (Kaikō Takeshi), 丸谷才一 (Maruya Saiichi),古井由吉 (Furui Yoshikichi),石原慎太郎  (Ishihara Shintarō),安岡章太郎 (Yasuoka Shōtarō).

Figure 13 shows the adjacency matrices with post-processing (top), without post-processing (middle), and the difference of both (bottom). The middle figure shows some kind of a regular pattern. This regular pattern is the award mutual link. The difference shows the regularity clear, though, the difference is not completely regular since there are several mutual-linking awards biases (e.g., Mainichi Genjyutu award).

Figure 6: Adjacency matrices. Japanese authors in ja.wikipedia.org. Top: Removed Navbox bias, Middle: No postprocessing, Bottom: difference (middle - top)

Table 13: Japanese author rank result with Navbox. We think this Navbox causes a bias.

(Note 1): in Table 13, 赤瀬川原平 (Akasegawa Genpei) won the prize as his pen-name 尾辻克彦 (Otuji Katuhiko).

Next article I would like to discuss about a Category problem.

Comments

Popular posts from this blog

Why A^{T}A is invertible? (2) Linear Algebra

Why A^{T}A has the inverse Let me explain why A^{T}A has the inverse, if the columns of A are independent. First, if a matrix is n by n, and all the columns are independent, then this is a square full rank matrix. Therefore, there is the inverse. So, the problem is when A is a m by n, rectangle matrix.  Strang's explanation is based on null space. Null space and column space are the fundamental of the linear algebra. This explanation is simple and clear. However, when I was a University student, I did not recall the explanation of the null space in my linear algebra class. Maybe I was careless. I regret that... Explanation based on null space This explanation is based on Strang's book. Column space and null space are the main characters. Let's start with this explanation. Assume  x  where x is in the null space of A .  The matrices ( A^{T} A ) and A share the null space as the following: This means, if x is in the null space of A , x is also in the null spa

Gauss's quote for positive, negative, and imaginary number

Recently I watched the following great videos about imaginary numbers by Welch Labs. https://youtu.be/T647CGsuOVU?list=PLiaHhY2iBX9g6KIvZ_703G3KJXapKkNaF I like this article about naming of math by Kalid Azad. https://betterexplained.com/articles/learning-tip-idea-name/ Both articles mentioned about Gauss, who suggested to use other names of positive, negative, and imaginary numbers. Gauss wrote these names are wrong and that is one of the reason people didn't get why negative times negative is positive, or, pure positive imaginary times pure positive imaginary is negative real number. I made a few videos about explaining why -1 * -1 = +1, too. Explanation: why -1 * -1 = +1 by pattern https://youtu.be/uD7JRdAzKP8 Explanation: why -1 * -1 = +1 by climbing a mountain https://youtu.be/uD7JRdAzKP8 But actually Gauss's insight is much powerful. The original is in the Gauß, Werke, Bd. 2, S. 178 . Hätte man +1, -1, √-1) nicht positiv, negative, imaginäre (oder gar um

Why parallelogram area is |ad-bc|?

Here is my question. The area of parallelogram is the difference of these two rectangles (red rectangle - blue rectangle). This is not intuitive for me. If you also think it is not so intuitive, you might interested in my slides. I try to explain this for hight school students. Slides:  A bit intuitive (for me) explanation of area of parallelogram  (to my site, external link) .