2017-10-14

A fake truth by web: google translate glitch

I make some basic mathematics learning materials. I publish them on YouTube. This time I got a strange comment.

``Why the title of this video is ``basic division''? It is about subtraction.''

I made this video in Japanese, my video title was in Japanese (基本のひき算), but the comment was in English.

I was puzzled, ``ひき算'' means ``subtraction'' instead of ``division''. I found out that the person who wrote the comment used Google translate. When I input ``ひき算'' to the Google translate, it indeed translates it as ``Division''. Japanese has Kanji and Hiragana (and Katakana) for writing and Hiragana is a phonetic notation. My basic subtraction video is for first or second class students, thus I use Hiragana (phonetic notation). If I put the Kanji notation (引き算) to the Google translate, it is correctly translated to the subtraction. If I use all phonetic notation (ひきざん), then Google translate outputs ``Hikiman'', which I don't know the meaning.

 Figure 1: Japanese ひき算 (subtraction) is mistranslated to  ``Division'' by the Google translate at 2017-10-14.

But I had a problem, since Google translate was trusted. I am just a random native Japanese speaker. If I said, Google translate is wrong, who will believe me?

I showed ``ひき算 (subtranction)'' is a phonetic notation of Kanji ``引き算'', this is shown in the following link (online Japanese dictionary).

 Figure 2: Japanese-Japanese online dictionary shows a phonetic notation ひき算 is equal too the Kanji notation 引き算.

I also show the following Japanese-English online dictionary result, which shows ``ひき算'' means subtraction.

It is hard to explain that the Google translate is wrong sometime.

This is maybe related with ``Reality-based community`` or Stephen Colbert's Truthiness. We don't know where the truth comes from. We just need always be careful. Fortunately this helps some extent. Also we need to learn every day.

This is nowadays not a new. But I would like to record this ``Fake truth by web'' experience. I also wonder, is it good that the Google translate one day learns ``ひき算'' means ``subtraction'', or this is a good warning about what you can trust is not so solid, so I would like to keep this mistake.

2017-07-10

How to use boost sha1 with python hashlib

I need to have a sha1 digest from both C++ code and python code. Here is a code snip to match both results. This code avoids a potential problem that the digest has some 0s on top of the digest array element. This doesn't matter if you stick to one implementation, but just in case, you need to match two worlds: C++ and python, this code might be useful.

/// get sha1 digest as a std::string
///
/// \param[in] mes message to be hashed
/// \return    digest string
std::string get_sha1_digest(const std::string& mes)
{
boost::uuids::detail::sha1 sha1;
sha1.process_bytes(mes.c_str(), mes.size());

const int DIGEST_SIZE = 5;
unsigned int sha1_hash[DIGEST_SIZE];
sha1.get_digest(sha1_hash);

std::stringstream sstr;
for (std::size_t i=0; i < DIGEST_SIZE; ++i)
{
sstr << std::setfill('0') << std::setw(8) << std::hex << sha1_hash[i];
}

return sstr.str();
}

This function's output matches with the following python code.

import hashlib

def get_sha1_digest(mes):

sha1_obj = hashlib.sha1(mes.encode())
return sha1_obj.hexdigest()