Japanese csv database

By Michael Gakuran | | Design & Web | 8 Comments |

Things have been pretty busy on Knowing Nothing recently. Finally, after many trials and tribulations, I succeeded in adding a sidebar widget which allows you to display quotes. It’s the ‘Quotes Collection’ plugin from Srini’s WordPress Laboratory. You can see it in action, modified slightly by me, displaying a random word from a rather swollen database of over 7800 that are listed as required for the Japanese Language Proficiency Test Level 1. Since it took me so long and was painfully tedious and frustrating, I want to document it for other people, should they encounter a similar problem. So today’s entry is very technical and uninteresting unless you are trying to deal with csv files or have an interest in phpmyadmin!

It was no small task to get it functioning, I can tell you! The problem wasn’t with the plugin, but what happens when the plugin takes the words from the database in phpmyadmin, an administration console for managing databases like the ones needed to power forums or blogs like mine. Let me clarify: There are two ways to add words (or ‘quotes’). You can either insert them manually, one by one, through the admin interface in WordPress, or you import a database – a csv (comma separated values) file – containing the items you want to display.

Of course, I wasn’t going to enter 7830 words in manually, so I decided to create a csv file and upload it through phpmyadmin. I used an excel file containing a list of the words for JLPT 1 that some kind soul created and uploaded somewhere (I forget where I downloaded it as it was some time ago).

You can download the JLPT 1 list of of words here.

I copied and pasted the list into notepad and set the ‘find and replace’ function to find all spaces and other characters and replace them with the relevant characters so that the csv file would function correctly. To cut a long, boring story short, I had to get the structure to something like this:

“挨拶”,”マイク”,”Ah!;Oh!;Alas!”,”yes”
“爆弾”,”マイケル”,”ohho”,”yes”
.
.
and so on

The first entry in Japanese, separated by a comma represents the ‘quote’ column. The second entry in Japanese, the ‘author’. The third is the ‘source’ and the fourth is the ‘visible’ column. This is how a csv file works – I tell phpmyadmin how to separate the list of terms. In the case above I used commas to separate each field and quotation marks to enclose each item. Each new entry is on a separate line.

When I uploaded this to phpmyadmin, it worked. But when the WordPress plugin tried to take the database entries and display them on my website, they appeared scrambled. For some reason, even though the Japanese text displayed correctly in phpmyadmin, it looked like a mess of characters on my front page. But I noticed that, when I input an entry through the admin interface in wordpress, it displayed correctly on my website, but became jumbled up in phpmyadmin. Take a look at the screen shots below:

phpmyad2

The screen shot above taken in phpmyadmin shows the Quotes Collection database. Entry 21 shows the entry I uploaded via a csv file – it displays correctly in phpmyadmin. Entry 20 is an entry I added via the admin interface for the Quotes Collection plugin via WordPress. Notice entry 20 appears scrambled in this phpmyadmin screen shot but, in the screen shot below and on my main blog page, it appeared correctly showing the Japanese text.

phpmyad3

The screenshot above is from inside wordpress in the Quotes Collection admin panel. Notice entry 21 that I added via the csv file in phpmyadmin is now scrambled. But entry 20 that I added normally though the wordpress admin interface displays correctly. Also notice that this is only affecting Japanese characters – none of the romanised parts are affected.

So what could I do? I had to upload all 7000 some quotes and in order to do so, it seemed that I needed to put the scrambled characters into the database so, conversely and rather strangely, they would render correctly on my blog. And that’s just what I did. I copied and pasted all the entries in massive chunks into the Quotes Collection admin panel and ‘added’ them as single quotes. This resulted in huge ‘quotes’ of messy, scrambled text. Then, I copied the newly scrambled text back into excel, separating the columns again and adding the relevant ‘spacing’ characters. The reason I needed to manually add the spacing characters in excel is that excel cannot export csv files containing Japanese characters.

My scrambled excel file is here to download, for reference.

And so:

/嗚呼/|/ああ/|/Ah!; Oh!; Alas!/|/yes/
/相/|/あい/|/together; mutually; fellow/|/yes/
/愛/|/あい/|/love/|/yes/

You see the scrambled characters instead of Japanese text. The / enclose each item and the | is the equivalent of a comma. I did this because some of the words had commas in them and so, upon importing the database, some entries would become corrupted. Using / and | as unique characters ensured this did not happen.

So after doing that for all the quotes (it took me about 12 huge ‘chunks’ to convert all the entries to jibberish), and adding the spacing characters in excel, I copied the entire lot back into notepad. This time, I had jibberish for Japanese characters, but all correctly formatted in the csv style. All that was left to do was to use find and replace to remove all the spaces between the ‘columns’ and I ended up with the following csv file for the JLPT 1 wordlist:

Download JLPT1 csv wordlist

*Phew!* What a lot of tedious work! But with this new file, once uploaded, it worked correctly. As you can see in the screen shot below, the entries in my database in phpmyadmin were scrambled, but displayed correctly in the admin console in wordpress and, most importantly, on my blog.

phpmyadmin1

Unfortunately, the database is so big that it has seemingly broken the admin interface in wordpress, so I cannot get a screen shot showing the entries displaying correctly. However, the widget showing the Japanese words on the sidebar should be testament enough!

Good luck to anyone who wants to attempt the same thing and feel free to use the files I created to save yourself the hassle of creating a csv file!

8 comments on “Japanese csv database
  1. Gakuranman says:

    Haha. If only I had the time!

  2. Rukishou says:

    You know, since you're not getting paid and will probably be hated for it for no reason, you should actually make this a plugin for WordPress. Do it now! D:<

  3. Mike says:

    Hi Mosey. Thanks for your comments! It was indeed tedious, and all for the belief that I can actually study when writing rubbish on my journal :P. I think it was an encoding problem in the transfer between phpmyadmin and the wordpress admin interface. I don’t really know enough to explain why, but I managed to solve it this time. I hope nothing goes wrong in the future however!

    I hope it comes to good use and I’ll be happy if it helps you and others!

  4. Mike says:

    Hi Mosey. Thanks for your comments! It was indeed tedious, and all for the belief that I can actually study when writing rubbish on my journal :P. I think it was an encoding problem in the transfer between phpmyadmin and the wordpress admin interface. I don’t really know enough to explain why, but I managed to solve it this time. I hope nothing goes wrong in the future however!

    I hope it comes to good use and I’ll be happy if it helps you and others!

  5. Mosey says:

    Followed the trail to your website from the Support Forums on WordPress ;) Was it an encoding problem after all? I have to admit the finished product does look very sleek! The process does sound extremely tedious though~

  6. Mosey says:

    Followed the trail to your website from the Support Forums on WordPress ;) Was it an encoding problem after all? I have to admit the finished product does look very sleek! The process does sound extremely tedious though~

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Get the Gakuranman Newsletter!

Greetings, fellow Adventurer!

For a limited time, subscribe free and get:

Just enter your name and email below:

Subscribe!