VOOZH about

URL: https://qiita.com/R_Linux/items/5558ea17a434e2577bcf

⇱ UTF-16実体参照をUTF-8に変換 #R - Qiita


👁 Image
3

Go to list of users who liked

2

Share on X(Twitter)

Share on Facebook

Add to Hatena Bookmark

More than 5 years have passed since last update.

@R_Linux

UTF-16実体参照をUTF-8に変換

3
Last updated at Posted at 2015-11-03

国立国会図書館のSPARQLデータベースにアクセスしたら、日本語はUTF-16で実体参照されていた。

👁 Image

しょうがないので変換する。

library(stringi);library(gsubfn)>stri_unescape_unicode(gsubfn("&#x|;",list("&#x"="\\u",";"=""),"夏目"))[1]"夏目"

あとで以下の方法を鍵垢氏より教えてもらった。余計なパッケージを利用する必要がないので、こちらが吉。

>stri_unescape_unicode(stri_replace_all_fixed("夏&#x76EE",c("&#x",";"),c("\\u",""),vectorize_all=FALSE))[1]"夏目"

ちなみにjsonを指定すると、普通にUTF-8を返してきた。ばかやろう。

3

Go to list of users who liked

2
0

Go to list of comments

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3

Go to list of users who liked

2