Punycode
Background
Messing with RSS feed recently (as NextJS do not have any official RSS parser plugin) , I realised Chinese URL links generated were not validated. UTF-8 encoded Chinese characters were not considered as valid.
What I've Learned
Punycode is a special encoding used to convert Unicode characters to ASCII, which is a smaller, restricted character set. Punycode is used to encode internationalized domain names (IDN).
How to recreate the problem?
Punycode is available to encode strings of Unicode to ASCII symbols, however, the returned URL is not working as expected. For example, a Chinese blog link like https://hk.news.yahoo.com/65個指明地方納入強制檢測公告-004959684.html
was converted into https://hk.news.yahoo.com/xn--65-q43c84c9a92hp8grphq2u14i5uhd3aq94bo6lkm8c-004959684.html
instead of https://hk.news.yahoo.com/65%E5%80%8B%E6%8C%87%E6%98%8E%E5%9C%B0%E6%96%B9%E7%B4%8D%E5%85%A5%E5%BC%B7%E5%88%B6%E6%AA%A2%E6%B8%AC%E5%85%AC%E5%91%8A-004959684.html
Action Items
Following code does not work as expected. Only the variable was converted instead of the whole URI string.
fs.readdirSync(blogPostDir)
.map((fileName) => {
// we need the full path of the file to be able to read it
const fullPath = path.join(blogPostDir, fileName);
// read the file so we can grab the front matter
const file = fs.readFileSync(fullPath, "utf8");
// for the RSS feed we don't need the html, we
// just want the attributes
const { data: frontmatter, content } = matter(file);
const excerpt = getExcerpt(content, 800);
const url = CONFIG.URL + `/posts/` + fileName.replace(".md", "") + `/`;
// console.log(excerpt);
// I want access to the fileName later on so we save it to our object
return { ...frontmatter, fileName, excerpt, url };
})
// sort the items by date in descending order, feel free
// to customize this as needed to sort your RSS items properly
.filter((post) => post.draft === false)
.sort((a, b) => +new Date(b.date) - +new Date(a.date))
// loop over each blog post and add it to our RSS feed
.forEach(({ title, date, tags, category, fileName, excerpt, url }) => {
// title, description, and date are properties of my front matter
// attributes. Yours might be different depending on your data structure
feed.item({
title,
description: excerpt,
url: punycode.encode(url),
author: CONFIG.AUTHOR_NAME,
categories: tags,
date,
});
});