Background
Messing with RSS feed recently (as NextJS do not have any official RSS parser plugin) , I realised Chinese URL links generated were not validated. UTF-8 encoded Chinese characters were not considered as valid.
What I've Learned
Punycode is a special encoding used to convert Unicode characters to ASCII, which is a smaller, restricted character set. Punycode is used to encode internationalized domain names (IDN).
https://www.punycoder.com/
How to recreate the problem?
Punycode is available to encode strings of Unicode to ASCII symbols, however, the returned URL is not working as expected. For example, a Chinese blog link like https://hk.news.yahoo.com/65個指明地方納入強制檢測公告-004959684.html
was converted into https://hk.news.yahoo.com/xn--65-q43c84c9a92hp8grphq2u14i5uhd3aq94bo6lkm8c-004959684.html
instead of https://hk.news.yahoo.com/65%E5%80%8B%E6%8C%87%E6%98%8E%E5%9C%B0%E6%96%B9%E7%B4%8D%E5%85%A5%E5%BC%B7%E5%88%B6%E6%AA%A2%E6%B8%AC%E5%85%AC%E5%91%8A-004959684.html
Action Items
Following code does not work as expected. Only the variable was converted instead of the whole URI string.
fs.readdirSync(blogPostDir)
.map((fileName) => {
// we need the full path of the file to be able to read it
const fullPath = path.join(blogPostDir, fileName);
// read the file so we can grab the front matter
const file = fs.readFileSync(fullPath, "utf8");
// for the RSS feed we don't need the html, we
// just want the attributes
const { data: frontmatter, content } = matter(file);
const excerpt = getExcerpt(content, 800);
const url = CONFIG.URL + `/posts/` + fileName.replace(".md", "") + `/`;
// console.log(excerpt);
// I want access to the fileName later on so we save it to our object
return { ...frontmatter, fileName, excerpt, url };
})
// sort the items by date in descending order, feel free
// to customize this as needed to sort your RSS items properly
.filter((post) => post.draft === false)
.sort((a, b) => +new Date(b.date) - +new Date(a.date))
// loop over each blog post and add it to our RSS feed
.forEach(({ title, date, tags, category, fileName, excerpt, url }) => {
// title, description, and date are properties of my front matter
// attributes. Yours might be different depending on your data structure
feed.item({
title,
description: excerpt,
url: punycode.encode(url),
author: CONFIG.AUTHOR_NAME,
categories: tags,
date,
});
});