Wordpress malformed utf8 urls - how to fix them

I struggled for hours with not fully url_encoded urls in my RSS feed causing it to be unreadable for... ALL apps out there including the feedburner extended compatibility tool which in my experience works when it likes to.  I had no time and needed sleep, so I decided instead of fighting google feedburner or some unknown to me plugin in WordPress causing that trouble I can fix this by hacking the feeds generating template in WordPress.

Here is the original output I got:

http://zubolekarbg.com/лечение-и-превенция/%d0%bf%d1%80%d0%b8-%d0%b1%d1%80%d0%b5%d0%bc%d0%b5%d0%bd%d0%bd%d0%be%d1%81%d1%82/#comments

As you can see it has nice utf8 characters untouched + urlencoded ones. I need to make it in one encoding first. For that purpose we urldecode everything, and CAREFULLY  rawurlencode piece by piece.

Original Wordpress (WP 3.5.x) <link> generating function contained in wp-includes/feed.php

/**
 * Display the permalink to the post for use in feeds.
 *
 * @package WordPress
 * @subpackage Feed
 * @since 2.3.0
 * @uses apply_filters() Call 'the_permalink_rss' on the post permalink
 */
function the_permalink_rss() {
    echo esc_url( apply_filters('the_permalink_rss',  get_permalink() ));
}

I found out pretty much all the functions generating links end with echo esc_url( );. So i decided to create some universal code to improve them all.

Here is the new version of the_permalink_rss()

/**
 * Display the permalink to the post for use in feeds.
 *
 * @package WordPress
 * @subpackage Feed
 * @since 2.3.0
 * @uses apply_filters() Call 'the_permalink_rss' on the post permalink
 */
function the_permalink_rss() {
    $echo = get_permalink();
    $pieces = explode('/',$echo);
    for($i=0; $i<count($pieces); $i++)
    {
        if($i > 2){
        $pieces[$i] = rawurldecode($pieces[$i]);
        $pieces[$i] = rawurlencode($pieces[$i]);        
        }
    }
    $echo = implode('/',$pieces);

    echo esc_url( apply_filters('the_permalink_rss', $echo ));
}

You can see that I retain 100% the original functionality just making sure that everything is DEcoded and ENcoded afterwards.

To explain it simple:

URL before (output by some WordPress Portfolio Plugin) (NOT VALID):
http://zubolekarbg.com/лечение-и-превенция/%d0%bf%d1%80%d0%b8-%d0%b1%d1%80%d0%b5%d0%bc%d0%b5%d0%bd%d0%bd%d0%be%d1%81%d1%82/#comments

URL after I apply DEcoding (NOT VALID) :
http://zubolekarbg.com/лечение-и-превенция/при-бременност/#comments

URL to output to RSS 2.0 XML feed (VALID):
http://zubolekarbg.com/%D0%BB%D0%B5%D1%87%D0%B5%D0%BD%D0%B8%D0%B5-%D0%B8-%D0%BF%D1%80%D0%B5%D0%B2%D0%B5%D0%BD%D1%86%D0%B8%D1%8F/%d0%bf%d1%80%d0%b8-%d0%b1%d1%80%d0%b5%d0%bc%d0%b5%d0%bd%d0%bd%d0%be%d1%81%d1%82/#comments

 


Go Go Web