Convert WordPress Into Markdown Files With YAML Front Matter (PHP Script)


Now we’re ready to begin our journey on extracting our WordPress content into a format needed for our static site generator, the big question becomes:

What format do we want to be able to extract our WordPress data into?

We have a plethora of choices available, but I like having YAML front-matter at the top of my posts and pages (to control HOW the content will be embedded), which is what many of the popular static site generators prefer too. However, I want the flexibility of being able to write my own custom header information which is what a YAML-like header on my posts and pages will provide.

For the content I’d like it to be Markdown based, even though when we get the data it will be a mix of HTML and Markdown (unless you’ve specifically written in <p> tags for every paragraph break in your content). I’d prefer to continue writing in Markdown and then be able to process my content into HTML for uploading to my static site.

Therefore, the way I have constructed my output is a little like this:

---
title: This is the title of the page
date: 2012-10-02 14:30
author: Ryan
tags:
 - hello
 - world
 - blog
excerpt: This is a wonderful post about title tags on pages
template: post
redirects:
- /90/this-is-the-title-of-the-page/index.html
---

What has helped to drive the design of the YAML header in my output has been the help of a node package YAML Front Matter to JSON which helps to transform my entire page into JSON.

The remainder of the page would then contain the content, like:

---
YAML header (as above)
---
This is the content of the blog post. 

When we extract it from the WordPress database we'll be noticing that there will be 
&lt;a href="/"&gt;anchor tags&lt;/a&gt; and image tags and div tags (etc) in our document, 
but there will be no paragraph tags. If you don't like Markdown, you could change my PHP 
script below so that it changes line breaks in &lt;br /&gt; or &lt;p&gt;&lt;/p&gt; tags.

So now we have an idea of what we’d like to output from all the content we’ve written in our WordPress install, let’s run a PHP script on our server’s terminal (yes this would therefore assume you have that type of access, you could still possibly run it if you only have FTP access, just be sure to edit the elements within the code and then run the file).

Anyway, here’s the script with detailed comments on how you can extract WordPress content and convert it to YAML front-matter with Markdown content:

Ryan

Author of scripteverything.com, Ryan has been dabbling in code since the late '90s when he cut his teeth by exploring VBA in Excel when trying to do something more. Having his eyes opened with the potential of automating repetitive tasks, he expanded to Python and then moved over to scripting languages such as HTML, CSS, Javascript and PHP. When he is not behind a screen, Ryan enjoys a good bush walk with the family during the cooler months, and going with them to the beach during the warmer months.

Recent Posts