Looking to accelerate your technology? Learn more about Zocalo, a solution to help orchestrate your composable commerce capabilities. Learn More
Insights / ARCHITECTURE
Migrating from Wordpress to Contentful is Far Easier than it May Seem
TwitterInstagramLinkedInFacebook
INSIGHT
Migrating to Contentful is Far Easier than it May Seem
Stephen Osentoski
by Stephen Ostentoski

Here at Avatria, we are constantly looking to evolve the technologies we support. As we’ve seen an increasing demand for headless solutions across our implementations, we’ve had to become experts in a number of different platforms that fit within a composable architecture. Enter Contentful, a headless CMS platform that allows for a ton of flexibility and freedom in building a scalable content model for your implementation needs. We’ve implemented Contentful as the CMS platform on a number of projects, and are excited to continue this partnership as headless requirements become more and more common.

We recently embarked on a total redesign of our internal site, avatria.com. Given our growing expertise in Contentful and the necessity of building out a number of site components, we decided to use the opportunity to migrate all of our website content from Wordpress into Contentful. Though there were a few minor hiccups along the way, the overall process was extremely simple. If you’re interested in migrating your website from Wordpress to Contentful, this article will line out the steps you need to take to ensure your migration process is as smooth as possible.

Wordpress Content Extraction

Wordpress's export functionality offers a quick way to export any content type from your Wordpress site. The steps below will walk through how to extract this data from your Wordpress implementation.

  1. Login to wp-admin

  2. Navigate to Tools -> Export

  3. Select either “Export all” or choosing what to export from a list of radio button of types of content, then “Download Export File”

The result of these steps will be an XML file that will hold a series of objects from your Wordpress implementation. Below is a snippet from the XML file, where each item represents a blog post (or other content type, if you export more than just blog posts).

    <item>
        <title><![CDATA[Mitigating the Causes and Effects of Scope Creep]]></title>
        <link>https://www.avatria.com/news-and-insights/mitigating-the-causes-and-effects-of-scope-creep/</link>
        <pubDate>Fri, 20 May 2016 14:27:57 +0000</pubDate>
        <dc:creator><![CDATA[anando.naqui]]></dc:creator>
        <guid isPermaLink="false">https://www.avatria.com/?p=792</guid>
        <description></description>
        <content:encoded><![CDATA[Even well-executed eCommerce implementations that are launched on time and budget fail to meet client expectations. Often times no single group (client, consultant or third party) is at fault, and the root cause can be traced back to either differences in the understanding of the body of work to be implemented or changes to that scope.
...

Contentful Modeling

The goal for this part of the exercise is to determine how to convert these XML objects from Wordpress into your new content model within Contentful. If possible, we recommend going through this process once you have a general sense of the design of your website, as it will help ensure that you develop a content model that supports all components and component types. Given Contentful is extremely flexible, this exercise is a data modeling one, where you should be asking the following questions:

  1. What are the key fields from the Wordpress objects that we want to have in our Contentful content model moving forward? What can we exclude?

    • Examples - title, image, date, author name, author image, slug, rich content

  2. What are fields that we can set to default for each of the objects?

    • Featured Article

  3. What are fields that can be used by references to the article?

    • Think of a tile that shows the article & a snippet of the article, providing a link to the article itself

    • In the case of a listing page of different blog posts that shows these tiles, would be useful to have a summary, a link, tags for filters, etc.

  4. What other features do you think you could use these objects in?

  5. What needs to be manually authored and cannot be easily migrated?

    • For us, this was images, for reasons we'll discuss below

The better you can answer these questions, the more comprehensive and well fitting your data model will be to your intended functionality. Of course there will always be new features / new enhancements to implement in the future, but with a bit of forward thinking now, you'll have less work in the future.

For our purposes, we answered the questions above and came up with the following data model within Contentful for our blog and blog posts.

Migrate Data Model

Note that this is just one part of our broader Contentful data model. Since blog posts are the only thing we needed to migrate from Wordpress, it is the only piece that we will be discussing in this post. We use the Content Page content model to support a number of different types of pages, including our blog posts.

Now that our content model is defined, we can populate it through a script that converts the Wordpress data into our Contentful models.

Migration Script

Contentful offers two main APIs for interacting with data within their platform:

  • Content Delivery API — a read-only API for retrieving content

  • Content Management API — create, edit, manage, and publish content

It was clear that we were interested in how to best utilize the Content Management API for what we needed to accomplish as a part of this exercise. Contentful has a number of different client libraries in various languages, which we reviewed on their platforms page. For the purpose of our migration, we used the Python SDK client library due to comfort with Python and the quick nature of this integration.

Below is walkthrough of the script we developed.

Step 1: Read XML File

The script starts with reading in the XML file using the library BeautifulSoup. This allows us to easily search through the file and grab elements that we need in an easy format.

with open('avatria.WordPress.2023-12-13.xml', 'r') as f:
  data = f.read()

soupData = BeautifulSoup(data, features="xml")

#Read in all the items from Wordpress 
items = soupData.find_all('item')

Thus, we have stored all of our blog post items and stored them within an array with name items.

Now, we are able to parse through each item, extracting the information from each item as we see fit:

for item in items:

    print("\nCreating article #", count)

    # Transform the objects into the format needed for the Post call to CFul

    title = item.find('title').get_text()
    slug = item.find('link').get_text().rsplit("/")[-2]
    date = item.find('pubDate').get_text()
    author = item.find('dc:creator').get_text().split('.')
    body = item.find('content:encoded').get_text()
    body = body.replace("&nbsp;", " ")
    body = body.replace("&amp;", "&")

Note that for some of the fields, it’s not as simple as using a get_text(), such as retrieving the tags or parsing through the body. Some of this is due to the encoding that has been done to the content, while others depends on how data is stored in the source system vs. how you want it to be stored in Contentful.

For example, the blog post link object, obtained via item.find('link').get_text() as shown above, would come back with the following value:

https://www.avatria.com/news-and-insights/appointment-service-design-for-ecommerce-applications/

In our implementation, all we need is the url suffix of news-and-insights/appointment-service-design-for-ecommerce-applications/. Thus, we will apply the following transformation this field:

slug = item.find('link').get_text().rsplit("/")[-2]

In determining what additional transformations you need, it's worth experimenting with a manually created object to determine the behavior of each object with the front-end design.

Remember that the more conversion you can do as a part of this step, the less you’ll need to manually do in the future, and the lower risk you'll run for errors in live content. The amount of effort in this conversion process likely depends on the number of resulting items. If you have 10 resulting objects, a manual update to a field in each is quite a bit less effort than if you have 100 resulting objects.

Step 2: Setting up our Contentful API Calls

Note from our content modeling above, we reference a number of nested items: (1) link, (2) article, (3) page metadata, (4) related insights. All of these objects will live within our Content Page that will ultimately serve as our blog post object.

Before we do this, we need to create our connection to Contentful via the Contentful Management Python SDK

client = Client('<CMA_TOKEN>')
space_id = '<SPACE_ID>'
environment_id = '<ENVIRONMENT_ID>'

For these values, you can find them in the following locations:

  • CMA_TOKEN - From the Contentful dashboard, navigate to Settings > CMA Tokens, generating one for this exercise.

  • SPACE_ID - Check the URL while on the Contentful dashboard. It should look something like app.contentful.com/spaces/<space_id>/environments/...

  • ENVIRONMENT_ID - Select the hamburger menu above the Home button. The environment ID is what the environment alias is pointing to. For example, it should look something similar to main > main-source, where main-source would be your environment ID.

With this, we can use client to execute calls to Contentful to create entries moving forward.

Thus, we need to create these objects to provide this reference for our blog post item.

First we define a link object, which will be used for linking to the post from different pages throughout our site.

Object #1 - Link

For a number of these attributes, we default the values to the title to allow for easier searching later on, as well as to allow us to differentiate between each Link item within our Contentful data.

#1 - Create a link

category = item.find('link').get_text().rsplit("/")[-3]
link = category + '/' + slug

print('Creating link for ', link)

link_content_type_id = 'link'
link_attributes = {
    'content_type_id': link_content_type_id,
    'fields': {
        'internalName': {
            'en-US': title
        },
        'text': {
            'en-US': title
        },
        'url': {
            'en-US': link
        },
        'targetEnabled': {
            'en-US': False
        },
        'target': {
            'en-US': '_blank'
        },
        'relEnabled': {
            'en-US': False
        }, 
    }
}

We then call the client.entries method, passing our space_id and environment_id to create a link entry within our Contentful instance, passing along our attributes that we set above.

link_entry = client.entries(space_id, environment_id).create(
    None,
    link_attributes
)

link_entry.publish()

print('Finished link for ', link)

We save the entry id attribute from the resulting object created within Contentful to reference later on within the page creation call.

link_id = link_entry.id

Object #2 - Article

Following a similar process to the above, we need to create an insightsArticle object, which includes key aspects like the title, date, author, and reference to our link object we just created in the previous step:

#2 - Create article

print('Inserting article with name: ', title)

entry_attributes = {
    'content_type_id': 'insightArticle',
    'fields': {
        'title': {
            'en-US': title
        },
        'date': {
            'en-US': date_string
        },
        'authorName':  {
            'en-US': formatted_author
        },
        'articleLink': {
            'en-US': {
                'sys': {
                    'type': 'Link',
                    'linkType': 'Entry',
                    'id': link_id
                }
            }
        }
    }
}

Once we set these attributes, now we can call the client library to create the entry via the Content Management API

article_entry = client.entries(space_id, environment_id).create(
    None,
    entry_attributes
)

article_id = article_entry.id

Note we don’t publish this item because these article objects are used on a separate blog listing page that customers use to navigate to individual posts. Since we still need to publish this post, we should create this object but leave it in Draft status so that the tile does not appear on the listing page.

Object #3 - Page Metadata

Creating a metadata object for each page allows you to group certain attributes like keywords, page ID, page description, etc. for each page without having to bloat the actual content within the page attributes themselves.

Thus, we populate this with some general attributes that will fit most of our blog posts, while including the title as the id attribute within this object.

#3 - Create page metadata object

print('Creating metadata in Contentful with title: ', title)

meta_description = 'Avatria.com | ' + title
meta_keywords = ", ".join(formatted_tags)

metadata_attributes = {
    'content_type_id': 'pageMetadata',
    'fields': {
        'id': {
            'en-US': title
        },
        'description': {
            'en-US': meta_description
        },
        'applicationName':  {
            'en-US': 'Avatria.com'
        },
        'author': {
            'en-US': 'Avatria, Inc.'
        },
        'keywords': {
            'en-US': meta_keywords
        }
    }
}

Similar to before, we call the client.entries method to create this object, then store the ID for this metadata object for later use:

metadata_entry = client.entries(space_id, environment_id).create(
    None,
    metadata_attributes
)

metadata_entry.publish()
print('Created metadata in Contentful with title: ', title)

metadata_id = metadata_entry.id

Object #4 - Related Insights

One of the components on our blog posts is a small carousel linking to other related blog posts. For the purpose of this migration, we want to set up a new relatedInsights object that will hold these references. Note that we won’t populate the references themselves, as the related posts are new functionality, and can't be imported from Wordpress. Creating these references will be a content exercise later on. However, setting up the objects now will make this content exercise a lot easier in the future.

Below is the code we use to set the attributes and create this entry within Contentful, saving the resulting entry ID within insights_id for use later on.

#4 - Create a default RELATED INSIGHTS Object

print('Creating related insights in Contentful with internal title: ', title)

insights_headline = 'RELATED INSIGHTS'
insights_internal = insights_headline + ' - ' + title

insights_attributes = {
    'content_type_id': 'insightsContainer',
    'fields': {
        'headline': {
            'en-US': insights_headline
        },
        'internalName': {
            'en-US': insights_internal
        },
        'articles': {
            'en-US': []
        }
    }
}

insights_entry = client.entries(space_id, environment_id).create(
    None,
    insights_attributes
)
print('Created related insights in Contentful with internal title: ', title)

insights_id = insights_entry.id

Now that we have all of our nested items created, we can finalize the creation of our Content Page.

We first set-up some default objects that will be included on every blog post:

#5 - Create page

print('Creating page for title: ', title)

default_theme_id = '<THEME_OBJECT_ID>'
default_cta_id = '<CTA_OBJECT_ID>'

Then we set the page attributes based on a combination of our default entry references & the entry references from the previous steps:

page_internal_title = 'Insights Article - ' + title
page_theme = {'type': 'Link', 'linkType': 'Entry', 'id': default_theme_id}
page_slug = link
page_title = title

page_article = {'sys':{'type': 'Link', 'linkType': 'Entry', 'id': article_id}}
page_transition = {'sys':{'type': 'Link', 'linkType': 'Entry', 'id': default_transition_id}}
page_insights = {'sys':{'type': 'Link', 'linkType': 'Entry', 'id': insights_id}}
page_cta = {'sys':{'type': 'Link', 'linkType': 'Entry', 'id': default_cta_id}}

page_components = [page_article, page_transition, page_insights, page_cta]

Now we can set our page attributes and submit the request to Contentful to create our blog post page entry.

page_attributes = {
    'content_type_id': 'siteStorefrontContentPage',
    'fields': {
        'internalName': {
            'en-US': page_internal_title
        },
        'theme': {
            'en-US': {
                'sys': page_theme
            }
        },
        'slug':  {
            'en-US': page_slug
        },
        'title': {
            'en-US': page_title
        },
        'pageMetadata': {
            'en-US': {
                'sys': {
                    'type': 'Link',
                    'linkType': 'Entry',
                    'id': metadata_id
                }
            }
        },
        'components': {
            'en-US': page_components
        }
    }
}

page_entry = client.entries(space_id, environment_id).create(
    None,
    page_attributes
)

page_entry.pageMetadata = metadata_entry
page_entry.save()

print('Created page for title: ', title)

We decided to not automatically publish this item for a couple reasons:

  • Wanted to ensure the content looked appropriate before making it live on our new site.

  • There were various content authoring steps (mainly image-based) that our script did not cover that we needed to do before publishing.

This is what a resulting blog post object with our above data modeling looks like in practice. Shared is the Content Page which contains all objects necessary to render our blog post appropriately.

migration-1migration-2

Migration Caveats

All steps I listed above were very specific to both (a) what we were exporting from Wordpress and (b) what we’re importing into Contentful.

When mapping out the content authoring solution, the process should follow a similar structure, but each field, data model, and exact field population will likely be a different process.

Given that, there are some things to consider if your implementation looks somewhat similar to ours:

  1. Have an idea of what you’re going to migrate, then try it for one entry.

    • There are going to be issues. Before implementing the entire process, try the script first on each embedded object and iterate. What’s easy? What’s really difficult? Verify amongst the team as you progress with this single entry.

  2. The structure of entry attributes is a bit wonky within the Contentful Python SDK.

    • This may be something that is similar in other SDKs, but I found that it’s necessary to mark every field with localization in order to import appropriately via the client.entries(…) call.

      • This included booleans, urls, etc. This wasn’t explicitly clear in the documentation and took a bit of trial / error & clever googling to find the solution.

  3. Determine the balance of “manual” vs. “scripted” for your implementation.

    • For example, we decided to manage the migration of image assets manually. Given the relatively small number of posts, and the necessity of steps that would be difficult to automate (such as resizing images for the new design), we determined that the payoff was not worth the effort.

    • There may be similar content types that requires manual authoring, and that’s okay. It’s about the balance of your team capabilities, timelines of go-live, etc.

Takeaways

As a result of this migration process, our content model has become more dynamic & the content model is flexible to support additional features we come up with for avatria.com. Building our content model within Contentful allows us to have access to our content library through the Contentful API, which makes us much more flexible for other internal marketing material in the future.

If you’re interested in learning more about Avatria or our abilities in CMS-related implementations, feel free to reach out to us!

RELATED INSIGHTS
INSIGHT
Elite BAs may not Develop Code, but are Pivotal to Project Success
In my experience, there has been a misconception that solid Business Analysts are pretty easy to find and are interchangeable within an eCommerce project. What I have come across is that finding very strong BAs that are difference makers on a project is not so easy; and having a great one is essential to overall quality on a project. There are many factors involved with delivering a successful project, and in this post I’ll review the main areas of a project where a strong BA can showcase his or her elite skills to help enable that success.
John Vurdelja
READ MORE  
INSIGHT
Maximizing the Value of Minimum Viable Product
Sometimes, less really is more. The benefits of establishing and practicing a disciplined minimum viable product approach to development are numerous.
Tony Farruggio
READ MORE  
INSIGHT
Effective Code Branching Strategies for Software Teams
Branching strategies are critical to clean, efficient, and bug-free development when multiple teams or developers are working within the same codebase. Check out our key strategies we recommend, but most of all: don’t overlook the small stuff.
Stephen Osentoski
READ MORE  
Have questions or want to chat?