Migrate Google Sites to Jekyll

Dusty Candland | |

Export

Install https://github.com/famzah/google-sites-backup

google-sites-backup/run.sh gdata-python-client/ google-sites-backup/

Convert to Markdown

Install reverse_markdown

cd into the exported proj

find . -iname "*.html" -exec echo "tidy -q -omit -b -i -c {} | reverse_markdown > {}.md" \; | sed s/\.html\.md/\.md/ > fix.sh
chmod +x fix.sh
./fix.sh
find . -iname "*.html" | xargs rm

Cleanup

Add frontmatter

find . -iname "*.md" -exec perl -0777 -i -pe 's/<head>.*<\/head>//igs' {} \;
find . -iname "*.md" -exec perl -0777 -i -pe 's/^# (.*)$/---\nlayout: page\ntitle: $1\n---/m' {} \;

Clean up left over extras, spaces, extra header lines, and

find . -iname "*.md" | xargs -t -I {} sed -i'' 's/Â//g' {}
find . -iname "*.md" -exec perl -0777 -i -pe 's/^[\s\|]*$//gm' {} \;
find . -iname "*.md" -exec perl -0777 -i -pe 's/^.*?---/---/ms' {} \;
find . -iname "*.md" -exec perl -i -pe 's/^ ([^ ].*)$/$1/g' {} \;

Remove absolute links

ack --ignore-dir=_site -l "sites.google.com\/a\/roximity.com\/wiki" | xargs perl -i -pe "s/https:\/\/sites\.google\.com\/a\/roximity\.com\/wiki//g"

Fix resource links

ack --ignore-dir=_site -l "\/_\/rsrc\/\d*\/" | xargs perl -i -pe "s/\/_\/rsrc\/\d*\///g"

Rename %20 to underscores in file names.

for i in `find . -name "*%20*"`; do mv -v $i `echo $i | sed 's/%20/_/g'` ; done

Still had to do a fair amount of clean up from the converted markdown.

Plugins

These make the stucture and navigation match the google sites somewhat.

Related Pages

Lots of our page had files as downloads. I like the idea of putting downloads in a sub directory and having them auto populate on the page. Also some of our navigation is based on pages in a matching directory. This plugin populates a sub_pages collection and a downloads collection. The view renders those collections

module AssociateRelatedPages
class Generator < Jekyll::Generator
def generate(site)
page_lookup = site.pages.reduce({}) { |lookup, page| lookup["/" + page.path] = page; lookup; }

site.pages.each do |page|
subdir = File.join(site.source, page.dir, page.basename)
if File.exist?(subdir) and File.directory?(subdir)
entries = Dir.entries(subdir)

page.data["sub_pages"] = entries.select{ |e|
e =~ /\.md$/
}.map{ |e|
page_lookup[File.join(page.dir, page.basename, e)]
}

page.data["downloads"] = entries.reject{ |e|
e == "." || e == ".." || e =~ /\.md$/ ||
File.directory?(File.join(subdir, e))
}.map{ |e|
download = File.join(subdir, e)
stat = File::Stat.new(download)
{
"title" => e,
"url" => File.join(page.basename, e),
"size" => stat.size
}
}
end
end
end
end
end
{% if page.sub_pages.size > 0 %}
<ul>
{% for page in page.sub_pages %}
<li>
<a href="{{ page.url | prepend: site.baseurl }}">{{ page.title }}</a>
</li>
{% endfor %}
</ul>
{% endif %}
{% if page.downloads.size > 0 %}
<div class="post-downloads">
<h2>Downloads</h2>
<ul>
{% for download in page.downloads %}
<li>
<a href="{{ download.url | prepend: site.baseurl }}">{{ download.title }} ({{ download.size }}b)</a>
</li>
{% endfor %}
</ul>
</div>
{% endif %}

Navigation

The navigation on the google site was mostly based on sub directories. This creates a nav collection used to build the navigation.

module HierarchicalNavigation
class Generator < Jekyll::Generator
#{dev: { page: Page, sub: [] }}

def generate(site)
nav = {}
site.pages.sort_by(&:dir).each do |page|
dirs = page.dir.split('/')
dir = dirs[1] || ''

if dirs.count <= 2
if page.basename == 'index'
nav[dir] ||= {'page' => nil, 'sub' => []}
nav[dir]['page'] = page
else
nav[dir] ||= {'page' => nil, 'sub' => []}
nav[dir]['sub'] << page
end
end
end

site.data['nav'] = nav.values
end
end
end
<ul>
{% for nav in site.data['nav'] %}
{% if nav.page.title %}
<li class="{% if page.url contains nav.page.url %}active{% endif %}">
<a class="page-link" href="{{ nav.page.url | prepend: site.baseurl }}">{{ nav.page.title }}</a>
{% if page.url contains nav.page.dir %}
<ul>
{% for sub in nav.sub %}
{% if sub.title %}
{% capture sub_dir %}{{ sub.url | remove: ".html" | append: "/" }}{% endcapture %}
<li class="{% if page.url contains sub.url or page.dir == sub_dir %}active{% endif %}">
<a class="page-link" href="{{ sub.url | prepend: site.baseurl }}">{{ sub.title }}</a>
</li>
{% endif %}
{% endfor %}
</ul>
{% endif %}
</li>
{% endif %}
{% endfor %}
</ul>

Webmentions

These are webmentions via the IndieWeb and webmention.io. Mention this post from your site: