• Migrate Google Sites to Jekyll


    Install https://github.com/famzah/google-sites-backup

    google-sites-backup/run.sh gdata-python-client/ google-sites-backup/

    Convert to Markdown

    Install reverse_markdown

    cd into the exported proj

    find . -iname "*.html" -exec echo "tidy -q -omit -b -i -c {} | reverse_markdown > {}.md" \; | sed s/\.html\.md/\.md/ > fix.sh
    chmod +x fix.sh
    find . -iname "*.html" | xargs rm


    Add frontmatter

    find . -iname "*.md" -exec perl -0777 -i -pe 's/<head>.*<\/head>//igs' {} \;
    find . -iname "*.md" -exec perl -0777 -i -pe 's/^# (.*)$/---\nlayout: page\ntitle: $1\n---/m' {} \;

    Clean up left over extras, spaces, extra header lines, and

    find . -iname "*.md" | xargs -t -I {} sed -i'' 's/Â//g' {}
    find . -iname "*.md" -exec perl -0777 -i -pe 's/^[\s\|]*$//gm' {} \;
    find . -iname "*.md" -exec perl -0777 -i -pe 's/^.*?---/---/ms' {} \;
    find . -iname "*.md" -exec perl -i -pe 's/^ ([^ ].*)$/$1/g' {} \;

    Remove absolute links

    ack --ignore-dir=_site -l "sites.google.com\/a\/roximity.com\/wiki" | xargs perl -i -pe "s/https:\/\/sites\.google\.com\/a\/roximity\.com\/wiki//g"

    Fix resource links

    ack --ignore-dir=_site -l "\/_\/rsrc\/\d*\/" | xargs perl -i -pe "s/\/_\/rsrc\/\d*\///g"

    Rename %20 to underscores in file names.

    for i in `find . -name "*%20*"`; do mv -v $i `echo $i | sed 's/%20/_/g'` ; done

    Still had to do a fair amount of clean up from the converted markdown.


    These make the stucture and navigation match the google sites somewhat.

    Lots of our page had files as downloads. I like the idea of putting downloads in a sub directory and having them auto populate on the page. Also some of our navigation is based on pages in a matching directory. This plugin populates a sub_pages collection and a downloads collection. The view renders those collections

    module AssociateRelatedPages
      class Generator < Jekyll::Generator
        def generate(site)
          page_lookup = site.pages.reduce({}) { |lookup, page| lookup["/" + page.path] = page; lookup; }
          site.pages.each do |page|
            subdir = File.join(site.source, page.dir, page.basename)
            if File.exist?(subdir) and File.directory?(subdir)
              entries = Dir.entries(subdir)
              page.data["sub_pages"] = entries.select{ |e|
                e =~ /\.md$/
              }.map{ |e|
                page_lookup[File.join(page.dir, page.basename, e)]
              page.data["downloads"] = entries.reject{ |e|
                e == "." || e == ".." || e =~ /\.md$/ || 
                  File.directory?(File.join(subdir, e))
              }.map{ |e|
                download = File.join(subdir, e)
                stat = File::Stat.new(download)
                  "title" => e,
                  "url" => File.join(page.basename, e),
                  "size" => stat.size
    {% if page.sub_pages.size > 0 %}
      {% for page in page.sub_pages %}
          <a href="{{ page.url | prepend: site.baseurl }}">{{ page.title }}</a>
      {% endfor %}
    {% endif %}
    {% if page.downloads.size > 0 %}
      <div class="post-downloads">
        {% for download in page.downloads %}
            <a href="{{ download.url | prepend: site.baseurl }}">{{ download.title }} ({{ download.size }}b)</a>
        {% endfor %}
    {% endif %}

    The navigation on the google site was mostly based on sub directories. This creates a nav collection used to build the navigation.

    module HierarchicalNavigation
      class Generator < Jekyll::Generator
        #{dev: { page: Page, sub: [] }}
        def generate(site)
          nav = {}
          site.pages.sort_by(&:dir).each do |page|
            dirs = page.dir.split('/')
            dir = dirs[1] || ''
            if dirs.count <= 2
              if page.basename == 'index'
                nav[dir] ||= {'page' => nil, 'sub' => []}
                nav[dir]['page'] = page
                nav[dir] ||= {'page' => nil, 'sub' => []}
                nav[dir]['sub'] << page
          site.data['nav'] = nav.values
    {% for nav in site.data['nav'] %}
      {% if nav.page.title %}
      <li class="{% if page.url contains nav.page.url %}active{% endif %}">
        <a class="page-link" href="{{ nav.page.url | prepend: site.baseurl }}">{{ nav.page.title }}</a>
        {% if page.url contains nav.page.dir %}
          {% for sub in nav.sub %}
            {% if sub.title %}
              {% capture sub_dir %}{{ sub.url | remove: ".html" | append: "/" }}{% endcapture %}
              <li class="{% if page.url contains sub.url or page.dir ==  sub_dir %}active{% endif %}">
                <a class="page-link" href="{{ sub.url | prepend: site.baseurl }}">{{ sub.title }}</a>
            {% endif %}
          {% endfor %}
        {% endif %}
      {% endif %}
    {% endfor %}
  • Spark RDD to CSV with headers

    We have some Spark jobs that we want the results stored as a CSV with headers so they can be directly used. Saving the data as CSV is pretty straight forward, just map the values into CSV lines.

    The trouble starts when you want that data in one file. FileUtil.copyMerge is the key for that. It takes all the files in a directly, like those output by saveAsTextFile and merges them into one file.

    Great, now we just need a header line. My first attempt was to union an RDD w/ the header and the output RDD. This works sometimes, if you get lucky. Since union just smashes everything together, more often then not, the CSV has the header row somewhere in the middle of the results.

    No problem! I’ll just prepend the header after the copyMerge. Nope, generally Hadoop is write only, you can get append to work, but still not a great option.

    The solution was to write the header as a file BEFORE the copyMerge using a name that puts it first in the resulting CSV! Here’s what we ended up using:

    (ns roximity.spark.output
      (:require [sparkling.conf :as conf]
                [sparkling.core :as spark]
                [sparkling.destructuring :as de]
                [clojure.data.csv :as csv]
                [clojure.java.io :as io]
      (:import [org.apache.hadoop.fs FileUtil FileSystem Path]
    (defn- csv-row
      (let [writer (java.io.StringWriter.)]
        (clojure.data.csv/write-csv writer [values])
        (clojure.string/trimr (.toString writer))
    (defn save-csv
      "Convert to CSV and save at URL.csv. URL should be a directory.
       Headers should be a vector of keywords that match the map in a tuple value.
       and should be in the order you want the data writen out in."
      [url headers sc rdd]
      (let [
          header (str (csv-row (map name headers)) "\n")
          file url
          dest (str file ".csv")
          conf (org.apache.hadoop.conf.Configuration.)
          srcFs (FileSystem/get (java.net.URI/create file) conf)
        (FileUtil/fullyDelete (io/as-file file))
        (FileUtil/fullyDelete (io/as-file dest))
        (->> rdd
          (spark/map (de/value-fn (fn [value]
            (let [values (map value headers)]
              (csv-row values)
          (spark/coalesce 1 true)
          (#(.saveAsTextFile % file))
        (with-open [out-file (io/writer (.create srcFs (Path. (str file "/_header"))))]
          (.write out-file header)
        (FileUtil/copyMerge srcFs (Path. file) srcFs (Path. dest) true conf nil)
        (.close srcFs)

    This works for local files and s3, and it should work for HDFS. Since we’re using s3 and the results are not huge, we use (coalesce 1 true) so that only one part file is written to s3, without that we had issues with too many requests. Could probably use a higher number and find a happy medium, but we just use 1.

  • Why ROXIMITY Selected MongoDB – Compose.io

    When we initially started development of ROXIMITY, I decided to go with MongoDB. There were three reasons for this choice: Geospatial support, redundancy and scalability and a lack of schema. If you are thinking about MongoDB, these are still all valid reasons for considering it and our experience should aid your decision making. read more

  • Capistrano 2 rolling task

    This took a long time to track down, but this will allow rolling deploys or tasks with Capistrano 2.x.

    task :rolling, roles: :web do 
      find_servers_for_task(current_task).each do |s|
        puts roles[:web].clear()
        server s.host, :web
        puts "Deploying to #{s.host}..."

    Ref: https://groups.google.com/forum/#!topic/capistrano/H-tizsMN2Tk

  • AngularJS dynamic filter

    Use this to dynamically pass a filter from a controller or somewhere else as a string. It will use DI to lookup the filter and apply it.

    In the template

    {{row.label | date | dynamic:nameFilter }}

    In the controller

    $scope.nameFilter = 'mycustomerfilter'; 
    app.filter('dynamic', ['$injector', function($injector) {
      return function(text, filter) {
        var result = text;
        if (filter) {
          var f = $injector.get(filter + "Filter");
          if (f) { result = f(text); }
        return result;
  • Run GIT GC in all subdirectories

    Find the release directories and run git gc in each.

    find ./ -maxdepth 1 -type d -exec git --git-dir={}/.git --work-tree={} gc \;
  • Semi Forward Foot Controls for Sportsters ’86 and ’03

    These controls place your feet in between the stock mid position and forward controls.4-020bb_s