Rake Package to Create Zipfile

Small Rakefile to package a WordPress plugin into a zip file that can be installed by uploading.

The Rake::PackageTask requires FileTasks that describe how to build the files, since we don’t really need to do anything, we just need to define what the files are.

require 'rake'
require 'rake/packagetask'

file 'README.txt'
file 'admin/**'
file 'includes/**'
file 'languages/**'
file 'public/**'
file "index.php"
file "LICENSE.txt"
file "README.txt"
file "remote-api.php"
file "uninstall.php"

Rake::PackageTask.new("remote-api", :noversion) do |p|
  p.need_zip = true
  p.package_files.include(
    "admin/**",
    "includes/**",
    "languages/**",
    "public/**",
    "index.php",
    "LICENSE.txt",
    "README.txt",
    "remote-api.php",
    "uninstall.php")
end

Then to create the zip, just run rake package. It’ll create the file in the pkg/remote-api directory, or whatever name you gave the package.


Clojure Duct setup with MongoDB, Foundation, and Buddy - Part 2

Zurb Foundation && SASS

Add Zurb Foundation or whatever webjars you want to use.

 [org.webjars/foundation "6.2.0"]
 [org.webjars/font-awesome "4.6.2"]

Setup a SASS file

I put these in src/sass/. You can import from webjars like this.

@import 'foundation/scss/foundation';
@import 'foundation/scss/util/mixins';
@import 'font-awesome/scss/font-awesome';

Watching SASS files in dev

Need to add SASS support. Checked out sass4clj. There is a lein plugin, but that didn’t play well with figwheel. I did end up using some ideas from that and the sass4clj project to integrate with figwheel.

I started with this SASS Watcher which was a good starting point, but didn’t load in the webjars. So the next step is replace with sass4clj which does reference webjars.

in dev.clj require these

[sass4clj.core :refer [sass-compile-to-file]]
[watchtower.core :refer :all]

Next create a new component to watch the SASS files and recompile on changes. A lot of these came from the lein-sass4clj project.

(defn- main-file? [file]
  (and (or (.endsWith (.getName file) ".scss")
           (.endsWith (.getName file) ".sass") )
       (not (.startsWith (.getName file) "_"))))

(defn- find-main-files [source-paths]
  (mapcat (fn [source-path]
            (let [file (io/file source-path)]
              (->> (file-seq file)
                   (filter main-file?)
                   (map (fn [x] [(.getPath x) (.toString (.relativize (.toURI file) (.toURI x)))])))))
          source-paths))

(defn watch-sass
  ""
  [input-dir output-dir options]
  (prn (format "Watching: %s -> %s" input-dir output-dir))
  (let [source-paths (vec (find-main-files [input-dir]))
        sass-fn 
        (fn compile-sass [& _]
          (doseq [[path relative-path] source-paths
                  :let [output-rel-path (clojure.string/replace relative-path #"\.(sass|scss)$" ".css")
                        output-path     (.getPath (io/file output-dir output-rel-path))]]
            (println (format "Compiling {sass}... %s -> %s" relative-path output-rel-path))
            (sass4clj.core/sass-compile-to-file
              path
              output-path
              (-> options
                  (update-in [:output-style] (fn [x] (if x (keyword x))))
                  (update-in [:verbosity] (fn [x] (or x 1)))))))
        ]
    (watchtower.core/watcher
       [input-dir]
       (watchtower.core/rate 100)
       (watchtower.core/file-filter watchtower.core/ignore-dotfiles)
       (watchtower.core/file-filter (watchtower.core/extensions :scss :sass))
       (watchtower.core/on-change sass-fn))
    )
  )

(defrecord SassWatcher [input-dir output-dir options]
  component/Lifecycle
  (start [this]
    (prn "Starting SassWatcher Component.")
    (if (not (:sass-watcher-process this))
      (do
        (println "Figwheel: Starting SASS watch process:" input-dir output-dir)
        (assoc this :sass-watcher-process (watch-sass input-dir output-dir options))
        )
      this))
  (stop [this]
    (when-let [process (:sass-watcher-process this)]
      (println "Figwheel: Stopping SASS watch process")
      (future-cancel process))
    this))

Next setup a config for compilation and add the compoent to the dev system.

(def sass-config
  {:input-dir "src/sass" ;location of the sass/scss files
   :output-dir "resources/nspkt/ui/public/css"
   :options 
   {:source-map true
    ;:output-style :nested, :compact, :expanded and :compressed
    ;:verbosity 1, 2
    }
   })

(defn new-system []
  (into (system/new-system config)
        {:figwheel (figwheel/server (:figwheel config))
         :sass (map->SassWatcher sass-config)}
        ))

Now we’re watching the files and updating on change. I think figwheel should pick up those changes and push an reload, but something doesn’t seem to be working there.


Clojure Duct setup with MongoDB, Foundation, and Buddy

Setup a new duct site

Why Duct? Well it’s a great starting point using most of what I want. Compojure, Ring, Component, ClojureScript, 12 Factor methodology.

lein new duct nspkt.ui +cljs +example +heroku +site
cd nspkt.ui && lein setup

Ok, we need some other stuff, like MongoDB, a css framework (Foundation), and authentication.

MongoDB Setup

I like Monger, so let’s add that. In project.clj, add the dependency.

[com.novemberain/monger "3.0.2"]

We need to add a connection string to the env for monger. This took a minute to figure out.

In the project.clj file, under :profiles > :project/dev > :env, add :url. This will write the values to .lein-env.

{:port "3000", :url "mongodb://localhost:27017/nspkt"}

Then we need to update the config.clj to grap the value. Like so.

(def environ
  {
   :http {:port (some-> env :port Integer.)}
   :db {:url (some-> env :url)}
   })

And add a compontent for System.

(ns nspkt.ui.component.mongodb
  (:require [com.stuartsierra.component :as component]
            [monger.core :as mg]
            )
  )

(defrecord MongoDb [url]
  component/Lifecycle
  (start [this]
    (let [{:keys [conn db]} (mg/connect-via-uri (:url this))]
      (assoc this :conn conn :db db)
      )
    )

  (stop [this]
    (if-let [conn (:conn this)]
      (do
        (mg/disconnect conn)
        (dissoc this :conn :db)
        )
      this
      )
    )
  )

(defn db-component [options]
  (map->MongoDb options)
  )

Next add the component to the system, add have the example endpoint depend on it. Don’t forget to add to the :requires. In system.clj.

...
(-> (component/system-map
     :app  (handler-component (:app config))
     :http (jetty-server (:http config))
     :db   (db-component (:db config))
     :example (endpoint-component example-endpoint))
    (component/system-using
     {:http [:app]
      :app  [:example]
      :example [:db]}))
...

And using the component in the example endpoint, endpoint/example.clj.

(ns nspkt.ui.endpoint.example
  (:require [compojure.core :refer :all]
            [monger.collection :as mc]
            [clojure.java.io :as io]))

(defn example-endpoint [{:keys [db] :as config}]
  (context "/example" []
    (GET "/" []
      (prn-str 
        (mc/find-maps (-> db :db) "reports")
        )
      )))

Great! Let’s make sure everything is working. We need to lein deps and start the REPL again.

If you run into trouble it’s some times easier to see what’s going by lein runing.

I added a test record in MongoDB just to see everything works.

It’s not pretty, but it’s pulling stuff out of the DB! Now let’s add a CSS framework to help thing look a little better.


Migrate Google Sites to Jekyll

Export

Install https://github.com/famzah/google-sites-backup

google-sites-backup/run.sh gdata-python-client/ google-sites-backup/

Convert to Markdown

Install reverse_markdown

cd into the exported proj

find . -iname "*.html" -exec echo "tidy -q -omit -b -i -c {} | reverse_markdown > {}.md" \; | sed s/\.html\.md/\.md/ > fix.sh
chmod +x fix.sh
./fix.sh
find . -iname "*.html" | xargs rm

Cleanup

Add frontmatter

find . -iname "*.md" -exec perl -0777 -i -pe 's/<head>.*<\/head>//igs' {} \;
find . -iname "*.md" -exec perl -0777 -i -pe 's/^# (.*)$/---\nlayout: page\ntitle: $1\n---/m' {} \;

Clean up left over extras, spaces, extra header lines, and

find . -iname "*.md" | xargs -t -I {} sed -i'' 's/Â//g' {}
find . -iname "*.md" -exec perl -0777 -i -pe 's/^[\s\|]*$//gm' {} \;
find . -iname "*.md" -exec perl -0777 -i -pe 's/^.*?---/---/ms' {} \;
find . -iname "*.md" -exec perl -i -pe 's/^ ([^ ].*)$/$1/g' {} \;

Remove absolute links

ack --ignore-dir=_site -l "sites.google.com\/a\/roximity.com\/wiki" | xargs perl -i -pe "s/https:\/\/sites\.google\.com\/a\/roximity\.com\/wiki//g"

Fix resource links

ack --ignore-dir=_site -l "\/_\/rsrc\/\d*\/" | xargs perl -i -pe "s/\/_\/rsrc\/\d*\///g"

Rename %20 to underscores in file names.

for i in `find . -name "*%20*"`; do mv -v $i `echo $i | sed 's/%20/_/g'` ; done

Still had to do a fair amount of clean up from the converted markdown.

Plugins

These make the stucture and navigation match the google sites somewhat.

Lots of our page had files as downloads. I like the idea of putting downloads in a sub directory and having them auto populate on the page. Also some of our navigation is based on pages in a matching directory. This plugin populates a sub_pages collection and a downloads collection. The view renders those collections

module AssociateRelatedPages
  class Generator < Jekyll::Generator
    def generate(site)
      page_lookup = site.pages.reduce({}) { |lookup, page| lookup["/" + page.path] = page; lookup; }

      site.pages.each do |page|
        subdir = File.join(site.source, page.dir, page.basename)
        if File.exist?(subdir) and File.directory?(subdir)
          entries = Dir.entries(subdir)

          page.data["sub_pages"] = entries.select{ |e|
            e =~ /\.md$/
          }.map{ |e|
            page_lookup[File.join(page.dir, page.basename, e)]
          }

          page.data["downloads"] = entries.reject{ |e|
            e == "." || e == ".." || e =~ /\.md$/ ||
              File.directory?(File.join(subdir, e))
          }.map{ |e|
            download = File.join(subdir, e)
            stat = File::Stat.new(download)
            {
              "title" => e,
              "url" => File.join(page.basename, e),
              "size" => stat.size
            }
          }
        end
      end
    end
  end
end
{% if page.sub_pages.size > 0 %}
  <ul>
  {% for page in page.sub_pages %}
    <li>
      <a href="{{ page.url | prepend: site.baseurl }}">{{ page.title }}</a>
    </li>
  {% endfor %}
  </ul>
{% endif %}
{% if page.downloads.size > 0 %}
  <div class="post-downloads">
    <h2>Downloads</h2>
    <ul>
    {% for download in page.downloads %}
      <li>
        <a href="{{ download.url | prepend: site.baseurl }}">{{ download.title }} ({{ download.size }}b)</a>
      </li>
    {% endfor %}
    </ul>
  </div>
{% endif %}

The navigation on the google site was mostly based on sub directories. This creates a nav collection used to build the navigation.

module HierarchicalNavigation
  class Generator < Jekyll::Generator
    #{dev: { page: Page, sub: [] }}

    def generate(site)
      nav = {}
      site.pages.sort_by(&:dir).each do |page|
        dirs = page.dir.split('/')
        dir = dirs[1] || ''

        if dirs.count <= 2
          if page.basename == 'index'
            nav[dir] ||= {'page' => nil, 'sub' => []}
            nav[dir]['page'] = page
          else
            nav[dir] ||= {'page' => nil, 'sub' => []}
            nav[dir]['sub'] << page
          end
        end
      end

      site.data['nav'] = nav.values
    end
  end
end
<ul>
{% for nav in site.data['nav'] %}
  {% if nav.page.title %}
  <li class="{% if page.url contains nav.page.url %}active{% endif %}">
    <a class="page-link" href="{{ nav.page.url | prepend: site.baseurl }}">{{ nav.page.title }}</a>
    {% if page.url contains nav.page.dir %}
      <ul>
      {% for sub in nav.sub %}
        {% if sub.title %}
          {% capture sub_dir %}{{ sub.url | remove: ".html" | append: "/" }}{% endcapture %}
          <li class="{% if page.url contains sub.url or page.dir ==  sub_dir %}active{% endif %}">
            <a class="page-link" href="{{ sub.url | prepend: site.baseurl }}">{{ sub.title }}</a>
          </li>
        {% endif %}
      {% endfor %}
      </ul>
    {% endif %}
  </li>
  {% endif %}
{% endfor %}
</ul>

Spark RDD to CSV with headers

We have some Spark jobs that we want the results stored as a CSV with headers so they can be directly used. Saving the data as CSV is pretty straight forward, just map the values into CSV lines.

The trouble starts when you want that data in one file. FileUtil.copyMerge is the key for that. It takes all the files in a directly, like those output by saveAsTextFile and merges them into one file.

Great, now we just need a header line. My first attempt was to union an RDD w/ the header and the output RDD. This works sometimes, if you get lucky. Since union just smashes everything together, more often then not, the CSV has the header row somewhere in the middle of the results.

No problem! I’ll just prepend the header after the copyMerge. Nope, generally Hadoop is write only, you can get append to work, but still not a great option.

The solution was to write the header as a file BEFORE the copyMerge using a name that puts it first in the resulting CSV! Here’s what we ended up using:

(ns roximity.spark.output
  (:require [sparkling.conf :as conf]
            [sparkling.core :as spark]
            [sparkling.destructuring :as de]
            [clojure.data.csv :as csv]
            [clojure.java.io :as io]
    )
  (:import [org.apache.hadoop.fs FileUtil FileSystem Path]
    )
  )

(defn- csv-row
  [values]
  (let [writer (java.io.StringWriter.)]
    (clojure.data.csv/write-csv writer [values])
    (clojure.string/trimr (.toString writer))
    )
  )

(defn save-csv
  "Convert to CSV and save at URL.csv. URL should be a directory.
   Headers should be a vector of keywords that match the map in a tuple value.
   and should be in the order you want the data writen out in."
  [url headers sc rdd]
  (let [
      header (str (csv-row (map name headers)) "\n")
      file url
      dest (str file ".csv")
      conf (org.apache.hadoop.conf.Configuration.)
      srcFs (FileSystem/get (java.net.URI/create file) conf)
    ]
    (FileUtil/fullyDelete (io/as-file file))
    (FileUtil/fullyDelete (io/as-file dest))
    (->> rdd
      (spark/map (de/value-fn (fn [value]
        (let [values (map value headers)]
          (csv-row values)
          )
        )))
      (spark/coalesce 1 true)
      (#(.saveAsTextFile % file))
      )
    (with-open [out-file (io/writer (.create srcFs (Path. (str file "/_header"))))]
      (.write out-file header)
      )
    (FileUtil/copyMerge srcFs (Path. file) srcFs (Path. dest) true conf nil)
    (.close srcFs)
    )
  )

This works for local files and s3, and it should work for HDFS. Since we’re using s3 and the results are not huge, we use (coalesce 1 true) so that only one part file is written to s3, without that we had issues with too many requests. Could probably use a higher number and find a happy medium, but we just use 1.


Why ROXIMITY Selected MongoDB – Compose.io

When we initially started development of ROXIMITY, I decided to go with MongoDB. There were three reasons for this choice: Geospatial support, redundancy and scalability and a lack of schema. If you are thinking about MongoDB, these are still all valid reasons for considering it and our experience should aid your decision making. read more


AngularJS dynamic filter

Use this to dynamically pass a filter from a controller or somewhere else as a string. It will use DI to lookup the filter and apply it.

In the template

row.label | date | dynamic:nameFilter 

In the controller

$scope.nameFilter = 'mycustomerfilter'; 
app.filter('dynamic', ['$injector', function($injector) {
  return function(text, filter) {
    var result = text;
    if (filter) {
      var f = $injector.get(filter + "Filter");
      if (f) { result = f(text); }
    }
    return result;
  }
}]);

Run GIT GC in all subdirectories

Find the release directories and run git gc in each.

find ./ -maxdepth 1 -type d -exec git --git-dir={}/.git --work-tree={} gc \;

Short MongoDB Fields with Mongoid

Need to have shorter field name in MongoDB, but still use readable names in code?

# Mongoid Timestamps
include Mongoid::Timestamps::Short

# fields
field :aid, as: :application_id, type:String

# one to one
embeds_one :address, :store_as => :ad

# collections
field :aid, as: :application_id
belongs_to :application, foreign_key: :aid

On Writing – Steven King

Part biography and part writing advice. Both were interesting.

The advise that jumped out is, Do The Work. For writing that’s writing and reading. Which translates well for most things into doing and learning/observing.

On Writing


Rails Routes used in an Isolated Engine

The Problem

I have a rails application an want to add a blog engine to it. In this case the blogit engine. Things are working well until the layout is rendered. It uses the parent applications layout, which is what I want, but because it’s an isolated engine, it doesn’t have access to any of the parent applications helpers, including url helpers.

My Solution

If there’s a better way, please post in the comments.

In the engines config block, I open it’s application helper and add a method_missing definition. Then I check the main_app helper, which is added when the engine is mounted, for a matching helper method, if found use it.

/config/initializers/blogit.rb

...
module Blogit
  module ApplicationHelper
    def method_missing method, *args, &block
      puts "LOOKING FOR ROUTES #{method}"
      if method.to_s.end_with?('_path') or method.to_s.end_with?('_url')
        if main_app.respond_to?(method)
          main_app.send(method, *args)
        else
          super
        end
      else
        super
      end
    end

    def respond_to?(method)
      if method.to_s.end_with?('_path') or method.to_s.end_with?('_url')
        if main_app.respond_to?(method)
          true
        else
          super
        end
      else
        super
      end
    end
  end
end
...

As a side note, the engine layout can be specified in /app/views/layouts/blogit/application.html.haml.

Gotchas

root_path and root_url are defined for the engine, so those still need to be handled differently. This issue could happend for any routes that overlap.

(main_app||self).root_path

And the other way if you want to link to the engines root_path. blogit is the engine_name.

(blogit||self).root_path

Rails Blog Engines

A list of mountable blog engines for Rails.

Blogit

Blogit Engine

Blogit is a flexible blogging solution for Rails apps. It:

  • Is Rack based;
  • Is a complete MVC solution based on Rails engines;
  • Aims to work right out of the box but remain fully customisable.

JABE

JABE Blog Engine

JABE is a bare bones blogging engine that is installed as a gem. It will grow as its needs do.

This version is for Rails 3.1+

Squeaky

Squeaky Blog Engine

Squeaky is a simple mountable Rails engine for making a squeaky clean blog.

Hitchens

Hitchens Blog Engine

  • Mountable blog engine for Rails 3.1+
  • Design/style agnostic – just a blog backend
  • Inspired by radar/forem
  • MIT-LICENSE.

Kublog

Kublog Blog Engine

Kublog is a simple yet complete way to have a Product Blog that integrates with your apps user base. It includes social sharing, atom feeds and moderated comments.

Built for Rails 3.1, Kublog is a complete stack, fully configurable solution.

  • Publish posts with the most basic and simple wysiwyg
  • Attach multiple images to your content
  • Share your posts on your Product’s Twitter Page and Facebook Fan Page
  • E-mail personalized versions of your posts to all your users
  • Optional background processing with Delayed Job
  • Moderated comments from apps users, apps admins, and visitors
  • Atom feed for main blog and individual post categories

Basic Rails Sitemap Setup

A basic sitemap generator for building a sitemap on the fly. I’d like a better way, but since it’s hosted on heroku and the map isn’t too big yet, this should work for now.

Add a route to handle building the XML

# /app/controllers/static_controller.rb
def sitemap
  xml = ::SitemapGenerator.create!('http://fixmycarin.com') do |map|
    map.add '', priority:0.9 , changefreq: :daily
    map.add '/faq', priority:0.7 , changefreq: :weekly
    map.add '/places', priority:0.7, changefreq: :weekly 
    map.add '/contact', priority:0.7, changefreq: :weekly 

    places = Place.all
    places.each{|u| map.add(place_path(u), lastmod:u.updated_at, changefreq: :weekly)}
  end
  render :xml => xml
end

Add a route to point to the above action

#/config/routes.rb
get 'sitemap.xml' => "static#sitemap"

Add a link in the robots.txt file

Sitemap: http://fixmycarin.com/sitemap.xml

Add the class that creates the XML. I put it in app/lib or app/services. This was adapted from this gist

# from: https://gist.github.com/288069
require 'builder'

class SitemapGenerator
  def initialize url
    @url = url
    yield(self) if block_given?
  end

  def add url, options = {}
    (@pages_to_visit ||= []) &lt;&lt; options.merge(url:url)
    
  end

  def generate_sitemap
    xml_str = ""
    xml = Builder::XmlMarkup.new(:target => xml_str, :indent => 2)

    xml.instruct!
    xml.urlset(:xmlns=>'http://www.sitemaps.org/schemas/sitemap/0.9') {
      @pages_to_visit.each do |hash|
        unless @url == hash[:url]
          xml.url {
            xml.loc(@url + hash[:url])
            xml.lastmod(hash[:lastmod].utc.strftime("%Y-%m-%dT%H:%M:%S+00:00")) if hash.include? :lastmod
            xml.priority(hash[:priority]) if hash.include? :priority
            xml.changefreq(hash[:changefreq].to_s) if hash.include? :changefre
           }
        end
      end
    }

    return xml_str
  end

  # Notify popular search engines of the updated sitemap.xml
  def update_search_engines
    sitemap_uri = @url + 'sitemap.xml'
    escaped_sitemap_uri = CGI.escape(sitemap_uri)
    Rails.logger.info "Notifying Google"
    res = Net::HTTP.get_response('www.google.com', '/webmasters/tools/ping?sitemap=' + escaped_sitemap_uri)
    Rails.logger.info res.class
    Rails.logger.info "Notifying Yahoo"
    res = Net::HTTP.get_response('search.yahooapis.com', '/SiteExplorerService/V1/updateNotification?appid=SitemapWriter&url=' + escaped_sitemap_uri)
    Rails.logger.info res.class
    Rails.logger.info "Notifying Bing"
    res = Net::HTTP.get_response('www.bing.com', '/webmaster/ping.aspx?siteMap=' + escaped_sitemap_uri)
    Rails.logger.info res.class
    Rails.logger.info "Notifying Ask"
    res = Net::HTTP.get_response('submissions.ask.com', '/ping?sitemap=' + escaped_sitemap_uri)
    Rails.logger.info res.class
  end
end

Open Up Localhost

In many cases it might be easier/better to just use LocalTunnel.

In this case, I didn’t want the external URL changing all the time, and I already have a Linode server setup that I can use. Here are some notes on setting it up.

Assumes a Linux server setup with SSH and Apache.

Setup the proxy

Enable the proxy modules on the server.

a2enmod proxy
a2enmod proxy_http

Edit mods-available/proxy.conf to enable the reverse proxy for requests.

ProxyRequests Off

<Proxy *>
Order deny,allow
Allow from all
</Proxy>

Add a new virtual site to be the reverse proxy. Place a file like the following in the sites-available Apache directory. The localhost port doesn’t matter too much, just needs to not be in use.

<VirtualHost *:80>
     ServerAdmin dustt
     ServerName virtual.red27.net
     SetEnv proxy-initial-not-pooled 1
     ProxyPass / http://localhost:8001
     ProxyPassReverse / http://localhost:8001
     ProxyPreserveHost On
</VirtualHost>

Enamble the virtual site and restart Apache.

a2ensite virtual.red27.net

On the client

You will need to SSH into the server and reverse forward the packets back to the local server.

ssh -nNT -R 8001:localhost:3000 user@virtual.red27.net

This tunnel will need to be reset if the local server errors out. Removing the n argument may help notify you if something goes wrong.


MongoDB 2012 Notes

File and Data Structures

  • Key names are stored in the BSON doc, so make sure key names are short.
  • Data files are pre allocated, doubling in size each time.
  • Files are accessed using memory mapped files at the OS level.
  • fsync’d every 60 seconds.
  • Should use 64bit systems to support the memory mapped files.

Journalling in 1.8 default in 2.0+

  • Write-ahead log
  • Ops written to journal before memory mapped regions
  • Journal flushed every 100ms or 100 MB written
  • db.getLastError({j:true}) to force a journal flush
  • /journal sub directory
  • 1 GB files, rotated ( only really need stuff that has not been fsync’d )
  • some slowdown on high throughput systems, can be symlink’d to other drives.
  • on by default in 64 systems.

When to use

  • If Single node
  • Replica Set – at least 1 node
  • For large data sets.

Fragmentation

  • files get fragmented over time if docs sizes change or deletes
  • collections that have a lot of resizes get padding factor to help reduce fragmentation.
  • need to improve free list
    • 2.0 reduced scanning to reasonable amount
    • 2.2 will change

Compaction

  • 2.0+ compact command

    • only needs 2 GB extra space
    • off line operation (another good reason with replica sets.)
  • safemode: waits for a round trip from using getLastError, with this call you can specify how safe you want the data to be.

  • drop collection doesn’t free the data file, dropDatabase does. Sometimes it makes sense to create and drop databases.

Index and Query Evaluation

@mschireson

  • indexes are lists of values associated with documents
  • stored in a btree
  • required for geo queries and unique constraints
  • assending/descending really only matter on compound indexes.
  • null == null for unique indexes, you can drop duplicates on create.
  • create index is blocking, unless {background: true}, still should try to do off peak
  • when dropping an index, you need to use the same document when created.
  • the $where operator doesn’t use the indexes
  • Regexp’s starting with /^ will use an index
  • Indexes are used for updates and deletes
  • Compound indexes can be used to query for the first field and sort on the second
  • Only uses one index at a time
  • Limited index uses:
    • $ne uses the index, but doesn’t help performance much
    • $not
    • $where
    • $mod index only limits to numbers
    • Range queries only help some.

GEO

  • created using “2d”
  • $near

    • sorted nearest to farthest
  • $within

  • $within{$polygon}}
  • can be in compound queries

Sparse Indexes

  • only store values w/ the indexed field, results won’t have documents w/ null in that field.
  • can be sparse & unique

Covering Indexes

  • contains all fields in the query and the results, no db lookup

Limits and Trade offs

  • max of 64
  • can slow down inserts and updates
  • compound index can be more valuable and handle multiple queries
  • You can force an index or full scan
  • use sort if you really want sorted data
  • db.c.find(…).explain() => see whats going on.
  • db.setProfilingLevel() – record slow queries.
  • Indexes work best when they fit in RAM

Replica Set

  • One is always the primary others are secondary.
  • Chosen by election
  • Automatic fail-over and recover
  • Reads can be from primary or secondary
  • Writes will always go to primary
  • Replica Sets are 2+ nodes, at least 3 is better
  • When a failed nodes come back, they recover by getting the missed updates, then join as a secondary node
  • Setup

    mongod —replSet

    cfg = { _id:, members: [{_id:0, host:‘’}] }

    use admin

    rs.initiate(cfg)

  • rs objects has replica set commands, needs to be issued on the current primary

  • rs.status()
  • Strong Consistency is only available when reading from primary
  • Reads on the secondary machines will be eventually consistent
  • Durability Options (set by driver)

    • fire and forget

      • won’t know about failures due to unique constraints, disk full, or anything else.
    • wait for error recommended

    • wait for journal sync
    • wait for fsync (slow)
    • wait for replication (really slow)
  • Can give nodes priorities, which will help ensure a specific machine is primary.

    • 0 priorities will never be primary
    • when a higher priority machine is back online, it will force an election.
  • Can have a slave delay

  • Tag replica sets with properties and can specify when waiting.
  • Arbiters Member

    • Don’t have data
    • vote in elections
    • used to break a tie
  • Hidden Member

    • not seen by the clients
  • Data is stored for replication in an oplog capped collection, all secondaries have an oplog too.

Scaling

  • Vertical Scaling is limited
  • Horizontal scaling is cheaper, can scale wider then higher
  • Vertical can be a single point a failure, can be hard to backup/maintain.

  • Replica Sets are one type

    • Can scale reads, but now writes, eventual consistency is the biggest downside.
    • Replication can overwhelm the secondaries, reducing performance anyway
  • Why Shard?

    • Distribute the write load
    • Keep working set in RAM, by using multiple machines act like one big virtual machine
    • Consistent reads
    • Preserve functionality, by range based portioning most(all?) the query operators are available.
  • Sharding design goals

    • scale linearly
    • increase capacity with no downtime
    • transparent to the application / clients
    • low administration to add capacity
    • no joins or transactions
    • BigTable / PNUTS inspired read the PNUTS paper
  • Basics

    • Choose how you partition data
    • Convert from a single replica set to sharding with no downtime
    • Full feature set
    • Fully consistent by default
    • You pick a shard key, which is used to move ranges of data to a shard

Architecture

  • Shard – each shard is it’s own replica set for automated fail over
  • Config Servers – store the meta data about where the partitions of data is, which shard

    • Not a replica set, writes to the config server is done with a transaction by the mongod / mognos
  • Mongos – uses the config servers to know what shard to use for the data/query

    • Client talks to the mongos servers
    • chunk = collection minkey, maxkey, shard
      • chunks are logical, not physical
      • chunk is 64MB, once you hit this point a new split happens, a new shard is created and data is moved.

shard keys

  • they are immutable.
  • Choose a key
    • _id? is incremental this results in all writes going to one query
    • hash? is random, this partitions well, but now great for queries
    • user_id? kinda random, useful for lookups, all data for user_id X will be on one shard

      • However you can’t split on this if one user is a really heavy user.
    • user_id + md5(x)? this is the best option.

Other notes

  • Want to add capacity way before it’s needed, at least before 70% operation capacity, this allows the data to migrate over time
  • Understand working set in RAM
  • Machine too small and admin overhead goes up
  • Machine too big and sharding doesn’t happen smoothly

MMS – MongoDB Monitoring Service

MMS


Include a Module Based on Rails Environment

Not sure if this is the best way to do things. But I want to include modules into a class based on the rails environment.

First I used the rails config store, Configurator.

module MyApp
  class Application &lt; Rails::Application
    # ...
    config.provider = :NullProvider
  end
end

The above config can be overridden in the environment config file. NullProvider, would just be a noop implementation.

Now, using this is a combo using send, accessing the config value, and getting the class from the symbol.

class MyClass
  MyClass.__send__(:include, Kernel.const_get(::MyApp::Application.config.provider))
end

Put that where ever you’re include would normally go.

Would love to know if this is something crazy or a good way to do things.

References


Send Commands From VIM to Tmux Pane

I’ve started using Tmux w/ VIM as my primary work flow. I installed the tslime.vim plugin, which will send highlighted commands to another pane, but I wanted to send ad-hoc commands, like :!. I added this function to the tslime.vim file, I’m sure there is a better way, but that’s what I did for now.

function! To_Tmux()
  let b:text = input("tmux:", "", "custom,")
  call Send_to_Tmux(b:text . "\\r")
endfunction

cmap tt :call To_Tmux()<CR>

This will allow me to type :tt and then any needed commmand, like rake. The tslime plugin will ask which pane number, then send the command to that pane from then on. The selected pane can be changed with <C-c>v.

my forked tslime


ITerm2 TMux and Vim Setup

Setup

  • Install MacVim
  • Install ITerm2
  • Install Tmux: brew install tmux
  • Install Tmux Patch for copy to OSX: tmux-MacOSX-pasteboard
  • Change shell to zsh chsh -s /bin/zsh && sudo chsh -s /bin/zsh username

Tmux Config

This sets some nice things for tmux. Of note, changing the default prefix to Ctrl+a and setting up the copy patch.

/.tmux.conf

set -g default-terminal "screen-256color"
setw -g mode-mouse on
set -g prefix C-a

setw -g mode-keys vi

bind h select-pane -L
bind j select-pane -D
bind k select-pane -U
bind l select-pane -R

bind-key -r C-h select-window -t :-
bind-key -r C-l select-window -t :+

# copy to osx
set-option -g default-command "reattach-to-user-namespace -l zsh"
bind ^y run-shell "reattach-to-user-namespace -l zsh -c 'tmux showb | pbcopy'"

# quick pane cycling
unbind ^a
bind ^a select-pane -t :.+

VIM

Had to add

syn on

to my .vimrc to get syntax highlighting in console mode.

Resources


Twitter and Facebook Popup Windows

I often find my self needing to add Twitter and Facebook share buttons. Usually not using the default widget icons. To do that you need some functions to call window.open and some jQuery to tie the links to those functions. These functions also encode the passed parameters.

var tweetWindow = function(url, text) {
  window.open( "http://twitter.com/share?url=" + 
    encodeURIComponent(url) + "&text=" + 
    encodeURIComponent(text) + "&count=none/", 
    "tweet", "height=300,width=550,resizable=1" ) 
}

var faceWindow = function(url, title) {
  window.open( "http://www.facebook.com/sharer.php?u=" + 
    encodeURIComponent(url) + "&t=" + 
    encodeURIComponent(title), 
    "facebook", "height=300,width=550,resizable=1" ) 
}

jQuery to enhance the link. This may or may not be used with the above.

$(".twitter").click(function(e) {
    e.preventDefault();
    var href = $(e.target).attr('href');
    window.open(href, "tweet", "height=300,width=550,resizable=1") 
});

The link; using target _blank to work without JavaScript

<a href="https://twitter.com/share?text=SOME%20TEXT&via=candland" 
    class="twitter" 
    target="_blank">Twitter Link</a>

Some Rails Tips and Tricks

After doing a lot of refactoring today I wanted to note a few useful things I’ve picked up.

load the rails environment for rake

If you need to run a rake task that need access to the Rails models, etc. add the :environment dependency

task :my_task => :environment do 
end

Testing Delayed Job

Added the following to tell Delayed Job to run without delays.

Delayed::Worker.delay_jobs = false

Or the following to run a last Job and check for Success and Fail, in the returned array.

Delayed::Worker.new.work_off.should == [1, 0]

Check the Job Count with one of the following

Delayed::Job.count.should == 1

# OR

lambda do
  # code the should schedule a delayed job
end.to change(Delayed::Job.count).by(1)

ActiveRecord add_column and update

In a migration, sometimes you want to add a new column, then update it do a new value.

def self.up
  add_column :users, :plan_id, :integer
  User.update_all("plan_id = 1")
end

FactoryGirl Callbacks

When setting up complex relationships with FactoryGirl, callbacks can help.

FactoryGirl Callbacks

Factory.define :user do |user|
  user.name "Dusty Candland"
  user.email "testing@testing.com"
  user.phone "303-333-3333"
  user.password "foobar"
  user.password_confirmation "foobar"
end

Factory.define :user_with_subscription, :parent => :user do |user|
  user.after_create{|u| Factory(:user_subscription, :user => u)}
end

Factory.define :user_subscription do |sub|
  sub.first_name 'beta'
  sub.last_name 'test'
  sub.number '4111111111111111'
  sub.expiration_year 2012
  sub.expiration_month 1
  sub.zip_code '90210'
  sub.price 2.50
  sub.group_limit 1
  sub.message_limit 100
  sub.association :plan
  sub.user_id 1
end

Brutalist Framework