• Node.JS Modules

  • MongoDB 2012 Notes

    File and Data Structures

    • Key names are stored in the BSON doc, so make sure key names are short.
    • Data files are pre allocated, doubling in size each time.
    • Files are accessed using memory mapped files at the OS level.
    • fsync’d every 60 seconds.
    • Should use 64bit systems to support the memory mapped files.

    Journalling in 1.8 default in 2.0+

    • Write-ahead log
    • Ops written to journal before memory mapped regions
    • Journal flushed every 100ms or 100 MB written
    • db.getLastError({j:true}) to force a journal flush
    • /journal sub directory
    • 1 GB files, rotated ( only really need stuff that has not been fsync’d )
    • some slowdown on high throughput systems, can be symlink’d to other drives.
    • on by default in 64 systems.

    When to use

    • If Single node
    • Replica Set – at least 1 node
    • For large data sets.


    • files get fragmented over time if docs sizes change or deletes
    • collections that have a lot of resizes get padding factor to help reduce fragmentation.
    • need to improve free list
      • 2.0 reduced scanning to reasonable amount
      • 2.2 will change


    • 2.0+ compact command

      • only needs 2 GB extra space
      • off line operation (another good reason with replica sets.)
    • safemode: waits for a round trip from using getLastError, with this call you can specify how safe you want the data to be.

    • drop collection doesn’t free the data file, dropDatabase does. Sometimes it makes sense to create and drop databases.

    Index and Query Evaluation


    • indexes are lists of values associated with documents
    • stored in a btree
    • required for geo queries and unique constraints
    • assending/descending really only matter on compound indexes.
    • null == null for unique indexes, you can drop duplicates on create.
    • create index is blocking, unless {background: true}, still should try to do off peak
    • when dropping an index, you need to use the same document when created.
    • the $where operator doesn’t use the indexes
    • Regexp’s starting with /^ will use an index
    • Indexes are used for updates and deletes
    • Compound indexes can be used to query for the first field and sort on the second
    • Only uses one index at a time
    • Limited index uses:
      • $ne uses the index, but doesn’t help performance much
      • $not
      • $where
      • $mod index only limits to numbers
      • Range queries only help some.


    • created using “2d”
    • $near

      • sorted nearest to farthest
    • $within

    • $within{$polygon}}
    • can be in compound queries

    Sparse Indexes

    • only store values w/ the indexed field, results won’t have documents w/ null in that field.
    • can be sparse & unique

    Covering Indexes

    • contains all fields in the query and the results, no db lookup

    Limits and Trade offs

    • max of 64
    • can slow down inserts and updates
    • compound index can be more valuable and handle multiple queries
    • You can force an index or full scan
    • use sort if you really want sorted data
    • db.c.find(…).explain() => see whats going on.
    • db.setProfilingLevel() – record slow queries.
    • Indexes work best when they fit in RAM

    Replica Set

    • One is always the primary others are secondary.
    • Chosen by election
    • Automatic fail-over and recover
    • Reads can be from primary or secondary
    • Writes will always go to primary
    • Replica Sets are 2+ nodes, at least 3 is better
    • When a failed nodes come back, they recover by getting the missed updates, then join as a secondary node
    • Setup

      mongod —replSet

      cfg = { _id:, members: [{_id:0, host:‘’}] }

      use admin


    • rs objects has replica set commands, needs to be issued on the current primary

    • rs.status()
    • Strong Consistency is only available when reading from primary
    • Reads on the secondary machines will be eventually consistent
    • Durability Options (set by driver)

      • fire and forget

        • won’t know about failures due to unique constraints, disk full, or anything else.
      • wait for error recommended

      • wait for journal sync
      • wait for fsync (slow)
      • wait for replication (really slow)
    • Can give nodes priorities, which will help ensure a specific machine is primary.

      • 0 priorities will never be primary
      • when a higher priority machine is back online, it will force an election.
    • Can have a slave delay

    • Tag replica sets with properties and can specify when waiting.
    • Arbiters Member

      • Don’t have data
      • vote in elections
      • used to break a tie
    • Hidden Member

      • not seen by the clients
    • Data is stored for replication in an oplog capped collection, all secondaries have an oplog too.


    • Vertical Scaling is limited
    • Horizontal scaling is cheaper, can scale wider then higher
    • Vertical can be a single point a failure, can be hard to backup/maintain.

    • Replica Sets are one type

      • Can scale reads, but now writes, eventual consistency is the biggest downside.
      • Replication can overwhelm the secondaries, reducing performance anyway
    • Why Shard?

      • Distribute the write load
      • Keep working set in RAM, by using multiple machines act like one big virtual machine
      • Consistent reads
      • Preserve functionality, by range based portioning most(all?) the query operators are available.
    • Sharding design goals

      • scale linearly
      • increase capacity with no downtime
      • transparent to the application / clients
      • low administration to add capacity
      • no joins or transactions
      • BigTable / PNUTS inspired read the PNUTS paper
    • Basics

      • Choose how you partition data
      • Convert from a single replica set to sharding with no downtime
      • Full feature set
      • Fully consistent by default
      • You pick a shard key, which is used to move ranges of data to a shard


    • Shard – each shard is it’s own replica set for automated fail over
    • Config Servers – store the meta data about where the partitions of data is, which shard

      • Not a replica set, writes to the config server is done with a transaction by the mongod / mognos
    • Mongos – uses the config servers to know what shard to use for the data/query

      • Client talks to the mongos servers
      • chunk = collection minkey, maxkey, shard
        • chunks are logical, not physical
        • chunk is 64MB, once you hit this point a new split happens, a new shard is created and data is moved.

    shard keys

    • they are immutable.
    • Choose a key
      • _id? is incremental this results in all writes going to one query
      • hash? is random, this partitions well, but now great for queries
      • user_id? kinda random, useful for lookups, all data for user_id X will be on one shard

        • However you can’t split on this if one user is a really heavy user.
      • user_id + md5(x)? this is the best option.

    Other notes

    • Want to add capacity way before it’s needed, at least before 70% operation capacity, this allows the data to migrate over time
    • Understand working set in RAM
    • Machine too small and admin overhead goes up
    • Machine too big and sharding doesn’t happen smoothly

    MMS – MongoDB Monitoring Service


  • Include a Module Based on Rails Environment

    Not sure if this is the best way to do things. But I want to include modules into a class based on the rails environment.

    First I used the rails config store, Configurator.

    module MyApp
      class Application < Rails::Application
        # ...
        config.provider = :NullProvider

    The above config can be overridden in the environment config file. NullProvider, would just be a noop implementation.

    Now, using this is a combo using send, accessing the config value, and getting the class from the symbol.

    class MyClass
      MyClass.__send__(:include, Kernel.const_get(::MyApp::Application.config.provider))

    Put that where ever you’re include would normally go.

    Would love to know if this is something crazy or a good way to do things.


  • Send Commands From VIM to Tmux Pane

    I’ve started using Tmux w/ VIM as my primary work flow. I installed the tslime.vim plugin, which will send highlighted commands to another pane, but I wanted to send ad-hoc commands, like :!. I added this function to the tslime.vim file, I’m sure there is a better way, but that’s what I did for now.

    function! To_Tmux()
      let b:text = input("tmux:", "", "custom,")
      call Send_to_Tmux(b:text . "\\r")
    cmap tt :call To_Tmux()<CR>

    This will allow me to type :tt and then any needed commmand, like rake. The tslime plugin will ask which pane number, then send the command to that pane from then on. The selected pane can be changed with <C-c>v.

    my forked tslime

  • ITerm2 TMux and Vim Setup


    • Install MacVim
    • Install ITerm2
    • Install Tmux: brew install tmux
    • Install Tmux Patch for copy to OSX: tmux-MacOSX-pasteboard
    • Change shell to zsh chsh -s /bin/zsh && sudo chsh -s /bin/zsh username

    Tmux Config

    This sets some nice things for tmux. Of note, changing the default prefix to Ctrl+a and setting up the copy patch.


    set -g default-terminal "screen-256color"
    setw -g mode-mouse on
    set -g prefix C-a
    setw -g mode-keys vi
    bind h select-pane -L
    bind j select-pane -D
    bind k select-pane -U
    bind l select-pane -R
    bind-key -r C-h select-window -t :-
    bind-key -r C-l select-window -t :+
    # copy to osx
    set-option -g default-command "reattach-to-user-namespace -l zsh"
    bind ^y run-shell "reattach-to-user-namespace -l zsh -c 'tmux showb | pbcopy'"
    # quick pane cycling
    unbind ^a
    bind ^a select-pane -t :.+


    Had to add

    syn on

    to my .vimrc to get syntax highlighting in console mode.


  • Twitter and Facebook Popup Windows

    I often find my self needing to add Twitter and Facebook share buttons. Usually not using the default widget icons. To do that you need some functions to call window.open and some jQuery to tie the links to those functions. These functions also encode the passed parameters.

    var tweetWindow = function(url, text) {
      window.open( "http://twitter.com/share?url=" + 
        encodeURIComponent(url) + "&text=" + 
        encodeURIComponent(text) + "&count=none/", 
        "tweet", "height=300,width=550,resizable=1" ) 
    var faceWindow = function(url, title) {
      window.open( "http://www.facebook.com/sharer.php?u=" + 
        encodeURIComponent(url) + "&t=" + 
        "facebook", "height=300,width=550,resizable=1" ) 

    jQuery to enhance the link. This may or may not be used with the above.

    $(".twitter").click(function(e) {
        var href = $(e.target).attr('href');
        window.open(href, "tweet", "height=300,width=550,resizable=1") 

    The link; using target _blank to work without JavaScript

    <a href="https://twitter.com/share?text=SOME%20TEXT&via=candland" 
        target="_blank">Twitter Link</a>
  • Some Rails Tips and Tricks

    After doing a lot of refactoring today I wanted to note a few useful things I’ve picked up.

    load the rails environment for rake

    If you need to run a rake task that need access to the Rails models, etc. add the :environment dependency

    task :my_task => :environment do 

    Testing Delayed Job

    Added the following to tell Delayed Job to run without delays.

    Delayed::Worker.delay_jobs = false

    Or the following to run a last Job and check for Success and Fail, in the returned array.

    Delayed::Worker.new.work_off.should == [1, 0]

    Check the Job Count with one of the following

    Delayed::Job.count.should == 1
    # OR
    lambda do
      # code the should schedule a delayed job
    end.to change(Delayed::Job.count).by(1)

    ActiveRecord add_column and update

    In a migration, sometimes you want to add a new column, then update it do a new value.

    def self.up
      add_column :users, :plan_id, :integer
      User.update_all("plan_id = 1")

    FactoryGirl Callbacks

    When setting up complex relationships with FactoryGirl, callbacks can help.

    FactoryGirl Callbacks

    Factory.define :user do |user|
      user.name "Dusty Candland"
      user.email "testing@testing.com"
      user.phone "303-333-3333"
      user.password "foobar"
      user.password_confirmation "foobar"
    Factory.define :user_with_subscription, :parent => :user do |user|
      user.after_create{|u| Factory(:user_subscription, :user => u)}
    Factory.define :user_subscription do |sub|
      sub.first_name 'beta'
      sub.last_name 'test'
      sub.number '4111111111111111'
      sub.expiration_year 2012
      sub.expiration_month 1
      sub.zip_code '90210'
      sub.price 2.50
      sub.group_limit 1
      sub.message_limit 100
      sub.association :plan
      sub.user_id 1