Short MongoDB Fields with Mongoid

Need to have shorter field name in MongoDB, but still use readable names in code?

# Mongoid Timestamps
include Mongoid::Timestamps::Short

# fields
field :aid, as: :application_id, type:String

# one to one
embeds_one :address, :store_as => :ad

# collections
field :aid, as: :application_id
belongs_to :application, foreign_key: :aid

On Writing – Steven King

Part biography and part writing advice. Both were interesting.

The advise that jumped out is, Do The Work. For writing that’s writing and reading. Which translates well for most things into doing and learning/observing.

On Writing

Rails Blog Engines

A list of mountable blog engines for Rails.

Blogit

Blogit Engine

Blogit is a flexible blogging solution for Rails apps. It:

  • Is Rack based;
  • Is a complete MVC solution based on Rails engines;
  • Aims to work right out of the box but remain fully customisable.

JABE

JABE Blog Engine

JABE is a bare bones blogging engine that is installed as a gem. It will grow as its needs do.

This version is for Rails 3.1+

Squeaky

Squeaky Blog Engine

Squeaky is a simple mountable Rails engine for making a squeaky clean blog.

Hitchens

Hitchens Blog Engine

  • Mountable blog engine for Rails 3.1+
  • Design/style agnostic – just a blog backend
  • Inspired by radar/forem
  • MIT-LICENSE.

Kublog

Kublog Blog Engine

Kublog is a simple yet complete way to have a Product Blog that integrates with your apps user base. It includes social sharing, atom feeds and moderated comments.

Built for Rails 3.1, Kublog is a complete stack, fully configurable solution.

  • Publish posts with the most basic and simple wysiwyg
  • Attach multiple images to your content
  • Share your posts on your Product’s Twitter Page and Facebook Fan Page
  • E-mail personalized versions of your posts to all your users
  • Optional background processing with Delayed Job
  • Moderated comments from apps users, apps admins, and visitors
  • Atom feed for main blog and individual post categories

Rails Routes used in an Isolated Engine

The Problem

I have a rails application an want to add a blog engine to it. In this case the blogit engine. Things are working well until the layout is rendered. It uses the parent applications layout, which is what I want, but because it’s an isolated engine, it doesn’t have access to any of the parent applications helpers, including url helpers.

My Solution

If there’s a better way, please post in the comments.

In the engines config block, I open it’s application helper and add a method_missing definition. Then I check the main_app helper, which is added when the engine is mounted, for a matching helper method, if found use it.

/config/initializers/blogit.rb

...
  module Blogit
    module ApplicationHelper
      def method_missing method, *args, &block
        puts "LOOKING FOR ROUTES #{method}"
        if method.to_s.end_with?('_path') or method.to_s.end_with?('_url')
          if main_app.respond_to?(method)
            main_app.send(method, *args)
          else
            super
          end
        else
          super
        end
      end

      def respond_to?(method)
        if method.to_s.end_with?('_path') or method.to_s.end_with?('_url')
          if main_app.respond_to?(method)
            true
          else
            super
          end
        else
          super
        end
      end
    end
  end
...

As a side note, the engine layout can be specified in /app/views/layouts/blogit/application.html.haml.

Gotchas

root_path and root_url are defined for the engine, so those still need to be handled differently. This issue could happend for any routes that overlap.

(main_app||self).root_path

And the other way if you want to link to the engines root_path. blogit is the engine_name.

(blogit||self).root_path

Basic Rails Sitemap Setup

A basic sitemap generator for building a sitemap on the fly. I’d like a better way, but since it’s hosted on heroku and the map isn’t too big yet, this should work for now.

Add a route to handle building the XML

# /app/controllers/static_controller.rb
def sitemap
  xml = ::SitemapGenerator.create!('http://fixmycarin.com') do |map|
    map.add '', priority:0.9 , changefreq: :daily
    map.add '/faq', priority:0.7 , changefreq: :weekly
    map.add '/places', priority:0.7, changefreq: :weekly 
    map.add '/contact', priority:0.7, changefreq: :weekly 

    places = Place.all
    places.each{|u| map.add(place_path(u), lastmod:u.updated_at, changefreq: :weekly)}
  end
  render :xml => xml
end

Add a route to point to the above action

#/config/routes.rb
get 'sitemap.xml' => "static#sitemap"

Add a link in the robots.txt file

Sitemap: http://fixmycarin.com/sitemap.xml

Add the class that creates the XML. I put it in app/lib or app/services. This was adapted from this gist

# from: https://gist.github.com/288069
require 'builder'

class SitemapGenerator
  def initialize url
    @url = url
    yield(self) if block_given?
  end

  def add url, options = {}
    (@pages_to_visit ||= []) << options.merge(url:url)
    
  end

  def generate_sitemap
    xml_str = ""
    xml = Builder::XmlMarkup.new(:target => xml_str, :indent => 2)

    xml.instruct!
    xml.urlset(:xmlns=>'http://www.sitemaps.org/schemas/sitemap/0.9') {
      @pages_to_visit.each do |hash|
        unless @url == hash[:url]
          xml.url {
            xml.loc(@url + hash[:url])
            xml.lastmod(hash[:lastmod].utc.strftime("%Y-%m-%dT%H:%M:%S+00:00")) if hash.include? :lastmod
            xml.priority(hash[:priority]) if hash.include? :priority
            xml.changefreq(hash[:changefreq].to_s) if hash.include? :changefre
           }
        end
      end
    }

    return xml_str
  end

  # Notify popular search engines of the updated sitemap.xml
  def update_search_engines
    sitemap_uri = @url + 'sitemap.xml'
    escaped_sitemap_uri = CGI.escape(sitemap_uri)
    Rails.logger.info "Notifying Google"
    res = Net::HTTP.get_response('www.google.com', '/webmasters/tools/ping?sitemap=' + escaped_sitemap_uri)
    Rails.logger.info res.class
    Rails.logger.info "Notifying Yahoo"
    res = Net::HTTP.get_response('search.yahooapis.com', '/SiteExplorerService/V1/updateNotification?appid=SitemapWriter&url=' + escaped_sitemap_uri)
    Rails.logger.info res.class
    Rails.logger.info "Notifying Bing"
    res = Net::HTTP.get_response('www.bing.com', '/webmaster/ping.aspx?siteMap=' + escaped_sitemap_uri)
    Rails.logger.info res.class
    Rails.logger.info "Notifying Ask"
    res = Net::HTTP.get_response('submissions.ask.com', '/ping?sitemap=' + escaped_sitemap_uri)
    Rails.logger.info res.class
  end
end

Open Up Localhost

In many cases it might be easier/better to just use LocalTunnel.

In this case, I didn’t want the external URL changing all the time, and I already have a Linode server setup that I can use. Here are some notes on setting it up.

Assumes a Linux server setup with SSH and Apache.

Setup the proxy

Enable the proxy modules on the server.

a2enmod proxy
a2enmod proxy_http

Edit mods-available/proxy.conf to enable the reverse proxy for requests.

ProxyRequests Off

<Proxy *>
Order deny,allow
Allow from all
</Proxy>

Add a new virtual site to be the reverse proxy. Place a file like the following in the sites-available Apache directory. The localhost port doesn’t matter too much, just needs to not be in use.

<VirtualHost *:80>
     ServerAdmin dustt
     ServerName virtual.red27.net
     SetEnv proxy-initial-not-pooled 1
     ProxyPass / http://localhost:8001
     ProxyPassReverse / http://localhost:8001
     ProxyPreserveHost On
</VirtualHost>

Enamble the virtual site and restart Apache.

a2ensite virtual.red27.net

On the client

You will need to SSH into the server and reverse forward the packets back to the local server.

ssh -nNT -R 8001:localhost:3000 user@virtual.red27.net

This tunnel will need to be reset if the local server errors out. Removing the n argument may help notify you if something goes wrong.

MongoDB 2012 Notes

File and Data Structures

  • Key names are stored in the BSON doc, so make sure key names are short.
  • Data files are pre allocated, doubling in size each time.
  • Files are accessed using memory mapped files at the OS level.
  • fsync’d every 60 seconds.
  • Should use 64bit systems to support the memory mapped files.

Journalling in 1.8 default in 2.0+

  • Write-ahead log
  • Ops written to journal before memory mapped regions
  • Journal flushed every 100ms or 100 MB written
  • db.getLastError({j:true}) to force a journal flush
  • /journal sub directory
  • 1 GB files, rotated ( only really need stuff that has not been fsync’d )
  • some slowdown on high throughput systems, can be symlink’d to other drives.
  • on by default in 64 systems.

When to use

  • If Single node
  • Replica Set – at least 1 node
  • For large data sets.

Fragmentation

  • files get fragmented over time if docs sizes change or deletes
  • collections that have a lot of resizes get padding factor to help reduce fragmentation.
  • need to improve free list
    • 2.0 reduced scanning to reasonable amount
    • 2.2 will change

Compaction

  • 2.0+ compact command

    • only needs 2 GB extra space
    • off line operation (another good reason with replica sets.)
  • safemode: waits for a round trip from using getLastError, with this call you can specify how safe you want the data to be.

  • drop collection doesn’t free the data file, dropDatabase does. Sometimes it makes sense to create and drop databases.

Index and Query Evaluation

@mschireson

  • indexes are lists of values associated with documents
  • stored in a btree
  • required for geo queries and unique constraints
  • assending/descending really only matter on compound indexes.
  • null == null for unique indexes, you can drop duplicates on create.
  • create index is blocking, unless {background: true}, still should try to do off peak
  • when dropping an index, you need to use the same document when created.
  • the $where operator doesn’t use the indexes
  • Regexp’s starting with /^ will use an index
  • Indexes are used for updates and deletes
  • Compound indexes can be used to query for the first field and sort on the second
  • Only uses one index at a time
  • Limited index uses:
    • $ne uses the index, but doesn’t help performance much
    • $not
    • $where
    • $mod index only limits to numbers
    • Range queries only help some.

GEO

  • created using “2d”
  • $near

    • sorted nearest to farthest
  • $within

  • $within{$polygon}}
  • can be in compound queries

Sparse Indexes

  • only store values w/ the indexed field, results won’t have documents w/ null in that field.
  • can be sparse & unique

Covering Indexes

  • contains all fields in the query and the results, no db lookup

Limits and Trade offs

  • max of 64
  • can slow down inserts and updates
  • compound index can be more valuable and handle multiple queries
  • You can force an index or full scan
  • use sort if you really want sorted data
  • db.c.find(…).explain() => see whats going on.
  • db.setProfilingLevel() – record slow queries.
  • Indexes work best when they fit in RAM

Replica Set

  • One is always the primary others are secondary.
  • Chosen by election
  • Automatic fail-over and recover
  • Reads can be from primary or secondary
  • Writes will always go to primary
  • Replica Sets are 2+ nodes, at least 3 is better
  • When a failed nodes come back, they recover by getting the missed updates, then join as a secondary node
  • Setup
    mongod —replSet
    cfg = { id:, members: [{id:0, host:‘’}] }
    use admin
    rs.initiate(cfg)

  • rs objects has replica set commands, needs to be issued on the current primary

  • rs.status()
  • Strong Consistency is only available when reading from primary
  • Reads on the secondary machines will be eventually consistent
  • Durability Options (set by driver)

    • fire and forget

      • won’t know about failures due to unique constraints, disk full, or anything else.
    • wait for error recommended

    • wait for journal sync
    • wait for fsync (slow)
    • wait for replication (really slow)
  • Can give nodes priorities, which will help ensure a specific machine is primary.

    • 0 priorities will never be primary
    • when a higher priority machine is back online, it will force an election.
  • Can have a slave delay

  • Tag replica sets with properties and can specify when waiting.
  • Arbiters Member

    • Don’t have data
    • vote in elections
    • used to break a tie
  • Hidden Member

    • not seen by the clients
  • Data is stored for replication in an oplog capped collection, all secondaries have an oplog too.

Scaling

  • Vertical Scaling is limited
  • Horizontal scaling is cheaper, can scale wider then higher
  • Vertical can be a single point a failure, can be hard to backup/maintain.

  • Replica Sets are one type

    • Can scale reads, but now writes, eventual consistency is the biggest downside.
    • Replication can overwhelm the secondaries, reducing performance anyway
  • Why Shard?

    • Distribute the write load
    • Keep working set in RAM, by using multiple machines act like one big virtual machine
    • Consistent reads
    • Preserve functionality, by range based portioning most(all?) the query operators are available.
  • Sharding design goals

    • scale linearly
    • increase capacity with no downtime
    • transparent to the application / clients
    • low administration to add capacity
    • no joins or transactions
    • BigTable / PNUTS inspired read the PNUTS paper
  • Basics

    • Choose how you partition data
    • Convert from a single replica set to sharding with no downtime
    • Full feature set
    • Fully consistent by default
    • You pick a shard key, which is used to move ranges of data to a shard

Architecture

  • Shard – each shard is it’s own replica set for automated fail over
  • Config Servers – store the meta data about where the partitions of data is, which shard

    • Not a replica set, writes to the config server is done with a transaction by the mongod / mognos
  • Mongos – uses the config servers to know what shard to use for the data/query

    • Client talks to the mongos servers
    • chunk = collection minkey, maxkey, shard
      • chunks are logical, not physical
      • chunk is 64MB, once you hit this point a new split happens, a new shard is created and data is moved.

shard keys

  • they are immutable.
  • Choose a key
    • _id? is incremental this results in all writes going to one query
    • hash? is random, this partitions well, but now great for queries
    • user_id? kinda random, useful for lookups, all data for user_id X will be on one shard

      • However you can’t split on this if one user is a really heavy user.
    • user_id + md5(x)? this is the best option.

Other notes

  • Want to add capacity way before it’s needed, at least before 70% operation capacity, this allows the data to migrate over time
  • Understand working set in RAM
  • Machine too small and admin overhead goes up
  • Machine too big and sharding doesn’t happen smoothly

MMS – MongoDB Monitoring Service

MMS

Include a Module Based on Rails Environment

Not sure if this is the best way to do things. But I want to include modules into a class based on the rails environment.

First I used the rails config store, Configurator.

module MyApp
  class Application < Rails::Application
    
    # ...

    config.provider = :NullProvider
  end
end

The above config can be overridden in the environment config file. NullProvider, would just be a noop implementation.

Now, using this is a combo using send, accessing the config value, and getting the class from the symbol.

class MyClass
  MyClass.__send__(:include, Kernel.const_get(::MyApp::Application.config.provider))
end

Put that where ever you’re include would normally go.

Would love to know if this is something crazy or a good way to do things.

References