MongoDB 2012 Notes

Dusty Candland | Wed Feb 01 2012 | mongo, mongodb, nosql

File and Data Structures

files get fragmented over time if docs sizes change or deletes
collections that have a lot of resizes get padding factor to help reduce fragmentation.
need to improve free list
- 2.0 reduced scanning to reasonable amount
- 2.2 will change

2.0+ compact command
- only needs 2 GB extra space
- off line operation (another good reason with replica sets.)
safemode: waits for a round trip from using getLastError, with this call you can specify how safe you want the data to be.
drop collection doesn’t free the data file, dropDatabase does. Sometimes it makes sense to create and drop databases.

@mschireson

indexes are lists of values associated with documents
stored in a btree
required for geo queries and unique constraints
assending/descending really only matter on compound indexes.
null == null for unique indexes, you can drop duplicates on create.
create index is blocking, unless {background: true}, still should try to do off peak
when dropping an index, you need to use the same document when created.
the $where operator doesn’t use the indexes
Regexp’s starting with /^ will use an index
Indexes are used for updates and deletes
Compound indexes can be used to query for the first field and sort on the second
Only uses one index at a time
Limited index uses:
- $ne uses the index, but doesn’t help performance much
- $not
- $where
- $mod index only limits to numbers
- Range queries only help some.

only store values w/ the indexed field, results won’t have documents w/ null in that field.
can be sparse & unique

One is always the primary others are secondary.
Chosen by election
Automatic fail-over and recover
Reads can be from primary or secondary
Writes will always go to primary
Replica Sets are 2+ nodes, at least 3 is better
When a failed nodes come back, they recover by getting the missed updates, then join as a secondary node
Setup

mongod —replSet

cfg = { _id:, members: [{_id:0, host:‘’}] }

use admin

rs.initiate(cfg)
rs objects has replica set commands, needs to be issued on the current primary
rs.status()
Strong Consistency is only available when reading from primary
Reads on the secondary machines will be eventually consistent
Durability Options (set by driver)
- fire and forget
  - won’t know about failures due to unique constraints, disk full, or anything else.
- wait for error recommended
- wait for journal sync
- wait for fsync (slow)
- wait for replication (really slow)
Can give nodes priorities, which will help ensure a specific machine is primary.
- 0 priorities will never be primary
- when a higher priority machine is back online, it will force an election.
Can have a slave delay
Tag replica sets with properties and can specify when waiting.
Arbiters Member
- Don’t have data
- vote in elections
- used to break a tie
Hidden Member
- not seen by the clients
Data is stored for replication in an oplog capped collection, all secondaries have an oplog too.

Vertical Scaling is limited
Horizontal scaling is cheaper, can scale wider then higher
Vertical can be a single point a failure, can be hard to backup/maintain.
Replica Sets are one type
- Can scale reads, but now writes, eventual consistency is the biggest downside.
- Replication can overwhelm the secondaries, reducing performance anyway
Why Shard?
- Distribute the write load
- Keep working set in RAM, by using multiple machines act like one big virtual machine
- Consistent reads
- Preserve functionality, by range based portioning most(all?) the query operators are available.
Sharding design goals
- scale linearly
- increase capacity with no downtime
- transparent to the application / clients
- low administration to add capacity
- no joins or transactions
- BigTable / PNUTS inspired read the PNUTS paper
Basics
- Choose how you partition data
- Convert from a single replica set to sharding with no downtime
- Full feature set
- Fully consistent by default
- You pick a shard key, which is used to move ranges of data to a shard

Shard – each shard is it’s own replica set for automated fail over
Config Servers – store the meta data about where the partitions of data is, which shard
- Not a replica set, writes to the config server is done with a transaction by the mongod / mognos
Mongos – uses the config servers to know what shard to use for the data/query
- Client talks to the mongos servers
- chunk = collection minkey, maxkey, shard
  - chunks are logical, not physical
  - chunk is 64MB, once you hit this point a new split happens, a new shard is created and data is moved.

Want to add capacity way before it’s needed, at least before 70% operation capacity, this allows the data to migrate over time
Understand working set in RAM
Machine too small and admin overhead goes up
Machine too big and sharding doesn’t happen smoothly

These are webmentions via the IndieWeb and webmention.io. Mention this post from your site: