SlideShare a Scribd company logo
1 of 39
Download to read offline
Pierre-Louis Gottfrois
Bastien Murzeau
Apéro Ruby Bordeaux, 8 novembre 2011
• Brève introduction


• Cas pratique


• Map / Reduce
Qu’est ce que mongoDB ?


 mongoDB est une base de donnée
        de type NoSQL,
          sans schéma
       document-oriented
sans-schéma

• Très utile en développements
  ‘agiles’ (itérations, rapidité de modifications,
  flexibilité pour les développeurs)

• Supporte des fonctionnalités qui seraient, en
  BDDs relationnelles :
 • quasi-impossible (stockage d’éléments non finis, ex. tags)

 • trop complexes pour ce qu’elles sont (migrations)
document-oriented
• mongoDB stocke des documents, pas de
  rows

 • les documents sont stockés sous forme de
   JSON; binary JSON

• la syntaxe de requêtage est aussi fournie que
  SQL

• le mécanisme de documents ‘embedded’
  résout bon nombre de problèmes rencontrés
document-oriented

• Les documents sont stockés dans une
 collection, en RoR = model


• une partie des ces données sont indexées
 pour optimiser les performances


• un document n’est pas une poubelle !
stockage de données
        volumineuses
• mongoDB (et autres NoSQL) sont plus
 performantes pour la scalabilité horizontale
 • ajout de serveurs pour augmenter la capacité
   de stockage («sharding»)
 • garantissant ainsi une meilleur disponibilité
 • load-balancing optimisé entre les nodes
 • augmentation transparente pour l’application
Cas pratique
• ORM devient ODM, la gem de référence mongoid
  • ou : mongoMapper, DataMapper
• Création d’une application a base de NoSQL MongoDB
  • rails new nosql
  • edition du Gemfile
    •   gem ‘mongoid’

    •   gem ‘bson_ext’

  • bundle install
  • rails generate mongoid:config
Cas pratique
• edition du config/application.rb
  • #require 'rails/all'
  • require "action_controller/railtie"
  • require "action_mailer/railtie"
  • require "active_resource/railtie"
  • require "rails/test_unit/railtie"
Cas pratique
class Subject
  include Mongoid::Document
  include Mongoid::Timestamps

  has_many :scores,     :as => :scorable, :dependent => :delete, :autosave => true
  has_many :requests,   :dependent => :delete
  belongs_to :author,   :class_name => 'User'




    class Conversation
      include Mongoid::Document
      include Mongoid::Timestamps


      field :public,            :type => Boolean, :default => false

      has_many :scores,         :as => :scorable, :dependent => :delete
      has_and_belongs_to_many   :subjects
      belongs_to :timeline
      embeds_many :messages
Map Reduce
Example


                               A “ticket” collection




{                       {                       {                       {
    “id” : 1,               “id” : 2,               “id” : 3,               “id” : 4,
    “day” : 20111017,       “day” : 20111017,       “day” : 20111017,       “day” : 20111017,
    “checkout” : 100        “checkout” : 42         “checkout” : 215        “checkout” : 73
}                       }                       }                       }
Problematic

• We want to
 • Calculate the ‘checkout’ sum of each object in our
    ticket’s collection

 • Be able to distribute this operation over the network
 • Be fast!
• We don’t want to
 • Go over all objects again when an update is made
Map : emit(checkout)

    The ‘map’ function emit (select) every checkout value
               of each object in our collection


          100                      42                     215                      73



{                       {                       {                       {
    “id” : 1,               “id” : 2,               “id” : 3,               “id” : 4,
    “day” : 20111017,       “day” : 20111017,       “day” : 20111017,       “day” : 20111017,
    “checkout” : 100        “checkout” : 42         “checkout” : 215        “checkout” : 73
}                       }                       }                       }
Reduce : sum(checkout)
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 215          “checkout” : 73
}                         }                         }                        }
Reduce function

 The ‘reduce’ function apply the algorithmic logic
 for each key/value received from ‘map’ function

This function has to be ‘idempotent’ to be called
      recursively or in a distributed system

reduce(k, A, B) == reduce(k, B, A)
reduce(k, A, B) == reduce(k, reduce(A, B))
Inherently Distributed
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 215          “checkout” : 73
}                         }                         }                        }
Distributed
Since ‘map’ function emits objects to be reduced
and ‘reduce’ function processes for each emitted
   objects independently, it can be distributed
            through multiple workers.




         map                     reduce
Logaritmic Update

For the same reason, when updating an object, we
    don’t have to reprocess for each obejcts.

   We can call ‘map’ function only on updated
                     objects.
Logaritmic Update
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logaritmic Update
                                                  430




                        142                                                 288




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logaritmic Update
                                                  430




                        142                                                 283




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logarithmic Update
                                                  425




                        142                                                 283




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Let’s do some code!
$> mongo

>   db.tickets.save({   "_id":   1,   "day":   20111017,   "checkout":   100 })
>   db.tickets.save({   "_id":   2,   "day":   20111017,   "checkout":   42 })
>   db.tickets.save({   "_id":   3,   "day":   20111017,   "checkout":   215 })
>   db.tickets.save({   "_id":   4,   "day":   20111017,   "checkout":   73 })

> db.tickets.count()
4

> db.tickets.find()
{ "_id" : 1, "day" : 20111017, "checkout" : 100 }
...

> db.tickets.find({ "_id": 1 })
{ "_id" : 1, "day" : 20111017, "checkout" : 100 }
> var map = function() {
... emit(null, this.checkout)
}

> var reduce = function(key, values) {
... var sum = 0
... for (var index in values) sum += values[index]
... return sum
}
Temporary Collection
> sumOfCheckouts = db.tickets.mapReduce(map, reduce)
{
  "result" : "tmp.mr.mapreduce_123456789_4",
  "timeMills" : 8,
  "counts" : { "input" : 4, "emit" : 4, "output" : 1 },
  "ok" : 1
}

> db.getCollectionNames()
[
  "tickets",
  "tmp.mr.mapreduce_123456789_4"
]

> db[sumOfCheckouts.result].find()
{ "_id" : null, "value" : 430 }
Persistent Collection
> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })

> db.getCollectionNames()
[
  "sumOfCheckouts",
  "tickets",
  "tmp.mr.mapreduce_123456789_4"
]

> db.sumOfCheckouts.find()
{ "_id" : null, "value" : 430 }

> db.sumOfCheckouts.findOne().value
430
Reduce by Date
> var map = function() {
... emit(this.date, this.checkout)
}

> var reduce = function(key, values) {
... var sum = 0
... for (var index in values) sum += values[index]
... return sum
}
> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })

> db.sumOfCheckouts.find()
{ "_id" : 20111017, "value" : 430 }
What we can do
Scored Subjects per
        User
Subject   User   Score
   1       1       2
   1       1       2
   1       2       2
   2       1       2
   2       2      10
   2       2       5
Scored Subjects per
   User (reduced)
Subject   User   Score

  1        1      4

  1        2      2

  2        1      2

  2        2      15
$> mongo

>   db.scores.save({   "_id":   1,   "subject_id":   1,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   2,   "subject_id":   1,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   3,   "subject_id":   1,   "user_id":   2,   "score":   2 })
>   db.scores.save({   "_id":   4,   "subject_id":   2,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   5,   "subject_id":   2,   "user_id":   2,   "score":   10 })
>   db.scores.save({   "_id":   6,   "subject_id":   2,   "user_id":   2,   "score":   5 })

> db.scores.count()
6

> db.scores.find()
{ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
...

> db.scores.find({ "_id": 1 })
{ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
> var map = function() {
... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id,
... user_id:this.user_id, score:this.score});
}

> var reduce = function(key, values) {
... var result = {user_id:"", subject_id:"", score:0};
... values.forEach(function (value) {result.score += value.score;result.user_id =
... value.user_id;result.subject_id = value.subject_id;});
... return result
}
ReducedScores
                         Collection
> db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" })

> db.getCollectionNames()
[
  "reduced_scores",
  "scores"
]

>   db.reduced_scores.find()
{   "_id" : "1-1", "value" :   {   "user_id"   :   1,   "subject_id"   :   1,   "score"   :   4 } }
{   "_id" : "1-2", "value" :   {   "user_id"   :   1,   "subject_id"   :   2,   "score"   :   2 } }
{   "_id" : "2-1", "value" :   {   "user_id"   :   2,   "subject_id"   :   1,   "score"   :   2 } }
{   "_id" : "2-2", "value" :   {   "user_id"   :   2,   "subject_id"   :   2,   "score"   :   15 } }

> db.reduced_scores.findOne().score
4
Dealing with Rails Query

ruby-1.9.2-p180 :007 > ReducedScores.first
 => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId('...'),
"subject_id"=>BSON::ObjectId('...'), "score"=>4.0}>

ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count
 => 2

ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value['score']
 => 4.0

ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value['score']
 => 2.0
Questions ?

More Related Content

Viewers also liked

LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014bndmr
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB
 
sshGate - RMLL 2011
sshGate - RMLL 2011sshGate - RMLL 2011
sshGate - RMLL 2011Tauop
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB
 
Automatisez votre gestion de MongoDB avec MMS
Automatisez votre gestion de MongoDB avec MMSAutomatisez votre gestion de MongoDB avec MMS
Automatisez votre gestion de MongoDB avec MMSMongoDB
 
Le monitoring à l'heure de DevOps et Big Data
Le monitoring à l'heure de DevOps et Big DataLe monitoring à l'heure de DevOps et Big Data
Le monitoring à l'heure de DevOps et Big DataClaude Falguiere
 
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDBPlus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDBMongoDB
 
L\'authentification forte : Concept et Technologies
L\'authentification forte : Concept et TechnologiesL\'authentification forte : Concept et Technologies
L\'authentification forte : Concept et TechnologiesIbrahima FALL
 
Supervision de réseau informatique - Nagios
Supervision de réseau informatique - NagiosSupervision de réseau informatique - Nagios
Supervision de réseau informatique - NagiosAziz Rgd
 
ElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementMohamed hedi Abidi
 
Rapport de stage nagios
Rapport de stage nagiosRapport de stage nagios
Rapport de stage nagioshindif
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Installer et configurer NAGIOS sous linux
Installer et configurer NAGIOS sous linuxInstaller et configurer NAGIOS sous linux
Installer et configurer NAGIOS sous linuxZakariyaa AIT ELMOUDEN
 
Présentation de ElasticSearch / Digital apéro du 12/11/2014
Présentation de ElasticSearch / Digital apéro du 12/11/2014Présentation de ElasticSearch / Digital apéro du 12/11/2014
Présentation de ElasticSearch / Digital apéro du 12/11/2014Silicon Comté
 
Tirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearchTirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearchSéven Le Mesle
 

Viewers also liked (17)

LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
sshGate - RMLL 2011
sshGate - RMLL 2011sshGate - RMLL 2011
sshGate - RMLL 2011
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment Checklist
 
Automatisez votre gestion de MongoDB avec MMS
Automatisez votre gestion de MongoDB avec MMSAutomatisez votre gestion de MongoDB avec MMS
Automatisez votre gestion de MongoDB avec MMS
 
Le monitoring à l'heure de DevOps et Big Data
Le monitoring à l'heure de DevOps et Big DataLe monitoring à l'heure de DevOps et Big Data
Le monitoring à l'heure de DevOps et Big Data
 
Supervision
SupervisionSupervision
Supervision
 
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDBPlus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
 
L\'authentification forte : Concept et Technologies
L\'authentification forte : Concept et TechnologiesL\'authentification forte : Concept et Technologies
L\'authentification forte : Concept et Technologies
 
Supervision de réseau informatique - Nagios
Supervision de réseau informatique - NagiosSupervision de réseau informatique - Nagios
Supervision de réseau informatique - Nagios
 
ElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementElasticSearch : Architecture et Développement
ElasticSearch : Architecture et Développement
 
Rapport de stage nagios
Rapport de stage nagiosRapport de stage nagios
Rapport de stage nagios
 
PKI par la Pratique
PKI par la PratiquePKI par la Pratique
PKI par la Pratique
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Installer et configurer NAGIOS sous linux
Installer et configurer NAGIOS sous linuxInstaller et configurer NAGIOS sous linux
Installer et configurer NAGIOS sous linux
 
Présentation de ElasticSearch / Digital apéro du 12/11/2014
Présentation de ElasticSearch / Digital apéro du 12/11/2014Présentation de ElasticSearch / Digital apéro du 12/11/2014
Présentation de ElasticSearch / Digital apéro du 12/11/2014
 
Tirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearchTirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearch
 

Similar to Apéro RubyBdx - MongoDB - 8-11-2011

Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarMongoDB
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
You will learn RxJS in 2017
You will learn RxJS in 2017You will learn RxJS in 2017
You will learn RxJS in 2017名辰 洪
 
What's new in GeoServer 2.2
What's new in GeoServer 2.2What's new in GeoServer 2.2
What's new in GeoServer 2.2GeoSolutions
 
The Art Of Readable Code
The Art Of Readable CodeThe Art Of Readable Code
The Art Of Readable CodeBaidu, Inc.
 
IT Days - Parse huge JSON files in a streaming way.pptx
IT Days - Parse huge JSON files in a streaming way.pptxIT Days - Parse huge JSON files in a streaming way.pptx
IT Days - Parse huge JSON files in a streaming way.pptxAndrei Negruti
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applicationsKexin Xie
 
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...Databricks
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJSKyung Yeol Kim
 
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Node
How to Hack a Road Trip  with a Webcam, a GSP and Some Fun with NodeHow to Hack a Road Trip  with a Webcam, a GSP and Some Fun with Node
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Nodepdeschen
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
D3.js - A picture is worth a thousand words
D3.js - A picture is worth a thousand wordsD3.js - A picture is worth a thousand words
D3.js - A picture is worth a thousand wordsApptension
 
Browsers with Wings
Browsers with WingsBrowsers with Wings
Browsers with WingsRemy Sharp
 
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSONFun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSONTomomi Imura
 

Similar to Apéro RubyBdx - MongoDB - 8-11-2011 (20)

Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
 
Search@airbnb
Search@airbnbSearch@airbnb
Search@airbnb
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Advancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGISAdvancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGIS
 
You will learn RxJS in 2017
You will learn RxJS in 2017You will learn RxJS in 2017
You will learn RxJS in 2017
 
What's new in GeoServer 2.2
What's new in GeoServer 2.2What's new in GeoServer 2.2
What's new in GeoServer 2.2
 
The Art Of Readable Code
The Art Of Readable CodeThe Art Of Readable Code
The Art Of Readable Code
 
IT Days - Parse huge JSON files in a streaming way.pptx
IT Days - Parse huge JSON files in a streaming way.pptxIT Days - Parse huge JSON files in a streaming way.pptx
IT Days - Parse huge JSON files in a streaming way.pptx
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJS
 
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Node
How to Hack a Road Trip  with a Webcam, a GSP and Some Fun with NodeHow to Hack a Road Trip  with a Webcam, a GSP and Some Fun with Node
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Node
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
D3.js - A picture is worth a thousand words
D3.js - A picture is worth a thousand wordsD3.js - A picture is worth a thousand words
D3.js - A picture is worth a thousand words
 
Browsers with Wings
Browsers with WingsBrowsers with Wings
Browsers with Wings
 
R and cpp
R and cppR and cpp
R and cpp
 
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSONFun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
 

Recently uploaded

Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)codyslingerland1
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1DianaGray10
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 

Recently uploaded (20)

Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 

Apéro RubyBdx - MongoDB - 8-11-2011

  • 1. Pierre-Louis Gottfrois Bastien Murzeau Apéro Ruby Bordeaux, 8 novembre 2011
  • 2. • Brève introduction • Cas pratique • Map / Reduce
  • 3. Qu’est ce que mongoDB ? mongoDB est une base de donnée de type NoSQL, sans schéma document-oriented
  • 4. sans-schéma • Très utile en développements ‘agiles’ (itérations, rapidité de modifications, flexibilité pour les développeurs) • Supporte des fonctionnalités qui seraient, en BDDs relationnelles : • quasi-impossible (stockage d’éléments non finis, ex. tags) • trop complexes pour ce qu’elles sont (migrations)
  • 5. document-oriented • mongoDB stocke des documents, pas de rows • les documents sont stockés sous forme de JSON; binary JSON • la syntaxe de requêtage est aussi fournie que SQL • le mécanisme de documents ‘embedded’ résout bon nombre de problèmes rencontrés
  • 6. document-oriented • Les documents sont stockés dans une collection, en RoR = model • une partie des ces données sont indexées pour optimiser les performances • un document n’est pas une poubelle !
  • 7. stockage de données volumineuses • mongoDB (et autres NoSQL) sont plus performantes pour la scalabilité horizontale • ajout de serveurs pour augmenter la capacité de stockage («sharding») • garantissant ainsi une meilleur disponibilité • load-balancing optimisé entre les nodes • augmentation transparente pour l’application
  • 8. Cas pratique • ORM devient ODM, la gem de référence mongoid • ou : mongoMapper, DataMapper • Création d’une application a base de NoSQL MongoDB • rails new nosql • edition du Gemfile • gem ‘mongoid’ • gem ‘bson_ext’ • bundle install • rails generate mongoid:config
  • 9. Cas pratique • edition du config/application.rb • #require 'rails/all' • require "action_controller/railtie" • require "action_mailer/railtie" • require "active_resource/railtie" • require "rails/test_unit/railtie"
  • 10. Cas pratique class Subject include Mongoid::Document include Mongoid::Timestamps has_many :scores, :as => :scorable, :dependent => :delete, :autosave => true has_many :requests, :dependent => :delete belongs_to :author, :class_name => 'User' class Conversation include Mongoid::Document include Mongoid::Timestamps field :public, :type => Boolean, :default => false has_many :scores, :as => :scorable, :dependent => :delete has_and_belongs_to_many :subjects belongs_to :timeline embeds_many :messages
  • 12. Example A “ticket” collection { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 13. Problematic • We want to • Calculate the ‘checkout’ sum of each object in our ticket’s collection • Be able to distribute this operation over the network • Be fast! • We don’t want to • Go over all objects again when an update is made
  • 14. Map : emit(checkout) The ‘map’ function emit (select) every checkout value of each object in our collection 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 15. Reduce : sum(checkout) 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 16. Reduce function The ‘reduce’ function apply the algorithmic logic for each key/value received from ‘map’ function This function has to be ‘idempotent’ to be called recursively or in a distributed system reduce(k, A, B) == reduce(k, B, A) reduce(k, A, B) == reduce(k, reduce(A, B))
  • 17. Inherently Distributed 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 18. Distributed Since ‘map’ function emits objects to be reduced and ‘reduce’ function processes for each emitted objects independently, it can be distributed through multiple workers. map reduce
  • 19. Logaritmic Update For the same reason, when updating an object, we don’t have to reprocess for each obejcts. We can call ‘map’ function only on updated objects.
  • 20. Logaritmic Update 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 21. Logaritmic Update 430 142 288 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 22. Logaritmic Update 430 142 283 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 23. Logarithmic Update 425 142 283 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 25. $> mongo > db.tickets.save({ "_id": 1, "day": 20111017, "checkout": 100 }) > db.tickets.save({ "_id": 2, "day": 20111017, "checkout": 42 }) > db.tickets.save({ "_id": 3, "day": 20111017, "checkout": 215 }) > db.tickets.save({ "_id": 4, "day": 20111017, "checkout": 73 }) > db.tickets.count() 4 > db.tickets.find() { "_id" : 1, "day" : 20111017, "checkout" : 100 } ... > db.tickets.find({ "_id": 1 }) { "_id" : 1, "day" : 20111017, "checkout" : 100 }
  • 26. > var map = function() { ... emit(null, this.checkout) } > var reduce = function(key, values) { ... var sum = 0 ... for (var index in values) sum += values[index] ... return sum }
  • 27. Temporary Collection > sumOfCheckouts = db.tickets.mapReduce(map, reduce) { "result" : "tmp.mr.mapreduce_123456789_4", "timeMills" : 8, "counts" : { "input" : 4, "emit" : 4, "output" : 1 }, "ok" : 1 } > db.getCollectionNames() [ "tickets", "tmp.mr.mapreduce_123456789_4" ] > db[sumOfCheckouts.result].find() { "_id" : null, "value" : 430 }
  • 28. Persistent Collection > db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" }) > db.getCollectionNames() [ "sumOfCheckouts", "tickets", "tmp.mr.mapreduce_123456789_4" ] > db.sumOfCheckouts.find() { "_id" : null, "value" : 430 } > db.sumOfCheckouts.findOne().value 430
  • 30. > var map = function() { ... emit(this.date, this.checkout) } > var reduce = function(key, values) { ... var sum = 0 ... for (var index in values) sum += values[index] ... return sum }
  • 31. > db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" }) > db.sumOfCheckouts.find() { "_id" : 20111017, "value" : 430 }
  • 33. Scored Subjects per User Subject User Score 1 1 2 1 1 2 1 2 2 2 1 2 2 2 10 2 2 5
  • 34. Scored Subjects per User (reduced) Subject User Score 1 1 4 1 2 2 2 1 2 2 2 15
  • 35. $> mongo > db.scores.save({ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 2, "subject_id": 1, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 3, "subject_id": 1, "user_id": 2, "score": 2 }) > db.scores.save({ "_id": 4, "subject_id": 2, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 5, "subject_id": 2, "user_id": 2, "score": 10 }) > db.scores.save({ "_id": 6, "subject_id": 2, "user_id": 2, "score": 5 }) > db.scores.count() 6 > db.scores.find() { "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 } ... > db.scores.find({ "_id": 1 }) { "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
  • 36. > var map = function() { ... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id, ... user_id:this.user_id, score:this.score}); } > var reduce = function(key, values) { ... var result = {user_id:"", subject_id:"", score:0}; ... values.forEach(function (value) {result.score += value.score;result.user_id = ... value.user_id;result.subject_id = value.subject_id;}); ... return result }
  • 37. ReducedScores Collection > db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" }) > db.getCollectionNames() [ "reduced_scores", "scores" ] > db.reduced_scores.find() { "_id" : "1-1", "value" : { "user_id" : 1, "subject_id" : 1, "score" : 4 } } { "_id" : "1-2", "value" : { "user_id" : 1, "subject_id" : 2, "score" : 2 } } { "_id" : "2-1", "value" : { "user_id" : 2, "subject_id" : 1, "score" : 2 } } { "_id" : "2-2", "value" : { "user_id" : 2, "subject_id" : 2, "score" : 15 } } > db.reduced_scores.findOne().score 4
  • 38. Dealing with Rails Query ruby-1.9.2-p180 :007 > ReducedScores.first => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId('...'), "subject_id"=>BSON::ObjectId('...'), "score"=>4.0}> ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count => 2 ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value['score'] => 4.0 ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value['score'] => 2.0