SlideShare a Scribd company logo
1 of 39
Download to read offline
Pierre-Louis Gottfrois
Bastien Murzeau
Apéro Ruby Bordeaux, 8 novembre 2011
• Brève introduction


• Cas pratique


• Map / Reduce
Qu’est ce que mongoDB ?


 mongoDB est une base de donnée
        de type NoSQL,
          sans schéma
       document-oriented
sans-schéma

• Très utile en développements
  ‘agiles’ (itérations, rapidité de modifications,
  flexibilité pour les développeurs)

• Supporte des fonctionnalités qui seraient, en
  BDDs relationnelles :
 • quasi-impossible (stockage d’éléments non finis, ex. tags)

 • trop complexes pour ce qu’elles sont (migrations)
document-oriented
• mongoDB stocke des documents, pas de
  rows

 • les documents sont stockés sous forme de
   JSON; binary JSON

• la syntaxe de requêtage est aussi fournie que
  SQL

• le mécanisme de documents ‘embedded’
  résout bon nombre de problèmes rencontrés
document-oriented

• Les documents sont stockés dans une
 collection, en RoR = model


• une partie des ces données sont indexées
 pour optimiser les performances


• un document n’est pas une poubelle !
stockage de données
        volumineuses
• mongoDB (et autres NoSQL) sont plus
 performantes pour la scalabilité horizontale
 • ajout de serveurs pour augmenter la capacité
   de stockage («sharding»)
 • garantissant ainsi une meilleur disponibilité
 • load-balancing optimisé entre les nodes
 • augmentation transparente pour l’application
Cas pratique
• ORM devient ODM, la gem de référence mongoid
  • ou : mongoMapper, DataMapper
• Création d’une application a base de NoSQL MongoDB
  • rails new nosql
  • edition du Gemfile
    •   gem ‘mongoid’

    •   gem ‘bson_ext’

  • bundle install
  • rails generate mongoid:config
Cas pratique
• edition du config/application.rb
  • #require 'rails/all'
  • require "action_controller/railtie"
  • require "action_mailer/railtie"
  • require "active_resource/railtie"
  • require "rails/test_unit/railtie"
Cas pratique
class Subject
  include Mongoid::Document
  include Mongoid::Timestamps

  has_many :scores,     :as => :scorable, :dependent => :delete, :autosave => true
  has_many :requests,   :dependent => :delete
  belongs_to :author,   :class_name => 'User'




    class Conversation
      include Mongoid::Document
      include Mongoid::Timestamps


      field :public,            :type => Boolean, :default => false

      has_many :scores,         :as => :scorable, :dependent => :delete
      has_and_belongs_to_many   :subjects
      belongs_to :timeline
      embeds_many :messages
Map Reduce
Example


                               A “ticket” collection




{                       {                       {                       {
    “id” : 1,               “id” : 2,               “id” : 3,               “id” : 4,
    “day” : 20111017,       “day” : 20111017,       “day” : 20111017,       “day” : 20111017,
    “checkout” : 100        “checkout” : 42         “checkout” : 215        “checkout” : 73
}                       }                       }                       }
Problematic

• We want to
 • Calculate the ‘checkout’ sum of each object in our
    ticket’s collection

 • Be able to distribute this operation over the network
 • Be fast!
• We don’t want to
 • Go over all objects again when an update is made
Map : emit(checkout)

    The ‘map’ function emit (select) every checkout value
               of each object in our collection


          100                      42                     215                      73



{                       {                       {                       {
    “id” : 1,               “id” : 2,               “id” : 3,               “id” : 4,
    “day” : 20111017,       “day” : 20111017,       “day” : 20111017,       “day” : 20111017,
    “checkout” : 100        “checkout” : 42         “checkout” : 215        “checkout” : 73
}                       }                       }                       }
Reduce : sum(checkout)
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 215          “checkout” : 73
}                         }                         }                        }
Reduce function

 The ‘reduce’ function apply the algorithmic logic
 for each key/value received from ‘map’ function

This function has to be ‘idempotent’ to be called
      recursively or in a distributed system

reduce(k, A, B) == reduce(k, B, A)
reduce(k, A, B) == reduce(k, reduce(A, B))
Inherently Distributed
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 215          “checkout” : 73
}                         }                         }                        }
Distributed
Since ‘map’ function emits objects to be reduced
and ‘reduce’ function processes for each emitted
   objects independently, it can be distributed
            through multiple workers.




         map                     reduce
Logaritmic Update

For the same reason, when updating an object, we
    don’t have to reprocess for each obejcts.

   We can call ‘map’ function only on updated
                     objects.
Logaritmic Update
                                                  430




                        142                                                 288




          100                        42                       215                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logaritmic Update
                                                  430




                        142                                                 288




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logaritmic Update
                                                  430




                        142                                                 283




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Logarithmic Update
                                                  425




                        142                                                 283




          100                        42                       210                        73



{                         {                         {                        {
    “id” : 1,                 “id” : 2,                 “id” : 3,                 “id” : 4,
    “day” : 20111017,         “day” : 20111017,         “day” : 20111017,         “day” : 20111017,
    “checkout” : 100          “checkout” : 42           “checkout” : 210          “checkout” : 73
}                         }                         }                        }
Let’s do some code!
$> mongo

>   db.tickets.save({   "_id":   1,   "day":   20111017,   "checkout":   100 })
>   db.tickets.save({   "_id":   2,   "day":   20111017,   "checkout":   42 })
>   db.tickets.save({   "_id":   3,   "day":   20111017,   "checkout":   215 })
>   db.tickets.save({   "_id":   4,   "day":   20111017,   "checkout":   73 })

> db.tickets.count()
4

> db.tickets.find()
{ "_id" : 1, "day" : 20111017, "checkout" : 100 }
...

> db.tickets.find({ "_id": 1 })
{ "_id" : 1, "day" : 20111017, "checkout" : 100 }
> var map = function() {
... emit(null, this.checkout)
}

> var reduce = function(key, values) {
... var sum = 0
... for (var index in values) sum += values[index]
... return sum
}
Temporary Collection
> sumOfCheckouts = db.tickets.mapReduce(map, reduce)
{
  "result" : "tmp.mr.mapreduce_123456789_4",
  "timeMills" : 8,
  "counts" : { "input" : 4, "emit" : 4, "output" : 1 },
  "ok" : 1
}

> db.getCollectionNames()
[
  "tickets",
  "tmp.mr.mapreduce_123456789_4"
]

> db[sumOfCheckouts.result].find()
{ "_id" : null, "value" : 430 }
Persistent Collection
> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })

> db.getCollectionNames()
[
  "sumOfCheckouts",
  "tickets",
  "tmp.mr.mapreduce_123456789_4"
]

> db.sumOfCheckouts.find()
{ "_id" : null, "value" : 430 }

> db.sumOfCheckouts.findOne().value
430
Reduce by Date
> var map = function() {
... emit(this.date, this.checkout)
}

> var reduce = function(key, values) {
... var sum = 0
... for (var index in values) sum += values[index]
... return sum
}
> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })

> db.sumOfCheckouts.find()
{ "_id" : 20111017, "value" : 430 }
What we can do
Scored Subjects per
        User
Subject   User   Score
   1       1       2
   1       1       2
   1       2       2
   2       1       2
   2       2      10
   2       2       5
Scored Subjects per
   User (reduced)
Subject   User   Score

  1        1      4

  1        2      2

  2        1      2

  2        2      15
$> mongo

>   db.scores.save({   "_id":   1,   "subject_id":   1,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   2,   "subject_id":   1,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   3,   "subject_id":   1,   "user_id":   2,   "score":   2 })
>   db.scores.save({   "_id":   4,   "subject_id":   2,   "user_id":   1,   "score":   2 })
>   db.scores.save({   "_id":   5,   "subject_id":   2,   "user_id":   2,   "score":   10 })
>   db.scores.save({   "_id":   6,   "subject_id":   2,   "user_id":   2,   "score":   5 })

> db.scores.count()
6

> db.scores.find()
{ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
...

> db.scores.find({ "_id": 1 })
{ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
> var map = function() {
... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id,
... user_id:this.user_id, score:this.score});
}

> var reduce = function(key, values) {
... var result = {user_id:"", subject_id:"", score:0};
... values.forEach(function (value) {result.score += value.score;result.user_id =
... value.user_id;result.subject_id = value.subject_id;});
... return result
}
ReducedScores
                         Collection
> db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" })

> db.getCollectionNames()
[
  "reduced_scores",
  "scores"
]

>   db.reduced_scores.find()
{   "_id" : "1-1", "value" :   {   "user_id"   :   1,   "subject_id"   :   1,   "score"   :   4 } }
{   "_id" : "1-2", "value" :   {   "user_id"   :   1,   "subject_id"   :   2,   "score"   :   2 } }
{   "_id" : "2-1", "value" :   {   "user_id"   :   2,   "subject_id"   :   1,   "score"   :   2 } }
{   "_id" : "2-2", "value" :   {   "user_id"   :   2,   "subject_id"   :   2,   "score"   :   15 } }

> db.reduced_scores.findOne().score
4
Dealing with Rails Query

ruby-1.9.2-p180 :007 > ReducedScores.first
 => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId('...'),
"subject_id"=>BSON::ObjectId('...'), "score"=>4.0}>

ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count
 => 2

ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value['score']
 => 4.0

ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value['score']
 => 2.0
Questions ?

More Related Content

Viewers also liked

LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014bndmr
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB
 
sshGate - RMLL 2011
sshGate - RMLL 2011sshGate - RMLL 2011
sshGate - RMLL 2011Tauop
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB
 
Automatisez votre gestion de MongoDB avec MMS
Automatisez votre gestion de MongoDB avec MMSAutomatisez votre gestion de MongoDB avec MMS
Automatisez votre gestion de MongoDB avec MMSMongoDB
 
Le monitoring à l'heure de DevOps et Big Data
Le monitoring à l'heure de DevOps et Big DataLe monitoring à l'heure de DevOps et Big Data
Le monitoring à l'heure de DevOps et Big DataClaude Falguiere
 
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDBPlus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDBMongoDB
 
L\'authentification forte : Concept et Technologies
L\'authentification forte : Concept et TechnologiesL\'authentification forte : Concept et Technologies
L\'authentification forte : Concept et TechnologiesIbrahima FALL
 
Supervision de réseau informatique - Nagios
Supervision de réseau informatique - NagiosSupervision de réseau informatique - Nagios
Supervision de réseau informatique - NagiosAziz Rgd
 
ElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementMohamed hedi Abidi
 
Rapport de stage nagios
Rapport de stage nagiosRapport de stage nagios
Rapport de stage nagioshindif
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Installer et configurer NAGIOS sous linux
Installer et configurer NAGIOS sous linuxInstaller et configurer NAGIOS sous linux
Installer et configurer NAGIOS sous linuxZakariyaa AIT ELMOUDEN
 
Présentation de ElasticSearch / Digital apéro du 12/11/2014
Présentation de ElasticSearch / Digital apéro du 12/11/2014Présentation de ElasticSearch / Digital apéro du 12/11/2014
Présentation de ElasticSearch / Digital apéro du 12/11/2014Silicon Comté
 
Tirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearchTirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearchSéven Le Mesle
 

Viewers also liked (17)

LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
LORD : un outil d'aide au codage des maladies - JFIM - 13 juin 2014
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
sshGate - RMLL 2011
sshGate - RMLL 2011sshGate - RMLL 2011
sshGate - RMLL 2011
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment Checklist
 
Automatisez votre gestion de MongoDB avec MMS
Automatisez votre gestion de MongoDB avec MMSAutomatisez votre gestion de MongoDB avec MMS
Automatisez votre gestion de MongoDB avec MMS
 
Le monitoring à l'heure de DevOps et Big Data
Le monitoring à l'heure de DevOps et Big DataLe monitoring à l'heure de DevOps et Big Data
Le monitoring à l'heure de DevOps et Big Data
 
Supervision
SupervisionSupervision
Supervision
 
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDBPlus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
 
L\'authentification forte : Concept et Technologies
L\'authentification forte : Concept et TechnologiesL\'authentification forte : Concept et Technologies
L\'authentification forte : Concept et Technologies
 
Supervision de réseau informatique - Nagios
Supervision de réseau informatique - NagiosSupervision de réseau informatique - Nagios
Supervision de réseau informatique - Nagios
 
ElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementElasticSearch : Architecture et Développement
ElasticSearch : Architecture et Développement
 
Rapport de stage nagios
Rapport de stage nagiosRapport de stage nagios
Rapport de stage nagios
 
PKI par la Pratique
PKI par la PratiquePKI par la Pratique
PKI par la Pratique
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Installer et configurer NAGIOS sous linux
Installer et configurer NAGIOS sous linuxInstaller et configurer NAGIOS sous linux
Installer et configurer NAGIOS sous linux
 
Présentation de ElasticSearch / Digital apéro du 12/11/2014
Présentation de ElasticSearch / Digital apéro du 12/11/2014Présentation de ElasticSearch / Digital apéro du 12/11/2014
Présentation de ElasticSearch / Digital apéro du 12/11/2014
 
Tirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearchTirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearch
 

Similar to Apéro RubyBdx - MongoDB - 8-11-2011

Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarMongoDB
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
You will learn RxJS in 2017
You will learn RxJS in 2017You will learn RxJS in 2017
You will learn RxJS in 2017名辰 洪
 
What's new in GeoServer 2.2
What's new in GeoServer 2.2What's new in GeoServer 2.2
What's new in GeoServer 2.2GeoSolutions
 
The Art Of Readable Code
The Art Of Readable CodeThe Art Of Readable Code
The Art Of Readable CodeBaidu, Inc.
 
IT Days - Parse huge JSON files in a streaming way.pptx
IT Days - Parse huge JSON files in a streaming way.pptxIT Days - Parse huge JSON files in a streaming way.pptx
IT Days - Parse huge JSON files in a streaming way.pptxAndrei Negruti
 
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...Databricks
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applicationsKexin Xie
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJSKyung Yeol Kim
 
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Node
How to Hack a Road Trip  with a Webcam, a GSP and Some Fun with NodeHow to Hack a Road Trip  with a Webcam, a GSP and Some Fun with Node
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Nodepdeschen
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
D3.js - A picture is worth a thousand words
D3.js - A picture is worth a thousand wordsD3.js - A picture is worth a thousand words
D3.js - A picture is worth a thousand wordsApptension
 
Browsers with Wings
Browsers with WingsBrowsers with Wings
Browsers with WingsRemy Sharp
 
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSONFun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSONTomomi Imura
 

Similar to Apéro RubyBdx - MongoDB - 8-11-2011 (20)

Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
 
Search@airbnb
Search@airbnbSearch@airbnb
Search@airbnb
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Advancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGISAdvancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGIS
 
You will learn RxJS in 2017
You will learn RxJS in 2017You will learn RxJS in 2017
You will learn RxJS in 2017
 
What's new in GeoServer 2.2
What's new in GeoServer 2.2What's new in GeoServer 2.2
What's new in GeoServer 2.2
 
The Art Of Readable Code
The Art Of Readable CodeThe Art Of Readable Code
The Art Of Readable Code
 
IT Days - Parse huge JSON files in a streaming way.pptx
IT Days - Parse huge JSON files in a streaming way.pptxIT Days - Parse huge JSON files in a streaming way.pptx
IT Days - Parse huge JSON files in a streaming way.pptx
 
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJS
 
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Node
How to Hack a Road Trip  with a Webcam, a GSP and Some Fun with NodeHow to Hack a Road Trip  with a Webcam, a GSP and Some Fun with Node
How to Hack a Road Trip with a Webcam, a GSP and Some Fun with Node
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
D3.js - A picture is worth a thousand words
D3.js - A picture is worth a thousand wordsD3.js - A picture is worth a thousand words
D3.js - A picture is worth a thousand words
 
Browsers with Wings
Browsers with WingsBrowsers with Wings
Browsers with Wings
 
R and cpp
R and cppR and cpp
R and cpp
 
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSONFun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Apéro RubyBdx - MongoDB - 8-11-2011

  • 1. Pierre-Louis Gottfrois Bastien Murzeau Apéro Ruby Bordeaux, 8 novembre 2011
  • 2. • Brève introduction • Cas pratique • Map / Reduce
  • 3. Qu’est ce que mongoDB ? mongoDB est une base de donnée de type NoSQL, sans schéma document-oriented
  • 4. sans-schéma • Très utile en développements ‘agiles’ (itérations, rapidité de modifications, flexibilité pour les développeurs) • Supporte des fonctionnalités qui seraient, en BDDs relationnelles : • quasi-impossible (stockage d’éléments non finis, ex. tags) • trop complexes pour ce qu’elles sont (migrations)
  • 5. document-oriented • mongoDB stocke des documents, pas de rows • les documents sont stockés sous forme de JSON; binary JSON • la syntaxe de requêtage est aussi fournie que SQL • le mécanisme de documents ‘embedded’ résout bon nombre de problèmes rencontrés
  • 6. document-oriented • Les documents sont stockés dans une collection, en RoR = model • une partie des ces données sont indexées pour optimiser les performances • un document n’est pas une poubelle !
  • 7. stockage de données volumineuses • mongoDB (et autres NoSQL) sont plus performantes pour la scalabilité horizontale • ajout de serveurs pour augmenter la capacité de stockage («sharding») • garantissant ainsi une meilleur disponibilité • load-balancing optimisé entre les nodes • augmentation transparente pour l’application
  • 8. Cas pratique • ORM devient ODM, la gem de référence mongoid • ou : mongoMapper, DataMapper • Création d’une application a base de NoSQL MongoDB • rails new nosql • edition du Gemfile • gem ‘mongoid’ • gem ‘bson_ext’ • bundle install • rails generate mongoid:config
  • 9. Cas pratique • edition du config/application.rb • #require 'rails/all' • require "action_controller/railtie" • require "action_mailer/railtie" • require "active_resource/railtie" • require "rails/test_unit/railtie"
  • 10. Cas pratique class Subject include Mongoid::Document include Mongoid::Timestamps has_many :scores, :as => :scorable, :dependent => :delete, :autosave => true has_many :requests, :dependent => :delete belongs_to :author, :class_name => 'User' class Conversation include Mongoid::Document include Mongoid::Timestamps field :public, :type => Boolean, :default => false has_many :scores, :as => :scorable, :dependent => :delete has_and_belongs_to_many :subjects belongs_to :timeline embeds_many :messages
  • 12. Example A “ticket” collection { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 13. Problematic • We want to • Calculate the ‘checkout’ sum of each object in our ticket’s collection • Be able to distribute this operation over the network • Be fast! • We don’t want to • Go over all objects again when an update is made
  • 14. Map : emit(checkout) The ‘map’ function emit (select) every checkout value of each object in our collection 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 15. Reduce : sum(checkout) 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 16. Reduce function The ‘reduce’ function apply the algorithmic logic for each key/value received from ‘map’ function This function has to be ‘idempotent’ to be called recursively or in a distributed system reduce(k, A, B) == reduce(k, B, A) reduce(k, A, B) == reduce(k, reduce(A, B))
  • 17. Inherently Distributed 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73 } } } }
  • 18. Distributed Since ‘map’ function emits objects to be reduced and ‘reduce’ function processes for each emitted objects independently, it can be distributed through multiple workers. map reduce
  • 19. Logaritmic Update For the same reason, when updating an object, we don’t have to reprocess for each obejcts. We can call ‘map’ function only on updated objects.
  • 20. Logaritmic Update 430 142 288 100 42 215 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 21. Logaritmic Update 430 142 288 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 22. Logaritmic Update 430 142 283 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 23. Logarithmic Update 425 142 283 100 42 210 73 { { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73 } } } }
  • 25. $> mongo > db.tickets.save({ "_id": 1, "day": 20111017, "checkout": 100 }) > db.tickets.save({ "_id": 2, "day": 20111017, "checkout": 42 }) > db.tickets.save({ "_id": 3, "day": 20111017, "checkout": 215 }) > db.tickets.save({ "_id": 4, "day": 20111017, "checkout": 73 }) > db.tickets.count() 4 > db.tickets.find() { "_id" : 1, "day" : 20111017, "checkout" : 100 } ... > db.tickets.find({ "_id": 1 }) { "_id" : 1, "day" : 20111017, "checkout" : 100 }
  • 26. > var map = function() { ... emit(null, this.checkout) } > var reduce = function(key, values) { ... var sum = 0 ... for (var index in values) sum += values[index] ... return sum }
  • 27. Temporary Collection > sumOfCheckouts = db.tickets.mapReduce(map, reduce) { "result" : "tmp.mr.mapreduce_123456789_4", "timeMills" : 8, "counts" : { "input" : 4, "emit" : 4, "output" : 1 }, "ok" : 1 } > db.getCollectionNames() [ "tickets", "tmp.mr.mapreduce_123456789_4" ] > db[sumOfCheckouts.result].find() { "_id" : null, "value" : 430 }
  • 28. Persistent Collection > db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" }) > db.getCollectionNames() [ "sumOfCheckouts", "tickets", "tmp.mr.mapreduce_123456789_4" ] > db.sumOfCheckouts.find() { "_id" : null, "value" : 430 } > db.sumOfCheckouts.findOne().value 430
  • 30. > var map = function() { ... emit(this.date, this.checkout) } > var reduce = function(key, values) { ... var sum = 0 ... for (var index in values) sum += values[index] ... return sum }
  • 31. > db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" }) > db.sumOfCheckouts.find() { "_id" : 20111017, "value" : 430 }
  • 33. Scored Subjects per User Subject User Score 1 1 2 1 1 2 1 2 2 2 1 2 2 2 10 2 2 5
  • 34. Scored Subjects per User (reduced) Subject User Score 1 1 4 1 2 2 2 1 2 2 2 15
  • 35. $> mongo > db.scores.save({ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 2, "subject_id": 1, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 3, "subject_id": 1, "user_id": 2, "score": 2 }) > db.scores.save({ "_id": 4, "subject_id": 2, "user_id": 1, "score": 2 }) > db.scores.save({ "_id": 5, "subject_id": 2, "user_id": 2, "score": 10 }) > db.scores.save({ "_id": 6, "subject_id": 2, "user_id": 2, "score": 5 }) > db.scores.count() 6 > db.scores.find() { "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 } ... > db.scores.find({ "_id": 1 }) { "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
  • 36. > var map = function() { ... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id, ... user_id:this.user_id, score:this.score}); } > var reduce = function(key, values) { ... var result = {user_id:"", subject_id:"", score:0}; ... values.forEach(function (value) {result.score += value.score;result.user_id = ... value.user_id;result.subject_id = value.subject_id;}); ... return result }
  • 37. ReducedScores Collection > db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" }) > db.getCollectionNames() [ "reduced_scores", "scores" ] > db.reduced_scores.find() { "_id" : "1-1", "value" : { "user_id" : 1, "subject_id" : 1, "score" : 4 } } { "_id" : "1-2", "value" : { "user_id" : 1, "subject_id" : 2, "score" : 2 } } { "_id" : "2-1", "value" : { "user_id" : 2, "subject_id" : 1, "score" : 2 } } { "_id" : "2-2", "value" : { "user_id" : 2, "subject_id" : 2, "score" : 15 } } > db.reduced_scores.findOne().score 4
  • 38. Dealing with Rails Query ruby-1.9.2-p180 :007 > ReducedScores.first => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId('...'), "subject_id"=>BSON::ObjectId('...'), "score"=>4.0}> ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count => 2 ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value['score'] => 4.0 ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value['score'] => 2.0