Sample Training Dataset

On this page

Collections
sample_training.companies
sample_training.grades
sample_training.inspections
sample_airbnb.listingsAndReviews
sample_training.routes
sample_training.trips
sample_training.zips

The sample_training database contains a set of realistic data used in MongoDB Private Training Offerings. This dataset is based on public available data sources such as:

These realistic datasets are used by our students to explore MongoDB's functionality across our private training labs and exercises.

To learn how to load the sample data provided by Atlas into your cluster, see Load Sample Data.

Collections

The sample_training database contains the following collections:

Collection Name	Description
companies	Contains a list of Crunchbase Data company information.
grades	Contains student grade information on a given class, including scores on different assessments.
inspections	Contains a list of New York City business inspections, including whether the business failed or passed the inspection.
posts	Contains randomized US Senate speeches organized as blog posts with randomly generated comments.
routes	Contains information of airline routes, with source and destination airports, the service airline and the type of airplane. This collection is used in labs that explore the $graphLookup aggregation stage.
trips	Contains New York City Citibike Data trips data. This data is useful to explore the $graphLookup aggregation stage and showcase Geospatial Queries .
zips	Contains United States general cities postal/zip code data.

`sample_training.companies`

This collection contains information on companies listed on Crunchbase. It has a variety of information such as the company website and/or blog websites about the company, funding rounds, and known individuals associated with the company.

Indexes

This collection contains the following indexes:

Name	Index	Description
`_id_`	`{ "_id": 1 }`	Primary key index on the `_id` field.

Sample Document

{
  "_id": {
      "$oid": "52cdef7c4bab8bd675298291"
  },
  "acquisition": null,
  "acquisitions": [],
  "alias_list": null,
  "blog_feed_url": "http://mobiance.wordpress.com/feed/",
  "blog_url": "http://mobiance.wordpress.com/",
  "category_code": "web",
  "competitions": [],
  "created_at": "Tue Feb 12 17:31:58 UTC 2008",
  "crunchbase_url": "http://www.crunchbase.com/company/mobiance",
  "deadpooled_day": null,
  "deadpooled_month": null,
  "deadpooled_url": null,
  "deadpooled_year": null,
  "description": null,
  "email_address": "info@mobiance.com",
  "external_links": [],
  "founded_day": {
      "$numberInt": "1"
  },
  "founded_month": {
      "$numberInt": "10"
  },
  "founded_year": {
      "$numberInt": "2004"
  },
  "funding_rounds": [],
  "homepage_url": "http://www.mobiance.com",
  "image": {
      "attribution": null,
      "available_sizes": [
          [
              [
                  {
                      "$numberInt": "150"
                  },
                  {
                      "$numberInt": "43"
                  }
              ],
              "assets/images/resized/0001/1859/11859v1-max-150x150.png"
          ],
          [
              [
                  {
                      "$numberInt": "208"
                  },
                  {
                      "$numberInt": "60"
                  }
              ],
              "assets/images/resized/0001/1859/11859v1-max-250x250.png"
          ],
          [
              [
                  {
                      "$numberInt": "208"
                  },
                  {
                      "$numberInt": "60"
                  }
              ],
              "assets/images/resized/0001/1859/11859v1-max-450x450.png"
          ]
      ]
  },
  "investments": [],
  "ipo": null,
  "milestones": [],
  "name": "Mobiance",
  "number_of_employees": {
      "$numberInt": "5"
  },
  "offices": [
      {
          "address1": "BC-3, Atrium Business Center,",
          "address2": "Coles Road, Frazer Town,",
          "city": "Bangalore",
          "country_code": "IND",
          "description": null,
          "latitude": null,
          "longitude": null,
          "state_code": null,
          "zip_code": "560005"
      }
  ],
  "overview": "<p>Mobiance provides the technology to track cell phones ...",
  "partners": [],
  "permalink": "mobiance",
  "phone_number": "+91-80- 41264756",
  "products": [],
  "providerships": [],
  "relationships": [
      {
          "is_past": true,
          "person": {
              "first_name": "Ritesh",
              "last_name": "Ambastha",
              "permalink": "ritesh-ambastha"
          },
          "title": "Product Manager"
      }
  ],
  "screenshots": [],
  "tag_list": null,
  "total_money_raised": "$0",
  "twitter_username": null,
  "updated_at": "Thu Dec 01 07:37:10 UTC 2011",
  "video_embeds": []
}

`sample_training.grades`

This collection has randomly generated student grades. Each document contains a class_id that identifies the class and a student_id that identifies the student. All student class exam scores are stored in the scores array, which contains subdocuments with two fields representing the type of assessment and the student score for that assessment.

Indexes

This collection contains the following indexes:

Name	Index	Description
`_id_`	`{ "_id": 1 }`	Primary key index on the `_id` field.

Sample Document

{
    "_id": {
        "$oid": "56d5f7eb604eb380b0d8d8fa"
    },
    "class_id": {
        "$numberDouble": "173"
    },
    "scores": [
        {
            "score": {
                "$numberDouble": "19.81430597438296"
            },
            "type": "exam"
        },
        {
            "score": {
                "$numberDouble": "16.851404299968642"
            },
            "type": "quiz"
        },
        {
            "score": {
                "$numberDouble": "60.108751761488186"
            },
            "type": "homework"
        },
        {
            "score": {
                "$numberDouble": "22.886167083915776"
            },
            "type": "homework"
        }
    ],
    "student_id": {
        "$numberDouble": "4"
    }
}

`sample_training.inspections`

The inspections collection was taken from the NYC OpenData dataset. Each inspections document contains information about:

The inspected business name, sector and address,
Inspection id, result, date and certificate number.

Indexes

This collection contains the following indexes:

Name	Index	Description
`_id_`	`{ "_id": 1 }`	Primary key index on the `_id` field.

Sample Document

{
   "_id": {
     "$oid": "56d61033a378eccde8a8357e"
   },
   "address": {
       "city": "LAWRENCE",
       "number": 1,
       "street": "BAY BLVD",
       "zip": 11559
   },
   "business_name": "SPRAGUE OPERATING RESOURCES LLC.",
   "certificate_number": 3019422,
   "date": "Mar  3 2015",
   "id": "11247-2015-ENFO",
   "result": "Fail",
   "sector": "Fuel Oil Dealer - 814"
}

`sample_airbnb.listingsAndReviews`

The posts collection is a set of randomly generated blog posts created using US Senate speeches as the seed for the document body field. On each document you will find:

Information on the blog posts like body text, author, permalink, date and title,
Randomly generated list of tags,
Randomly generated list of comment subdocuments.

Indexes

This collection contains the following indexes:

Name	Index	Description
`_id_`	`{ "_id": 1 }`	Primary key index on the `_id` field.

Sample Document

{
    "_id": {
      "$oid": "50ab0f8bbcf1bfe2536dc3f9"
    },
    "author": "machine",
    "body": "Amendment I\n<p>Congress shall make no law respecting ...  ",
    "comments": [
        {
            "author": "Santiago Dollins",
            "body": "Lorem ipsum dolor sit amet, consectetur adipisicing...",
            "email": "HvizfYVx@pKvLaagH.com"
        },
        {
            "author": "Jaclyn Morado",
            "body": "Lorem ipsum dolor sit amet, consectetur adipisicing...",
            "email": "WpOUCpdD@hccdxJvT.com"
        }
        ...
    ],
    "date": {
      "$date": {
        "$numberLong": "1332804016000"
      }
    },
    "permalink": "aRjNnLZkJkTyspAIoRGe",
    "tags": [
        "watchmaker",
        "santa",
        "xylophone",
        "math",
        "handsaw",
        "dream",
        "undershirt",
        "dolphin",
        "tanker",
        "action"
    ],
    "title": "Bill of Rights"
}

`sample_training.routes`

The routes collection data was sourced from the Open Flights data. The documents of this collection have information on airline routes between airports.

Each document contains information about:

Airline data in subdocument containing the name, alias, unique identifier and the IATA airline code,
The source and destination airports, identified their IATA airport code,
Route codeshare and the number of stops.

Indexes

This collection contains the following indexes:

Name	Index	Description
`_id_`	`{ "_id": 1 }`	Primary key index on the `_id` field.

Sample Document

 {
     "_id": {
       "$oid": "56e9b39b732b6122f877fa5c"
     },
     "airline": {
         "alias": "2G",
         "iata": "CRG",
         "id": 1654,
         "name": "Cargoitalia"
     },
     "airplane": "A81",
     "codeshare": "",
     "dst_airport": "OVB",
     "src_airport": "BTK",
     "stops": 0
 }

`sample_training.trips`

The trips collection contains bike trips data from the New York City Citibike service. The documents are composed of:

Bicycle unique identifier,
Trip start and stop time and date,
Trip start and end stations names and geospatial location,
User information such as gender, year of birth and service type (Customer or Subscriber).

Indexes

This collection contains the following indexes:

Name	Index	Description
`_id_`	`{ "_id": 1 }`	Primary key index on the `_id` field.

Sample Document

{
    "_id": {
      "$oid": "572bb8222b288919b68abf82"
    },
    "bikeid": 14785,
    "birth year": 1977,
    "end station id": 433,
    "end station location": {
        "coordinates": [
            -73.98057249,
            40.72955361
        ],
        "type": "Point"
    },
    "end station name": "E 13 St & Avenue A",
    "gender": 1,
    "start station id": 518,
    "start station location": {
        "coordinates": [
            -73.9734419,
            40.74780373
        ],
        "type": "Point"
    },
    "start station name": "E 39 St & 2 Ave",
    "start time": {
      "$date": {
        "$numberLong": "1332804016000"
      }
    },
    "stop time": {
      "$date": {
        "$numberLong": "1352114016000"
      }
    },
    "tripduration": 812,
    "usertype": "Subscriber"
}

`sample_training.zips`

The zips collection contains information of US cities and their area postal/zip code. Documents contain information on the city name, area zip code, city center geo coordinates (latitude and longitude), state and population.

This dataset is used to explore 2d Index creation and queries.

Indexes

This collection contains the following indexes:

Name	Index	Description
`_id_`	`{ "_id": 1 }`	Primary key index on the `_id` field.

Sample Document

{
  "_id": {
    "$oid": "5c8eccc1caa187d17ca6ed29"
  },
  "city": "CLEVELAND",
  "loc": {
      "x": 86.559355,
      "y": 33.992106
  },
  "pop": 2369,
  "state": "AL",
  "zip": "35049"
}

Back

Supply Store

Weather