Sample Training Dataset
On this page
The sample_training
database contains a set of realistic data used in
MongoDB Private Training Offerings.
This dataset is based on public available data sources such as:
These realistic datasets are used by our students to explore MongoDB's functionality across our private training labs and exercises.
To learn how to load the sample data provided by Atlas into your cluster, see Load Sample Data.
Collections
The sample_training
database contains the following collections:
Collection Name | Description |
---|---|
Contains a list of Crunchbase Data company information. | |
Contains student grade information on a given class, including scores on different assessments. | |
Contains a list of New York City business inspections, including whether the business failed or passed the inspection. | |
Contains randomized US Senate speeches organized as blog posts with randomly generated comments. | |
Contains information of airline routes, with source and destination airports, the service airline and the type of airplane. This collection is used in labs that explore the $graphLookup aggregation stage. | |
Contains New York City Citibike Data trips data. This data is useful to explore the $graphLookup aggregation stage and showcase Geospatial Queries . | |
Contains United States general cities postal/zip code data. |
sample_training.companies
This collection contains information on companies listed on Crunchbase. It has a variety of information such as the company website and/or blog websites about the company, funding rounds, and known individuals associated with the company.
Indexes
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
|
| Primary key index on the |
Sample Document
{ "_id": { "$oid": "52cdef7c4bab8bd675298291" }, "acquisition": null, "acquisitions": [], "alias_list": null, "blog_feed_url": "http://mobiance.wordpress.com/feed/", "blog_url": "http://mobiance.wordpress.com/", "category_code": "web", "competitions": [], "created_at": "Tue Feb 12 17:31:58 UTC 2008", "crunchbase_url": "http://www.crunchbase.com/company/mobiance", "deadpooled_day": null, "deadpooled_month": null, "deadpooled_url": null, "deadpooled_year": null, "description": null, "email_address": "info@mobiance.com", "external_links": [], "founded_day": { "$numberInt": "1" }, "founded_month": { "$numberInt": "10" }, "founded_year": { "$numberInt": "2004" }, "funding_rounds": [], "homepage_url": "http://www.mobiance.com", "image": { "attribution": null, "available_sizes": [ [ [ { "$numberInt": "150" }, { "$numberInt": "43" } ], "assets/images/resized/0001/1859/11859v1-max-150x150.png" ], [ [ { "$numberInt": "208" }, { "$numberInt": "60" } ], "assets/images/resized/0001/1859/11859v1-max-250x250.png" ], [ [ { "$numberInt": "208" }, { "$numberInt": "60" } ], "assets/images/resized/0001/1859/11859v1-max-450x450.png" ] ] }, "investments": [], "ipo": null, "milestones": [], "name": "Mobiance", "number_of_employees": { "$numberInt": "5" }, "offices": [ { "address1": "BC-3, Atrium Business Center,", "address2": "Coles Road, Frazer Town,", "city": "Bangalore", "country_code": "IND", "description": null, "latitude": null, "longitude": null, "state_code": null, "zip_code": "560005" } ], "overview": "<p>Mobiance provides the technology to track cell phones ...", "partners": [], "permalink": "mobiance", "phone_number": "+91-80- 41264756", "products": [], "providerships": [], "relationships": [ { "is_past": true, "person": { "first_name": "Ritesh", "last_name": "Ambastha", "permalink": "ritesh-ambastha" }, "title": "Product Manager" } ], "screenshots": [], "tag_list": null, "total_money_raised": "$0", "twitter_username": null, "updated_at": "Thu Dec 01 07:37:10 UTC 2011", "video_embeds": [] }
sample_training.grades
This collection has randomly generated student grades.
Each document contains a class_id
that identifies the class and a
student_id
that identifies the student.
All student class exam scores are stored in the scores
array, which contains
subdocuments with two fields representing the type of assessment and the student
score for that assessment.
Indexes
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
|
| Primary key index on the |
Sample Document
{ "_id": { "$oid": "56d5f7eb604eb380b0d8d8fa" }, "class_id": { "$numberDouble": "173" }, "scores": [ { "score": { "$numberDouble": "19.81430597438296" }, "type": "exam" }, { "score": { "$numberDouble": "16.851404299968642" }, "type": "quiz" }, { "score": { "$numberDouble": "60.108751761488186" }, "type": "homework" }, { "score": { "$numberDouble": "22.886167083915776" }, "type": "homework" } ], "student_id": { "$numberDouble": "4" } }
sample_training.inspections
The inspections
collection was taken from the NYC OpenData dataset.
Each inspections
document contains information about:
The inspected business name, sector and address,
Inspection id, result, date and certificate number.
Indexes
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
|
| Primary key index on the |
Sample Document
{ "_id": { "$oid": "56d61033a378eccde8a8357e" }, "address": { "city": "LAWRENCE", "number": 1, "street": "BAY BLVD", "zip": 11559 }, "business_name": "SPRAGUE OPERATING RESOURCES LLC.", "certificate_number": 3019422, "date": "Mar 3 2015", "id": "11247-2015-ENFO", "result": "Fail", "sector": "Fuel Oil Dealer - 814" }
sample_airbnb.listingsAndReviews
The posts
collection is a set of randomly generated blog posts created
using US Senate speeches as the seed for the document body field.
On each document you will find:
Information on the blog posts like body text, author, permalink, date and title,
Randomly generated list of tags,
Randomly generated list of comment subdocuments.
Indexes
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
|
| Primary key index on the |
Sample Document
{ "_id": { "$oid": "50ab0f8bbcf1bfe2536dc3f9" }, "author": "machine", "body": "Amendment I\n<p>Congress shall make no law respecting ... ", "comments": [ { "author": "Santiago Dollins", "body": "Lorem ipsum dolor sit amet, consectetur adipisicing...", "email": "HvizfYVx@pKvLaagH.com" }, { "author": "Jaclyn Morado", "body": "Lorem ipsum dolor sit amet, consectetur adipisicing...", "email": "WpOUCpdD@hccdxJvT.com" } ... ], "date": { "$date": { "$numberLong": "1332804016000" } }, "permalink": "aRjNnLZkJkTyspAIoRGe", "tags": [ "watchmaker", "santa", "xylophone", "math", "handsaw", "dream", "undershirt", "dolphin", "tanker", "action" ], "title": "Bill of Rights" }
sample_training.routes
The routes
collection data was sourced from the Open Flights data.
The documents of this collection have information on airline routes between
airports.
Each document contains information about:
Airline data in subdocument containing the name, alias, unique identifier and the IATA airline code,
The source and destination airports, identified their IATA airport code,
Route codeshare and the number of stops.
Indexes
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
|
| Primary key index on the |
Sample Document
{ "_id": { "$oid": "56e9b39b732b6122f877fa5c" }, "airline": { "alias": "2G", "iata": "CRG", "id": 1654, "name": "Cargoitalia" }, "airplane": "A81", "codeshare": "", "dst_airport": "OVB", "src_airport": "BTK", "stops": 0 }
sample_training.trips
The trips
collection contains bike trips data from the New York City Citibike
service.
The documents are composed of:
Bicycle unique identifier,
Trip start and stop time and date,
Trip start and end stations names and geospatial location,
User information such as gender, year of birth and service type (Customer or Subscriber).
Indexes
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
|
| Primary key index on the |
Sample Document
{ "_id": { "$oid": "572bb8222b288919b68abf82" }, "bikeid": 14785, "birth year": 1977, "end station id": 433, "end station location": { "coordinates": [ -73.98057249, 40.72955361 ], "type": "Point" }, "end station name": "E 13 St & Avenue A", "gender": 1, "start station id": 518, "start station location": { "coordinates": [ -73.9734419, 40.74780373 ], "type": "Point" }, "start station name": "E 39 St & 2 Ave", "start time": { "$date": { "$numberLong": "1332804016000" } }, "stop time": { "$date": { "$numberLong": "1352114016000" } }, "tripduration": 812, "usertype": "Subscriber" }
sample_training.zips
The zips
collection contains information of US cities and their area
postal/zip code.
Documents contain information on the city name, area zip code, city center
geo coordinates (latitude and longitude), state and population.
This dataset is used to explore 2d Index creation and queries.
Indexes
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
|
| Primary key index on the |
Sample Document
{ "_id": { "$oid": "5c8eccc1caa187d17ca6ed29" }, "city": "CLEVELAND", "loc": { "x": 86.559355, "y": 33.992106 }, "pop": 2369, "state": "AL", "zip": "35049" }