Introduction to MongoDB $sample Operator

$sample operator is an aggregation pipeline operator in MongoDB, which randomly selects a specified number of documents as output. This operator can be used to display random documents or for testing purposes.

Syntax

The syntax of the $sample operator is as follows:

{
  $sample: {
    size: <positive integer>
  }
}

Here, size is required and specifies the number of documents to randomly select. It must be a positive integer.

Use Cases

The $sample operator is mainly used in the following scenarios:

  • Randomly selecting documents for display
  • Randomly sampling data sets
  • Testing purposes to simulate random data in production environments

Example

Here is an example of using the $sample operator.

Assuming there is a users collection containing the following documents:

{ "_id": 1, "name": "Alice", "age": 28 }
{ "_id": 2, "name": "Bob", "age": 35 }
{ "_id": 3, "name": "Charlie", "age": 42 }
{ "_id": 4, "name": "David", "age": 19 }
{ "_id": 5, "name": "Eva", "age": 25 }

The following aggregation pipeline randomly selects 2 documents:

db.users.aggregate([{ $sample: { size: 2 } }])

The output could be one of the following documents, but not necessarily these exact documents:

{ "_id": 2, "name": "Bob", "age": 35 }
{ "_id": 4, "name": "David", "age": 19 }

Conclusion

The $sample operator is a very useful aggregation pipeline operator in MongoDB that is used to randomly select a specified number of documents. It can be used in various scenarios, such as displaying random documents, randomly sampling data, and testing. Note that this operator may incur significant performance overhead for large datasets, so it needs to be weighed based on specific circumstances.