Introduction to MongoDB $substrBytes Operator

The $substrBytes operator is a string aggregation operator in MongoDB used to extract a substring from a string. Unlike the $substrCP operator, which extracts substrings based on code points, the $substrBytes operator extracts substrings based on byte positions and is therefore suitable for strings containing multi-byte characters.

Syntax

The syntax for the $substrBytes operator is as follows:

{ $substrBytes: [ <string>, <start>, <length> ] }

Where:

  • string: The original string from which to extract the substring.
  • start: The starting position from which to extract the substring, starting from 0.
  • length: The length of the substring to extract, in bytes.

Use Cases

The $substrBytes operator is typically used to extract a portion of a string, such as a specific field from a string containing log information or a substring from a string containing multi-byte characters. In these scenarios, the $substrBytes operator can be used to conveniently process strings.

Examples

Suppose we have the following documents:

{ "_id": 1, "name": "John", "address": "123 Main St, Anytown, USA" }
{ "_id": 2, "name": "Alice", "address": "456 Second St, Othertown, USA" }

The following example uses the $substrBytes operator to extract the first two bytes from the name field and stores the result in a new field, name_short:

db.users.aggregate([
  {
    $project: {
      name: 1,
      name_short: { $substrBytes: ["$name", 0, 2] }
    }
  }
])

After executing the above aggregation pipeline, the following results will be obtained:

{ "_id": 1, "name": "John", "name_short": "Jo" }
{ "_id": 2, "name": "Alice", "name_short": "Al" }

The following example uses the $substrBytes operator to extract 10 bytes starting from the fifth byte of the address field and stores the result in a new field, address_short:

db.users.aggregate([
  {
    $project: {
      address: 1,
      address_short: { $substrBytes: ["$address", 4, 10] }
    }
  }
])

After executing the above aggregation pipeline, the following results will be obtained:

{ "_id": 1, "address": "123 Main St, Anytown, USA", "address_short": "Main St, " }
{ "_id": 2, "address": "456 Second St, Othertown, USA", "address_short": "ond St, Ot" }

Conclusion

The $substrBytes operator is a string aggregation operator in MongoDB used to extract a substring from a string based on byte positions.