Skip to main content

About Data Bags

[edit on GitHub]

Data bags store global variables as JSON data. Data bags are indexed for searching and can be loaded by a cookbook or accessed during a search.

Create a Data Bag

A data bag can be created in two ways: using knife or manually. In general, using knife to create data bags is recommended, but as long as the data bag folders and data bag item JSON files are created correctly, either method is safe and effective.

Create a Data Bag with Knife

knife can be used to create data bags and data bag items when the knife data bag subcommand is run with the create argument. For example:

knife data bag create DATA_BAG_NAME (DATA_BAG_ITEM)

knife can be used to update data bag items using the from file argument:

knife data bag from file BAG_NAME ITEM_NAME.json

As long as a file is in the correct directory structure, knife will be able to find the data bag and data bag item with only the name of the data bag and data bag item. For example:

knife data bag from file BAG_NAME ITEM_NAME.json

will load the following file:

data_bags/BAG_NAME/ITEM_NAME.json

Continuing the example above, if you are in the “admins” directory and make changes to the file charlie.json, then to upload that change to the Chef Infra Server use the following command:

knife data bag from file admins charlie.json

In some cases, such as when knife is not being run from the root directory for the chef-repo, the full path to the data bag item may be required. For example:

knife data bag from file BAG_NAME /path/to/file/ITEM_NAME.json

Manually

One or more data bags and data bag items can be created manually under the data_bags directory in the chef-repo. Any method can be used to create the data bag folders and data bag item JSON files. For example:

mkdir data_bags/admins

would create a data bag folder named “admins”. The equivalent command for using knife is:

knife data bag create admins

A data bag item can be created manually in the same way as the data bag, but by also specifying the file name for the data bag item (this example is using vi, a visual editor for UNIX):

vi data_bags/admins/charlie.json

would create a data bag item named “charlie.json” under the “admins” sub-directory in the data_bags directory of the chef-repo. The equivalent command for using knife is:

knife data bag create admins charlie

Store Data in a Data Bag

When the chef-repo is cloned from GitHub, the following occurs:

  • A directory named data_bags is created.
  • For each data bag, a sub-directory is created that has the same name as the data bag.
  • For each data bag item, a JSON file is created and placed in the appropriate sub-directory.

The data_bags directory can be placed under version source control.

When deploying from a private repository using a data bag, use the deploy_key option to ensure the private key is present:

{
  'id': 'my_app',
  ... (truncated) ...
  'deploy_key': 'ssh_private_key'
}

where ssh_private_key is the same SSH private key as used with a private git repository and the new lines converted to \n.

Directory Structure

All data bags are stored in the data_bags directory of the chef-repo. This directory structure is understood by knife so that the full path does not need to be entered when working with data bags from the command line. An example of the data_bags directory structure:

- data_bags
    -  admins
        -  charlie.json
        -  bob.json
        -  tom.json
    -  db_users
        -  charlie.json
        -  bob.json
        -  sarah.json
    -  db_config
        -  small.json
        -  medium.json
        -  large.json

where admins, db_users, and db_config are the names of individual data bags and all of the files that end with .json are the individual data bag items.

Data Bag Items

A data bag is a container of related data bag items, where each individual data bag item is a JSON file. knife can load a data bag item by specifying the name of the data bag to which the item belongs and then the filename of the data bag item. The only structural requirement of a data bag item is that it must have an id:

{
  /* This is a supported comment style */
  // This style is also supported
  "id": "ITEM_NAME",
  "key": "value"
}

where

  • key and value are the key:value pair for each additional attribute within the data bag item
  • /* ... */ and // ... show two ways to add comments to the data bag item

Encrypt a Data Bag Item

A data bag item may be encrypted using shared secret encryption. This allows each data bag item to store confidential information (such as a database password) or to be managed in a source control system (without plain-text data appearing in revision history). Each data bag item may be encrypted individually; if a data bag contains multiple encrypted data bag items, these data bag items are not required to share the same encryption keys.

Note

Because the contents of encrypted data bag items are not visible to the Chef Infra Server, search queries against data bags with encrypted items will not return any results.

Encryption Versions

The manner by which a data bag item is encrypted depends on the Chef Infra Client version used. See the following:

Infra Client versionEncryption v0Encryption v1Encryption v2Encryption v3
10.xR W
11.0+RR W
11.6+R DR DR W
13.0R DR DR DR W

R = read W = write D = disable

Version 0

Chef Infra Client 0.10+

  • Uses YAML serialization format to encrypt data bag items
  • Uses Base64 encoding to preserve special characters
  • Uses AES-256-CBC encryption, as defined by the OpenSSL package in the Ruby Standard Library
  • Shared secret encryption; an encrypted file can only be decrypted by a node or a user with the same shared secret
  • Recipes load encrypted data with access to the shared secret in a file on the node or from a URI path
  • Decrypts only data bag item values. Keys are encrypted but searchable
  • Data bag id value is unencrypted for tracking data bag items

Version 1

Chef Infra Client 11.0+

  • Version 0
  • Uses JSON serialization format instead of YAML to encrypt data bag items
  • Adds random initialization vector encryption for each value to protect against cryptanalysis

Version 2

Chef Infra Client 11.6+

  • Version 1
  • Option to disable versions 0 and 1
  • Adds Encrypt-then-MAC(EtM) protection

Version 3

Chef Infra Client 13.0+

  • Option to disable version 0, 1, and 2

Knife Options

knife can encrypt and decrypt data bag items when the knife data bag subcommand is run with the create, edit, from file, or show arguments and the following options:

OptionDescription
--secret SECRETThe encryption key that is used for values contained within a data bag item. If secret is not specified, Chef Infra Client looks for a secret at the path specified by the encrypted_data_bag_secret setting in the client.rb file.
--secret-file FILEThe path to the file that contains the encryption key.

Secret Keys

Encrypting a data bag item requires a secret key. A secret key can be created in any number of ways. For example, OpenSSL can be used to generate a random number, which can then be used as the secret key:

openssl rand -base64 512 | tr -d '\r\n' > encrypted_data_bag_secret

where encrypted_data_bag_secret is the name of the file which will contain the secret key. For example, to create a secret key named “my_secret_key”:

openssl rand -base64 512 | tr -d '\r\n' > my_secret_key

The tr command eliminates any trailing line feeds. Doing so avoids key corruption when transferring the file between platforms with different line endings.

Encrypt

A data bag item is encrypted using a knife command similar to:

knife data bag create passwords mysql --secret-file /tmp/my_data_bag_key

where “passwords” is the name of the data bag, “mysql” is the name of the data bag item, and “/tmp/my_data_bag_key” is the path to the location in which the file that contains the secret-key is located. knife will ask for user credentials before the encrypted data bag item is saved.

Verify Encryption

When the contents of a data bag item are encrypted, they will not be readable until they are decrypted. Encryption can be verified with a knife command similar to:

knife data bag show passwords mysql

where “passwords” is the name of the data bag and “mysql” is the name of the data bag item. This will return something similar to:

id:   mysql
pass:
cipher:         aes-256-cbc
encrypted_data: JZtwXpuq4Hf5ICcepJ1PGQohIyqjNX6JBc2DGpnL2WApzjAUG9SkSdv75TfKSjX4
iv:             VYY2qx9b4r3j0qZ7+RkKHg==
version:        1
user:
cipher:         aes-256-cbc
encrypted_data: 10BVoNb/plkvkrzVdybPgFFII5GThZ3Op9LNkwVeKpA=
iv:             uIqKHZ9skJlN2gpJoml6rQ==
version:        1

Decrypt

An encrypted data bag item is decrypted with a knife command similar to:

knife data bag show --secret-file /tmp/my_data_bag_key passwords mysql

that will return JSON output similar to:

{
  "id": "mysql",
  "pass": "thesecret123",
  "user": "fred"
}

Edit a Data Bag Item

A data bag can be edited in two ways: using knife or by using the Chef management console.

Edit a Data Bag with Knife

Use the edit argument to edit the data contained in a data bag. If encryption is being used, the data bag will be decrypted, the data will be made available in the $EDITOR, and then encrypted again before saving it to the Chef Infra Server.

To edit an item named “charlie” that is contained in a data bag named “admins”, enter:

knife data bag edit admins charlie

to open the $EDITOR. Once opened, you can update the data before saving it to the Chef Infra Server. For example, by changing:

{
   "id": "charlie"
}

to:

{
   "id": "charlie",
   "uid": 1005,
   "gid": "ops",
   "shell": "/bin/zsh",
   "comment": "Crazy Charlie"
}

Use Data Bags

Data bags can be accessed in the following ways:

Data bags store global variables as JSON data. Data bags are indexed for searching and can be loaded by a cookbook or accessed during a search.

Any search for a data bag (or a data bag item) must specify the name of the data bag and then provide the search query string that will be used during the search. For example, to use knife to search within a data bag named “admin_data” across all items, except for the “admin_users” item, enter the following:

knife search admin_data "(NOT id:admin_users)"

Or, to include the same search query in a recipe, use a code block similar to:

search(:admin_data, 'NOT id:admin_users')

It may not be possible to know which data bag items will be needed. It may be necessary to load everything in a data bag (but not know what “everything” is). Using a search query is the ideal way to deal with that ambiguity, yet still ensure that all of the required data is returned. The following examples show how a recipe can use a series of search queries to search within a data bag named “admins”. For example, to find every administrator:

search(:admins, '*:*')

Or to search for an administrator named “charlie”:

search(:admins, 'id:charlie')

Or to search for an administrator with a group identifier of “ops”:

search(:admins, 'gid:ops')

Or to search for an administrator whose name begins with the letter “c”:

search(:admins, 'id:c*')

Data bag items that are returned by a search query can be used as if they were a hash. For example:

charlie = search(:admins, 'id:charlie').first
# => variable 'charlie' is set to the charlie data bag item
charlie['gid']
# => "ops"
charlie['shell']
# => "/bin/zsh"

The following recipe can be used to create a user for each administrator by loading all of the items from the “admins” data bag, looping through each admin in the data bag, and then creating a user resource so that each of those admins exist:

admins = data_bag('admins')

admins.each do |login| admin = data_bag_item('admins', login) home = "/home/#{login}"

user(login) do uid admin['uid'] gid admin['gid'] shell admin['shell'] comment admin['comment'] home home manage_home true end end

And then the same recipe, modified to load administrators using a search query (and using an array to store the results of the search query):

admins = []

search(:admins, ':').each do |admin| login = admin['id']

admins << login

home = "/home/#{login}"

user(login) do uid admin['uid'] gid admin['gid'] shell admin['shell'] comment admin['comment']

home      home
manage_home <span style="color:#007020">true</span>

end end

Environments

Values that are stored in a data bag are global to the organization and are available to any environment. There are two main strategies that can be used to store per-environment data within a data bag: by using a top-level key that corresponds to the environment or by using separate items for each environment.

A data bag that is storing a top-level key for an environment might look something like this:

{
  "id": "some_data_bag_item",
  "production" : {
    # Hash with all your data here
  },
  "testing" : {
    # Hash with all your data here
  }
}

When using the data bag in a recipe, that data can be accessed from a recipe using code similar to:

data_bag_item[node.chef_environment]['some_other_key']

The other approach is to use separate items for each environment. Depending on the amount of data, it may all fit nicely within a single item. If this is the case, then creating different items for each environment may be a simple approach to providing per-environment values within a data bag. However, this approach is more time-consuming and may not scale to very large environments or when the data must be stored in many data bag items.

Recipes

Data bags can be accessed by a recipe in the following ways:

  • Loaded by name when using the Chef Infra Language. Use this approach when a only single, known data bag item is required.
  • Accessed through the search indexes. Use this approach when more than one data bag item is required or when the contents of a data bag are looped through. The search indexes will bulk-load all of the data bag items, which will result in a lower overhead than if each data bag item were loaded by name.

Load with Chef Infra Language

The Chef Infra Language provides access to data bags and data bag items (including encrypted data bag items) with the following methods:

  • data_bag(bag), where bag is the name of the data bag.
  • data_bag_item('bag_name', 'item', 'secret'), where bag is the name of the data bag and item is the name of the data bag item. If 'secret' is not specified, Chef Infra Client will look for a secret at the path specified by the encrypted_data_bag_secret setting in the client.rb file.

The data_bag method returns an array with a key for each of the data bag items that are found in the data bag.

Some examples:

To load the secret from a file:

data_bag_item('bag', 'item', IO.read('secret_file'))

To load a single data bag item named admins:

data_bag('admins')

The contents of a data bag item named justin:

data_bag_item('admins', 'justin')

will return something similar to:

# => {'comment'=>'Justin Currie', 'gid'=>1005, 'id'=>'justin', 'uid'=>1005, 'shell'=>'/bin/zsh'}

If item is encrypted, data_bag_item will automatically decrypt it using the key specified above, or (if none is specified) by the Chef::Config[:encrypted_data_bag_secret] method, which defaults to /etc/chef/encrypted_data_bag_secret.

Create and edit

Creating and editing the contents of a data bag or a data bag item from a recipe is not recommended. The recommended method of updating a data bag or a data bag item is to use knife and the knife data bag subcommand. If this action must be done from a recipe, please note the following:

  • If two operations concurrently attempt to update the contents of a data bag, the last-written attempt will be the operation to update the contents of the data bag. This situation can lead to data loss, so organizations should take steps to ensure that only one Chef Infra Client is making updates to a data bag at a time.
  • Altering data bags from the node when using the open source Chef Infra Server requires the node’s API client to be granted admin privileges. In most cases, this is not advisable.

and then take steps to ensure that any subsequent actions are done carefully. The following examples show how a recipe can be used to create and edit the contents of a data bag or a data bag item using the Chef::DataBag and Chef::DataBagItem objects.

To create a data bag from a recipe:

users = Chef::DataBag.new
users.name('users')
users.create

To create a data bag item from a recipe:

sam = {
  'id' => 'sam',
  'Full Name' => 'Sammy',
  'shell' => '/bin/zsh',
}
databag_item = Chef::DataBagItem.new
databag_item.data_bag('users')
databag_item.raw_data = sam
databag_item.save

To edit the contents of a data bag item from a recipe:

sam = data_bag_item('users', 'sam')
sam['Full Name'] = 'Samantha'
sam.save

Create users

Chef Infra Client can create users on systems based on the contents of a data bag. For example, a data bag named “admins” can contain a data bag item for each of the administrators that will manage the various systems that each Chef Infra Client is maintaining. A recipe can load the data bag items and then create user accounts on the target system with code similar to the following:

# Load the keys of the items in the 'admins' data bag
admins = data_bag('admins')

admins.each do |login|
  # This causes a round-trip to the server for each admin in the data bag
  admin = data_bag_item('admins', login)
  homedir = '/home/#{login}'

  # for each admin in the data bag, make a user resource
  # to ensure they exist
  user(login) do
    uid admin['uid']
    gid admin['gid']
    shell admin['shell']
    comment admin['comment']
    home homedir
    manage_home true
  end
end

# Create an "admins" group on the system
# You might use this group in the /etc/sudoers file
# to provide sudo access to the admins
group 'admins' do
  gid '999'
  members 'admins'
end

chef-solo

chef-solo can load data from a data bag as long as the contents of that data bag are accessible from a directory structure that exists on the same machine as chef-solo. The location of this directory is configurable using the data_bag_path option in the solo.rb file. The name of each sub-directory corresponds to a data bag and each JSON file within a sub-directory corresponds to a data bag item. Search is not available in recipes when they are run with chef-solo; use the data_bag() and data_bag_item() functions to access data bags and data bag items.

Note

Use the chef-solo-search cookbook library (developed by Chef community member “edelight” and available from GitHub) to add data bag search capabilities to a chef-solo environment: https://github.com/edelight/chef-solo-search.

Was this page helpful?