Skip to content

Instantly share code, notes, and snippets.

@pkuczynski
Last active September 13, 2024 07:56
Show Gist options
  • Save pkuczynski/8665367 to your computer and use it in GitHub Desktop.
Save pkuczynski/8665367 to your computer and use it in GitHub Desktop.
Read YAML file from Bash script
#!/bin/sh
parse_yaml() {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
sed -ne "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 |
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]}}
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
}
}'
}
#!/bin/sh
# include parse_yaml function
. parse_yaml.sh
# read yaml file
eval $(parse_yaml zconfig.yml "config_")
# access yaml content
echo $config_development_database
development:
adapter: mysql2
encoding: utf8
database: my_database
username: root
password:
@wadewegner
Copy link

The following update solves my issue ...

parse_yaml() {
  
  local prefix=$2
  local s='[[:space:]]*'
  local w='[a-zA-Z0-9_]*'
  local fs=$(echo @|tr @ '\034')
  
  sed "h;s/^[^:]*//;x;s/:.*$//;y/-/_/;G;s/\n//" $1 |
  sed -ne "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
      -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" |
  awk -F$fs '{
    indent = length($1)/2;
    vname[indent] = $2;

    for (i in vname) {if (i > indent) {delete vname[i]}}
    if (length($3) > 0) {
        vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
        printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
    }
  }'
}

... but I've noticed that arrays like the following doesn't work:

data-plans:
  - ./data/YourCustomObject1__c-plan.json
  - ./data/YourCustomObject2__c-plan.json

@4383
Copy link

4383 commented Feb 12, 2018

Niet is a tool that help you to extract data from json or yaml file directly in your shell/bash CLI.

$ pip install niet

Consider a yaml file named project.yaml with the following contents:

project:
   meta:
       name: project-sample

You can use niet like this:

$ PROJECT_NAME=$(niet project.yaml project.meta.name)
$ echo ${PROJECT_NAME}
project-sample

@wadewegner
Copy link

wadewegner commented Feb 12, 2018

What if it was proj-ect instead of project? That's what I'm trying to solve.

UPDATE: it works!

Here's a test.yaml:

test: "test output"
my-test: "my-test output"
my-array:
  - one
  - two

And the output:

❯ niet test.yml test
test output
❯ niet test.yml my-test
my-test output
❯ niet test.yml my-array
'one' 'two'

Sweet!

@4383
Copy link

4383 commented Apr 5, 2018

@wadewegner Thanks for reply!

A new stable release of niet (1.0.0) is available and fix some bad behaviors like list handling (remove ticle).

You can install/update this by using:

$ pip install -U niet

Consider your example test.yaml:

$ cat test.yaml
# test.yaml
test: "test output"
my-test: "my-test output"
my-array:
  - one
  - two

New niet behavior with update corresponding to your commands:

$ niet test.yml test
test output
$ niet test.yml my-test
my-test output
$ niet test.yml my-array # remove ticle on items
one two
$ for el in $(niet test.yaml my-array); do echo ${el}; done
one
two

Manage niet errors more properly:

$ for el in $(niet test.yaml my-array); do echo ${el}; done
one
two
$ echo $? # exit code is now available so you can handle errors more properly
0
$ niet test.yaml element.not.found
Element not found: element.not.found
$ echo $? # error during execution
1
$ niet fake.yaml test
Yaml file not found! Abort!
$ echo $? # error during execution
1
$ # You can now deal correctly with errors

Advanced usages

$ RESULTS=$(niet test.yaml element.not.found)
$ if [ "$?" == "1" ]; then echo ${RESULTS}; else echo "it's work!"; fi
Element not found: element.not.found
$ RESULTS=$(niet test.yaml my-array)
$ if [ "$?" == "1" ]; then echo ${RESULTS}; else echo "it's work!"; fi
it's work!
$ # implement your own treatments on available results

Also you can test with samples available in niet source code:

$ git clone https://github.com/gr0und-s3ct0r/niet.git
$ cd niet
$ pipenv shell
$ python setup.py install 
$ niet tests/samples/sample.yaml project.meta.tags
example sample for testing purpose

@jainapatel13
Copy link

If my .ymal file is in the subdirectory how to parse that? my makefile is in top directory and my parse_yml.sh and .ymal files are in subdirectory.

@v1k0d3n
Copy link

v1k0d3n commented Apr 29, 2018

i've been using @jasperes repo with some really good success. for a bash-only method of reading, parsing yaml into variables it's been working great! thanks for the work. i'll have to try out niet as well and see how that turns out. at least in my case, i'm obsessively trying to limit other language requirements as my intention is to run this on container linux. my whole project will switch to golang soon anyway, but @jasperes solution was great for laying out the general concepts for my team. i really appreciate this thread! thanks, everyone.

@byronmansfield
Copy link

This is fantastic! Thank you all who have contributed to this. It was exactly what I was looking for. I think it was mentioned above by @wadewegner, it has issues with hyphenated hash keys such as proj-ect:. wadewegner you said that your first pure bash change worked for you, but had array issues. I was unsuccessful with using your approach. And unfortunately I am not able to introduce other tooling like python packages such as niet into my project. Has anyone been able to get this working for hyphenated keys?

@misterboe
Copy link

How could i "echo" all given variables?

@BGMP
Copy link

BGMP commented Feb 3, 2019

Thank you very much!

@djentlguy
Copy link

djentlguy commented Mar 13, 2019

Trying to use Martins modified parser from stckovrflow. Input file:

schemas:
- name: exports
  tables:
  - name: wolverine
    description: sample message here
      may overflow next line
    active_date: 2019-01-07 00:00:00
    columns:
    - name: height
      type: timestamp without time zone
    - name: strength
      type: bigint
      description: sample message here
        may overflow next line
      example: 21352352
    - name: power
      type: bigint
      description: sample message here
        may overflow next line
      example: 10001
  - name: cyclops
    description: sample message here
      may overflow next line
    active_date: 2018-12-15 00:00:00
    columns:
    - name: size
      type: datetime
      description: sample message here
        may overflow next line
      example: 2018-03-03 12:30:00
    - name: power
      type: timestamp without time zone
      description: sample message here
        may overflow next line

Parse function:

function parse_yaml {
   local prefix=$2
   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|,$s\]$s\$|]|" \
        -e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|\1\2: [\3]\n\1  - \4|;t1" \
        -e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|\1\2:\n\1  - \3|;p" $1 | \
   sed -ne "s|,$s}$s\$|}|" \
        -e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|\1- {\2}\n\1  \3: \4|;t1" \
        -e    "s|^\($s\)-$s{$s\(.*\)$s}|\1-\n\1  \2|;p" | \
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)-$s[\"']\(.*\)[\"']$s\$|\1$fs$fs\2|p" \
        -e "s|^\($s\)-$s\(.*\)$s\$|\1$fs$fs\2|p" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" | \
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]; idx[i]=0}}
      if(length($2)== 0){  vname[indent]= ++idx[indent] };
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) { vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, vname[indent], $3);
      }
   }'
}

Current Output :

1="name: exports"
1_1="name: wolverine"
1_1_description="sample message here"
1_1_active_date="2019-01-07 00:00:00"
1_1_1="name: height"
1_1_1_type="timestamp without time zone"
1_1_2="name: strength"
1_1_2_type="bigint"
1_1_2_description="sample message here"
1_1_2_example="21352352"
1_1_3="name: power"
1_1_3_type="bigint"
1_1_3_description="sample message here"
1_1_3_example="10001"
1_2="name: cyclops"
1_2_description="sample message here"
1_2_active_date="2018-12-15 00:00:00"
1_2_1="name: size"
1_2_1_type="datetime"
1_2_1_description="sample message here"
1_2_1_example="2018-03-03 12:30:00"
1_2_2="name: power"
1_2_2_type="timestamp without time zone"
1_2_2_description="sample message here"

But I need to get the values in nested format csv instead of synthetic numbered markers. The idea is to populate a nested relationship from parent to child elements . Anything in the same level will be written delimited by comma like below:

Wolverine.height,"timestamp without time zone",,
Wolverine.strength,bigint,"sample message here may overflow next line",21352352
Wolverine.power,bigint,"sample message here may overflow next line",10001
Cyclops.size,datetime,"sample message here may overflow next line","2018-03-03 12:30:00"
Cyclops.power,"timestamp without time zone",something,"sample message here may overflow next line",,

How can I reformat it? Can someone help ?

@tec82263
Copy link

Hi,
I tried to use this in Jenkins for job config parsing.
I ran successfully with Python 2.6.6 but not with Python 2.7.5
Help is really appreciated.

Ran with Python 2.6.6

  • echo 'Get Job Properties'
    Get Job Properties
  • /usr/bin/python -V
    Python 2.6.6
  • cd /bms/webapps/jenkins/workspace/pro-group/test/yaml_parser
  • . parse_yaml.sh
    ++ parse_yaml access_info.yaml accessconfig_

Ran with Python 2.7.5

  • echo 'Get Job Properties'
    Get Job Properties
  • /usr/bin/python -V
    Python 2.7.5
  • cd /bms/webapps/jenkins/workspace/pro-group/test/yaml_parser
  • . parse_yaml.sh
    /bms/webapps/jenkins/jenkins881879139786658308.sh: line 29: .: parse_yaml.sh: file not found
    Build step 'Execute shell' marked build as failure

@TenzinCando
Copy link

Thank you for publishing! Helped save alot of time 👍

@4383
Copy link

4383 commented Oct 9, 2019

Hello,

niet now support eval output format so you can do things like that:

$ pip install -U niet
$ echo '{"foo-biz": "bar", "fizz": {"buzz": ["zero", "one", "two", "three"]}}' | niet -f eval .
 foo_biz="bar";fizz__buzz=( zero one two three )
$ eval $(echo '{"foo-biz": "bar", "fizz": {"buzz": ["zero", "one", "two", "three"]}}' | niet -f eval .)
$ echo ${foo_biz}
bar
$ echo ${fizz__buzz}
zero one two three
$ eval $(echo '{"foo-biz": "bar", "fizz": {"buzz": ["zero", "one", "two", "three"]}}' | niet -f eval '"foo-biz"'); echo ${foo_biz}
bar
$ echo '{"foo-biz": "bar", "fizz": {"buzz": ["zero", "one", "two", "three"]}}' | niet -f eval fizz.buzz
fizz_buzz=( zero one two three );
$ eval $(echo '{"foo-biz": "bar", "fizz": {"buzz": ["zero", "one", "two", "three"]}}' | niet -f eval fizz.buzz)
$ for el in ${fizz_buzz}; do echo $el; done
zero
one
two
three

niet work with JSON and YAML input, also you can convert each format to each other, just by using pip install -U niet.

@mundrusatish
Copy link

mundrusatish commented Nov 21, 2019

Thank you. Can we validate that the username as below is mandatory and have value, non empty field. Right now it passed.
development:
adapter: mysql2
encoding: utf8
database: my_database
username:
if anyone tried, please share. Thanks much

@bukowa
Copy link

bukowa commented Jan 26, 2020

@babubalagani
Copy link

@pkuczynski, It's work for me as well to read the key in yaml file. Thanks for that. Do you have the script to replace the value after reading? i have to replace the each value which i reads .

@inieves
Copy link

inieves commented Apr 25, 2020

Hi! What is the expected output from running test.sh? Perhaps listing an output file above would be useful.

@sonisrje
Copy link

Works great.

@Nawanop-AMNB
Copy link

nice script!

@niekwit
Copy link

niekwit commented Feb 14, 2021

Works really well, thank you!

@harryberto
Copy link

description: sample message here
may overflow next line

I have a similar issue how could we store description variable that may overflow into the next line in the same

1_2_1_description="sample message here"
TO
1_2_1_description="sample message here may overflow next line"

Thank you!

@aahnik
Copy link

aahnik commented Apr 3, 2021

I think python is superior in this scripting situation. why on earth anyone would use a bash script to do such a thing? python is generally pre-installed in mac and Linux.

python is so much clean and readable.

pip install pyyaml

import yaml

FILENAME = 'your_file.yml'

with open(FILENAME) as file:
    data:dict = yaml.full_load(file)

# data is a python dictionary

@flee2free
Copy link

The traversal logic is super. Love it !! Sometimes its just fun to do these crazy stuff with Bash.

@aleon1220
Copy link

jq for JSON and yq for YAML
https://github.com/mikefarah/yq

@kokosowy
Copy link

kokosowy commented Sep 8, 2021

Hi all! This is very interested thread. Anybody knows how to make it working where string spans several lines (through "|" pipe), example:

key1: |
   1st line
   2nd line
   3rd line
key2: |
   1st line
   2nd line
   3rd line

@sonjz
Copy link

sonjz commented Oct 28, 2021

thanks for this.
added this to end after awk as a hack not to interpret $ interpolation (application: reading serverless.yml config into bash)

 | tr "$" "#" # don't want to interpret $

@Nigam8972
Copy link

i cant figure out why its not running succesfully for me whenever i execute ./test.sh it gives me error
./test.sh
./test.sh: line 4: .: parse_yaml.sh: file not found
./test.sh: line 7: parse_yaml: command not found

@Nigam8972
Copy link

has anybody else faced this issue please help

@potes74
Copy link

potes74 commented Jan 30, 2022

@Nigam8972 try with
. ./parse_yaml.sh
if they are in the same folder (it's what I use)

@ppenguin
Copy link

ppenguin commented Mar 2, 2023

Cool, just discovered this; of course the more recent suggestions are more robust and feature-rich, but I found myself needing this on constrained (i.e. busybox, sh and no python) systems where going POSIX-sh only is the way to go.

The method here by @wadewegner (or similar) to pre-process problematic keys (but not their values) was the way to go, i.e. by using "advanced" sed hold-space etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment