Skip to content

Instantly share code, notes, and snippets.

@benhosmer
Created September 11, 2012 09:24
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save benhosmer/3697171 to your computer and use it in GitHub Desktop.
Save benhosmer/3697171 to your computer and use it in GitHub Desktop.
Parsing yaml into a python dictionary.
f = open('user.yaml')
dataMap = yaml.load(f)
f.close()
print ""
print "=-----------="
print "dataMap is a ", type(dataMap), dataMap
print "=-----------="
print "main items are", type(dataMap['main']), dataMap['main']
print "=-----------="
print "main is a list, the first item is a dictionary", type(dataMap['main'][0]), dataMap['main'][0]
print "=-----------="
print "main[0] is a dict, the first item is", dataMap['main'][0]['users'][0]
"""
Sample YAML File:
- users:
- joe
- mike
- sally
- path:
- /Users/me/files/
- text.files
- search_phrases:
- Invalid users
- Failed password
"""
Copy link

ghost commented Apr 12, 2016

I am facing problem parsing multiple yaml files in directory. The for loop runs only for 1st file in directory.

Check my code

#!/usr/bin/env python

import os
import yaml
import copy

class iplMatch:
    # This is Match class.
    iMatchCount = 0
    iMatchID = 0
    strSourceFileName = 0
    def __init__(self, strFileName):
        print strFileName
        self.strSourceFileName = strFileName
        stream =  file(strFileName, 'r')
        yamlMatchData = yaml.safe_load(stream)
        self.strMatchVenue = yamlMatchData['info']['venue']
        self.strMatchCity = yamlMatchData['info']['city']
        self.strMatchDate = str(yamlMatchData['info']['dates'][0])
        self.strHomeTeam = yamlMatchData['info']['teams'][0]
        self.strAwayTeam = yamlMatchData['info']['teams'][1]
        self.strWinner = yamlMatchData['info']['outcome']['winner']
        self.strTossWinner = yamlMatchData['info']['toss']['winner']
        self.strTossDecision = yamlMatchData['info']['toss']['decision']
        self.strWinType = yamlMatchData['info']['outcome']['by']
        self.strMoM = yamlMatchData['info']['player_of_match'][0]


    def displayMatchDetails(self):
        # print "Match : ", self.strHomeTeam, " _vs_ ", self.strAwayTeam
        print "" \
              "Venue : ", self.strMatchVenue, ",", self.strMatchCity
        print "" \
              "On : ", self.strMatchDate
        print "" \
              "Toss Won by : ", self.strTossWinner, " Decided to ", self.strTossDecision
        print "" \
              "Won by ", self.strWinner, " by ", self.strWinType.keys()[0], " : ", self.strWinType.values()[0]
        print "" \
              "Man of the Match : ", self.strMoM
        print self.strSourceFileName

    def reset(self):
        self.strMatchVenue = 0
        self.strMatchCity = 0
        self.strMatchDate = 0
        self.strHomeTeam = 0
        self.strAwayTeam = 0
        self.strWinner = 0
        self.strTossWinner = 0
        self.strTossDecision = 0
        self.strWinType = 0
        self.strMoM = 0


#Start Main

# arrayIplMatches = []
root = 'C:/Users/pkya/Documents/DWH/ipl/'
FileList = os.listdir(root)
print FileList

for fn in FileList:
    print "inside for :", fn
    strAbsFilePath = root + fn
    print strAbsFilePath
    if os.path.isfile(strAbsFilePath):
        print "inside if"
        # os.path.join(root, f)
        iplMatchData = iplMatch(strAbsFilePath)
        # arrayIplMatches.append(copy.copy(iplMatchData))
        iplMatchData.displayMatchDetails()
        print "Done Display"
        print iplMatchData
        iplMatchData.reset()
        print "Done Reset"
        print iplMatchData
    break

now the directory specified contains 3000 yaml files

Copy link

ghost commented Apr 12, 2016

sample files

--- meta: data_version: 0.6 created: 2011-05-06 revision: 1 info: city: Bangalore competition: IPL dates: - 2008-04-18 match_type: T20 outcome: by: runs: 140 winner: Kolkata Knight Riders overs: 20 player_of_match: - BB McCullum teams: - Royal Challengers Bangalore - Kolkata Knight Riders toss: decision: field winner: Royal Challengers Bangalore umpires: - Asad Rauf - RE Koertzen venue: M Chinnaswamy Stadium

File 2
--- meta: data_version: 0.6 created: 2011-05-06 revision: 1 info: city: Bangalore competition: IPL dates: - 2008-04-18 match_type: T20 outcome: by: runs: 140 winner: Kolkata Knight Riders overs: 20 player_of_match: - BB McCullum teams: - Royal Challengers Bangalore - Kolkata Knight Riders toss: decision: field winner: Royal Challengers Bangalore umpires: - Asad Rauf - RE Koertzen venue: M Chinnaswamy Stadium

@beng
Copy link

beng commented Apr 21, 2016

@D-Pkya you have a break in the for loop causing the loop to end after the first iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment