Skip to content

Instantly share code, notes, and snippets.

@tombaker
Last active April 11, 2016 15:18
Show Gist options
  • Save tombaker/e16d119af39bf176912a5fab2908c498 to your computer and use it in GitHub Desktop.
Save tombaker/e16d119af39bf176912a5fab2908c498 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Open file and read contents as one long string"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"\"Line 1\\nSecond line\\nThird Example line\\n\""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"File.open(\"file.txt\") do |file|\n",
" contents = file.read\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* File is not closed\n",
"* Entire contents are returned, as one long string\n",
"* Uses two methods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read file as one long string (one method!)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"\"Line 1\\nSecond line\\nThird Example line\\n\""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"contents = File.read(\"file.txt\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* File is _closed_\n",
"* Entire contents are returned as one long string\n",
"* Uses just one method: File.read"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Open file, read contents into array of lines"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[\"Line 1\\n\", \"Second line\\n\", \"Third Example line\\n\"]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"file = File.open(\"file.txt\")\n",
"lines = file.readlines"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[\"Line 1\\n\", \"Second line\\n\", \"Third Example line\\n\"]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"File.open(\"file.txt\").readlines"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* File object probably not closed? How does one tell?\n",
"* As explained on [stackoverflow.com](http://stackoverflow.com/questions/4795447/rubys-file-open-and-the-need-for-f-close), experienced Rubyists either explicitly close their files, or, more idiomatically, use the block form of File.open, which automatically closes the file for you.\n",
"* Scripts are a special case. Scripts generally run so short, and use so few file descriptors that it simply doesn't make sense to close them, since the operating system will close them anyway when the script exits.\n",
"* I/O streams are automatically closed when they are claimed by the garbage collector.\n",
"* After the GC has collected the object, there is no way for you to close the file anymore, and thus you would leak file descriptors. Actually, it's not the garbage collector that closes the files. The garbage collector simply executes any finalizers for an object before it collects it. It just so happens that the File class defines a finalizer which closes the file.\n",
"* Wasted memory is cheap, but wasted file descriptors aren't. Therefore, it doesn't make sense to tie the lifetime of a file descriptor to the lifetime of some chunk of memory. You simply cannot predict when the garbage collector will run. You cannot even predict if it will run at all: if you never run out of memory, the garbage collector will never run, therefore the finalizer will never run, therefore the file will never be closed."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[\"Line 1\\n\", \"Second line\\n\", \"Third Example line\\n\"]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"File.open(\"file.txt\") do |file|\n",
" lines = file.readlines\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* File is closed\n",
"* Entire contents are returned as an array of lines\n",
"* Uses two methods: File.open and File::readlines"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Open file, block reads contents into array of lines then loops over lines"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[\"Line 1\\n\", \"Second line\\n\", \"Third Example line\\n\"]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"File.open(\"file.txt\") do |file|\n",
" contents = file.readlines\n",
" contents.each do |line|\n",
" if line.start_with?(\"c\")\n",
" puts line\n",
" end\n",
" end\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Holds entire contents in memory\n",
"* Uses four methods: File.open, File::readlines, Array::each, String::start_with?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Open file, block steps thru line by line"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Line 1\n",
"Second line\n",
"Third Example line\n"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"File.open(\"file.txt\") do |file|\n",
" file.each_line do |line|\n",
" if line.start_with?(\"c\")\n",
" puts line\n",
" end\n",
" end\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Interesting output: the result of evaluating file.each_line! (See below)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Line 1\n",
"Second line\n",
"3rd Example line\n"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"File.open(\"example.txt\") do |file|\n",
" file.each_line do |line|\n",
" end\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Line 1\n",
"Second line\n",
"Third Example line\n"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = File.open(\"file.txt\") do |file|\n",
" file.each_line do |line|\n",
" end\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"#<File:0x007f871b1264d0>\n"
]
}
],
"source": [
"puts x"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### __Shortcut!__ Step thru file line-by-line (without pulling into memory)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Third Example line\n",
"\n"
]
}
],
"source": [
"File.foreach(\"file.txt\") do |line|\n",
" if line.start_with?(\"T\")\n",
" puts line\n",
" end\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Does not require us to open the file!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{\"2016-01-12\"=>[\"Mklist-rules-regex.txt\"], \"2016-01-22\"=>[\"Reading_from_files.ipynb\", \"file.txt\", \"lsfiles.txt\"], \"2016-01-18\"=>[\"Ruby-poster.txt\"], \"2016-01-19\"=>[\"ruby-fileopen-options.txt\"]}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myfiles =\n",
" File.open(\"lsfiles.txt\") do |lsentry|\n",
" lsentry\n",
" .map { |line| { :date => line.split[1], :name => line.split[4] } }\n",
" .group_by { |request| request[:date] }\n",
" .each { |date, names| names.map! { |r| r[:name]}}\n",
" end"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"2016-01-12\"\u001b[0;37m => \u001b[0m[\n",
" \u001b[1;37m[0] \u001b[0m\u001b[0;33m\"Mklist-rules-regex.txt\"\u001b[0m\n",
" ],\n",
" \"2016-01-22\"\u001b[0;37m => \u001b[0m[\n",
" \u001b[1;37m[0] \u001b[0m\u001b[0;33m\"Reading_from_files.ipynb\"\u001b[0m,\n",
" \u001b[1;37m[1] \u001b[0m\u001b[0;33m\"file.txt\"\u001b[0m,\n",
" \u001b[1;37m[2] \u001b[0m\u001b[0;33m\"lsfiles.txt\"\u001b[0m\n",
" ],\n",
" \"2016-01-18\"\u001b[0;37m => \u001b[0m[\n",
" \u001b[1;37m[0] \u001b[0m\u001b[0;33m\"Ruby-poster.txt\"\u001b[0m\n",
" ],\n",
" \"2016-01-19\"\u001b[0;37m => \u001b[0m[\n",
" \u001b[1;37m[0] \u001b[0m\u001b[0;33m\"ruby-fileopen-options.txt\"\u001b[0m\n",
" ]\n",
"}\n"
]
}
],
"source": [
"require 'awesome_print'\n",
"ap myfiles"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Other notes...\n",
"\n",
"```\n",
"----------------------------------------------------------------------\n",
">> IO.foreach(\"/Users/tbaker/Dropbox/ff/mbox\") do |line|\n",
" | puts line if line =~ /oclc.org/\n",
" | end\n",
" <D427D963-89CC-473B-8B07-69A79C7AE6DA@oclc.org> ... ...\n",
"\n",
" Class method IO.foreach iterates over a file one line at a time.\n",
"\n",
" In this case (ie, yielding to the block), the file need not be opened in the code.\n",
"```\n",
"----------------------------------------------------------------------\n",
"Class Dog\n",
" attr_attribute getter and setter\n",
" def initialize(name = \"Rover\") # parameter\n",
" @name = name # instance variable = local variable\n",
"\n",
"Cutover = dog.new(\"spot\") # argument\n",
"\n",
"----------------------------------------------------------------------\n",
"```\n",
">> arr = IO.readlines(\"/Users/tbaker/uu/agenda/calendar\")\n",
"=> [\"= 2016-01-02 Sat 1720-0710+ UA0989 Washington/Dulles/IAD-Frankfurt\\n\"]\n",
"```\n",
"\n",
"Read entire file into array without opening file. Method IO.readlines opens and\n",
"closes file on its own.\n",
"\n",
"----------------------------------------------------------------------\n",
"Best way to use File.open is \n",
"with a code block -- need not remember to close file when done.\n",
"\n",
"The open method \n",
"-- creates a new File object, \n",
"-- passes it to your code block, and \n",
"-- closes the file automatically after your code block runs -- \n",
"-- even if your code throws an exception.\n",
"\n",
"You could rely on the Ruby interpreter's garbage collection to close the file\n",
"once it's no longer being used, but Ruby makes it easy to do things the right\n",
"way.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Ruby 2.2.3",
"language": "ruby",
"name": "ruby"
},
"language_info": {
"file_extension": ".rb",
"mimetype": "application/x-ruby",
"name": "ruby",
"version": "2.2.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment