Skip to content

Instantly share code, notes, and snippets.

@MichelleDalalJian
Created October 7, 2017 14:48
Show Gist options
  • Save MichelleDalalJian/4d630b054e647b2d61a5ed9bcc385f10 to your computer and use it in GitHub Desktop.
Save MichelleDalalJian/4d630b054e647b2d61a5ed9bcc385f10 to your computer and use it in GitHub Desktop.
Extracting Data With Regular Expressions Finding Numbers in a Haystack In this assignment you will read through and parse a file with text and numbers. You will extract all the numbers in the file and compute the sum of the numbers. Data Files We provide two files for this assignment. One is a sample file where we give you the sum for your testi…
import re
hand = open("regex_sum_24962.txt")
x=list()
for line in hand:
y = re.findall('[0-9]+',line)
x = x+y
sum=0
for z in x:
sum = sum + int(z)
print(sum)
@AJMagnus21
Copy link

I've tried everyone's code on this thread and I either get "0" or "35" . What is the f-ing solution to this thing. I've been stuck for a week and a half already

@Rea-mogetse
Copy link

No matter what code I use, I get the same sum amount of 34294980...
What am I supposed to do to get the actual number that ends in 695?? :(

`
import re

Open the file

with open('regex_sum.txt', 'r') as file:
# Read the contents of the file
file_contents = file.read()

# Find all integers using regular expression
integers = re.findall(r'\d+', file_contents)

# Convert extracted strings to integers
integers = [int(i) for i in integers]

# Sum up the integers
sum_integers = sum(integers)

# Print the result
print(f"The sum of all integers in the file is: {sum_integers}")

`

@AboufakerAli
Copy link

Finding Numbers in a Haystack

In this assignment you will read through and parse a file with text and numbers. You will extract all the numbers in the file and compute the sum of the numbers.

Data Files
We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.

Sample data: http://py4e-data.dr-chuck.net/regex_sum_42.txt (There are 90 values with a sum=445833)
Actual data: http://py4e-data.dr-chuck.net/regex_sum_1780792.txt (There are 83 values and the sum ends with 715)
These links open in a new window. Make sure to save the file into the same folder as you will be writing your Python program. Note: Each student will have a distinct data file for the assignment - so only use your own data file for analysis.
Data Format
The file contains much of the text from the introduction of the textbook except that random numbers are inserted throughout the text. Here is a sample of the output you might see:

Why should you learn to write programs? 7746
12 1929 8827
Writing programs (or programming) is a very creative
7 and rewarding activity. You can write programs for
many reasons, ranging from making your living to solving
8837 a difficult data analysis problem to having fun to helping 128
someone else solve a problem. This book assumes that
everyone needs to know how to program ...
The sum for the sample text above is 27486. The numbers can appear anywhere in the line. There can be any number of numbers in each line (including none).
Handling The Data
The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for a regular expression of '[0-9]+' and then converting the extracted strings to integers and summing up the integers.

Turn in Assignent

Enter the sum from the actual data and your Python code below:
Sum:
(ends with 715)

@AreebaYousuf
Copy link

try this it is working
image

@ChernetAsmamaw
Copy link

import re
fname = open(r'C:\Users\Lenovo\Desktop\Practice\owl.txt')
lst = list()
total = 0

for line in fname:
string_num = re.findall("[0-9]+", line)
if len(string_num) == 0:
continue
else:
for value in string_num:
total = total + int(value)
print(total)

#The total value printed should be one that ends with 97

@MiguelData2030
Copy link

Hola
esta es la solucion que se necesita para llegar al total de palabras

Codigo que necesitas:

import re
import urllib.request

url = 'http://py4e-data.dr-chuck.net/regex_sum_1747000.txt'
response = urllib.request.urlopen(url)
data = response.read().decode('utf-8')

total = 0
string_nums = re.findall('[0-9]+', data)
for value in string_nums:
total += int(value)

print(total)

Solucion

@collinsjie
Copy link

The following codes worked. Sum: 373584
import re
import urllib.request

Read the file from the URL

url = "http://py4e-data.dr-chuck.net/regex_sum_1786009.txt"
response = urllib.request.urlopen(url)
data = response.read().decode()

Find all numbers in the file using regular expression

numbers = re.findall('[0-9]+', data)

Convert the extracted strings to integers and calculate the sum

total_sum = sum(int(num) for num in numbers)

print("Sum:", total_sum)

@tahira2k16
Copy link

import re
hand = open("datafile.txt")
num_list = list()
for line in hand:
num= re.findall('[0-9]+', line)
num_list = num_list+num
sum=0
for x in num_list:
sum += int(x)
print(sum)

@CalebBaron
Copy link

Actual data: http://py4e-data.dr-chuck.net/regex_sum_1867584.txt (There are 121 values and the sum ends with 650)
Need help for this question, sum ends with 650

@HolmanCanas
Copy link

Hey guys, I've been giving a try to the problem and I got with this solution. Hope it can run without a problem :D

#practice file import re

user=input('Enter a file: ') try: file=open(user) except: print('File cannot be open') exit()

x=[]

sum=0
for line in file:
line=line.rstrip()
num=re.findall('[0-9]+', line)
for n in num:
sum=sum + int(n)
print(sum)

Actually, i´ve got some problems with this exercise (don´t know why but from the beginning it was a problem to make MS/DOS read it properly), here I got an alternative solution that gives me the exact sum

import re file_name=r'-Put Acces Route-' with open(file_name,'r', encoding='utf-8') as f: x=list() for line in f: y = re.findall('([0-9]+)',line) x = x+y sum=0 for z in x: sum = sum + int(z) print(sum)

i added a parenthesis on the re.findall expression and it worked perfectly

@HaseebWar
Copy link

Just run the code in Colab and upload your txt file by copy the content and save it as txt and call it in to the open function you should get your answer

import re
sum = 0
file = open('regex_sum_97406', 'r')
for line in file:
numbers = re.findall('[0-9]+', line)
if not numbers:
continue
else:
for number in numbers:
sum += int(number)
print(sum)

@neeshu144035
Copy link

Here is the really working code

import re
m=list()
hand=open('regex_sum_1876332.txt')
for line in hand:
y=re.findall('[0-9]+',line)
for i in range(0,len(y)):
m.append(y[i])

sum=0
for i in range(0,len(m)):
sum=sum+int(m[i])
print(sum)

@kingpa1919
Copy link

kingpa1919 commented Apr 7, 2024

Screenshot 2024-04-07 131036
Screenshot 2024-04-07 142700
Screenshot 2024-04-07 142913
This week’s project will focus on extracting meaningful information from a typical log file.

Please use the file redhat.txt Download redhat.txtfor this assignment.

Your assignment is to develop a python script that:

Prompts the user for a file to process.
Open the file and iterate through each line of the file, catch and report any errors that occur.
For each line in the file:
If the line identifies a specific worm that was detected, then record the unique name of the worm.
Keep track of the number of occurrences of each unique worm.
Once all the lines have been processed produce a prettytable result that includes the name of each unique worm and number of occurrences that were identified. The prettytable should be sorted by highest occurring worms.
Your script must be commented in detail.
Submit one file logScan.py. In addition, submit a screenshot of successful execution of the script.

Any help as soon as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment