-
Write a function
char_freq()
that takes a string and builds a frequency listing of the characters contained in it. Represent the frequency listing as a Python dictionary. Try it with something likechar_freq("abbabcbdbabdbdbabababcbcbab")
. -
Write a function
char_freq_table()
that take a file name as argument, builds a frequency listing of the characters contained in the file, and prints a sorted and nicely formatted character frequency table to the screen. -
The third person singular verb form in English is distinguished by the suffix
-s
, which is added to the stem of the infinitive form:run
->runs
. A simple set of rules can be given as follows:a. If the verb ends in
y
, remove it and addies
b. If the verb ends ino
,ch
,s
,sh
,x
orz
, addes
c. By default just adds
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
We can make this file beautiful and searchable if this error is corrected: It looks like row 4 should actually have 27 columns, instead of 25 in line 3.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DBNLti_id DBNLpers_id YearFirstPublished YearEditionPublished Edition Woman Born Died AuthorOrigin DBNLgeb_land_code DBNLgenre DBNLsubgenre Author Title Filename ti_id_set WPAuthor AuthorInCanon2002 TitleInCanon2002 InBasisbibliotheek2008 AuthorDBRDMatches AuthorNLWikipedia2019Matches DBNLSecRefsAuthor DBNLSecRefsTitle holding lending GNTpages | |
kist001leve01 kist001 1800 1800 1ste druk 0 1758 1841 Woerden proza roman Willem Kist Het leven, gevoelens en zonderlinge reize van den landjonker Govert Hendrik Godefroi van Blankenheim tot den Stronk (2 delen) kist001leve01_01.xml kist001leve01 0 0 0 0 1 19 1 0 0 4 | |
wolf016gesc01 deke001 1802 1802 1ste druk 1 1741 1804 Amstelveen proza roman Aagje Deken Geschrift eener bejaarde vrouw wolf016gesc01_01.xml wolf016gesc01 Aagje Deken 1 0 0 1 21 131 6 0 0 0 | |
stre001char01 stre001 1804 1804 1ste druk 1 1760 1828 Amsterdam proza briefroman Naatje van Streek-Brinkman Charakters en lotgevallen van Adelson, Héloïse en Elius stre001char01_01.xml stre001char01 0 0 0 0 0 13 0 0 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This is a comment | |
FROM ubuntu:20.04 | |
MAINTAINER Andreas van Cranenburgh <a.w.vancranenburgh@uva.nl> | |
RUN ln -fs /usr/share/zoneinfo/Europe/Amsterdam /etc/localtime | |
ENV DEBIAN_FRONTEND=noninteractive | |
RUN apt-get update && apt-get install -y \ | |
build-essential \ | |
curl \ | |
git \ | |
python3 \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import random | |
from timeit import timeit | |
import re | |
import re2 | |
re_ip = re.compile(br'\d+\.\d+\.\d+\.\d+') | |
re2_ip = re2.compile(br'\d+\.\d+\.\d+\.\d+') | |
lines = ['.'.join(str(random.randint(1, 255)) for _ in range(4)).encode('utf8') | |
for _ in range(16000)] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import datetime | |
def addseconds(timestamp, seconds): | |
"""Take timestamp as string and add seconds to it. | |
>>> addseconds('00:01:45,667', 1) | |
'00:01:46,667' | |
>>> addseconds('00:01:45,667', 0.5) | |
'00:01:46,167' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
filename | lang | confidence | read_bytes | |
---|---|---|---|---|
train/neg/3706_2.txt | en | 81.0 | 1268 | |
train/neg/9466_1.txt | en | 99.0 | 1066 | |
train/neg/6464_2.txt | en | 99.0 | 1248 | |
train/neg/14850_2.txt | en | 99.0 | 1128 | |
train/neg/4674_2.txt | en | 99.0 | 1306 | |
train/neg/7036_1.txt | fy | 68.0 | 997 | |
train/neg/7454_2.txt | en | 63.0 | 688 | |
train/neg/4856_2.txt | en | 99.0 | 1363 | |
train/neg/12096_2.txt | en | 99.0 | 1339 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Apply polyglot language detection to all .txt files under current directory | |
(searched recursively), write report in tab-separated file detectedlangs.tsv. | |
""" | |
import os | |
from glob import glob | |
from polyglot.detect import Detector | |
from polyglot.detect.base import UnknownLanguage | |
def main(): |
-
Define a function
max()
that takes two numbers as arguments and returns the largest of them. Use the if-then-else construct available in Python. (It is true that Python has themax()
function built in, but writing it yourself is nevertheless a good exercise). -
Define a function
max_of_three()
that takes three numbers as arguments and returns the largest of them. -
Define a function that computes the length of a given list or string. (It is true that Python has the
len()
function built in, but writing it yourself is nevertheless a good exercise). -
Write a function that takes a character (i.e. a string of length 1) and returns
True
if it is a vowel,False
otherwise.
NewerOlder