Skip to content

Instantly share code, notes, and snippets.

@aaronj1335
Last active August 29, 2015 14:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aaronj1335/11327697 to your computer and use it in GitHub Desktop.
Save aaronj1335/11327697 to your computer and use it in GitHub Desktop.
can i make this faster by reducing string copies? are there other things i can do to make this faster?

i'm inputting a file of sentences where words are tagged with part-of-speech. each line is a sentence, and each word is tagged w/ a trailing slash and the part of speech:

Mr./NNP Vinken/NNP is/VBZ chairman/NN of/IN Elsevier/NNP N.V./NNP ,/, the/DT Dutch/NNP publishing/VBG group/NN ./.
There/EX is/VBZ no/DT asbestos/NN in/IN our/PRP$ products/NNS now/RB ./. ''/''

i made the iterator above to input a file of these and format them into a vector of word/part-of-speech pairs. can i improve its performance?

Pierre /NNP
Vinken /NNP
, /,
61 /CD
years /NNS
old /JJ
, /,
will /MD
join /VB
the /DT
board /NN
as /IN
a /DT
nonexecutive /JJ
director /NN
Nov. /NNP
29 /CD
. /.
Mr. /NNP
Vinken /NNP
is /VBZ
chairman /NN
of /IN
Elsevier /NNP
N.V. /NNP
, /,
the /DT
Dutch /NNP
publishing /VBG
group /NN
. /.
Rudolph /NNP
Agnew /NNP
, /,
55 /CD
years /NNS
old /JJ
and /CC
former /JJ
chairman /NN
of /IN
Consolidated /NNP
Gold /NNP
Fields /NNP
PLC /NNP
, /,
was /VBD
named /VBN
a /DT
nonexecutive /JJ
director /NN
of /IN
this /DT
British /JJ
industrial /JJ
conglomerate /NN
. /.
A /DT
form /NN
of /IN
asbestos /NN
once /RB
used /VBN
to /TO
make /VB
Kent /NNP
cigarette /NN
filters /NNS
has /VBZ
caused /VBN
a /DT
high /JJ
percentage /NN
of /IN
cancer /NN
deaths /NNS
among /IN
a /DT
group /NN
of /IN
workers /NNS
exposed /VBN
to /TO
it /PRP
more /RBR
than /IN
30 /CD
years /NNS
ago /IN
, /,
researchers /NNS
reported /VBD
. /.
The /DT
asbestos /NN
fiber /NN
, /,
crocidolite /NN
, /,
is /VBZ
unusually /RB
resilient /JJ
once /IN
it /PRP
enters /VBZ
the /DT
lungs /NNS
, /,
with /IN
even /RB
brief /JJ
exposures /NNS
to /TO
it /PRP
causing /VBG
symptoms /NNS
that /WDT
show /VBP
up /IN
decades /NNS
later /JJ
, /,
researchers /NNS
said /VBD
. /.
Lorillard /NNP
Inc. /NNP
, /,
the /DT
unit /NN
of /IN
New /JJ
York-based /JJ
Loews /NNP
Corp. /NNP
that /WDT
makes /VBZ
Kent /NNP
cigarettes /NNS
, /,
stopped /VBD
using /VBG
crocidolite /NN
in /IN
its /PRP$
Micronite /NN
cigarette /NN
filters /NNS
in /IN
1956 /CD
. /.
Although /IN
preliminary /JJ
findings /NNS
were /VBD
reported /VBN
more /RBR
than /IN
a /DT
year /NN
ago /IN
, /,
the /DT
latest /JJS
results /NNS
appear /VBP
in /IN
today /NN
's /POS
New /NNP
England /NNP
Journal /NNP
of /IN
Medicine /NNP
, /,
a /DT
forum /NN
likely /JJ
to /TO
bring /VB
new /JJ
attention /NN
to /TO
the /DT
problem /NN
. /.
A /DT
Lorillard /NNP
spokewoman /NN
said /VBD
, /,
`` /``
This /DT
is /VBZ
an /DT
old /JJ
story /NN
. /.
We /PRP
're /VBP
talking /VBG
about /IN
years /NNS
ago /IN
before /IN
anyone /NN
heard /VBD
of /IN
asbestos /NN
having /VBG
any /DT
questionable /JJ
properties /NNS
. /.
There /EX
is /VBZ
no /DT
asbestos /NN
in /IN
our /PRP$
products /NNS
now /RB
. /.
'' /''
Pierre /NNP
Vinken /NNP
, /,
61 /CD
years /NNS
old /JJ
, /,
will /MD
join /VB
the /DT
board /NN
as /IN
a /DT
nonexecutive /JJ
director /NN
Nov. /NNP
29 /CD
. /.
Mr. /NNP
Vinken /NNP
is /VBZ
chairman /NN
of /IN
Elsevier /NNP
N.V. /NNP
, /,
the /DT
Dutch /NNP
publishing /VBG
group /NN
. /.
Rudolph /NNP
Agnew /NNP
, /,
55 /CD
years /NNS
old /JJ
and /CC
former /JJ
chairman /NN
of /IN
Consolidated /NNP
Gold /NNP
Fields /NNP
PLC /NNP
, /,
was /VBD
named /VBN
a /DT
nonexecutive /JJ
director /NN
of /IN
this /DT
British /JJ
industrial /JJ
conglomerate /NN
. /.
A /DT
form /NN
of /IN
asbestos /NN
once /RB
used /VBN
to /TO
make /VB
Kent /NNP
cigarette /NN
filters /NNS
has /VBZ
caused /VBN
a /DT
high /JJ
percentage /NN
of /IN
cancer /NN
deaths /NNS
among /IN
a /DT
group /NN
of /IN
workers /NNS
exposed /VBN
to /TO
it /PRP
more /RBR
than /IN
30 /CD
years /NNS
ago /IN
, /,
researchers /NNS
reported /VBD
. /.
The /DT
asbestos /NN
fiber /NN
, /,
crocidolite /NN
, /,
is /VBZ
unusually /RB
resilient /JJ
once /IN
it /PRP
enters /VBZ
the /DT
lungs /NNS
, /,
with /IN
even /RB
brief /JJ
exposures /NNS
to /TO
it /PRP
causing /VBG
symptoms /NNS
that /WDT
show /VBP
up /IN
decades /NNS
later /JJ
, /,
researchers /NNS
said /VBD
. /.
Lorillard /NNP
Inc. /NNP
, /,
the /DT
unit /NN
of /IN
New /JJ
York-based /JJ
Loews /NNP
Corp. /NNP
that /WDT
makes /VBZ
Kent /NNP
cigarettes /NNS
, /,
stopped /VBD
using /VBG
crocidolite /NN
in /IN
its /PRP$
Micronite /NN
cigarette /NN
filters /NNS
in /IN
1956 /CD
. /.
Although /IN
preliminary /JJ
findings /NNS
were /VBD
reported /VBN
more /RBR
than /IN
a /DT
year /NN
ago /IN
, /,
the /DT
latest /JJS
results /NNS
appear /VBP
in /IN
today /NN
's /POS
New /NNP
England /NNP
Journal /NNP
of /IN
Medicine /NNP
, /,
a /DT
forum /NN
likely /JJ
to /TO
bring /VB
new /JJ
attention /NN
to /TO
the /DT
problem /NN
. /.
A /DT
Lorillard /NNP
spokewoman /NN
said /VBD
, /,
`` /``
This /DT
is /VBZ
an /DT
old /JJ
story /NN
. /.
We /PRP
're /VBP
talking /VBG
about /IN
years /NNS
ago /IN
before /IN
anyone /NN
heard /VBD
of /IN
asbestos /NN
having /VBG
any /DT
questionable /JJ
properties /NNS
. /.
There /EX
is /VBZ
no /DT
asbestos /NN
in /IN
our /PRP$
products /NNS
now /RB
. /.
'' /''
Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./.
Mr./NNP Vinken/NNP is/VBZ chairman/NN of/IN Elsevier/NNP N.V./NNP ,/, the/DT Dutch/NNP publishing/VBG group/NN ./.
Rudolph/NNP Agnew/NNP ,/, 55/CD years/NNS old/JJ and/CC former/JJ chairman/NN of/IN Consolidated/NNP Gold/NNP Fields/NNP PLC/NNP ,/, was/VBD named/VBN a/DT nonexecutive/JJ director/NN of/IN this/DT British/JJ industrial/JJ conglomerate/NN ./.
A/DT form/NN of/IN asbestos/NN once/RB used/VBN to/TO make/VB Kent/NNP cigarette/NN filters/NNS has/VBZ caused/VBN a/DT high/JJ percentage/NN of/IN cancer/NN deaths/NNS among/IN a/DT group/NN of/IN workers/NNS exposed/VBN to/TO it/PRP more/RBR than/IN 30/CD years/NNS ago/IN ,/, researchers/NNS reported/VBD ./.
The/DT asbestos/NN fiber/NN ,/, crocidolite/NN ,/, is/VBZ unusually/RB resilient/JJ once/IN it/PRP enters/VBZ the/DT lungs/NNS ,/, with/IN even/RB brief/JJ exposures/NNS to/TO it/PRP causing/VBG symptoms/NNS that/WDT show/VBP up/IN decades/NNS later/JJ ,/, researchers/NNS said/VBD ./.
Lorillard/NNP Inc./NNP ,/, the/DT unit/NN of/IN New/JJ York-based/JJ Loews/NNP Corp./NNP that/WDT makes/VBZ Kent/NNP cigarettes/NNS ,/, stopped/VBD using/VBG crocidolite/NN in/IN its/PRP$ Micronite/NN cigarette/NN filters/NNS in/IN 1956/CD ./.
Although/IN preliminary/JJ findings/NNS were/VBD reported/VBN more/RBR than/IN a/DT year/NN ago/IN ,/, the/DT latest/JJS results/NNS appear/VBP in/IN today/NN 's/POS New/NNP England/NNP Journal/NNP of/IN Medicine/NNP ,/, a/DT forum/NN likely/JJ to/TO bring/VB new/JJ attention/NN to/TO the/DT problem/NN ./.
A/DT Lorillard/NNP spokewoman/NN said/VBD ,/, ``/`` This/DT is/VBZ an/DT old/JJ story/NN ./.
We/PRP 're/VBP talking/VBG about/IN years/NNS ago/IN before/IN anyone/NN heard/VBD of/IN asbestos/NN having/VBG any/DT questionable/JJ properties/NNS ./.
There/EX is/VBZ no/DT asbestos/NN in/IN our/PRP$ products/NNS now/RB ./. ''/''
#include <string>
#include <vector>
#include <utility>
#include <iostream>
#include <sstream>
#include <fstream>
#include <iterator>
#include <assert.h>
typedef std::vector<std::pair<std::string, std::string> > sentence;
class my_iterator : public std::iterator<std::input_iterator_tag, sentence> {
std::istream* is;
std::string line;
sentence s;
void advance() {
std::getline(*is, line);
convert();
}
void convert() {
std::istringstream iss(line);
s.clear();
for (std::istream_iterator<std::string> it(iss), end; it != end; ++it) {
std::string token = *it;
size_t idx = token.find_last_of('/');
std::string word = token.substr(0, idx);
std::string part_of_speech = token.substr(idx, token.size());
s.push_back(std::pair<std::string, std::string>(word, part_of_speech));
}
}
public:
my_iterator() : is(NULL) {}
my_iterator(std::istream* is) : is(is), line() {
if (is && !is->eof())
advance();
}
my_iterator& operator++() {
assert(is && !is->eof());
if (is && !is->eof())
advance();
if (is->eof())
is = NULL;
return *this;
}
my_iterator& operator++(int junk) {
return (*this)++;
}
sentence operator*() const {
return s;
}
const sentence* operator->() const {
return &s;
}
bool operator==(const my_iterator& rhs) const {
return is == rhs.is;
}
bool operator!=(const my_iterator& rhs) const {
return is != rhs.is;
}
};
int main() {
for (my_iterator it(&std::cin), end; it != end; ++it)
for (sentence::const_iterator si = it->begin(); si != it->end(); ++si)
std::cout << si->first << " " << si->second << std::endl;
{
std::ifstream is("input.txt");
for (my_iterator it(&is), end; it != end; ++it)
for (sentence::const_iterator si = it->begin(); si != it->end(); ++si)
std::cout << si->first << " " << si->second << std::endl;
}
{
std::ifstream is("input.txt");
std::vector<sentence> sentences;
std::copy(my_iterator(&is), my_iterator(), sentences.begin());
}
{
std::ifstream is("input.txt");
std::vector<sentence> sentences;
std::copy(my_iterator(&is), my_iterator(), std::back_inserter(sentences));
}
{
std::ifstream is("input.txt");
assert(std::distance(my_iterator(&is), my_iterator()) == 10);
}
return 0;
}
CXX = g++
TARGET = main
all: $(TARGET)
$(TARGET): main.cpp
g++ -Wall -Werror -o $@ $<
test: all
@cat input.txt | ./$(TARGET) | diff expected.txt -
clean:
rm $(TARGET)
.PHONY: clean
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment