Skip to content

Instantly share code, notes, and snippets.

@nirmalyaghosh
nirmalyaghosh / extract_span_start_end_positions.py
Created April 9, 2023 13:07
Extract positions of indicated spans from indicated text. Used as a precursor to the step to convert named entities identified by alternative processes into a spaCy NER format
from typing import List
def extract_span_start_end_positions(text: str, spans: List[str]):
"""
Extract positions of indicated spans from indicated text.
Adapted from : https://www.programcreek.com/python/?CodeExample=convert+to+spans
Args:
text: The string to be searched
spans: The spans of interest within the string. Can be single or
@nirmalyaghosh
nirmalyaghosh / scrape_news_summaries.py
Created August 7, 2016 13:20
Scrapes Google for gathering news summaries. Written in response to http://stackoverflow.com/q/38769951. The code can obviously be improved. It was quickly written to give an idea.
from bs4 import BeautifulSoup
import requests
import time
from random import randint
def scrape_news_summaries(s):
# It is based on a notebook posted on Kaggle, http://bit.ly/1VJ8pF9
time.sleep( randint(0,2) ) #relax and don't let google be angry
r = requests.get("http://www.google.co.uk/search?q="+s+"&tbm=nws")
content = r.text
@nirmalyaghosh
nirmalyaghosh / IPAddressGenerator
Created February 8, 2014 15:52
A random but realistic IP address generator. It makes use of country specific CIDR address files read from the resource directory. Files can be downloaded from http://www.ip2location.com/free/visitor-blocker in the CIDR format (each line is similar to 222.165.0.0/17). For the purpose of testing, a few CIDR addresses have been selected for a few …
package net.nirmalya.util.ipaddr;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import java.util.Scanner;
import org.apache.commons.net.util.SubnetUtils;
import org.slf4j.Logger;