Skip to content

Instantly share code, notes, and snippets.

View sksoumik's full-sized avatar

Sadman Kabir Soumik sksoumik

View GitHub Profile
@sksoumik
sksoumik / tf_image_check.py
Created April 20, 2022 19:27
Find images that are not accepted by TensorFlow due to improper format or find corrupted images that will cause InvalidArgumentError
"""
Error:
InvalidArgumentError: Unknown image file format. One of JPEG, PNG, GIF, BMP required.
Find the images using the following code and delete those images from the dataset.
"""
from pathlib import Path
import imghdr
@sksoumik
sksoumik / sleep_request.py
Created February 10, 2022 16:41
Randomly pause between each request
def send_request(url, HEADERS):
with TorRequests() as tor_requests:
with tor_requests.get_session() as sess:
# print the IP address of the proxy
print(sess.get("http://httpbin.org/ip").json())
# pause randomly between 1 to 3 seconds
time.sleep(random.randint(1, 3))
html_content = sess.get(url, headers=HEADERS, timeout=10).text
# your scraping code here ..
print(HEADERS["User-Agent"])
@sksoumik
sksoumik / rotate_user_agent.py
Created February 10, 2022 16:34
Ratoate User agent and IP address with each request
from email.header import Header
from wsgiref import headers
from torpy.http.requests import TorRequests
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
import random
# https://developers.whatismybrowser.com/useragents/explore/operating_system_name/linux/
@sksoumik
sksoumik / rotate_IP.py
Last active February 10, 2022 10:13
Rotate IP address with each request
from torpy.http.requests import TorRequests
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def send_request(url):
with TorRequests() as tor_requests:
with tor_requests.get_session() as sess:
# print the IP address of the proxy
@sksoumik
sksoumik / bigquery_table.py
Last active December 11, 2021 22:21
creating bigquery table from google compute engine
def create_bigquery_table(df, dataset_tablename, gcp_project_name):
# df: your pandas dataframe
# dataset_tablename (str.str): dataset_name.tablename
# gcp_project_name: GCP Project ID
df.to_gbq(
destination_table=destination_table_name,
project_id=gcp_project_name,
if_exists="replace", # 3 available methods: fail/replace/append
)
@sksoumik
sksoumik / main.dart
Created August 22, 2021 16:52
Hello world simple app by flutter
import 'package:flutter/material.dart';
void main() => runApp(MyApp());
class MyApp extends StatelessWidget {
@override
Widget build(BuildContext context) {
final wordPair = WordPair.random();
return MaterialApp(
theme: ThemeData(primaryColor: Colors.black),
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sksoumik
sksoumik / resume.json
Last active March 29, 2021 16:07
resume from json
{
"meta": {
"theme": "stackoverflow",
"whatever": {
"x" : "dsdsds",
"y" : [],
"z": {
"z1": 1,
"z2": "2"
}
@sksoumik
sksoumik / gitpush.sh
Created November 23, 2020 08:11
gitpush automation
eval `ssh-agent -s`
ssh-add
git add .
git commit -m "update message"
git push origin master
@sksoumik
sksoumik / selecting_text_based_on_sententece_length.py
Created September 24, 2020 10:17
Text length wise filter pandas
df = df[df['text'].str.split().str.len() > 10]