Skip to content

Instantly share code, notes, and snippets.

@sara-02
sara-02 / commoncrawler_sarah.py
Created June 8, 2022 18:51
Testing a slightly modified common crawler.py to accommodate additional parse args.
#!/usr/bin/env python
"""
This scripts downloads WARC files from commoncrawl.org's news crawl and extracts articles from these files. You can
define filter criteria that need to be met (see YOUR CONFIG section), otherwise an article is discarded. Currently, the
script stores the extracted articles in JSON files, but this behaviour can be adapted to your needs in the method
on_valid_article_extracted. To speed up the crawling and extraction process, the script supports multiprocessing. You can
control the number of processes with the parameter my_number_of_extraction_processes.
You can also crawl and extract articles programmatically, i.e., from within
your own code, by using the class CommonCrawlCrawler or the function
@sara-02
sara-02 / trends_csv.ipynb
Last active June 22, 2021 16:43
Scarping tables(of trending hashtags) from dynamically loaded pages via selenium. https://www.exportdata.io/trends
@sara-02
sara-02 / Link_Prediction.ipynb
Created December 20, 2020 16:24
Code Snippets for the plots generated for the link prediction chapter for the social network book.
3.716041911135552
====================
iter = 1
0.0001 news relu 1.0 3.716041911135552
Train on 1024 samples
Epoch 1/5
1024/1024 [==============================] - 10s 10ms/sample - loss: 0.5194
Epoch 2/5
1024/1024 [==============================] - 10s 9ms/sample - loss: 0.3417
Epoch 3/5
3.716041911135552
====================
iter = 1
0.0001 news relu 1.0 3.716041911135552
WARNING:tensorflow:From /home/sarahm/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/sarahm/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
4.89017726799057
Learning Rate 0.0001
========================================
iter= 1
news GRU 1.0 4.89017726799057
Tensor("attention_64/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Tensor("model_128/attention_64/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Train on 1024 samples
Epoch 1/5
1024/1024 [==============================] - 43s 42ms/sample - loss: 0.5893
@sara-02
sara-02 / dynamic_sgd_16.py
Created June 11, 2020 06:11
Varying lr values
4.89017726799057
Learning Rate 0.0001
========================================
iter= 1
news GRU 1.0 4.89017726799057
WARNING:tensorflow:From /home/sarahm/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Tensor("attention/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Tensor("model/attention/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
4.89017726799057
========================================
iter= 1
news GRU 1.0 4.89017726799057
Tensor("attention_41/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Tensor("model_82/attention_41/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Train on 1024 samples
Epoch 1/5
1024/1024 [==============================] - 40s 39ms/sample - loss: 0.3400
Epoch 2/5
4.89017726799057
========================================
iter= 1
news GRU 1.0 4.89017726799057
Tensor("attention_25/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Tensor("model_50/attention_25/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Train on 1024 samples
Epoch 1/5
1024/1024 [==============================] - 38s 37ms/sample - loss: 0.1309
Epoch 2/5
4.89017726799057
========================================
iter= 1
news GRU 1.0 4.89017726799057
Tensor("attention_9/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Tensor("model_18/attention_9/ExpandDims:0", shape=(?, 1, 64), dtype=float32)
Train on 1024 samples
Epoch 1/5
1024/1024 [==============================] - 34s 33ms/sample - loss: 0.1623
Epoch 2/5