This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
import multiprocessing as mp | |
import re | |
from collections import defaultdict | |
from typing import List, Optional, Set | |
from datasets import load_dataset | |
from datasketch import MinHash, MinHashLSH, minhash | |
from dpu_utils.utils.iterators import ThreadedIterator | |
from tqdm import tqdm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ | |
"https://github.com/minimaxir/big-list-of-naughty-strings.git", | |
"https://github.com/shadowsocks/shadowsocks.git", | |
"https://github.com/littlecodersh/ItChat.git", | |
"https://github.com/google-research/bert.git", | |
"https://github.com/0voice/interview_internal_reference.git", | |
"https://github.com/keon/algorithms.git", | |
"https://github.com/satwikkansal/wtfpython.git", | |
"https://github.com/drduh/macOS-Security-and-Privacy-Guide.git", | |
"https://github.com/google/python-fire.git", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* @name Use with open() as | |
* @description Consider using a context manager | |
* @kind problem | |
* @tags maintainability | |
* @problem.severity recommendation | |
* @sub-severity low | |
* @precision medium | |
* @id py/use-context-manager | |
*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* @name Join paths correctly | |
* @description use os.path.join or an alternative to correctly join paths. | |
* @kind path-problem | |
* @tags maintainability | |
* @problem.severity recommendation | |
* @sub-severity low | |
* @precision medium | |
* @tags speed | |
* @sub-severity low |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* @name Iterate of the items of a dictionary using `.items()`. | |
* @description instead of iterating over the keys of a dictionary and the indexing the dictionary, | |
use `.items()` to retrieve the key-value pairs. | |
* @kind problem | |
* @tags maintainability | |
* @problem.severity recommendation | |
* @sub-severity low | |
* @precision medium | |
* @tags speed |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* @name Use islice() to slice an iterable. | |
* @description instead of converting an iterable to a list and then slicing it | |
use `islice` for efficiency. | |
* @kind problem | |
* @tags maintainability | |
* @problem.severity recommendation | |
* @sub-severity low | |
* @precision high | |
* @tags speed |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* @name Use hash() | |
* @description use hash() instead of __hash__ | |
* @kind problem | |
* @tags maintainability | |
* @problem.severity recommendation | |
* @sub-severity low | |
* @precision high | |
* @tags style | |
* @sub-severity low |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* @name Prefer iglob | |
* @description Use iglob instead of glob | |
* @kind problem | |
* @tags speed | |
* @problem.severity recommendation | |
* @sub-severity low | |
* @precision high | |
* @id py/use-iglob | |
*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* @name Do not check leap year using modulo | |
* @description Use the system functions to check for leap years instead of modulo. Very low precision. | |
* @kind problem | |
* @tags reliability | |
* maintainability | |
* @problem.severity recommendation | |
* @sub-severity low | |
* @precision low | |
* @id py/no-leap-check-with-modulo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* @name Use math.inf or math.nan | |
* @description Prefer using math.inf instead of parsing infinity | |
from a string | |
* @kind problem | |
* @tags maintainability | |
* @problem.severity recommendation | |
* @sub-severity low | |
* @precision high | |
* @id py/no-float-inf |
NewerOlder