This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
for f in $(git diff --name-only --cached); do | |
if [[ $f == *.ipynb ]]; then | |
has_content="$(cat $f | underscore select '.cells' | underscore flatten --shallow | underscore any 'value?.outputs?.length > 0')" | |
if $has_content | |
then | |
echo 'Notebook ' $f ' output cells are not clean!' | |
echo 'Unstage ' $f ' file and clean its cell outputs' | |
exit 1 | |
fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# originally from Apache Lucene Persian Stop words | |
# link: https://github.com/apache/lucene/raw/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/fa/stopwords.txt | |
# This file was created by Jacques Savoy and is distributed under the BSD license. | |
# See http://members.unine.ch/jacques.savoy/clef/index.html. | |
# Also see http://www.opensource.org/licenses/bsd-license.html | |
# Note: by default this file is used after normalization, so when adding entries | |
# all arabic occurences of arabic 'ي' are replaced with 'ی' | |
انان | |
نداشته | |
سراسر |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# converts all possible characters used as Persian to Persian | |
# converts numbers | |
# converts punctuations | |
def normalize(text): | |
srcLine = ',?%1234567890;“”ﭘﺮﺯﻭﺻكىيﻬ٧ﺍﭙﺚﻖﯿﮎﺗﯼيﺴﻯﮥﺻﯾﺸﺿﻔﻐﻴﺞ٦ﻡےكﻩﺟﺜﻥﺰﻟﭻﻰﻣﻉﻳﺪﻤﺒ٤ﺫﺠﻲﺳﻓﺭﺨﮏﺕﻧﺵﮑ١ﮔﻗ٢ﺘﻱﻭﮯ٥ٱﻫﺩ٨ﻏﻦﻠﺺﺼﭘﺖﺏﻕﺲﺷۀﻎﻝﭽﻮﻑﺶﻨﺮﮕﮐﺣ٩٠٣ةﻍﺝﻒﭼﮓﺹﻌﯽﺛﻄڪﺬﻃﻢﻋﺑﺧﻂﺤﺥﻊﺁﻜﻞﺦﻛﺎﺯﻘﺱﻪہﺐى' | |
trgLine = '،؟٪۱۲۳۴۵۶۷۸۹۰؛""پرزوصکییه۷اپثقیکتییسیهصیشضفغیج۶میکهجثنزلچیمعیدمب۴ذجیسفرخکتنشک۱گق۲تیویهاهد۸غنلصصپتبقسشهغلچوفشنرگکح۹۰۳هغجفچگصعیثطکذطمعبخطحخعآکلخکازقسههبی' | |
repl = str.maketrans(srcLine, trgLine) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Check Iranian National Code Validity - Clojure, C#, Ruby, JavaScript, Python, Scala, Java 8, PHP, C, Go, Swift, Kotlin | |
// بررسی صحت کد ملی ایران - کلوژر، سیشارپ، روبی، جاوااسکریپت، پایتون، اسکالا، جاوا ۸، پیاچپی، سی، گو، سوئیفت، کاتلین | |
// در نسخههای قبل یکسان بودن اعداد نا معتبر تشخیص داده میشد ولی | |
// اعداد یکسان نامعتبر نیست http://www.fardanews.com/fa/news/127747 | |
// بعضی از پیادهسازیها سریع نیستند، میتوانید نسخهٔ خود را بر پایهٔ | |
// نسخهٔ سی یا گو ایجاد کنید که بهترین سرعت را داشته باشد | |
/** |