Thorben Hellweg thllwg

## tokenizations_post.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              63 stars
            
          
                tamuhey
                / tokenizations_post.md
            
            
              Last active
              March 30, 2024 19:00
            
              
                How to calculate the alignment between BERT and spaCy tokens effectively and robustly
              
          
    How to calculate the alignment between BERT and spaCy tokens effectively and robustly


site: https://tamuhey.github.io/tokenizations/
Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still require language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm that simplifies calculating correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm.
Here are the library and the demo site links:

repo: https://github.com/tamuhey/tokenizations


## falsehoods-programming-time-list.md

      
              1 file
            
          
              81 forks
            
          
              81 comments
            
          
              1574 stars
            
          
                timvisee
                / falsehoods-programming-time-list.md
            
            
              Last active
              May 4, 2024 11:33
            
              
                Falsehoods programmers believe about time, in a single list
              
          
    Falsehoods programmers believe about time

This is a compiled list of falsehoods programmers tend to believe about working with time.
Don't re-invent a date time library yourself.
If you think you understand everything about time, you're probably doing it wrong.
Falsehoods


There are always 24 hours in a day.
February is always 28 days long.
Any 24-hour period will always begin and end in the same day (or week, or month).