Pradeep Singh mepsrajput

## mongodb_cheat_sheet.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mepsrajput
                / mongodb_cheat_sheet.md
            
            
              Created
              April 22, 2020 10:05
                — forked from bradtraversy/mongodb_cheat_sheet.md
            
              
                MongoDB Cheat Sheet
              
          
    MongoDB Cheat Sheet

Show All Databases

show dbs

Show Current Database


## big_data.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mepsrajput
                / big_data.md
            
            
              Last active
              May 30, 2020 10:26
            
              
                Big Data
              
          
    Hadoop Vocabulary

Here is a list of some terms associated with Hadoop. You'll learn more about these terms and how they relate to Spark in the rest of the lesson.

Hadoop - an ecosystem of tools for big data storage and data analysis. Hadoop is an older system than Spark but is still used by many companies. The major difference between Spark and Hadoop is how they use memory. Hadoop writes intermediate results to disk whereas Spark tries to keep data in memory whenever possible. This makes Spark faster for many use cases.
Hadoop MapReduce - a system for processing and analyzing large data sets in parallel.
Hadoop YARN - a resource manager that schedules jobs across a cluster. The manager keeps track of what computer resources are available and then assigns those resources to specific tasks.
Hadoop Distributed File System (HDFS) - a big data storage system that splits data into chunks and stores the chunks across a cluster of computers.

As Hadoop matured, other tools were developed t

  
## SpaCy.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mepsrajput
                / SpaCy.md
            
            
              Created
              April 5, 2020 13:43
            
              
                Spacy
              
          
    1. Processing A Line of Text

Import the English language class
from spacy.lang.en import English

# Create the nlp object
nlp = English()


## blogscraping.py
import requests
from bs4 import BeautifulSoup
from csv import writer

response = requests.get('http://codedemos.com/sampleblog/')

soup = BeautifulSoup(response.text, 'html.parser')

posts = soup.find_all(class_='post-preview')

## python.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mepsrajput
                / python.md
            
            
              Created
              March 25, 2020 11:59
            
              
                Python Notes and Resources
              
          
    Dictionary: https://www.geeksforgeeks.org/iterate-over-a-dictionary-in-python/

  
## NLP.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mepsrajput
                / NLP.md
            
            
              Last active
              March 29, 2020 16:13
            
              
                Natural Language Processing
              
          
    A high-level standard workflow for any NLP project

Text Document -> Text pre-processing -> Text parsing & Exploratory Data Analysis -> Text Representation & Feature Engineering
-> Modeling and/or Pattern Mining -> Evaluation & Deployment
NLP Uses


Machine Translation
Speech Recognition
Sentiment Analysis


## pyspark.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mepsrajput
                / pyspark.md
            
            
              Last active
              December 12, 2021 10:43
            
              
                PySpark Notes
              
          
    PySpark Sub Packages


pyspark.sql module
pyspark.streaming module
pyspark.ml package
pyspark.mllib package

important classes of pyspark.sql package


pyspark.sql.SparkSession: Main entry point for DataFrame and SQL functionality.
pyspark.sql.DataFrame: A distributed collection of data grouped into named columns.
pyspark.sql.Column: A column expression in a DataFrame.


## statistics.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mepsrajput
                / statistics.md
            
            
              Last active
              September 13, 2020 09:11
            
              
                Statistics Notes for Data Science and ML
              
          
    Exploratory data analysis

• anecdotal evidence: Evidence, often personal, that is collected casually rather than by a well-designed study.
• population: A group we are interested in studying. “Population” often refers to a group of people, but the term is used for other subjects, too.
• cross-sectional study: A study that collects data about a population at a particular point in time.
• cycle: In a repeated cross-sectional study, each repetition of the study is called a cycle.

  
## Data_Set_Operations.md

      
              3 files
            
          
              0 forks
            
          
              2 comments
            
          
              1 star
            
          
                mepsrajput
                / Data_Set_Operations.md
            
            
              Last active
              November 14, 2021 02:59
            
              
                My SAS Notes
              
          
    1. Read Raw Data

1.1 Reading ASCII(Text) Data Set

DATA TEMP; 
   INFILE '/folders/myfolders/World Happiness/practice text dataset.txt' firstobs= 2; 
   INPUT @1 ID @5 Name $ 5-17 Location $;
RUN;
PROC PRINT DATA = TEMP;
RUN;


## ds_python_notes.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mepsrajput
                / ds_python_notes.md
            
            
              Last active
              January 30, 2020 10:52
            
              
                Personal Notes of Data Science and ML
              
          
    1. Important Links

A Medium publication sharing concepts, ideas, and codes.

https://towardsdatascience.com/
Pandas DataFrame - Playing with CSV Files

https://towardsdatascience.com/pandas-dataframe-playing-with-csv-files-944225d19ff
Machine Learning | An Introduction

https://towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0
	import requests
	from bs4 import BeautifulSoup
	from csv import writer

	response = requests.get('http://codedemos.com/sampleblog/')

	soup = BeautifulSoup(response.text, 'html.parser')

	posts = soup.find_all(class_='post-preview')