Skip to content

Instantly share code, notes, and snippets.

@yunoooo111
Last active March 7, 2024 03:14
Show Gist options
  • Save yunoooo111/2d1e89952e68b16b3593d4b2bc2ed7cb to your computer and use it in GitHub Desktop.
Save yunoooo111/2d1e89952e68b16b3593d4b2bc2ed7cb to your computer and use it in GitHub Desktop.
# METADATA
# Metadata is data about data.Metadata are pieces of information that have some meaning in relation to another piece of information, that can be created, managed, stored, and preserved like any other data.
# Six types of metadata :
#• Structural metadata (Structural metadata provides valuable information that helps to establish the relationship between objects.)
#• Descriptive metadata (Descriptive metadata provides helpful information for discovering and identifying a data resource.)
#• Preservation metadata (Preservation metadata refers to the information related to the preservation management of collections and information resources.)
#• Administrative metadata (Administrative metadata provides information that is useful in managing resources.)
#• Provenance metadata (Provenance metadata provides helpful information on the origins of a data resource.)
#• Definitional metadata (Definitional metadata refers to the metadata that provides a common vocabulary that facilitates a shared understanding of the meaning of the data.)
# RESUME 2
# DATA SCIENCE
# Data Science is scientific process of transforming data into insight for making better decisions,
# that the goal is to turn data into actionable value
# Data :
# Data refers to raw facts, figures, and statistics that are collected
# and stored for analysis. It can be in various forms, such as numbers, text, images,
# or any other format that represents information.
# Data Science:
# Data science is a multidisciplinary field that uses scientific methods, processes, algorithms,
# and systems to extract insights and knowledge from structured and unstructured data.
# It involves a combination of skills from statistics, mathematics, computer science,
# and domain-specific knowledge to analyze and interpret complex data sets. The goal of data science is to uncover patterns,
# trends, and valuable insights from data that can be used to inform business decisions, solve problems, or gain a competitive advantage.
# Data Scientist:
# A data scientist is a professional who possesses a combination of skills in statistics, mathematics, programming, and domain expertise.
# Data scientists use their skills to analyze large and complex data sets, develop algorithms, and create predictive models to extract meaningful insights.
# They work on identifying patterns, trends, and correlations in data to help organizations make data-driven decisions.
# Data scientists also play a crucial role in designing and implementing machine learning models, building data pipelines, and communicating findings to non-technical stakeholders.
# Foundational aspects of data science
#• Mathematics :
#• It will cover foundational mathematical concepts, such as functions, relations, assumptions, conclusions, and abstraction, so that the concepts can be used to define and understand many aspects of data manipulation.
#• Other mathematics and statistics courses have also connections to data science, including graphs for social network analysis, matrices for finding themes in relations, and supervised machine learning.
#• Technology :
#Python knowledge will be extended from the prerequisite with more advanced table manipulation functions, extended practice with data cleaning and manipulation tasks, computational notebooks (such as Jupyter), and GitHub for version control and project publishing.
#• Visualization :
#• New types of plots will be learnt for a wide variety of data types and what you intend to communicate about them.
#• The general principles that govern when and how to use visualizations will be studied.
#• How to build and publish interactive online visualizations (dashboards) will also be learnt.
#• Communication :
#• How to write comments in code, documentation for code, motivations in computational notebooks, interpretation of results in computational notebooks, and technical reports about the results of analyses.
#• Clarity, brevity (concise), and knowing the target audience will be prioritized.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "f41ba097",
"metadata": {},
"source": [
"Exercise 1\n",
"1. Explain the difference between median and middle value?\n",
"Answer: Median is middle number in a set of data when the data arranged in ascending, Middle Value is the middle number when the data is unsorted\n",
"2. Are mean and mode values always the same for unsorted and sorted dataset? Why?\n",
"Answer: Mean and mode values always same even the data unsorted or sorted because mean sum all of the data and divided it with the length of data, and Mode values depends on the amount of data that appears frequently\n",
"3. If range is caculate with last datapoint – first datapoint, should the dataset is sorted first or not? Why?\n",
"Answer: The dataset must sorted first if we want to calculate the range of the dataset, because the unsorted data will make the the range of the dataset incorrect"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "23c9a9b6",
"metadata": {},
"outputs": [],
"source": [
"x =[\n",
" 10, 20, 40, 40, 40, 60, 70,\n",
" 20, 40, 40, 40, 40, 40, 60,\n",
" 10, 20, 30, 40, 50, 60, 70,\n",
" 70, 20, 30, 40, 50, 70, 20\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "90b784f6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x= [10, 20, 40, 40, 40, 60, 70, 20, 40, 40, 40, 40, 40, 60, 10, 20, 30, 40, 50, 60, 70, 70, 20, 30, 40, 50, 70, 20]\n",
"N= 28\n",
"[10, 10, 20, 20, 20, 20, 20, 30, 30, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 50, 50, 60, 60, 60, 70, 70, 70, 70]\n"
]
}
],
"source": [
"print('x= ', x)\n",
"print('N= ', len(x))\n",
"y = sorted(x)\n",
"print(y)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "1ce04de7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"range= 60\n"
]
}
],
"source": [
"rng = y[-1] - y[0]\n",
"print('range= ', rng)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "30a137a0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"range= 10\n"
]
}
],
"source": [
"rng = x[-1] - x[0]\n",
"print('range= ', rng)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "8dfe4c46",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mid value= 35.0\n"
]
}
],
"source": [
"mid = (x[13] + x[14])/2\n",
"print('mid value= ', mid)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "1e25f513",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"40.0\n"
]
}
],
"source": [
"import statistics as stat\n",
"print(stat.median(x))"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "2642849f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"median = 40.0\n"
]
}
],
"source": [
"med = (y[13] + y[14])/2\n",
"print('median = ', med)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "3e8e4ed6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"40.0\n"
]
}
],
"source": [
"print(stat.median(y))"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "d0d93b7b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mean= 40.714285714285715\n"
]
}
],
"source": [
"sumx= sum(x)\n",
"N = len(x)\n",
"mean = sumx/N\n",
"print('mean= ', mean)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "1139d51c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"40.714285714285715\n"
]
}
],
"source": [
"print(stat.mean(x))"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "635d1728",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"40.714285714285715\n"
]
}
],
"source": [
"print(stat.mean(y))"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "25b7c488",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mode= 40\n"
]
}
],
"source": [
"print('mode= ', stat.mode(x))"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "d6b56cda",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mode= 40\n"
]
}
],
"source": [
"print('mode= ', stat.mode(y))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment