ror, scala, jetty, erlang, thrift, mongrel, comet server, my-sql, memchached, varnish, kestrel(mq), starling, gizzard, cassandra, hadoop, vertica, munin, nagios, awstats
- Introduce Gizzard of Twitter - 트위터의 새로운 분산 관리 라이브러리 Gizzard를 소개합니다.
- The Architecture Twitter Uses to Deal with 150M Active Users, 300K QPS, a 22 MB/S Firehose, and Send Tweets in Under 5 Seconds
- Scaling Twitter: Making Twitter 10000 Percent Faster
- Scaling Twitter
- Blaine Cook on Scaling Twitter - YouTube
- How Twitter Stores 250 Million Tweets a Day Using MySQL (Korean)
- Twitter on Scala
- DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
- Improving Running Components at Twitter
- QCon London 2009: Upgrading Twitter without service disruptions
php (with hiphop compiler), thrift, java(tomcat, jetty, minor), epoll, erlang, tornado, nodejs my-sql, memcahced, hadoop, hbase, hive, scribe(-hdfs), bigpipe, varnish, haystack, cassandra
- "Building for a Billion Users" - YouTube
- Scaling Out
- Facebook Architecture - Stack Overflow
- Facebook Architecture for 600M users
- Facebook Chat
- Scaling the Messages Application Back End
- Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
centos, sciapache, apache, nginx, (move out of)php, scala(selection), ruby, thrift, my-sql, redis, hbase, memcached, gearman, kafka, kestrel, finagle, varnish, ha-proxy, func, capistrano, puppet, jenkins
aws(ec2, s3), ubuntu, cloudfront, python, pylons, paste, tornado, thrift, comet server, memcached, haproxy, nginx
python, django, tornado, node.js, rabbitmq, nginx, haproxy, varnish, memcached, membase, redis, my-sql, mrjob, hadoop(elastic map reduce)
- Pinterest Architecture Update - 18 Million Visitors, 10x Growth,12 Employees, 410 TB of Data
- Polyglot persistence at Pinterest: Redis, Membase, MySQL • myNoSQL
aws(s3, ebs), cloudfront, ubuntu, django(high-cpu extra-large), gunicorn, fabric, gearman, pyapns, twisted, postgre-sql(quadruple extra-large), mdadm(sofeware raid with ebs), repmgr, pgbouncer, redis, memcached, node2dm, munin, pingdom, pagerduty, sentry
- Instagram Architecture Update: What’s new with Instagram?
- What Powers Instagram: Hundreds of Instances, Dozens of Technologies - Instagram Engineering (Korean)
- Tracking Slow Requests with Dogslow
- Keeping Instagram up with over a million new users in twelve hours - Instagram Engineering
aws(ec2, s3, elb), tornado, scribe, mrjob, node-readability, haproxy, tornado, gae, mapreduce, django(appengine), google-cloud-storage, memcache, redis
aws(ec2, s3, ebs, rds, dynamodb, sdb, sqs, sns, emr, elb, eip, vpc, direct-connect, iam), java(tomcat), mongodb, my-sql, casandra, hadoop, zookeeper, evcache, asgard, groovy, grails, zuul, priam and more netflix opensouces)
- Architectural Patterns for High Availability - Netflix
- The Netflix Tech Blog
- 3 shades of latency: How Netflix built a data architecture around timeliness — Tech News and Analysis
- Keeping Movies Running Amid Thunderstorms!
linux(2.6), nginx, uwsgi, aws(s3), dotcloud, mysql, redis, celery
ubuntu(12.04), aws(ec2, s3, elb), nginx, werkzeug, flask, postgre-sql, pgpool, memcached, gevent, celery, rabbitmq, fabric, boto, exceptional, flask-exceptional
rabbitmq, celery, phash
ubuntu(12.04), nhn ncloud, django, apache, mod_wsgi, ms-sql, memcached, fabric, south, wand, rsync, py-bcrypt, python-gcm, apns
gae, "천만명 이하 규모는 구글 앱 엔진을 써도 충분하다.(Translate)" - @xguru
Cartoon Service(만화서비스) - Smartstudy
ubuntu(11.04) nbp (nhn business platform), my-sql(innodb), mongodb, django, fabric, uwsgi, nginx, memcached, cacti, npk, rsync, lftp
aws(ec2, s3, elb, rds, cloudwatch), varnish, php, ergo, my-sql, memcached, mongodb, redis, new-relic, statsd
linux, apache, java, tomcat, postgres-sql, lucene, velocity, memcached, jgroups, hadoop, cacti, nagios, custom[?]
- supoorting infra: java, python, ruby, php, perl
nodejs, haproxy, redis, mongodb
ubuntu(10.04), skyscape, akamai, puppet, puppetdb, nginx, unicorn, gunicorn, upstart, haproxy, varnish,
- redirection: perl(manage and test), php(some), nodejs(side-by-side browser)
- applications: ror, sinatra, scala, play(2.0), django, mapit
- databases and other storage: mongodb, my-sql, postgres-sql, elasticsearch, solr, rabbitmq
- monitoring, managing and alerting: statsd, logstash, ganglia, graphie, nagios
- supporting tools: jenkins, new-relic, google-analytics, dyn, ses, font-forge, font-tools, zendesk, jekyll, heroku, clojure
windows server, sql-server, redis
php-frm, haproxy, activemq, varnish, redis, nginx, my-sql, syslog-ng, symfony2
INFRA, PLATFORMS, FRAMEWORKS
- Amazon Web Services, Cloud Computing: Compute, Storage, Database
- Elastic Load Balancing
- Amazon Elastic Compute Cloud(Amazon EC2)
- ELB, Elastic Load Balancing
- Amazon Simple Storage Service (Amazon S3)
- Amazon RDS, Cloud Relational Database Service: MySQL, Oracle, SQL Server
- Amazon Route 53
- Amazon Elastic MapReduce(Amazon EMR)
- Amazon CloudWatch
- Amazon Simple Email Service (Amazon SES)
- nginx: HTTP and reverse proxy server, as well as a mail proxy server
- Werkzeug: WSGI utility library for Python.
- unbit/uwsgi · GitHub: uWSGI application server container
- dotCloud: Deploy, manage and scale any web app.
- Gunicorn: Python WSGI HTTP Server for UNIX.
- geeknam/python-gcm · GitHub: Python client for Google Cloud Messaging for Android (GCM).
- pylons: web framework to develop web application framework technology in Python. Rather than focusing on a single web framework.
- Erlang: programming language used to build massively scalable soft real-time systems with requirements on high availability.
- gevent: coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.
- memcached: in-memory key-value store for small chunks of arbitrary data (strings, objects).
- RabbitMQ: Robust messaging for applications.
- Celery: asynchronous task queue/job queue based on distributed message passing.
- Flask: microframework for Python based on Werkzeug.
- HBase: Hadoop database, a distributed, scalable, big data store.
- Varnish: caching HTTP reverse proxy.
- HAProxy: High Performance TCP/HTTP Load Balancer.
- gearman: job queue system, is used for long running fire and forget type work.
- Kafka: distributed publish-subscribe messaging system
- robey/kestrel · GitHub: simple, distributed message queue system.
- twitter/finagle · GitHub: fault tolerant, protocol-agnostic RPC system.
- defunkt/starling · GitHub: light weight server for reliable distributed message passing
- tornado: Python web framework and asynchronous networking library.
- Func: secure, scriptable remote control framework.
- evan/mongrel · GitHub: small fast HTTP library and server that runs Rails, Camping, Nitro and Iowa apps.
- paste: set of utilities for web development in Python. Paste has been described as "a framework for web frameworks". the package contains Python modules that help in implementing WSGI middleware.
- jetty: Web server and javax.servlet container, plus support for SPDY, Web Sockets, OSGi, JMX, JNDI, JASPI, AJP and many other integrations.
- pyapns: universal Apple Push Notification Service (APNS) provider.
- node2dm: sending push notifications to Google's C2DM push notification server.
- Groovy: dynamic language for the Java platform
- Grails: full stack, web application framework for the JVM
- 99designs/ergo: lightweight php5 library for request/response routing, controllers and http interaction
- beanstalkd: simple, fast work queue, PHP client for beanstalkd queue
- Unicorn: Rack HTTP server for fast clients and Unix
- alphagov/unicornherder: Unicorn Herder: manage daemonized (g)unicorns
- Sinatra: DSL for quickly creating web applications in Ruby with minimal effort
- GAE, Google App Engine: Lets you run web applications on Google's infrastructure. App Engine applications are easy to build, easy to maintain, and easy to scale as your traffic and data storage needs grow. With App Engine, there are no servers to maintain
- SKYSCAPE: The service provider of choice for Assured Cloud Services
- Jekyll: Simple, blog-aware, static sites.
- Heroku: cloud application platform.
- Clojure: dynamic programming language that targets the Java Virtual Machine.
- JGroups: toolkit for reliable messaging.
- PHP-FPM: alternative PHP FastCGI.
- Symfony: High Performance PHP Framework.
DATABASE, STORAGE, DATA-MINING
- PostgreSQL: most advanced open source database.
- repmgr: open source tools that helps DBAs and System administrators manage a cluster of PostgreSQL databases.
- pgpool Wiki: middleware that works between PostgreSQL servers and a PostgreSQL database client.
- PgBouncer: lightweight connection pooler for PostgreSQL.
- SQLAlchemy: Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
- South: intelligent schema and data migrations for Django projects.
- twitter/gizzard · GitHub: flexible sharding framework for creating eventually-consistent distributed datastores
- cassandra: used for high velocity writes, and lower velocity reads
- hadoop: process unstructured and large datasets, hundreds of billions of rows.
- vertica: used for analytics and large aggregations and joins so they don't have to write MapReduce jobs. (twitter)
- mrjob: Run MapReduce jobs on Hadoop or Amazon Web Services
- Apache Lucene - Welcome to Apache Lucene
- Apache Solr: popular, blazing fast open source enterprise search platform from the Apache LuceneTM project
- fatcache: Memcache on SSD.
- google-cloud-storage: Store, access and manage your data on Google’s storage infrastructure. Take advantage of the scale and efficiency we have built over the years.
- haystack: Facebook photo Infrastructure.
- Netflix/EVCache: distributed in-memory data store for the cloud.
- Elasticsearch: Open Source Distributed Real Time Search & Analytics
- SQL-Server: Microsoft SQL Server.
- Doctrine: PHP libraries primarily focused on providing persistence services
DEPLOY, MONITORING, UTILITIES
- Fabric: library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks.
- boto/boto · GitHub: python interface to aws
- capistrano/capistrano · GitHub: Remote multi-server automation tool.
- puppetlabs/puppet · GitHub: Server automation framework and application.
- Exceptional: Exceptional tracks errors in web apps. It reports them in real-time.and gathers the info you need to fix them fast.
- jzempel/flask-exceptional · GitHub: Exceptional extension for Flask.
- rsync: utility that provides fast incremental file transfer.
- Nagios: The Industry Standard in IT Infrastructure Monitoring
- Munin: networked resource monitoring tool that can help analyze resource trends
- AWStats: powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically
- pingdom: Website monitoring. Monitor your server and network uptime and performance for free.
- sentry: realtime error logging and aggregation platform
- pagerduty: SaaS IT on-call schedule management, alerting and incident tracking.
- scribe: server for aggregating log data that's streamed in realtime from clients. It is designed to be scalable and reliable
- Sphinx: tool that makes it easy to create intelligent and beautiful
- pHash.org: perceptual hash library
- dahlia/wand · GitHub: The ctypes-based simple ImageMagick binding for Python.
- py-bcrypt: strong password hashing for Python.
- mdadm: manage MD devices aka Linux Software RAID.
- twitter/snowflake · GitHub: network service for generating unique ID numbers at high scale with some simple guarantees
- node-readability: Server side readability with node.js
- Netflix/asgard: Web interface for application deployments and cloud management in Amazon Web Services (AWS)
- Netflix/Priam: Co-Process for backup/recovery, Token Management, and Centralized Configuration management for Cassandra.
- Netflix/zuul: edge service that provides dynamic routing, monitoring, resiliency, security, and more.
- Cacti®: The Complete RRDTool-based Graphing Solution.
- LFTP: sophisticated file transfer program.
- lqez/npk: neat package system.
- etsy/statsd: Simple daemon for easy stats aggregation.
- new-relic: Application Performance Management & Monitoring.
- Upstart - Wikipedia, the free encyclopedia
- MapIt: map postcodes and geographical points to administrative areas
- logstash: open source log management.
- Ganglia: scalable distributed monitoring system for high-performance computing systems such as clusters and Grid.
- Jenkins: extendable open source continuous integration server.
- Graphite: Scalable Realtime Graphing.
- Google-Analytics: Web Analytics & Reporting.
- dyn: Managed DNS | Email Delivery | SMTP | Domain Registration
- FontForge: outline font editor.
- DTL FontTools: Dutch Type Library.
- Zendesk.com: Customer Service Software.
You are managing a web
app, built on django framework. Nginx is used as a reverse proxy and hosted on Ubuntu
14.04 server. It uses celery as a task processor, for sending email and downloading
files. The app uses redis as a caching server and PostgreSQL as the main database.
For reporting purposes, the app fetches aggregate data from PostgreSQL and stores in
Elasticsearch. Kafka is used for real-time messaging.
a. You are getting an INTERNAL SERVER ERROR, while accessing the main page
of the app. (a) What might be the possible source of the issue? (b) What
procedure will you follow to fix the issue?