Skip to content

Instantly share code, notes, and snippets.

@danidiaz
Last active February 21, 2024 22:04
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save danidiaz/fd341ed6541e9ffc2f2640670c27b6be to your computer and use it in GitHub Desktop.
Save danidiaz/fd341ed6541e9ffc2f2640670c27b6be to your computer and use it in GitHub Desktop.

Vertical decomposition. Creating cohesive services

One of the biggest misconceptions about services is that a service is an independent deployable unit, i.e., service equals process. With this view, we are defining services according to how components are physically deployed. In our example, since it’s clear that the backend admin runs in its own process/container, we consider it to be a service.

But this definition of a service is wrong. Rather you need to define your services in terms of business capabilities. The deployment aspect of the system doesn’t have to be correlated to how the system has been divided into logical services. For example, a single service might run in different components/processes, and a single component might contain parts of multiple services. Once you start thinking of services in terms of business capabilities rather than deployment units, a whole world of options open.

What are the Admin UI, Admin Backend and Website UI, Website Backend components? They basically act as containers of services. They are maintained by their own teams and their sole purpose is to coordinate between services. These components are business-logic agnostic.

Avoiding Microservice Megadisasters unsure about the approach to search and data duplication

Microservices and Rules Engines – a blast from the past

"search engine, not search service" "allows each microservice to put a component into it, and the search engine will run that set of rules" "what we are talking about here is not the whole microservice, but the search component of that service" "that way, the search engine doesn't need in and of itself access to all of that data directly"

Don't build a distributed monolith

"Don't couple systems with binary dependencies"

Alas this seems to go against the "thinking of services in terms of business capabilities rather than deployment units" principle. If the deployment is intertwined, is seems that there will be binary dependencies.

The Art of the node.js Rescue

The entity service antipattern

Five pieces of advice for new technical leads

The System Design Primer hn

in general, re-organizing the architecture of a system is usually possible - if and only if - the underlying data model is sane.

What is the convention for addressing assets and entities? Is it consistent and useful for informing both security or data routing?

What is the security policy for any specific entity in your system? How can it be modified? How long does it take to propagate that change? How centralized is the authentication?

If a piece of "data" is found, how complex is it to find the origin of this data?

What is the policy/system for enforcing subsystems have a very narrow capability to mutate information?

More than concentric layers The Software Architecture Chronicles

Managing the Complexity of Microservices Deployments

Designing Microservice Architectures the Right Way slides

To Test A System, You Need A Good Design Shunt pattern Two-level test suites?

For a test environment, you can inject an “In-Memory Data Source.” For production, you can use the “HTTP Server Data Source.”

How Contract Tests Improve the Quality of Your Distributed Systems

SOLID Architecture in Slices not Layers

For too long we've lived under the tyranny of n-tier architectures. Building systems with complicated abstractions, needless indirection and more mocks in our tests than a comedy special. But there is a better way - thinking in terms of architectures of vertical slices instead horizontal layers. Once we embrace slices over layers, we open ourselves to a new, simpler architecture, changing how we build, organize and deploy systems.

Scaling without cross-functional teams

Growing Object-Oriented Software, Guided by Tests Without Mocks Unit testing anti-patterns: Structural Inspection

Test automation without a headache: Five key patterns

What’s your release process like?

ndepend and code analysis

Lessons from Building Static Analysis Tools at Google

Writing Documentation When You Aren't a Technical Writer hn. Semantic linefeeds. guidelines.

automated the checks as much as possible with linters [2].

Age of Invisible Disasters

Conforming container antipattern

microfrontends

Break Up With Your Frontend Monolith - Elisabeth Engel

Compositional UIs - the Microservices Last Mile - Jimmy Bogard

Explicitly Yours

jdepend

Before using JDepend, it is important to understand that "good" design quality metrics are not necessarily indicative of good designs. Likewise, "bad" design quality metrics are not necessarily indicative of bad designs. The design quality metrics produced by JDepend should not be used as yard sticks by which all designs are measured.

Reconstructing thalia.de with self-contained systems

Optimizing for iteration speed

one of the most scary thing in software engineering is “inventory” of code that builds up without going into production. It represents deployment risk, but also risk of building something users don’t want. Not to mention lost user value from not shipping parts of the feature earlier (user value should be thought of as feature value integrated over time, not as the feature value at the end state).

Thinking Architecturally

Unit test your Java architecture tweet

If your primary motivation for building microservices is to enforce modular architectures, think twice. Modularity is solved within the JVM (JPMS, OSGi, JBoss Modules; even multimodule builds get you far), don't pay the price of distributed computing + remote calls just for this.

Majestic Modular Monolith!

SonarJS

Apache Kafka als Backend für Webanwendungen?

How Events Are Reshaping Modern Systems by Jonas Bonér

Serverless

Complex Event Flows in Distributed Systems

Designing Events-first Microservices. Journey to Event Driven – Part 1

Microservices in a Post-Kubernetes Era

In the post-Kubernetes era, using libraries to implement operational networking concerns (such as Hystrix circuit breaking) has been completely overtaken by service mesh technology.

the testing renaissance. tweets about testing.

HANDS-ON INTRO TO KUBERNETES & OPENSHIFT.

Tell Don't Ask. more. How Interfaces Are Refactoring Our Code. The art of embugging. GetterEradicator.

AWS Solution Architect Associate exam.

Hybrid Networking Reference Architectures.

Docs as Code – Architekturdokumentation leicht gemacht.

No More Silos: How to Integrate Your Databases with Apache Kafka and CDC.

Streaming Data Clears the Path for Legacy Systems Modernization.

Integrating legacy and CQRS.

Streaming MySQL tables in real-time to Kafka.

Streaming databases in realtime with MySQL, Debezium, and Kafka. Mención en: ¡Larga vida al legacy!.

no ser el dueño del modelo de datos debería ser algo temporal

parece que el "sistema secundario" es solo de lectura en un principio

el siguiente pase, de alguna manera, tienes que ser capaz de modificar el sistema antiguo... arquitecturasa más complejas con bidireccionalidad. no uses bus de eventos. expon servicios en el sistema nuevo, haz que el sistema legado los accede. no uses bus de eventos (?)

sin eventos obligo al software legado a saber a dónde me he llevado ese trocito que le he quitado.

[para sincronizar] podemos usar eventos, triggers...

GETTING STARTED WITH DDD WHEN SURROUNDED BY LEGACY SYSTEMS - Eric Evans. bubble context. strategy 1 - bubble context. strategic design. bounded context.

Listen to Yourself: A Design Pattern for Event-Driven Microservices.

For example, you cannot guarantee that a commit to Cassandra and a message delivery to Kafka would be done atomically or not done at all.

Let’s take a common use case: Updating a local NoSQL database and also notifying a legacy system of record about the activity.

However, there is still a concrete problem: How do you guarantee atomic execution of both the NoSQL writes and the publishing of the event to the message broker?

Note: Potential duplicate messages are always a possibility with a message broker so you should design your message handling to be idempotent regardless of the solution you choose.

All your events and database writes must be idempotent to avoid duplicate records.

The client isn’t guaranteed to read their own writes immediately.

The transaction log tailing pattern can achieve similar results to those described here. Your transactions will be atomic without resorting to two phase commit. The transaction log tailing pattern has the added benefit of guranteeing your database is committed before returning a response to the client.

Pattern: Transaction log tailing.

each step of a saga must atomically update the database and publish messages/events. It is not viable to use a distributed transaction that spans the database and the message broker.

How to solve two generals issue between event store and persistence layer?.

Event-Driven Data Management for Microservices.

One way to achieve atomicity is for the application to publish events using a multi-step process involving only local transactions. The trick is to have an EVENT table, which functions as a message queue, in the database that stores the state of the business entities.

Domain events: simple and reliable solution.

In an event-driven architecture there is also the problem of atomically updating the database and publishing an event.

[my thoughs] are the options: 1 - "listen to yourself" pattern and 2 - "keeping an internal events table"? Also 3 - "log trailing"?

Paypal talk. slides. Streaming Data Microservices. Oracle Golden Gate.

from the slides: 66: XA transactions: ensure consistency, give up availability 67: Event Sourcing: give up read-your-writes consistency (Is this the "listen to yourself" pattern?) 68: Change Data Capture: read-your-writes + eventual consistency across systems

OLAP engines like Apache Druid, LinkedIn's Pinot

"Use Change Data Capture, rather than XA Transactions or Event Sourcing, for replicating data between data systems where consistency is required, etc., such as financial services... Also use schemas"

logs, not queues!

Unlike queues, consumers don't delete entries; Kafka manages their lifecycles

N Consumers, who start reading where they want

Akka Streams, Kafka Streams - libraries for “data-centric microservices”. Smaller scale, but great flexibility

Kubernetes: Your Next Application Server. video.

How to extract change data events from MySQL to Kafka using Debezium.

about architectures.

Openshift secrets management

one step forward, two steps back.

Introduction to the Kubernetes Operator Framework

Keynote: Maturing Kubernetes Operators - Rob Szumski.

DEATH OF LOGGING, HEXAGONAL ARCHITECTURES, TECHNOLOGY AND ARCHITECTURES.

Kubernetes The Database.

Introduction to Cloud Storage for Developers.

building a CI / CD bot with Kubernetes

docker secrets

managing env vars in production comes down to doing one of two things: using an environment file that is securely stored and securely retrieved, or retrieving each key from a secure secrets management service like Vault, Keywhiz or Cyberarc. The former is easier, as it requires less infrastructure, but requires greater care. The latter requires more infrastructure but handles things like role based access for each key more easily

Istio Multicluster on OpenShift

Kafka for long-term storage. so. hn. How Pinterest runs Kafka at scale. Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously. Is Kafta a database?. experimentation with event-based systems. The Magical Rebalance Protocol of Apache Kafka. The Death and Rebirth of the Event-Driven Architecture. ETL Is Dead, Long Live Streams. event-based architectures with Kafka and Atom. Complex Event Flows in Distributed Systems. Restoring Confidence in Microservices: Tracing That's More Than Traces.

This is an important requirement for processes that calculate real-time results but need to periodically recalculate results (say when their processing logic changes).

Something to keep in mind as well is that cluster restarts (especially after unclean shutdowns) might take a very long time, as all logs would need to be checked at broker startup. Apart from that I can't think of large reasons not to do this, though I agree that dumping data to S3/HDFS/similar should be the preferred solution

we use Kafka to transport data to our data warehouse, including critical events like impressions, clicks, close-ups, and repins. We also use Kafka to transport visibility metrics for our internal services

Language-oriented software engineering. tweet.

Using ETL Staging Tables

The future of Kubernetes is Virtual Machines hn

Developing applications on OpenShift in an easier way

Mastering Spring framework 5, Part 2: Spring WebFlux

Day Two Kubernetes: Tools for Operability.

kubernetes guideposts 2019 Simple Multi-tenancy with Django Running on OpenShift

KUBERNETES FAILURE STORIES. hn.

Cloud native Java EE on OpenShift Adam Bien.

making the most of Kubernetes clusters

Scaling a Distributed Stream Processor in a Containerized Environment

Kubinception.

kubernetes vs. docker

rethinking legacy and monolithic systems

"how do I propagate state across asynchronous, reactive execution pipelines?". video. Spring Tips: Testing Reactive Code. RxJava vs Reactor. Reactive Spring: Eine Einführung in die reaktive Programmierung. Point-to-Point Messaging Architecture - The Reactive Endgame. building reactive pipelines tweet. reactive DDD. How (not) to use Reactive Streams in Java 9+. Assembly time Subscription time Execution time) 404. construyendo pipelines reactivos slides. Spring Tips: Reactive MySQL with Jasync SQL and R2DBC. reactive streams operators. RxJava by example. reactive jdbc tweet. 5 reasons to use RxJava in your projects. reactive programming - lessons learned. reactive transactions. reactive-revolution course materials. marble diagrams. reactive streams and Kotlin flows. Event Driven with Spring. How to build Reactive Server in 50 minutes. moving from imperative to reactive. reactive programming - lessons learned. more. slides. Five Things About RxJS and Reactive Programming. Going full reactive with Spring Webflux and the new CosmosDB API v3. reactive streams basic concepts. Streaming data as one additional use case for #reactive programming. building reactive pipelines. the value of reactive systems. reactor. Do's and Don'ts: Avoiding First-Time Reactive Programmer Mines.

Learn Openshift operator framework

certified

In big companies, 95% of apps are still old school: firewall -- load balancer -- 5 front ends -- 3 back ends -- two database servers.

"everybody wants to get rid of ELK for logging quite soon".

airhacks tv 59 docker vs. openshift effective web standards

metrics for the masses

Service Catalog and Kubernetes

Kubernetes declarative object configuration model is one of the most interesting features of the orchestrator

12 ways to get smarter about Kubernetes.

Microservices in a Post-Kubernetes Era.

How Kubernetes can break: networking

Automating stateful applications with Kubernetes operators Reaching for the Stars with Ansible Operator

Why are we templating YAML?.

An Incremental Architecture Approach to Building Systems.

Various links about persistence and DDD:

https://tech.transferwise.com/hibernate-and-domain-model-design/ https://stackoverflow.com/questions/10099636/are-persistence-annotations-in-domain-objects-a-bad-practice https://stackoverflow.com/questions/14737652/entity-objects-vs-value-objects-hibernate-and-spring https://stackoverflow.com/questions/31400432/ddd-domain-entities-vo-and-jpa https://stackoverflow.com/questions/2597219/is-it-a-good-idea-to-migrate-business-logic-code-into-our-domain-model https://stackoverflow.com/questions/821276/why-should-i-isolate-my-domain-entities-from-my-presentation-layer https://softwareengineering.stackexchange.com/questions/350067/is-it-good-practice-to-use-entity-objects-as-data-transfer-objects https://softwareengineering.stackexchange.com/questions/378866/understanding-ddd-when-using-an-orm-such-as-hibernate https://softwareengineering.stackexchange.com/questions/171457/what-is-the-point-of-using-dto-data-transfer-objects https://softwareengineering.stackexchange.com/questions/140826/do-orms-enable-the-creation-of-rich-domain-models https://blog.pragmatists.com/refactoring-from-anemic-model-to-ddd-880d3dd3d45f https://enterprisecraftsmanship.com/2016/04/05/having-the-domain-model-separate-from-the-persistence-model/

Custom Implementations for Spring Data Repositories.

Three-Part Architecture of the Next Generation Data Center Inside NetApp.

Conquering the Challenges of Data Preparation for Predictive Maintenance

Java 9: Bessere Domänenmodelle mit Java-9-Modulen.

Links about the strangler pattern https://news.ycombinator.com/item?id=19122973 strangler pattern https://news.ycombinator.com/item?id=19125333 https://paulhammant.com/2013/07/14/legacy-application-strangulation-case-studies/ https://www.michielrook.nl/2016/11/strangler-pattern-practice/ https://trunkbaseddevelopment.com/strangulation/ https://www.leadingagile.com/2018/10/the-urge-to-stranglethe-strangler-pattern/ https://www.martinfowler.com/bliki/StranglerApplication.html https://twitter.com/martinfowler/status/357142664665251841 https://blog.overops.com/strangler-pattern-how-to-keep-sane-with-legacy-monolith-applications/ https://blogs.sap.com/2017/09/25/strangler-applications-monolith-to-microservices/

Links about DTO mappers https://auth0.com/blog/automatically-mapping-dto-to-entity-on-spring-boot-apis/ https://www.baeldung.com/entity-to-and-from-dto-for-a-java-spring-application http://modelmapper.org/ https://medium.com/@hackmajoris/a-generic-dtos-mapping-in-java-11d649b8a486 https://stackoverflow.com/questions/2828403/dto-and-mapper-generation-from-domain-objects https://stackoverflow.com/questions/14523601/bo-dto-mapper-in-java https://stackoverflow.com/questions/15117403/dto-pattern-best-way-to-copy-properties-between-two-objects https://stackoverflow.com/questions/1432764/any-tool-for-java-object-to-object-mapping https://stackoverflow.com/questions/678217/best-practices-for-mapping-dto-to-domain-object https://codereview.stackexchange.com/questions/64731/mapping-interface-between-pojos-and-dtos https://softwareengineering.stackexchange.com/questions/171457/what-is-the-point-of-using-dto-data-transfer-objects https://www.jhipster.tech/using-dtos/ http://appsdeveloperblog.com/java-objects-mapping-with-modelmapper/ http://www.adam-bien.com/roller/abien/entry/creating_dtos_without_mapping_with The Ping class is a JPA entity and JSON-B DTO at the same time: http://www.adam-bien.com/roller/abien/entry/creating_dtos_without_mapping_with DTOs are also motivated by their typesafe nature. Lacking typesafety, JSON-P JsonObjects are not used as DTOs. https://www.credera.com/blog/technology-solutions/mapping-domain-data-transfer-objects-in-spring-boot-with-mapstruct/ https://rmannibucau.wordpress.com/2014/04/07/dto-to-domain-converter-with-java-8-and-cdi/ https://vladmihalcea.com/the-best-way-to-map-a-projection-query-to-a-dto-with-jpa-and-hibernate/ https://github.com/porscheinformatik/anti-mapper jpa hashCode euquals dilemma

What's new in Spring Data

Running your own DBaaS based on your preferred DBs, Kubernetes operators and containerized storage.

Microservices in a Post-Kubernetes Era

In the post-#Kubernetes era, using libraries to implement operational networking concerns (such as Hystrix circuit breaking) has been completely overtaken by service mesh technology.

Netflix Titus, Its Feisty Team, and Daemons.

Scaling a Distributed Stream Processor in a Containerized Environment

Odo.

Is Shared Database in Microservices actually anti-pattern?. hn

The Whys and Hows of Database Streaming

The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Your migrations are bad, and you should feel bad. hn.

An introduction to distributed systems

Paying Technical Debt at Scale - Migrations @Stripe.

Transaction scripts https://dzone.com/articles/transaction-script-pattern https://stackoverflow.com/questions/16139941/transaction-script-is-antipattern https://gunnarpeipman.com/architecture-design-patterns/transaction-script-pattern/ https://learnbycode.wordpress.com/2015/04/12/the-business-logic-layer-transaction-script-pattern/ http://www.servicedesignpatterns.com/webserviceimplementationstyles/transactionscript http://lorenzo-dee.blogspot.com/2014/06/quantifying-domain-model-vs-transaction-script.html http://grahamberrisford.com/AM%202%20Methods%20support/06DesignPatternPairs/Domain%20Driven%20Design%20v.%20Transaction%20script.htm

Automating applications with @kubernetesio operators

Code in the database vs. code in the application. 2. 3. 4. 5. 6. 7 by Lukas Eder. tweet. 8. reddit. 9. 10. 11. mf.

The big myth perpetrated by architects who don’t really understand relational database architecture (me included early in my career) is that the more tables there are, the more complex the design will be.

feature flags at Twitter

Kubernetes commandments

Kafka running on OpenShift4 using Ceph Block Storage

What's next for Kubernetes.

12 Factors for Cloud Native and Openshift.

Openshift on Azure

domain probes hn

data preparation for predictive machine learning

spring high performance batch processing

Installing Openshift 4 from start to finish. Multiple stages within a Kubernetes cluster.

Migrating a Retail Monolith to Microservices: Sebastian Gauder at MicroXchg Berlin. slides.

microservices gone wrong

Idempotency - challenges and solutions over HTTP

reflections on moving to Kubernetes. advanced kubernetes. when to use kubernetes.

Bringing up an OpenShift playground in AWS

Should that be a microservice? hn.

deploy != release. Testing in Production, the safe way. Deploy != Release (Part 1). Deploy != Release (Part 2). Istio Observability with Go, gRPC, and Protocol Buffers-based Microservices. works in staging. Using Blue-Green Deployment to Reduce Downtime and Risk . NoStaging. How to Deploy Software with Envoy. Reactive REST API Using Spring Boot and RxJava.

Mature Microservices and How to Operate Them.

Reconciling Kubernetes and PCI DSS for a Modern and Compliant Payment System.

become a better software architect

Eoin Woods on Democratising Software Architecture at ICSA 2019

Software architecture is still needed because stakeholders are still around, we need to decide on design tradeoffs and we have several cross cutting concerns in software. In practice, what happens nowadays is having more empowered cross-functional teams and using more lightweight descriptions for architecture than in the past. Difficult to understand and evolve architecture diagrams are now replaced by lightweight C4 and Architecture Decision Records diagrams. Code static and runtime analyses combined with informal documentation in the form of Wiki or powerpoint documents can substitute complex static documents. Tools like sonarqube for static code analysis or jaeger, zipkin, ELK, prometheus/grafana and NewRelic for distributed monitoring and tracing services in production can give an accurate and real time view of code and its architecture.

Architecture decision record

Drinking from the stream. slides.

Streaming IoT Data and MQTT Messages to Apache Kafka.

distributed tracing

DDD Ports and Adapters with Onion architecture, what goes where?

What is left inside a Hexagon is a logic to gather external data, call a decision maker and process result.

Layers, Onions, Ports, Adapters: it's all the same

I've put the UI components (the orange boxes) and the Data Access components (the blue boxes) in the same laye

DDD, Hexagonal, Onion, Clean, CQRS, … How I put it all together

the typical application flow goes from the code in the user interface, through the application core to the infrastructure code, back to the application core and finally deliver a response to the user interface.

while the CLI console and the web server are used to tell our application to do something, the database engine is told by our application to do something

The adapters that tell our application to do something are called Primary or Driving Adapters while the ones that are told by our application to do something are called Secondary or Driven Adapters.

Cockburn on Hexagonal Architecture

The ports and adapters pattern is deliberately written pretending that all ports are fundamentally similar. That pretense is useful at the architectural level. In implementation, ports and adapters show up in two flavors, which I’ll call ‘’primary’’ and ‘’secondary’’, for soon-to-be-obvious reasons. They could be also called ‘’driving’’ adapters and ‘’driven’’ adapters.

good description of ports and adapters

Asymmetry: Configurable Dependency implementation is different for each side. In the driver side, the application doesn’t know about which adapter is driving it. But in the driven side, the application must know which driven adapter it must talk to.

Isn't this just a layered architecture with a different name and drawn differently?. Onion vs. N-Layered Architecture. hexagonal architecture with spring data. video.

The Evolution of Comcast’s Architecture Guild

Application Integration for Microservices Architectures: A Service Mesh Is Not an ESB

Craftconf architecture talk

Real-time Data Processing using Redis Streams and Apache Spark Structured Streaming

majestic modular monoliths

How Netflix Thinks of DevOps

How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. tweet

cloud native apps

clean code, direct style

code with few possible control flow combinations, direct style (can always trace what connects to what), doesn’t violate grep test, comments explain why.

FRP can help increase code cohesion

Lessons Learned Replacing a DI Framework in a Legacy Codebase

DI composition root. What is a composition root in the context of Dependency Injection. more. more. more. more. more. Clean Composition Roots with Pure Dependency Injection (DI). Are We Just Moving the Coupling?

Microservices, Apache Kafka, and domain-driven design

91 global variables in Excel that were protected by one spin lock. No one could unravel the hairball

A Service Mesh Is Not an ESB

A service mesh is only meant to be used as infrastructure for communicating between services, and developers should notbe building any business logic inside the service mesh.

Restoring Confidence in Microservices: Tracing That's More Than Traces

The Potential for Using a Service Mesh for Event-Driven Messaging

Cloud Functions

Aggregating REST and Real-Time Data Sources

Maintainable ETLs. hn

USE THE MOST PRODUCTIVE STACK YOU CAN GET. JAVA'S JOB LISTINGS, JWT, KAFKA, SERVERLESS, STREAMING, JARS IN WARS, THREADS, CODE COVERAGE--63RD AIRHACKS.TV

microfrontends

cloud transactions

the state of Java relational persistence. slides. Spring Data JPA from 0-100 in 60 Minutes

TRANSACTIONS, J2EE, JAVA EE, JAKARTA EE, MICROPROFILE AND QUARKUS

temporal modelling

regarding bad internal technology. HN.

Fast key-value stores: An idea whose time has come and gone. hn

Getting value out of your monad

some best (?) practices

the Challenges of Operationalizing Microservices

Mistakes we made adopting event sourcing

And if you store events with both an event_timestamp and effective_timestamp, you get bi-temporal state for free too. Invaluable when handing a time series of financial events subject to adjustments and corrections. For instance, backdate interest adjustments due to misbooked payments, recalculate a derivatives trade if reported market data was initially incorrect, calculate adjustments to your business end of month P&L after correcting errors from two months ago.

as time goes by - technical challenges of bi-temporal Event Sourcing. same talk. event sourcing with bi-temporal data

The evolution of the Shopify codebase

forging a functional enterprise

Feature Flags and Test-Driven Design: Some Practical Tips

Design Techniques for Building #Streaming Data, Cloud-Native Applications

lots of lambdas. tweet.

How to get along with HATEOAS

Von Service-orientierten Architekturen (SOA) zu DDD und Microservices

When we try to force a service decomposition that isn't really there, we freeze today's technical design into an organizational design

Updating Materialized Views and Caches Using Kafka

the good parts of aws

Event-sourcing at Nordstrom: Part 2

apache kafka tutorials

TEMPORAL MODELLING

envers vs. debezium

Software Architecture Guide

How To Keep the Layers of your Spring App Separate using Integration Tests

MySQL CDC with Apache Kafka and Debezium

Message transformations for change data capture

Modern applications at AWS

Dependency Management and Versioning With a Maven Multi-Module Project

From PHP to transactions - airhacks

perhaps not a good idea

what microservices are

performance matters

Package by feature or by layer

hexagonal architecture in practice more more

intimidated by the sheer breadth of #DDD

DDDTrouble

"Building audit logs with change data capture and stream processing"

you can't have a rollback button. Rolling Forward and other Deployment Myths

Debezium resources

software archeology

CDC, @debezium Streaming and @apachekafka an http://airhacks.fm episode

How many storage devices does a workload require?

Battle of the circuit breakers

Streaming Database Changes with Debezium by Gunnar Morling slides

CQRS

Kafka Streams: Topology and Optimizations

Build your own X

How to sleep at night having a cloud service: common Architecture Do's

"Stop Mapping Stuff in Your Middleware" Logic in the database vs. logic in the application

The dark side of events. Finding your service boundaries. Monolith Decomposition Patterns. the usefulness of pre-allocating ids at the beginning. Event-Driven Microservices, the Sense, the Non-sense and a Way Forward

Kafka stream workshop and slides.

The Configuration Complexity Curse – Don’t Be a YAML Engineer

Step Away From The Database - A step-by-step example of how to introduce Hazelcast into an existing database backed application.

auto-formatting @java source code as part of the build process is a blessing

Have you built applications following #DDDesign principles, using #JPA for persistence?

To Domain Driven Design

Vertical Slices. Out with the Onion, in with Vertical Slices. The Importance of Vertically Slicing Architecture. Vertical Slice Architecture. APLICA VERTICAL SLICE. Why vertical slice architecture is better. Our architecture is a mess! Are you sure?

Ensuring rollback safety during deployments. Dealing with safely rolling forward and rolling back stateful services isn't something people talk about much, if at all. It's the sort of thing that gets hand-waved away.

Java Cloud Native Starter or Kubernetes, OpenShift, istio, Postgres, Clouds, Backend for Frontend, vue.js and MicroProfile

To DTO or not to DTO

Azure for AWS specialists

Our setup of Prometheus and Grafana (as of the end of 2019)

A Thought Experiment: Using the ECS Pattern Outside of Game Engines. cache-friendliness

CSRF, XSS, JWT, REACTIVE DATABASES, TX AND WEBSOCKETS, JSON-B, OPENSHIFT

Practical Change Data Streaming Use Cases with Apache Kafka & Debezium

Qualities of a Highly Effective Architect

Plumbing At Scale

END-TO-END ARGUMENTS IN SYSTEM DESIGN

How I write backends. hn

The Many Faces of Modularity

3 database architectures

Modular Monolithic Architecture

Monoliths are the future

2020 Predictions

The Let It Crash Philosophy Outside Erlang

Scaling to 100k Users. complexity

Data Modernization for Spring-based Microservices

Why did disabling hyperthreading make my server slower?

Modularity does not have to be fancy. It could be as simple as using DDD and intelligent package naming

Testing Microservices: an Overview of 12 Useful Techniques - Part 1

Data-oriented architecture

git-flow vs GitHub flow

This is not the class of software that I had in mind when I wrote the blog post 10 years ago. If your team is doing continuous delivery of software, I would suggest to adopt a much simpler workflow (like GitHub flow) instead of trying to shoehorn git-flow into your team.

If, however, you are building software that is explicitly versioned, or if you need to support multiple versions of your software in the wild, then git-flow may still be as good of a fit to your team as it has been to people in the last 10 years. In that case, please read on.

Branching is a core concept in Git, and the entire GitHub flow is based upon it. There's only one rule: anything in the master branch is always deployable.

Ready for changes with Hexagonal Architecture | Netflix Tech Blog

Re-architecting 2-tier to 3-tier

Builders, fluent builders

Your database as an API

becoming an architect

GOTO 19 "Good Enough" Architecture . Monolith Decomposition Patterns • Sam Newman Building Resilient Frontend Architecture • Monica Lent

Systems design for Advanced Beginners

humble guide to database schema design

"API-first"

Beyond the Distributed Monolith

Should services always return DTOs, or can they also return domain models?. Service layer returns DTO to controller but need it to return model for other services. Pass DTO to service layer. Map DTOs and Models within the Service Layer due to a missing Business Layer. Entity To DTO Conversion for a Spring REST API. LocalDTO (2004)

Not only you could pass DTO objects to Service Layer, but you should pass DTO objects instead of Business Entities to Service Layer.

Your service should receive DTOs, map them to business entities and send them to the repository. It should also retrieve business entities from the repository, map them to DTOs and return the DTOs as reponses. So your business entities never get out from the business layer, only the DTOs do.

Some people argue for them as part of a Service Layer API because they ensure that service layer clients aren't dependent upon an underlying Domain Model. While that may be handy, I don't think it's worth the cost of all of that data mapping. As my contributor Randy Stafford says in P of EAA "Don't underestimate the cost of [using DTOs].... It's significant, and it's painful - perhaps second only to the cost and pain of object-relational mapping".

Let's now look at a service level operation – which will obviously work with the Entity (not the DTO)

@GetMapping @ResponseBody public List getPosts(...) { //... List posts = postService.getPostsList(page, size, sortDir, sort); return posts.stream() .map(this::convertToDto) .collect(Collectors.toList()); }

service layer (old)

Enterprise applications typically require different kinds of interfaces to the data they store and the logic they implement: data loaders, user interfaces, integration gateways, and others. Despite their different purposes, these interfaces often need common interactions with the application to access and manipulate its data and invoke its business logic. The interactions may be complex, involv-ing transactions across multiple resources and the coordination of several responses to an action. Encoding the logic of the interactions separately in each interface causes a lot of duplication.

Presentation Model (old)

[ON DTOS (old)] much conflicting information!

DTOs are only created when their structure significantly differs from the that of the entity. In all other cases the entity itself is used. The cases when you don’t want to show some fields (especially when exposing via web services to 3rd parties) exist, but are not that common. This can sometimes be handled via the serialization mechanism – mark them as @JsonIgnore or @XmlTransient for example

Don’t use the mappers/entity-to-dto constructors in controllers, use them in the service layer. The reason DTOs are used in the first place is that entities may be ORM-bound, and they may not valid outside a session (i.e. outside the service layer).

performance of Java mapping frameworks

Creating large Java applications composed of multiple layers require using multiple models such as persistence model, domain model or so-called DTOs. Using multiple models for different application layers will require us to provide a way of mapping between beans.

Dozer is a mapping framework that uses recursion to copy data from one object to another. The framework is able not only to copy properties between the beans, but it can also automatically convert between different types.

sooo subsetting and type conversions are an important part of mapping frameworks?

to DTO or not to DTO

DTO : Hipster Ou Dépassé ?

The magic behind the Dependency Injection of Quarkus

Java & SQL - stronger together overview of the lasagna

Domain Events Versus Change Data Capture

From Batch to Streaming to Both. Internet of Tomatoes: Building a Scalable Cloud Architecture.

newbie architect

Software Architecture for Agile Enterprises

One thing I never liked about ORMs and OGMs: Let the application define the database scheme, indexes or constraints.

Haskell arch

Domain event

Event-Driven Architectures for Spring Developers

Adoption of Cloud Native Architecture, Part 2: Stabilization Gaps and Anti-Patterns

Things I Wish I’d Known About CSS

Dividing front end from back end is an antipattern (?)

Domain-Oriented Microservice Architecture

For any event-based system, the message structures it exposes to external consumers are its public interface. Evolve their schemas with the same care and attention to backwards compatibility as your synchronous APIs.. cdc breaks encapsulation

How to design a REST API that can “prompt” the client about long-running operations?

Building dashboards for operational visibility

Inside the Hidden World of Legacy IT Systems.

I've tackled legacy systems my entire career, and there is a certain art to untying the knot of dependencies, procedures, and expectations.

It’s painful to see the people who know all the hotkeys and key sequences on an old green terminal suddenly thrust into a world of mouse hunting and clicking. It makes you wonder if new things are really better.

HLint evolution

Under Deconstruction: The State of Shopify’s Monolith

To Microservices and Back Again very good.

Design Microservice Architectures the Right Way (2018)

Event Sourcing You are doing it wrong (2018) See the papers mentioned at the end: the dark side of event sourcing. versioning in an event sourced system.

Moving BBC Online to the cloud

Go in Production – Lessons Learned

Asynchronous Task Scheduling at Dropbox

If you have the opportunity, please do not build it like this. Referring to the architectural diagram, it is going to be much more efficient for the "Frontend" to persist the task data into a durable data store, like they show, but then the Frontend should simply directly call the "Store Consumer" with the task data in an RPC payload. There is no reason in the main execution path why the store consumers should ever need to read from the database, because almost all tasks can cut-through immediately and be retired. Reading from the database should only need to happen due to restarts and retries of tasks that fail to cut through.

Terrible Source code

One of the best, and first, things we did when starting our machine learning platform was to design it using a plugin architecture. There's a lot of scar tissue and horrible experience through our previous ML products we built for enterprise. Namely, it was extremely hard to onboard new developers to work on the product. They had to understand the whole thing in order to contribute.

Not Just Events: Developing Asynchronous Microservices. Creating event-driven microservices: the why, how and what.

Haskell app architecture.

Stored Procedures as a Back End

sagas - Azure reference architectures. sagas for consistency. 2. Not Just Events: Developing Asynchronous Microservices. Battle-tested event-driven patterns for your Microservices archit. Opportunities and Pitfalls of Event-driven Utopia

Clean Architecture Boundaries with Spring Boot and ArchUnit

Clean Architecture with Spring by Tom Hombergs

If All You Have Is a Database, Everything Looks Like a Nail. HN.

Soon, there was an established trend that increased the entropy and intertwining of applications and tables. It became common to have transactional updates across tables for different apps.

Sometimes people stage read-only copies of tables. These are asynchronously updated from the authoritative owning application. Other applications then “own” the read-only copy in their application set of tables.

is only good advice if the tables are application specific data and you don't do microservices in that stupid braindead way that makes it so that everything from the admin panel to data visualization are their own "applications" with their own databases and doing things that would be even the simplest of queries becomes a project in writing what are effectively bad performance joins via random http APIs. IE have a data model and understand where the most painless boundaries are, don't throw up dozens of DBs for the hell of it.

Look into the patterns of CQRS, event sourcing, flow based programming and materialized views. GraphQL is an interface layer, but you still have to solve for the layer below. API composition only works when the network boundary and services are performance compatible to federate queries. The patterns above can be used to work around the performance concern at a cost of system complexity.

Don't forget the part where the queries are impossible to test, because you can't spin up real instances of all 15 APIs in a test environment, so all the HTTP calls are mocked and the responses are meaningless!

A lot of posters here seem to have been deeply burned from microservices designed along the wrong lines. I mean, sure, it happens. You're going to make mistakes just like you can misjudge how to separate concerns in a set of classes. It shouldn't be an issue to fix it. Maybe some teams focus on pure separation before they have a solid design? Maybe its just a culture of feeling like service boundaries can never change?

There’s a model of software based around shipping events around, and subscriptions between systems. The purposes of separation are at least a couple important, perhaps you know. Each has a DB, often embedded, that is suitable and materialized from the subscriptions and its own; mutated predictably.

Software Design for Flexibility book.

Engineers who participated in originally building a system are often magnitudes faster in fixing bugs and building features that engineers that joined later.

logs

But in practice, the accumulation of cold data on a local disk is where this starts to hurt, particularly if that has to serve read traffic which starts from the beginning of time (i.e your queries don't start with a timestamp range).

KSQL transforms does help reduce the depth of the traversal, by building flatter versions of the data set, but you need to repartition the same data on every lookup key you want - so if you had a video game log trace, you'd need multiple materializations for (user) , (user,game), (game) etc.

  1. Write a event recording a desire to checkout. 2) Build a view of checkout decisions, which compares requests against inventory levels and produces checkout results. This is a stateful stream/stream join. 3) Read out the checkout decision to respond to the user, or send them an email, or whatever.

CDC is great and all, too, but there are architectures where ^ makes more sense than sticking a database in front.

Admittedly working up highly available, stateful stream-stream joins which aren't challenging to operate in production is... hard, but getting better.

Unpopular opinion: SQL is better than GraphQL. some good aspects. better for trees and DAGs. level limitation. what about using views?. another (older) comparative

GraphQL is better when what you are requesting is best expressed as a tree (or a "graph", though only the DAG variety). This is not always the case, but it very often is when building API:s for production use.

Of course, you can express tree structures in table form, but it is not very convenient for clients to consume. In particular if your client is rendering nested component views, what you want is very often something hierarchical.

performance is more predictable, exactly because the language is more restricted. You can't just join in all the things, or select billions of rows by accident. The schema dictates what is allowed.

you can't request a tree structure (eg: a menu with submenus) with an unknown number of levels.

You don't have to expose your entire schema, instead expose carefully designed SQL views (so you can refactor your tables without breaking your API)

Lessons Learned from Reviewing 150 Infrastructures

sharing transactions and persistence contexts across module boundaries -- yea or nay?

Data architecture vs backend architecture

are queues overkill?

Bit little guide to message queues.

How we rebuilt the Walmart Autocomplete Backend

good checklist

React created roadblocks in our enterprise app. original link. Using react in enterprise contexts. The "seams" link in that last one is interesting as well.

Software engineering topics I changed my mind on

Architecture.md. ADR.

reworking of GHC's errors nice architectural choice to avoid cyclic dependencies

The complexity that lives in the GUI

Why microservices: part 5

The Database Inside Your Codebase

You probably don’t need a micro-frontend

Developing microservices with aggregates

Modules, monoliths, and microservices. hn.

Why isn't Godot an ECS-based game engine? . lobsters.

testing quarkus

bbc and serverless

Capturing Every Change From Shopify’s Sharded Monolith

Backpressure in Reactive Systems

necessarily microservices but something akin to serverless functions running on a managed platform

In praise of --dry-run

Software Architecture Design for Busy Developers

kafka

The pedantic checklist for changing your data model in a web application

database migrations and continuous delivery

Zero-downtime schema migrations in Postgres using views

Notes on streaming large API responses

don't forget structure and then try to remember it

Microservices and Cross-Cutting Concerns

Qualities of a Highly Effective Architect

events, not webhooks

The Database Ruins All Good Ideas

Thinking in Events: From Databases to Distributed Collaboration Software

On the Evilness of Feature Branching

Changes tend to be made higher up in the stack, ultimately the UI, because that has a lower risk of breaking something else. This gets very messy very fast.

How much business logic should be allowed to exist in the controller layer?. How accurate is “Business logic should be in a service, not in a model”?. Why put the business logic in the model?.

requirements

Soliciting requirements is a iterative process, starting at an abstract level and diving down as you iterate. It is a data pull from the stakeholders; so it is about asking a ton of questions, several different ways, and becoming more tactical as you go along.

Solving the double (quintuple) declaration Problem in GraphQL Applications

Domain services (2012). Services in Domain-Driven Design (DDD).

application services which act as a facade. Application services are simple classes which have methods corresponding to use cases in your domain

When a significant process or transformation in the domain is not a natural responsibility of an ENTITY or VALUE OBJECT, add an operation to the model as standalone interface declared as a SERVICE. Define the interface in terms of the language of the model and make sure the operation name is part of the UBIQUITOUS LANGUAGE. Make the SERVICE stateless.

Retry long-running message processing in case of processing node failure.

Are Repositories implementations part of my domain? Should repositories have SQL queries?

DDD repositories in application or domain service

What does Unsplash cost in 2019?

Detect potential AWS costs savings

AWS Cost Management

Best Practices Design Patterns: Optimizing Amazon S3 Performance

Automatic Feedback-Directed Optimization for Warehouse-Scale Applications . tweet

Announcing the new pricing plan for AWS Config rules

AWS costs every programmer should know. hn.

reducing computing costs

Hakuna Cloud – Stop cloud servers when they are not in use

BigQuery best practices

understanding data transfer in AWS

AWS facts

multi-cloud

Bank of America's CEO says it's saved $2B per year by building its own cloud

Prevent Unnecessary Expense from Amazon Web Service (AWS) by Demystifying Its Cost Structure

unbundling AWS

Would add that edge compute, running cloud paradigms (code instead of config; automation; management abstractions), partially addresses these limitations for many use cases.

Costly to move the data off, but longer-term ROI for those orgs that are willing to make long-term decisions.

Meanwhile, as edge matures, greenfield apps should be edge-centric, rather than cloud-centric (doesn't mean they won't have cloud components...they will do the processing and storage where it best makes sense).

Once you have significant data on AWS it costs you so much to transfer it you are stuck with them. Their data fees are insane, and so are their storage fees.

The ominous opacity of the AWS bill – a cautionary tale

Cloud bandwidth costs are a rip off

the Amazon premium

IT operation costs traps

How to compete with AWS

comparison of Cloud provider costs

We use Kubernetes and spot instances to reduce EC2 billing

the only type of API services I will ever use

It’s 3 times more expensive to send 1TB of data out of Amazon EC2 than it is to buy a 1TB drive from Amazon.

AWS cost explorer

Hotel California for your data

Mastering AWS Cost Optimization: Real-world technical and operational cost-saving best practices

How to burn the most money with a single click in Azure

spot instances

How we reduced our Google Maps API cost by 94%

princing calculator for AWS

why Zoom chose Oracle

validate your pricing name

oops

free to send, costly to retrieve

costs saver

A Developer’s Guide to Cloud Costs

AWS budget actions

Is a billion dollars worth of server lying on the ground?. reddit.

If you need a predefined small number of VMs and no other functionality, it would be silly to go with AWS. But on the other hand, if you want a set of servers of a given class spawning on demand, with traffic coming in via load balancers, with integrated certificate and DNS management, with programmable lifecycle hooks, with integrated authn/authz, with full audit logs and account management, with configurable private networking to other services, etc. etc. ... You'll pay more than the price difference for someone to implement all of that from scratch.

5 IT Operations Cost Traps and How to Avoid Them

Taking Control of Confusing Cloud Costs

The Various Billing Philosophies of AWS

create an estimation

GCP Billing Budgets that send Pub/Sub notifications to Functions

Please fix the AWS free tier before somebody gets hurt

huge bills while learning

cloud cost podcast

a black hole of unpredictable spend, according to new report

Is the unit of compute a "machine" or is it a millisecond of CPU and a GB of memory?

CSS-tricks

Three-sided border

CSS z-index Property

Note: z-index only works on positioned elements (position:absolute, position:relative, or position:fixed).

CSS Grid in IE: Debunking Common IE Grid Misconceptions

#5: Columns of Equal Height: Super Simple Two Column Layout

https://stackoverflow.com/questions/3298746/apply-different-css-stylesheet-for-different-parts-of-the-same-web-page

Combining multiple CSS files without conflicting code?

Guidelines for better and faster CSS

Guidelines for Brutalist Web Design HN

css-nesting request to pick up the css-nesting proposal

The problem with CSS pre-processors What Will Save Us from the Dark Side of CSS Pre-Processors? WHY I'M (STILL) AGAINST SASS & LESS

Modern CSS Explained For Dinosaurs

CSS Utility Classes and "Separation of Concerns"

You might not need a CSS framework

Bootstrap

Transclusion in self-contained systems

At first, that sounds obvious, especially when you pretend that styles and scripts are isolated. Unfortunately they aren’t. However, if you manually provide for the highest possible isolation, for example by preventing collisions of CSS selectors (e.g. using system-specific HTML class prefixes), you can come to grips with this problem.

css isolation - How To Isolate a div from public CSS styles? - CSS isolation: there has got to be a better way - Sandbox local HTML/CSS code snippets inside an iframe (for style guides/pattern libraries) - How to isolate CSS styles to one area?

Some JavaScript libraries provide a noConflict mode. The technique is very simple: when the library initially loads, it keeps a copy of the global variable sat where it wants to live. When noConflict is called, the library puts the old global variable back where it was.

Restrict CSS applying on a particular div Reset/remove CSS styles for element only

A Vision for Our Sass

Basic concepts of flexbox

When we describe flexbox as being one dimensional we are describing the fact that flexbox deals with layout in one dimension at a time — either as a row or as a column. This can be contrasted with the two-dimensional model of CSS Grid Layout, which controls columns and rows together.

writing-mode

The writing-mode CSS property defines whether lines of text are laid out horizontally or vertically, as well as the direction in which blocks progress.

Introduction to the CSS basic box model

Layout and the containing block

CSS Border-Image

My Favorite Ways of Centering With CSS

Bootstrap 4.1.2 released hn

9 CSS in JS Libraries You Should Know in 2018

Layoutit – An interactive CSS Grid generator HN

Automatically remove unused css from Bootstrap or other frameworks

Constructing Modern UIs with SVG - Tim G. Thomas

USING CSS GRID WHERE APPROPRIATE

https://www.mikecr.it/ramblings/functional-css/

https://news.ycombinator.com/item?id=18083508

https://lobste.rs/s/fqkodg/defense_functional_css

https://tailwindcss.com/docs/what-is-tailwind/

https://adamwathan.me/css-utility-classes-and-separation-of-concerns/

https://www.reddit.com/r/programming/comments/9j4cab/in_defense_of_functional_css_mike/

https://www.reddit.com/r/css/comments/9imhlv/its_2018_you_shouldnt_be_writing_vanilla_css/

https://itnext.io/what-is-modular-css-659949e23534

http://getbem.com/introduction/

Tailwind: A Utility-First CSS Framework.

CSS Layout cookbook. HN.

Incomplete List of Mistakes in the Design of CSS. hn.

Difference between justify-content vs align-items?. What the flex is the difference between justify-content, align-items, and align-content?!. A Quick Way to Remember the Difference Between justify-content and align-items. Demystifying CSS alignment. justify-items. aligning items in a flex container. reddit. CSS justify-content Property.

Centering in CSS: A Complete Guide.

Keeping CSS short with currentColor.

Difference between justify-content vs align-items?.

The Lowdown on :before and :after in CSS.

absolute vs. relative positioning. more.

You're using wrong. lobsters.

Electron and the Decline of Native Apps.

When To Use The Button Element

animated grid layout

Why is CSS so damn hard?

Houdini and the Paint API

the state of CSS

things about css

Digging Into The Display Property: The Two Values Of Display

css grid & tables

SVG Properties and CSS

the state of css 2019

every layout

CSS Houdini & The Future of Styling by Una Kravets

How to Make Your Website Not Ugly - basic UX for programmers

https://medium.com/@wendersyang/what-the-flex-is-the-difference-between-justify-content-align-items-and-align-content-5fd3694f5259

https://stackoverflow.com/questions/35049262/difference-between-justify-content-vs-align-items

https://css-tricks.com/almanac/properties/a/align-items/ https://developer.mozilla.org/es/docs/Web/CSS/align-items https://www.youtube.com/watch?v=GsSk9zv19AE How To Overlay One Div Over Another Div Using CSS

The z-index will only change stacking order, not the x & y positioning. Relative positioning can be used to offset elements in x y space and create an overlap. But you may have to think carefully how you apply the offsets to keep responsive.

https://css-tricks.com/absolute-relative-fixed-positioining-how-do-they-differ/ https://dzone.com/articles/css-position-relative-vs-position-absolute https://developer.mozilla.org/en-US/docs/Web/CSS/position https://stackoverflow.com/questions/2027657/overlapping-elements-in-css

frontend design, react, and a bridge over the great divide

The CSS background-image property as an anti-pattern. HN

top 10 css mistakes

This Ain’t Disney: A practical guide to CSS transitions and animations. HN

Resilient CSS: 7-part Series. tweet

The Differing Perspectives on CSS-in-JS

Layout Land. resilient CSS

Bert Bos & Håkon Wium Lie | CSS Reset | CSS Day 2017

Pseudo-classes

The progression of CSS layouts

In Search of the Holy Grail (2006) old, obsolete.

5 ways to vertically center with CSS

Using whitespace to make our designs look better

https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/

https://twitter.com/tailwindcss

In Defense of Utility-First CSS hn CSS Utility Classes and "Separation of Concerns" hn

Coping with Flexbox

7 Uses for CSS Custom Properties

flexbox woes

Old CSS, New CSS

Using CSS custom properties (variables)

intrinsic sizing in CSS

resilient CSS

Layout-isolated component

specifity in css - a rebuttal

Facebook's CSS-in-JS Approach. CSS Containment Now a Web Standard.

don't design for mobile

render-blocking JavaScript and CSS

display: 'flex', justifyContent: 'center'

Java 9 on Java EE 8 Using Eclipse and Open Liberty

JAVA EE 8 ON JAVA 9 - FROM INSTALL TO DEPLOYMENT WITH OPENLIBERTY SERVER

Build your own open source Eclipse MicroProfile

Graceful Shutdown Spring Boot Applications

How to bootstrap JPA programmatically without the persistence.xml configuration file

Difference Between BeanFactory and ApplicationContext in Spring

Build your own open source Eclipse MicroProfile

The magic of Spring Data

Understanding Jakarta EE: “Modularity is key to faster release cycles”

Spring Boot – Best Practices

10 Spring Boot Security Best Practices

Deep Dive into JUnit 5 Extension Model

Ten Things You Can Do With GraalVM

Java Performance Puzzlers

Exploring Java 9: The Key Parts

Build a MySQL Spring Boot App Running on WildFly on an Azure VM

BORING ENTERPRISE JAVA

JWT and Scalability, JSON-B Configuration, Bulk Data and JAX-RS, EclipseLink, Hibernate and Schema Validation, designing distributed storage applications, Payara Dockerfile explanation twitter

Consumer Driven Contract with Spring Boot

JDBC in Java, Hibernate, and ORMs: The Ultimate Resource

Helidon

Building an offline app from scratch and with web standards only..

sts 4

kubernetes para desarrolladores java

airhacks

A NOTE ON DATA TRANSFER OBJECTS (DTO)S.

[frameworks for Java application development.

Mastering Spring framework 5, Part 2: Spring WebFlux.

Oracle Code One 2018.

Live-Coding Web Apps (PWAs)—Without Frameworks.

Guide to "Reactive" for Spring MVC Developers.

HOW TO STRUCTURE JAKARTA EE APPLICATIONS FOR PRODUCTIVITY WITHOUT BLOAT.

Designing the infrastructure persistence layer.

Can you have multiple transactions within one Hibernate Session?.

A hibernate session is more or less a database connection and a cache for database objects.

A Session is an inexpensive, non-threadsafe object that should be used once and then discarded for: a single request, a conversation or a single unit of work.

How do I get the connection inside of a Spring transaction?.

The transaction manager is completely orthogonal to data sources. Some transaction managers interact directly with data sources, some interact through an intermediate layer (eg, Hibernate), and some interact through services provided by the container (eg, JTA).

Class HibernateTransactionManager.

PlatformTransactionManager implementation for a single Hibernate SessionFactory. Binds a Hibernate Session from the specified factory to the thread, potentially allowing for one thread-bound Session per factory. SessionFactory.getCurrentSession() is required for Hibernate access code that needs to support this transaction handling mechanism, with the SessionFactory being configured with SpringSessionContext.

Note: To be able to register a DataSource's Connection for plain JDBC code, this instance needs to be aware of the DataSource (setDataSource(javax.sql.DataSource)). The given DataSource should obviously match the one used by the given SessionFactory.

This transaction manager is appropriate for applications that use a single Hibernate SessionFactory for transactional data access, but it also supports direct DataSource access within a transaction (i.e. plain JDBC code working with the same DataSource). This allows for mixing services which access Hibernate and services which use plain JDBC (without being aware of Hibernate)! Application code needs to stick to the same simple Connection lookup pattern as with DataSourceTransactionManager (i.e. DataSourceUtils.getConnection(javax.sql.DataSource) or going through a TransactionAwareDataSourceProxy).

Interface Session.

The main runtime interface between a Java application and Hibernate. This is the central API class abstracting the notion of a persistence service.

The lifecycle of a Session is bounded by the beginning and end of a logical transaction. (Long transactions might span several database transactions.)

It is not intended that implementors be threadsafe. Instead each thread/transaction should obtain its own instance from a SessionFactory.

Class DataSourceUtils.

Is aware of a corresponding Connection bound to the current thread, for example when using DataSourceTransactionManager. Will bind a Connection to the thread if transaction synchronization is active (e.g. if in a JTA transaction).

Hibernate commit() and flush().

flush() will synchronize your database with the current state of object/objects held in the memory but it does not commit the transaction. So, if you get any exception after flush() is called, then the transaction will be rolled back. You can synchronize your database with small chunks of data using flush() instead of committing a large data at once using commit() and face the risk of getting an Out Of Memory Exception.

commit() will make data stored in the database permanent. There is no way you can rollback your transaction once the commit() succeeds.

One common case for explicitly flushing is when you create a new persistent entity and you want it to have an artificial primary key generated and assigned to it, so that you can use it later on in the same transaction. In that case calling flush would result in your entity being given an id.

Commit(); Commit will make the database commit.When you have a persisted object and you change a value on it, it becomes dirty and hibernate needs to flush these changes to your persistence layer.So You should commit but it also ends the unit of work.transaction.commit()

It is usually not recommended to call flush explicitly unless it is necessary. Hibernate usually auto calls Flush at the end of the transaction and we should let it do it's work. Now, there are some cases where you might need to explicitly call flush where a second task depends upon the result of the first Persistence task, both being inside the same transaction.

Hibernate sessions and transaction management guidelines.

Hibernate sessions are not thread-safe. Not only does this mean you shouldn’t pass a Hibernate session into a new thread, it also means that because objects you load from a session can be called from (and call back to) their owning session, you must not share Hibernate-managed objects between threads. Once again, try to only pass object IDs, and load the object freshly from the new thread’s own session.

Spring’s transaction management places the Hibernate session in a ThreadLocal variable, accessed via the sessionFactory. All Confluence DAOs use that ThreadLocal. This means that when you create a new thread you no longer have access to the Hibernate session for that thread (a good thing, as above), and you are no longer part of your current transaction.

MicroProfile, the microservice programming model made for Istio.

Spring Boot in a Container.

From Jakarta EE over MicroProfile to Serverless: Interactive Onstage Hacking.

Full Stack Reactive Java con @ProjectReactor!. tweet.

2019 predictions.

Jakarta EE MicroProfile WebStandards, On Stage Hacking noslides by Adam Bien

JSON-P: REMOVING A SLOT FROM A JSONOBJECT WITH JSONPATCH.

Pagination and Sorting With Spring Data JPA

tweet

So. The correct way to fix someone else's Spring Boot setup is to just try random annotations until it works, right?

EXCEPTION HTTP STATUS MAPPING WITHOUT MAPPERS

i18n in Java 11, Spring Boot, and JavaScript

searching in a distributed world

OPTIMIZING FOR HUMANS, NOT MACHINES

jee

the Spring Framework Early Days, Languages Post-Java, & Rethinking CI/CD

how fast is spring?

java in the 21 century

webmvc.fn

Spring tips - dinamic views

MULTIPLE CACHE CONFIGURATIONS WITH CAFFEINE AND SPRING BOOT

Caching is key for performance of nearly every application. Distributed caching is sometimes needed, but not always. In many cases a local cache would work just fine and there’s no need for the overhead and complexity of the distributed cache.

Using ConfigMaps to configure MicroProfile / Java EE 8 applications

A Quick Guide to Spring Boot Login Options

Reactive Transactions with Spring

@ComponentScan on a @Service class in #Spring actually works and contributes to the application context

Basic Concepts: @Bean and @Configuration

Build a Spring Boot App With Flyway and Postgres

Spring boot tips. video.

Event Driven Microservices with Axon and Spring Boot

the proxy fairy and the magic of Spring

spring boot internals

I DON'T YOUR DEPENDENCY INJECTION

correspondences

bootiful podcast hateoas

Installing Jenkins, creating S2I build, setting up a CD pipeline, building, deploying and testing a Java EE / Jakarta EE / MicroProfile service (twice) and configuring the readiness probe ...in 7 minutes

Spring Cloud Data Flow

Things that should not appear in Java code in 2019

(about no getters) Most libraries (Jackson, Spring, etc) have supported direct field access for a while.

What's new in Spring Framework 5.x

CODE SHRINKING TECHNIQUES WITH JAKARTA EE AND MICROPROFILE--DEVOXX

live refactoring session

polyglot microservice example using #helidon and #graalVM

web frameworks

From Spring Boot apps to functional Kotlin

From Spring Boot apps to functional Kotlin

Modernize and optimize Spring Boot applications

dynamic CDS archives

Understanding Low Latency JVM GCs - Jean-Philippe BEMPEL

The Lean, Mean... OpenJDK?

JAKARTA EE 8: LINKS AND RESOURCES

JAVA EE IS DEAD

The definite guide to Java agents

Writing controllers

CODE SHRINKING WITH QUARKUS AND PANACHE ORM

Live-Coding Web Apps (PWAs)—Without Frameworks

henandoah – ultra-low Pause Time Garbage Collector

How Quarkus brings imperative and reactive programming together

How to Get Productive with Spring Boot

Quarkus and CORS

My view is that a vast majority of applications are fine as monoliths

Well secured and documented REST API with Eclipse Microprofile and Quarkus

Configuring a Main Class in Spring Boot

Back to Shared Deployments

Autoconfigurations In-Depth

Part III: Read Entities - Jakarta EE CRUD API Tutorial

Full-Duplex Scalable Client-Server Communication with WebSockets and Spring Boot (Part I)

Best Performance Practices for Hibernate 5 and Spring Boot 2

create a simple @SpringData #JPA application using @intellijidea

Kubernetes Identity Management: Authentication. Key Features to Consider When Evaluating an Enterprise Kubernetes Solution.

Rancher vs. OKD

Project Calico and the Challenge of Cloud Native Networking

Develop Hundreds of Kubernetes Services at Scale with Airbnb

Re-Imagining Virtualization with Kubernetes and KubeVirt – Part II

Create a nested virtual machine in a Microsoft Azure Linux VM

Using Kubernetes ConfigMap Resources for Dynamic Apps

kubernetes trends

Tutorial: Explore Istio’s Traffic Rules and Telemetry Capabilities.

Upgrading your Cluster with Zero Downtime

masters are updated first, nodes follow

API Gateways and Service Meshes: Opening the Door to Application Modernisation

A lot of the multi-platform/hybrid cloud questions really revolve around what/where your control plane is

Enable Dynatrace OneAgent in Istio service mesh

Istio is a service mesh that supports running distributed microservice architectures. It’s a prominent vehicle that typically runs in Kubernetes to control inter-pod and inter-service traffic from Kubernetes workloads. For this, Istio uses Kubernetes Mutating Admission Webhooks for automatically injecting a sidecar proxy into pods.

Kubernetes and OpenShift Networking Primer

Goodbye AWS: Rolling your own servers with Kubernetes, Part 1. hn

OpenShift 4

49189 Running Kubernetes in Production: A Million Ways to Crash Your Cluster | DevOpsCon 2018

Container Design Patterns for Kubernetes, Part 1

Creating an Effective Developer Experience for @kubernetesio and Cloud-native Apps

Deploying HA PostgreSQL on OpenShift using Portworx

Deploying Docker Containers using an AWS CodePipeline for DevOps

Deploying a Haskell application to AWS Elastic Beanstalk

Docker on AWS - what is a difference between Elastic Beanstalk and ECS?. EKS vs. ECS: orchestrating containers on AWS. ECS Vs. EKS Vs. Fargate.

This shouldn’t have to be said, but do not put your Kubernetes API Server on the public internet.

Docker data science pipeline

A Practical kubernetes Operator using Ansible — an example

The 10 Kubernetes commandments

KubeCon EU 2019 "Securing Cloud Native Communication: From End User to Service"

Expanding the Kubernetes Operator Community

Kubernetes storage on Digital Ocean

AWS woes

the basics of stateful applications in kubernetes

openshift course

The Gorilla Guide to Kubernetes in the Enterprise — Chapter 2:

Powering Flexible Payments in the Cloud with Kubernetes. Reconciling Kubernetes and PCI DSS for a Modern and Compliant Payment System.

How Kubernetes and Configuration Management works

Configuration Best Practices. Labels and Selectors. Using labels effectively.

Openshift 4.1

How Canary Deployments Work, Part 1: Kubernetes, Istio and Linkerd

Pod Evictions based on Taints/Tolerations

cool projects right now

Isolating Linux containers with SDN

Kubernetes basics: Learn how to drive first

How to navigate the Kubernetes learning curve

Migrating From Self-Managed Kubernetes to AWS EKS Using Terraform at Blue Matador

Kubernetes: Long Label Names and UX

Creating a Killer Database Architecture with Kubernetes + MariaDB

Kubernetes Design Principles: Understand the Why

Helm Chart Patterns [I]

Persistent Storage with Kubernetes in Production - Which Solution and Why?

introduction to Kubernetes secrets and configmaps

the ability to leverage a canary rollout of the various control planes. more. testing upgrades.

Kubernetes failure stories

Machine API in Openshift 4

testing and local dev can be tricky at times

rethinking best practices. more recent tweet.

Reddit thread about OKD 4.1

Learn Openshift with Minishift

Kubernetes on CentOS 7 with Firewalld

Name Resolution Issue Due To Cache Inconsistencies In CoreDNS.

AWS's Outposts

version 1.15

future of CRD - schemas

I question, though, whether circuit breaking/timeouts/retries should be externalized (deferred) to the network.

life of a packet through istio

reliable AWS services

Regions provide physical mapping to the real world that allow you to deal with latency, compliance, failure domains, and data locality.

uses of daemonsets

Maintaining big Kubernetes environments with factories

k8s versus openshift thorough comparison

life of a packet through istio

Argo

https://argoproj.github.io/

https://itnext.io/argo-workflow-engine-for-kubernetes-7ae81eda1cc5 https://jaxenter.com/argo-workflow-engine-kubernetes-151694.html https://blog.argoproj.io/introducing-argo-a-container-native-workflow-engine-for-kubernetes-55c0b4b76fac https://fission.io/workflows/ https://kubernetes.io/blog/2018/06/28/airflow-on-kubernetes-part-1-a-different-kind-of-operator/ https://medium.com/@doronsegal/workflow-using-argo-kubernetes-6b45ef3f1614 https://containerjournal.com/topics/container-ecosystems/camunda-brings-workflow-engine-to-kubernetes/ https://www.youtube.com/watch?v=oXPgX7G_eow https://eksworkshop.com/batch/ https://blog.kintohub.com/how-do-we-ditch-jenkins-for-argo-1c0b4df5dab0 Why did we ditch Jenkins for Argo? https://applatix.com/introducing-argo-container-native-workflow-engine-kubernetes/ https://dzone.com/articles/parallel-workflows-on-kubernetes https://workflowengine.io/blog/does-it-make-sense-to-build-your-own-workflow-engine/ https://medium.com/@arik.cohen/lets-make-workflow-engines-fun-again-73c4ad5eb428 list of engines, good resource https://github.com/meirwah/awesome-workflow-engines https://zenaton.com/features/workflow-engine/ https://dzone.com/articles/workflow-management-how-build https://kissflow.com/workflow/workflow-engine-business-rule-engine-difference/ https://blog.bernd-ruecker.com/architecture-options-to-run-a-workflow-engine-6c2419902d91 https://camunda.com/solutions/add-workflow-software/ https://www.infoq.com/news/2009/07/WFEngine/ (2009) https://www.quora.com/What-should-I-use-to-weigh-the-decision-to-use-a-workflow-engine-and-build-workflow-into-our-in-house-application-vs-using-a-third-party-workflow-tool-such-as-Pipefy pachyderm/pachyderm#3345 kubeflow/kubeflow#376 https://siliconangle.com/2018/11/07/kubeflow-shows-promise-standardizing-ai-devops-pipeline/ https://www.kubeflow.org/docs/use-cases/gitops-for-kubeflow/ https://github.com/kubeflow/pipelines https://medium.com/kubeflow/kubeflow-in-2018-a-year-in-perspective-49c273b490f4 https://www.youtube.com/watch?v=zVTNobgvR9M Kuberflow + Argo https://blog.argoproj.io/using-gitops-to-deploy-kubeflow-with-argo-cd-76f6b27807c https://news.ycombinator.com/item?id=18425084 Google's new Kubeflow Pipelines service uses Argo. https://www.speechmatics.com/2019/01/argo-learn-all-about-the-kubernetes-workflow-engine/ http://dev.matt.hillsdon.net/2018/03/24/argo-integration-review.html Airflow: the future of data engineering https://news.ycombinator.com/item?id=13761071 https://www.youtube.com/watch?v=oXPgX7G_eow https://www.youtube.com/watch?v=VrsVbuo4ENE Compare to Apache Airflow argoproj/argo-workflows#849 https://github.com/argoproj/data-pipeline https://medium.com/@doronsegal/workflow-using-argo-kubernetes-6b45ef3f1614 https://www.astronomer.io/blog/using-apache-airflow-to-create-data-infrastructure/ https://towardsdatascience.com/data-pipelines-luigi-airflow-everything-you-need-to-know-18dc741449b7 https://stackoverflow.com/questions/57037302/apache-airflow-or-argoproj-for-long-running-and-dags-tasks-on-kubernetes https://medium.com/@dieswaytoofast/kubernetes-workflow-with-argo-74b776b252c1 https://github.com/brigadecore/brigade https://admiralty.io/blog/running-argo-workflows-across-multiple-kubernetes-clusters/ https://azure.microsoft.com/es-es/services/kubernetes-service/ https://www.youtube.com/watch?v=M_rxPPLG8pU https://towardsdatascience.com/data-engineering-basics-of-apache-airflow-build-your-first-pipeline-eefecb7f1bb9 https://www.youtube.com/watch?v=pKLPXA-gnvw https://www.youtube.com/watch?v=6eNiCLanXJY https://www.youtube.com/watch?v=43wHwwZhJMo what is a pipeline and what is a workflow? https://bioinformatics.stackexchange.com/questions/7347/what-is-the-difference-between-a-bioinformatics-pipeline-and-workflow https://www.lightbend.com/blog/how-to-deploy-kubeflow-on-lightbend-platform-openshift-support-components-kubeflow argo in OpenShift https://www.lightbend.com/blog/how-to-deploy-kubeflow-on-lightbend-platform-openshift-support-components-kubeflow https://github.com/argoproj/argo-ui https://fission.io/workflows/ https://kubernetes.io/blog/2018/06/28/airflow-on-kubernetes-part-1-a-different-kind-of-operator/ a comparison https://xunnanxu.github.io/2018/04/13/Workflow-Processing-Engine-Overview-2018-Airflow-vs-Azkaban-vs-Conductor-vs-Oozie-vs-Amazon-Step-Functions/ https://www.youtube.com/watch?v=pKLPXA-gnvw https://www.youtube.com/watch?v=qUwz20v7lcc https://www.youtube.com/watch?v=yXkLuPaLPoE architecture decisions https://blog.bernd-ruecker.com/architecture-options-to-run-a-workflow-engine-6c2419902d91 another comparison https://xunnanxu.github.io/2018/04/13/Workflow-Processing-Engine-Overview-2018-Airflow-vs-Azkaban-vs-Conductor-vs-Oozie-vs-Amazon-Step-Functions/ Kickoff Argo workflows via REST call https://stackoverflow.com/questions/54912490/kickoff-argo-workflows-via-rest-call https://github.com/argoproj/argo/blob/master/docs/rest-api.md#examples https://www.paradigmadigital.com/dev/apache-airflow/ Azure solution? https://azure.microsoft.com/en-us/services/logic-apps/ https://notetoself.tech/2018/04/08/logic-apps-x-microsoft-flow-which-one-should-i-choose/ more logic apps https://thenewstack.io/serverless-and-workflows-the-present-and-the-future/ even more logic apps https://xo.xello.com.au/blog/why-use-logic-apps-integration-platform https://kubernetes.io/docs/reference/using-api/api-concepts/ https://stackoverflow.com/questions/tagged/argoproj?tab=Votes https://opendatahub.io/news/2019-04-29/project-road-map-for-2019.html the Ceph data lake https://opendatahub.io/ https://www.youtube.com/watch?v=STh3F2g2gsM https://www.youtube.com/watch?v=eCGx8Y1qcmU https://www.openstack.org/assets/presentation-media/QCT-Lightning-Talk-Building-Big-Data-Analytics-Data-Lake-with-All-flash-Ceph.pdf https://devconfus2019.sched.com/event/RFDN/ml-pipelines-with-kubeflow-argo-and-open-data-hub The Open Data Hub (ODH) is a scalable data lake platform that provides tools such as distributed Spark and Ceph data store. https://bigdata.cioreview.com/cxoinsight/data-lake-building-a-bridge-between-technology-and-business-nid-24733-cid-15.html https://www.alibabacloud.com/help/doc-detail/119725.htm https://es.slideshare.net/inovex/data-science-und-machine-learning-im-kuberneteskosystem https://www.redhat.com/en/blog/why-spark-ceph-part-1-3 https://kubernetes.io/blog/2018/06/28/airflow-on-kubernetes-part-1-a-different-kind-of-operator/ https://go.qct.io/wp-content/uploads/2019/07/Real-time-Analytics-with-All-Flash-Ceph-Data-Lake-Architecture-EN_20180611.pdf Brigade https://brigade.sh/ https://cloudblogs.microsoft.com/opensource/2019/04/01/brigade-kubernetes-serverless-tutorial/ https://www.interline.io/blog/scaling-openstreetmap-data-workflows/ Azure blob storage artifact support

opinion on Argo. opinion on Airflow.

CI / CD pipelines now on digital ocean

Mistake that cost thousands (Kubernetes, GKE)

The fact that configmaps are not bound to a specific replicaset is one of Kubernetes' worst design decisions.

The shipwreck of GKE Cluster Upgrade

A Kubernetes/GKE mistake that cost me thousands of dollars

Kubernetes patterns - declarative deployments

Kubernetes from scratch to AWS with Terraform and Ansible (part 1)

uses of Ceph at CERN

k8s 1.16. hn.

"Managed Kubernetes" really runs the spectrum between "one step above just installing it yourself on a bunch of VMs" and "I spend 1% of my time managing anything below the product." Each cloud provider exists somewhere different on this spectrum, with none of them being in quite the same location, and some of them have multiple different products which exist at different points.

For example: AWS is among the most bare-bones. EKS is just a managed control plane; coming from GKE, you might click "create an cluster" then be very confused how there are no options for, say, instance size, or how many... because you have to do that all yourself. There are tools like eksctl or Rancher which can help with this, but ultimately, you're managing those instances. You're doing capacity planning (you think kube would be a great pick to integrate with spot fleets because of its ability to schedule and move workloads to a new instance when one goes down? have fun setting it up, hope you like ops work.). You're doing auto-scaling (and that ASG? its not going to know about your pod resource requests, so you either need some very smart manual coordination between the two, or you need to set up cluster-autoscaler). You're setting up cluster metrics (definitely need metrics-server. not heapster, that was last year, metrics-server is this year. but how to visualize? do i host grafana in the cluster? then i need to worry about authn. cloudwatch really isn't made for these kinds of things... maybe I'll just give datadog a few thousand bucks.) Crap, 1.16 is out already? They only support 9 months of releases with security updates?! I feel like I just upgraded my nodes! Oh well, time to lose a day replicating this update across all of my environments.

DigitalOcean is pretty similar to this (it does provision instances, but the tooling beyond that is barebones). Google Cloud/GKE is "more managed" in a few sense; the cloud dashboard provides some great management capabilities out-of-the-box, such that you may not need to reach for something like Datadog, and the autoscaler works really well without a lot of tinkering. There are still underlying instances, so you're worrying about ingress protection, OS hardening, OS upgrades, etc... but its not as bad as AWS. Not by a long shot.

Infrastructure Wars with Sheng Liang

Using a Kubernetes based Cluster for Various Services with auto HTTPS

Configuration management with Kubernetes

Write your own Kubernetes

Everything I know about Kubernetes I learned from a cluster of Raspberry Pis

The Focusing Illusion of Developer Productivity

Azure Functions Private Site Access

Java Application Optimization on Kubernetes on the Example of a Spring Boot Microservice

The Open Application Model from Alibaba's Perspective

Dhall for Kubernetes

Effective Management of APIs

Comparing Kubernetes CNI Providers: Flannel, Calico, Canal, and Weave

Understanding Modern Cloud Architecture on AWS

terraform modules

High-density Multi-tenant Bare-metal Cloud

AWS the main services

zero-downtime release

aws tagging best practices

good and bad monitoring

Managing the Risk of Cascading Failure

Using strongly-typed entity IDs to avoid primitive obsession. so answer. But how to implement this in JPA? Perhaps with a fake "composite" id class that only has one component? IdClass. How to create and handle composite primary key in JPA.

Openstreetmap

Why Use OpenStreetMap Instead of Google Maps? hn

https://operations.osmfoundation.org/policies/tiles/

https://wiki.openstreetmap.org/wiki/Slippy_Map

Slippy Map is, in general, a term referring to modern web maps which let you zoom and pan around (the map slips around when you drag the mouse).

https://www.mapbox.com/help/how-web-apps-work/

https://wiki.openstreetmap.org/wiki/Browsing https://wiki.openstreetmap.org/wiki/Main_Page https://wiki.openstreetmap.org/wiki/Tiles

square bitmap graphics displayed in a grid arrangement to show a map

https://switch2osm.org/ https://blog.openstreetmap.org/2018/06/20/switch2osm/

the switch2osm website, with up to date information on running your own OSM based services

Apart from very limited testing purposes, you should not use the tiles supplied by OpenStreetMap.org itself. OpenStreetMap is a volunteer-run non-profit body and cannot supply tiles for large-scale commercial use. Rather, you should use a third party provider that makes tiles from OSM data, or generate your own.

Serving your own maps is a fairly intensive task. Depending on the size of the area you’re interested in serving and the traffic you expect the system requirements will vary. In general, requirements will range from 10-20GB of storage, 4GB of memory, and a modern dual-core processor for a city-sized region to 300GB+ of fast storage, 24GB of memory, and a quad-core processor for the entire planet.

We would recommend that you begin with extracts of OpenStreetMap data – for example, a city, county or small country – rather than spending a week importing the whole world (planet.osm) and then having to restart because of a configuration mistake!

Geofabrik

Mapnik

Meteogalicia

RSS, GeoRSS, Podcast e JSON

cargo.txt

Save RSS and Atom!

Solving JVM Performance Problems with Profilers: Wallclock vs CPU Time Edition

Modern SAT solvers: fast, neat and underused (part 1 of N) HN

The Ultimate JSON Library: JSON.simple vs GSON vs Jackson vs JSONP

The Usefulness of Abstracting Over Time

How to Improve the Performance of a Java Application

Host your blog on DigitalOcean with Docker, Nginx and Let’s Encrypt. hn.

automated formatting

gitops.

Kubernetes: The Surprisingly Affordable Platform for Personal Projects. lobsters.

SonarJS: Detect runtime exceptions in JavaScript

Jackson JSON Views.

Subtyping vs. Parametrization for a Complex Domain.

Introduction to Linux interfaces for virtual networking

Command-line links https://lobste.rs/s/5azafh/how_i_m_still_not_using_guis_2019_guide https://lucasfcosta.com/2019/02/10/terminal-guide-2019.html http://ballingt.com/rich-terminal-applications-2/ Richer command line interfaces https://lobste.rs/s/hqui1o/richer_command_line_interfaces https://lobste.rs/s/ycvcsw/hard_part_becoming_command_line_wizard https://www.johndcook.com/blog/2019/02/18/command-line-wizard/

kubernetes cluster networking

SAT / SMT by example

Kubernetes Borg/Omega history topic

3: Annotations. Borg's Job type had a single notes field. Like the DNS TXT record, that proved insufficient. For example, layers of client libraries and tools wanted to attach additional information.

Don’t read your data from a straw

podman instead of docker

Expressing Business Flows using an F# DSL

AWS costs

Gallery of Processor Cache Effects

f 4.6

modern SAT solvers are underused

implementing a draft mode for posts

Tracking the weather with Python and Prometheus

s3 & cloudflare

aws costs

kube complexity

Monitoring and Observability with USE and RED.

Introducing Traffic Director: Google's Service Mesh Control Plane.

PCI compliance faq. PCI. A guide to PCI compliance. pci requirements. hn. another story.

How I draw figures for my mathematical lecture notes using Inkscape

robust user interfaces with state machines

Streaming Java CompletableFutures in Completion Order.

On lists, cache, algorithms, and microarchitecture

I/O Is Faster Than CPU – Let’s Partition Resources and Eliminate OS Abstractions hn

Data Structures for Range Minimum Queries in Multidimensional Arrays. Segment Tree | Set 2 (Range Minimum Query). Assignment 1: Range Minimum Queries. Multidimensional segment trees can do range queries and updates in logarithmic time. A Simple Linear-Space Data Structure for Constant-Time Range Minimum Query?. CP-algorithms - sparse table. segment trees.

Automated Refactoring of a U.S. Department of Defense Mainframe to AWS

SRE stuff. more. more.

hoverfly tutorial

Kotlin coroutines android

Self-Hosting Your Own Cloud – Part 2: SMB File Server with Automated Backups using Rsync/Rclone

kythe and semantic.

The configuration complexity clock. dhall. jsonnet. json not a good conf language. just use a programming language. safety garantees. At what point does a config file become a programming language?.

naming convention

Effective problem solving using SAT solvers

Once the problems have been encoded into Boolean logic, solutions can be found (or shown to not exist) automatically, without the need to implement any search algorithm.

The Trouble with Memory

Computer Architecture – ETH Zürich – Fall 2019

debugging stories. awesome LD_PRELOAD.

Hedgehog for state machine testing

The builder pattern https://blog.ploeh.dk/2017/08/21/generalised-test-data-builder/ Test Data Builders in C# https://blog.ploeh.dk/2017/08/15/test-data-builders-in-c/ https://blog.ploeh.dk/2020/02/10/builder-isomorphisms/

Roadmap to becoming a React developer in 2018 hn

Using Redux actions

Make your PWA work offline I

How to visually design state in JavaScript

Redux or ES6?

Offlinefähige Desktopanwendungen mit Angular und Electron

Redux vs. The React Context API

The Terrible Performance Cost of CORS Request on the Single-Page Application. hn.

removing jquery from GitHub. blog.

Creating a Drag-and-Drop File Uploader with React & TypeScript

reactive timed popup

Introducing Hooks. HN. tweet.

React Today and Tomorrow and 90% Cleaner React.

The React hooks proposal shows how to express several existing concepts that are currently bulky (React component declaration, state, context, lifecycle) using only functions

React Component Patterns by Michael Chan

Vue 3.0 Contains info about suing JSX without webpack. gist.

if you are referring to React when mentioning webpack / transliteration, i'd like to mention that you don't need them for React neither. It can be resumed to a simple script tag as well.

Could you paste the simple script tag that makes React work without transpilation? I always thought you needed Webpack (or an equivalent) to transpile JSX into a vanilla js function the browser can interpret.

No, I meant simply adding a tag and using react directly, in any context. That's how it was originally designed, btw. It wasn't meant only for SPAs when it was conceived, but as an addon to existing websites. The other comment gives a perfect example and there are a few tutorials (although I do agree not very mainstream) that teach React without JSX/Webpack/Babel/etc.

There's also a babel script that you can drop into a script tag and will transpile stuff in the browser for you if you want JSX. Not recommended for large projects in production (but then I'd be using webpack or similar for large vue projects too), but it works pretty well (and surprisingly fast) for quick experiments.

React.Component vs React.createClass.

React.createClass versus extends React.Component.

Two ways to do the same thing. Almost. React traditionally provided the React.createClass method to create component classes, and released a small syntax sugar update to allow for better use with ES6 modules by extends React.Component, which extends the Component class instead of calling createClass.

For the React changes, we now create a class called “Contacts” and extend from React.Component instead of accessing React.createClass directly, which uses less React boilerplate and more JavaScript. This is an important change to note further changes this syntax swap brings.

ECMAScript 6 modules: the final syntax.

The default export is actually just a named export with the special name default.

In current JavaScript module systems, you have to execute the code in order to find out what the imports and exports are. That is the main reason why ECMAScript 6 breaks with those systems: by building the module system into the language, you can syntactically enforce a static module structure. Let’s first examine what that means and then what benefits it brings.

Making Sense of React Hooks.

Why mixins are broken

If several components used this mixin to subscribe to a data source, a nice way to avoid repetition is to use a pattern called “higher-order components”. It can sound intimidating so we will take a closer look at how this pattern naturally emerges from the component model.

Mixins Are Dead. Long Live Composition.

render props, mixins, hocs...

5 common practices that you can stop doing in React.

How Does setState Know What to Do?. HN.

Why Do React Hooks Rely on Call Order?. HN.

decluttering a React application. reddit.

what is the shadow DOM?. reddit

Understanding JavaScript Modules As A TypeScript User.

Ask HN: Go-to web stack today?

Making SetInterval Declarative with React Hooks more

Spring Framework 5 (Boot/Cloud) + React?.

new es2018 features

hooks example

scheduling in react

useReducer

useEffect

Architecting UIs for Change

TypeScript for Enterprise Developers

react unpopular opinions

Dilemmas With React Hooks - Part 2: Persistence And Memoization. hn.

web components

biggest lies about react hooks

Comparing JVM alternatives to JavaScript. hn.

binding actions creators - doesn't make much sense with hooks

Deeply Understanding JavaScript Async and Await with Examples

hooks tip

React articles from Google

Typescript & React: Manipulating Prop Types

RxJS: A Better Way to Write Front-end Applications. RxJS behaviour subjects

Typescript 3.5

react from vue

pitfalls adopting react hooks

unnecessary rerenders

React testings vs. end-to-end testing

You Probably Don't Need Derived State

The modern PWA worksheet

React and PureScript

reusable componentes using React

adopting typescript at scale

improving your react with typescript ADTs

React + Redux + Typescript

smells in react apps

Fantastic Front-End Performance Tricks

pseudo-elements https://css-tricks.com/a-little-reminder-that-pseudo-elements-are-children-kinda/

A Simple, Understandable Production Ready Frontend Project Setup

Facebook GraphQL interview

State Management for React Using Context and Hooks

algebraic effects

theming with react and sass

Programming the Cloud with TypeScript

People keep asking if Hooks can replace Redux

Using React Hooks to Wrap Connectors to Live Data Sources

Loading States in React Components Using TypeScript’s Discriminated Unions

using typescript like a pro

react interview

testing react with jest and enzyme

using Typescript with React

JavaScript: The Modern Parts

The metaphysics of Javascript. Deconstructing Web frameworks for a more resilient code base

react hooks pitfalls

Using React Hooks & Context to Avoid Adding Redux Too Early

frustrations with React hooks

Using React Hooks & Context to Avoid Adding Redux Too Early

Thinking in React Hooks

Chart.js

aha moment with react hooks

Building a commentary sidebar in React

build your own React

react unit testing

React Table is a “headless” UI library

One of the many problems with this data fetching approach is that the cache is too local

useState with useReducer

thinking in react hooks

newbie confusion. difficult things

TS tricks for React

My browser does what?

classes vs hooks

styled components

The Many Jobs of JS Build Tools

testing react applications

advanced PWAs

CSS options poll

Replacing Redux with observables and React Hooks

Things I wish I knew about state management when I started writing React apps

SSR menagerie

How to Scale a React Component

Create dynamic reducers by passing values or functions in as an argument

Persisting React State in LocalStorage

a possible approach to leveraging remoteData in React with hooks and TypeScript

a possible approach to leveraging remoteData in React with hooks and TypeScript

es6 import for side effects meaning

Import an entire module for side effects only, without importing anything. This runs the module's global code, but doesn't actually import any values.

ES6 Module Gotchas

If you will have side-effects, separate them and load them in a module with short syntax.

ES modules: A cartoon deep-dive

The final step is filling in these boxes in memory. The JS engine does this by executing the top-level code — the code that is outside of functions.

Besides just filling in these boxes in memory, evaluating the code can also trigger side effects. For example, a module might make a call to a server.

This is one reason to have the module map. The module map caches the module by canonical URL so that there is only one module record for each module. That ensures each module is only executed once. Just as with instantiation, this is done as a depth first post-order traversal.

testing React apps

useReducer > useState

typescript without typescript

manage html dom with vanilla javascript

react mental models

method references & bind. Can you bind 'this' in an arrow function?.

the magic of static workflows

The Many Jobs of JS Build Tools

source maps from top to bottom

when does react re-render?

things to know about react

How We Reduced Our React App’s Load Time by 60%

React-query and swr

  • Should I cache data on the client for a certain period?
  • Should I load fresh data when the tab is refocused, or the network reconnects?
  • Should I retry failed HTTP calls?
  • Should I return cached data, then fetch fresh data behind the scenes?
  • Should I handle server cache separately from app state?
  • Should I avoid refetching recently fetched data?
  • Should I prefetch data the user is likely to want?

Common mistakes writing React components with hooks .

A React implementation of Spectrum, Adobe’s design system

render props are not dead Exploring Render Props Vs. React Hooks In 2020 . Using custom hooks in place of "render props"

modern forms in react

cancelling promises with hooks https://reactjs.org/docs/hooks-state.html https://juliangaramendy.dev/use-promise-subscription/ https://medium.com/@rajeshnaroth/writing-a-react-hook-to-cancel-promises-when-a-component-unmounts-526efabf251f https://dev.to/rodw1995/cancel-your-promises-when-a-component-unmounts-gkl https://www.reddit.com/r/reactjs/comments/blhj2b/how_do_i_cancelignore_previously_running_promises/ https://itnext.io/introduction-to-abortable-async-functions-for-react-with-hooks-768bc72c0a2b https://codesandbox.io/s/useeffect-react-hooks-cancel-promise-h6dcw Lemoncode/react-hooks-by-example#4 https://stackoverflow.com/questions/49906437/how-to-cancel-a-fetch-on-componentwillunmount https://github.com/microsoft/PowerBI-JavaScript/wiki/Bootstrap-For-Better-Performance https://github.com/microsoft/PowerBI-client-react#powerbi-client-react

https://react-query.tanstack.com/

Shamir's secret sharing hn

Commitment scheme

Coin flipping by telephone a protocol for solving impossible problems slides

Zero-knowledge proof software tweet

Create a VPN-Secured VPC With Packer and Terraform.

How to run database integration tests 20 times faster.

In-memory databases such as H2, HSQLDB, and Derby are great to speed up integration tests. Although most database queries can be run against these in-memory databases, many enterprise systems make use of complex native queries which can only be tested against an actual production-like relational database.

Stuff about testing persistence repositories

https://softwareengineering.stackexchange.com/questions/301479/are-database-integration-tests-bad https://softwareengineering.stackexchange.com/questions/185326/why-do-i-need-unit-tests-for-testing-repository-methods https://softwareengineering.stackexchange.com/questions/348661/should-i-use-a-layer-between-service-and-repository-for-a-clean-architecture-s https://softwareengineering.stackexchange.com/questions/111193/hooking-up-a-business-layer-and-repository-using-unit-of-work-pattern?rq=1 https://softwareengineering.stackexchange.com/questions/294561/repository-pattern-with-service-layer-too-much-separation?rq=1 https://softwareengineering.stackexchange.com/questions/282033/how-do-you-scale-your-integration-testing/283067#283067 https://softwareengineering.stackexchange.com/a/283067/76774 fake repositories. https://www.baeldung.com/spring-boot-testing https://grokonez.com/testing/datajpatest-with-spring-boot

By default, @DataJpaTest will configure an in-memory embedded database, scan for @Entity classes and configure Spring Data JPA repositories. It is also transactional and rollback at the end of each test.

https://blog.philipphauer.de/dont-use-in-memory-databases-tests-h2/ https://www.baeldung.com/spring-testing-separate-data-source https://www.baeldung.com/spring-jpa-test-in-memory-database https://medium.com/@joeclever/integration-testing-multiple-datasources-in-spring-boot-and-spring-data-with-spock-f88e1428ce9f https://medium.com/@harittweets/how-to-connect-to-h2-database-during-development-testing-using-spring-boot-44bbb287570 https://vladmihalcea.com/how-to-run-database-integration-tests-20-times-faster/ https://memorynotfound.com/unit-test-jpa-junit-in-memory-h2-database/ https://stackoverflow.com/questions/42943447/spring-boot-integration-test-with-h2-inmemory-database

In-memory database tests with Querydsl

Fortunately the EntityQueries abstraction is very easy to implement using POJO in-memory collections.

Test Doubles — Fakes, Mocks and Stubs

Fakes are objects that have working implementations, but not same as production one. Usually they take some shortcut and have simplified version of production code.

An example of this shortcut, can be an in-memory implementation of Data Access Object or Repository. This fake implementation will not engage database, but will use a simple collection to store data. This allows us to do integration test of services without starting up a database and performing time consuming requests.

Testing Real Repositories. When unit testing, do you have to use a database to test CRUD operations?. Data Access Component Testing Redux.

The most important thing to keep in mind is to avoid the temptation to create a General Fixture with some 'representative data' and attempt to reuse this across all tests. Instead, you should fill in data as part of each test and clean it up after.

https://news.ycombinator.com/item?id=18740246 Turning GraphQL diagrams to mock back end

farewell to fsync. lobsters.

verified fakes

From interaction-based to state-based testing. mocks for commands, stubs for queries. State vs Interaction Based Testing. An example of interaction-based testing in C#.

property-based testing. Building on developers' intuitions to create effective property-based tests.

don'w write tests. Find the best properties for Property Based Testing. Introduction to Property Based Testing. Building on developers' intuitions

testing in production. Testing in Production, the safe way.

JUnit 5: The Next Step in Automated Testing.

about consumer-driven contracts

Nicolas Frankel on application security, integration testing, Kotlin and more

Hypothesis for web developers. tweet.

80% CODE COVERAGE IS NOT ENOUGH.

Integrated versus Manual Shrinking

(State machine testing - LambdaJam 2018)[https://twitter.com/Jose_A_Alonso/status/1129325840662224902].

Testing Java Microservices: From Development to Production

nines

cloud native observability

test data generation with faker

docker for integration tests

most unit testing is a waste

storybook and react-testing

How to Specify it! A Guide to Writing Properties of Pure Functions

This is the most obvious approach to writing properties—to replicate the implementation in the test code—and it is deeply unsatisfyin

Real World Scenario Testing using Azure DevOps and automated UI tests

test telemetry. event log

state machine testing with Hedgehog. scala

Hedgehog vs Quickcheck

GOTO 2019 • Millisecond Full Stack Acceptance Tests • Aslak Hellesøy

runtime monitoring

Better Integration Tests for Performance Monitoring

Thoughts on efficient enterprise testing (1/6). 2.

tests that touch files

Testing in Production: the hard parts

testing sql

We have a ton of unit tests covering expected API/parser/renderer input/output pairs, but most functionality is covered in ~1000 integration tests running ~20k SQL queries against each supported RDBMS, mostly in-memory or Docker, sometimes VMWare run database instances.

Automation testing is not working disagree

Testing Microservices, the sane way

Conventional wisdom says you need a comprehensive set of regression tests to go green before you release code

Minimizing real-time prediction serving latency in machine learning https://cloud.google.com/solutions/machine-learning/minimizing-predictive-serving-latency-in-machine-learning

You have too many entities (high cardinality), which makes it challenging to precompute prediction in a limited amount of time. An example is forecasting daily sales by item when you have hundreds of thousands or millions items. In that case, you can use a hybrid approach, where you precompute predictions for the top N entities, such as for the most active customers or the most viewed products. You can then use the model directly for online prediction for the rest of the long-tail entities.

https://liqixu.github.io/papers/needletail-hilda.pdf Optimally Leveraging Density and Locality for Exploratory Browsing and Sampling

, we would need B+Trees on every single attribute or combination of attributes

druid

At the time, we were handling approximately 100 millions events per day, and some of our reports were taking 30 seconds to generate. We currently handle billions of events per day, and the reporting takes less than 1 second most of the time.

https://hevodata.com/blog/druid-vs-redshift-data-warehouse/ https://towardsdatascience.com/introduction-to-druid-4bf285b92b5a

dynamo db

With NoSQL, it is best practice to precalculate aggregates values out of band, and store them back into the table as a single item for quick retrieval.

There are many data enrichment use cases that would fit this model.

Why you should use a relational database instead of NoSQL for your IoT application

Do we need pre-computed aggregates?

When querying time series data, resolution refers to the number of data points for a given time range. The highest resolution would provide every available data point for a time range. So if I want a query to use the highest resolution and if there are 100 data points, then the query result should include every one of those 100 points.

As the number of data points increases, providing results at higher resolutions becomes less effective. For instance, increasing the resolution to the point where a graph in the UI includes 1 million points is probably no more effective than if the graph included only 10,000 or even 1,000 data points. The higher resolution could degrade user experience as rendering time increases. Latency on server response time is also likely to increase.

Pre-computed aggregation is the process of continually downsampling a time series and storing the lower resolution data for future analysis or processing. Pre-computed aggregates are often combined with data expiration/retention policies to address the aforementioned storage problem. Higher resolution data is stored for shorter periods of time than lower resolution data. Pre-computed aggregation can also alleviate the CPU utilization and latency problems. Instead of downsampling 1 million data points, we can query the pre-computed aggregated data points and perform downsampling on 10,000 data points.

time series databases to watch

I have used TimescaleDB for several purposes. As it is built on top of Postgres, all the existing tools, libraries and processes work out of the box. This is a huge advantage if you are operating Postgres anyway: Your existing backup tools will work, as does your user Managment.

We ingest a lot of time series (IoT) data and use Postgres for other data so Timescale works quite well for us. One thing Timescale treats as second class citizen though is updates to existing data points which are 1-2 areas of magnitude slower than inserts. Granted this is also the case for all other TSDB solutions out there which are for obvious reasons optimized for inserts and reads aggregated along the time dimension. Still would be amazing if you could add to the already existing differentiation of allowing fast updates for cases like ours where we dont store events relating to a singular point in time but rather time-spans so new incoming data points might be "merged" into existing time-spans.

Narrow-table Model

In this model, each metric/tag-set combination is considered an individual "time series" containing a sequence of time/value pairs.

Using our example above, this approach would result in 9 different "time series", each of which is defined by a unique set of tags.

The number of such time series scales with the cross-product of the cardinality of each tag, i.e., (# names) × (# device ids) × (# location ids) × (device types). Some time-series databases struggle as cardinality increases, ultimately limiting the number of device types and devices you can store in a single database.

TimescaleDB supports narrow models and does not suffer from the same cardinality limitations as other time-series databases do. A narrow model makes sense if you collect each metric independently. It allows you to add new metrics as you go by adding a new tag without requiring a formal schema change.

TimescaleDB easily supports wide-table models. Queries across multiple metrics are easier in this model, since they do not require JOINs. Also, ingest is faster since only one timestamp is written for multiple metrics.

Of course, this is not a new format: it's what one would commonly find within a relational database.

Relational Database schema design for metric storage good SO answer

Schema Design for Time Series Data

For time series, you should generally use tall and narrow tables. This is for two reasons: Storing one event per row makes it easier to run queries against your data.

Timeseries: How long can the elephant remember?

Wide narrow data

BigQuery Best Practices For High Performance ETL

Continuous Queries in InfluxDB – Part I. more. Under the hood with Continuous Queries – Part II. downsampling and retention. Resolution 1: Downsample to get your database in shape.

Queries returning aggregate, summary, and computed data are frequently used in application development. For example, if you’re building an online dashboard application to report metrics, you probably need to show summary data. These summary queries are generally expensive to compute since they have to process large amounts of data, and running them over and over again just wouldn’t scale. Now, if you could pre-compute and store the aggregates query results so that they are ready when you need them, it would significantly speed up summary queries in your dashboard application, without overloading your database. Enter InfluxDB’s continuous queries feature!

Series cardinality is the number of unique database, measurement, and tag set combinations in an InfluxDB instance. We talk about it quite a bit because extremely high series cardinality can kill your InfluxDB process.

Using Data Transformations for Low-latency Time Series Analysis

While a row can have arbitrary number of fields, we encourage users to define their table schema as narrow tables, such as the OpenTSDB [8] table format: {metric, tags, time, value},

Is the EAV model still a decent way to store misc model data?

JSON columns. The peformance issues where fixed so there's no reason for EAV anymore

Re-architecting Slack’s Workspace Preferences: How to Move to an EAV Model to Support Scalability

Failed Solution II: Pre-compute the World in NoSQL

In short, we took all of our data and pre-computed aggregates for every combination of dimensions. At query time we need only locate the specific pre-computed aggregate and and return it: an O(1) key-value lookup. This made things fast and worked wonderfully when we had a six dimension beta data set. But when we added five more dimensions – giving us 11 dimensions total – the time to pre-compute all aggregates became unmanageably large (such that we never waited more than 24 hours required to see it finish).

So we decided to limit the depth that we aggregated to. By only pre-computing aggregates of five dimensions or less, we were able to limit some of the exponential expansion of the data. The data became manageable again, meaning it only took about 4 hours on 15 machines to compute the expansion of a 500k beta rows into the full multi-billion entry output data set.

What distinguishes the time series workload?

With time series databases, it’s common to keep high precision data around for a short period of time. This data is aggregated and downsampled into longer term trend data. This means that for every data point that goes into the database, it will have to be deleted after its period of time is up. This kind of data lifecycle management is difficult for application developers to implement on top of regular databases. They must devise schemes for cheaply evicting large sets of data and constantly summarizing that data at scale. With a Time Series Database, this functionality is provided out of the box.

How To Resample and Interpolate Your Time Series Data With Python. more

Downsampling: Where you decrease the frequency of the samples, such as from days to months.

Downsampling reduces the number of samples in the data. During this reduction, we are able to apply aggregations over data points.

on time series

Very often TS data is used to generate charts. This is an artifact of the human brain being spectacularly good at interpreting a visual representation of a relationship between streams of numbers while nearly incapable of making sense of data in tabular form. When plotting, no matter how much data is being examined, the end result is limited to however many pixels are available on the display. Even plotting aside, most any use of time series data is in an aggregated form.

Implementing Multidimensional Data Warehouses into NoSQL

mondrian aggregate tables

Time Series Aggregate Store. Read Time Series Data for Multiple Property Types of a Thing Applying M4 Algorithm

The Time Series Aggregate Store stores the pre-calculated aggregates for the time series data of Things

With this service, you can read the time series data for multiple property types of the specified thing by applying the M4 algorithm.

M4: A Visualization-Oriented Time Series Data Aggregation

M4 Aggregation. M4 is a composite value-preserving aggregation (see Section 4.1) that groups a time series relation into w equidistant time spans, such that each group exactly corresponds to a pixel column in the visualization. For each group, M4 then computes the aggregates min(v), max(v), min(t), and max(t) – hence the name M4

A Review of Aggregation Algorithms for the Internet of Things. Data aggregation mechanisms in the Internet of things. Efficiently Validating Aggregated IoT Data Integrity. Comparison of Data Aggregation Techniques in Internet of Things (IoT)

FAQs for Big Data & Analytics on Timeseries within SAP IoT Application Enablement (Leonardo Foundation)

Storing time-series data, relational or non?. . Is there a powerful database system for time series data? [closed]. storing massive ordered time series data in bigtable derivatives. Database and large Timeseries - Downsampling - OpenTSDB InfluxDB Google DataFlow

typical evaluations are hard to formulate in SQL and slow in the execution. E.g. find the maximum value with time stamp per 15 minutes for all measurements during the last month.

The need is since we can query vast amount of data, for instance a year, if the DB downsample at the query and is not pre-computed, it may take a very long time.

As well, downsampling needs to be "updated" when ever "delayed" datapoint are added.

For time series databases period aggregations (aka downsampling, averaging, summarization, etc) is one of standard use cases. They are all pretty good at it, at least the basics - avg, min, max, percentiles, first/last, etc. opentsdb for instance reads raw data and returns aggregates and then queues these aggregates for re-use. This is how it works last time I checked.

Time-series data: Why (and how) to use a relational database instead of NoSQL

time aggregation - DB2

Time aggregation is the aggregation of all data points for a single resource over a specified period (the granularity). Data aggregations in Resource Time Series reports are of the time aggregation type.

The result of the aggregation is one data point that reflects a statistical view of the collected and aggregated data points. For example, average, minimum, maximum, sum, or count. Typically, multiple aggregated data points are presented in a report for a given reporting period.

Benchmarking Time Series Databases with IoTDB-Benchmark for IoT Scenarios

What time series database can support high cardinality?

Procella: unifying serving and analytical data at YouTube

"For real-time tables, the user can also specify how to age-out, down-sample or compact the data"

Influxdb use case

Datadog

Why You Should NOT be Using an RDBS for Tme-Stamped Data

Redis and Grafana for real-time analytics

GOTO 2019 • Temporal Modelling • Mathias Verraes

high cardinality data and stuff

Influxdb on Kubernetes - should I be doing this?

Apache Druid vs. Time-Series Databases

from batch to streaming to both

How to Get Started Using CrateDB and Grafana to Visualize Time-Series Data

things we learned about sums

MetricsDB: TimeSeries Database for storing metrics at Twitter

clickhouse

ClickHouse: New Open Source Columnar Database

clickhouse

interview about clickhouse

Raspberry Pi IoT: Sensors, InfluxDB, MQTT, and Grafana

How Netflix uses Druid

Handling Real-Time Updates in ClickHouse

Mutable data is generally unwelcome in OLAP databases.

Under the pressure of GDPR, requirements the ClickHouse team delivered UPDATEs and DELETEs in 2018.

ClickHouse as an alternative to Elasticsearch for log storage and analysis

grafana datasources

grafana datasources

Timescaledb

Zabbix, Time Series Data and TimescaleDB. hn.

By clever sharding, you can work around the performance issues somewhat but it'll never be as efficient as an OLAP column store like ClickHouse or MemSQL:

Timestamps and metric values compress very nicely using delta-of-delta encoding.

Compression dramatically improves scan performance.

Aligning data by columns means much faster aggregation. A typical time series query does min/max/avg aggregations by timestamp. You can load data straight from disk into memory, use SSE/AVX instructions and only the small subset of data you aggregate on will have to be read from disk.

PG Partition Manager

timescale multi cloud

Building a distributed time-series database on PostgreSQL

TimescaleDB adds native compression for any PostgreSQL type

Reading this post is frustrating. What they are describing is where column store databases were 20 years ago. Perhaps at some point the folks at TimescaleDB will read Daniel Abadi’s 2008 paper, which describes the key elements of how all modern column stores work: http://db.csail.mit.edu/pubs/abadi-column-stores.pdf

The key takeaway is that columnar compression only accounts for a small minority of the speed up that you get for scan-oriented workloads; the real big win comes when you implement a block-oriented query processor and pipelined execution. Of course you can’t do this by building inside the Postgres codebase, which is why every good column store is built more or less from scratch.

Anyone considering a “time series database” should first set up a modern commercial column store, partition their tables on the time column, and time their workload. For any scan-oriented workload, it will crush a row store like Timescale.

Multi-node TimescaleDB is now free

ListenBrainz moves to TimescaleDB

TimescaleDB vs. Amazon Timestream

graphite

whisper

Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data.

opntsdb

opend tdsd downsampling. Rollup And Pre-Aggregates. aggregators. Understanding Metrics and Time Series. Rollup And Pre-Aggregates IMPORTANT.

Downsampling (or in signal processing, decimation) is the process of reducing the sampling rate, or resolution, of data. For example, lets say a temperature sensor is sending data to an OpenTSDB system every second. If a user queries for data over an hour time span, they would receive 3,600 data points, something that could be graphed fairly easily. However now if the user asks for a full week of data they'll receive 604,800 data points and suddenly the graph may become pretty messy. Using a downsampler, multiple data points within a time range for a single time series are aggregated together with a mathematical function into a single value at an aligned timestamp. This way we can reduce the number of values from say, 604,800 to 168.

When storing rollups, it's best to avoid functions such as average, median or deviation. When performing further downsampling or grouping aggregations, such values become meaningless. Instead it's much better to always store the sum and count from which, at least, the average can be computed at query time. For more information, see the section below.

OpenTSDB was designed to efficiently combine multiple, distinct time series during query execution. The reason for this is that when users are looking at their data, most often they start at a high level asking questions like "what is my total throughput by data center?" or "what is the current power consumption by region?". After looking at these high level values, one or more may stick out so users drill-down into more granular data sets like "what is the throughput by host in my LAX data center?". We want to make it easy to answer those high level questions but still allow for drilling down for greater detail.

But how do you merge multiple individual time series into a single series of data? Aggregation functions provide the means of mathematically merging the different time series into one. Filters are used to group results by tags and aggregations are then applied to each group. Aggregations are similar to SQL's GROUP BY clause where the user selects a pre-defined aggregation function to merge multiple records into a single result. However in TSDs, a set of records is aggregated per timestamp and group.

This document focuses on how aggregators are used in a group by context, i.e. when merging multiple time series into one. Additionally, aggregators can be used to downsample time series (i.e. return a lower resolution set of results). For more information, see Downsampling.

While rollups help with wide time span queries, you can still run into query performance issues with small ranges if the metric has high cardinality (i.e. the unique number of time series for the given metric). In the example above, we have 4 web servers. But lets say that we have 10,000 servers. Fetching the sum or average of interface traffic may be fairly slow. If users are often fetching the group by (or some think of it as the spatial aggregate) of large sets like this then it makes sense to store the aggregate and query that instead, fetching much less data.

Notice that these time series have dropped the tags for host and interface. That's because, during aggregation, multiple, different values of the host and interface have been wrapped up into this new series so it no longer makes sense to have them as tags. Also note that we injected the new _aggregate tag in the stored data. Queries can now access this data by specifying an _aggregate value.

While pre-aggregates certainly help with high-cardinality metrics, users may still want to ask for wide time spans but run into slow queries. Thankfully you can roll up a pre-aggregate in the same way as raw data. Just generate the pre-aggregate, then roll it up using the information above.

One method that is commonly used by other time series databases is to read the data out of the database after some delay, calculate the pre-aggs and rollups, then write them. This is the easiest way of solving the problem and works well at small scales. However there are still a number of issues:

How to Handle the Influx of Data

timescale cloud

bitemporal data

https://martinfowler.com/bliki/DataLake.html

It is important that all data put in the lake should have a clear provenance in place and time. Every data item should have a clear trace to what system it came from and when the data was produced. The data lake thus contains a historical record. This might come from feeding Domain Events into the lake, a natural fit with Event Sourced systems. But it could also come from systems doing a regular dump of current state into the lake - an approach that's valuable when the source system doesn't have any temporal capabilities but you want a temporal analysis of its data. A consequence of this is that data put into the lake is immutable, an observation once stated cannot be removed (although it may be refuted later), you should also expect ContradictoryObservations.

https://martinfowler.com/bliki/ContradictoryObservations.html https://blog.bi-geek.com/arquitectura-bi-introduccion-al-data-lake/ https://tdwi.org/articles/2017/12/04/arch-all-data-time-and-the-data-lake.aspx

Business state data tagged with business and DBMS start and end timestamps is called bitemporal data. This structure is one of the most useful ways that relational database designers record and manage time-related data. Several databases, including IBM DB2 and Teradata, have included internal support for bitemporal data since early this decade.

Bitemporality is at the heart of data warehouse consistency and enables operational systems to manage the creation of state data from time series. As relational databases are the basis of both operational systems and data warehouses, extensive design and development effort has been expended over the decades to handle time properly in this environment.

“The lake's one-dimensional time series approach can give rise to significant implementation challenges in more complex data warehouse use cases.” https://tdwi.org/articles/2017/12/04/arch-all-data-time-and-the-data-lake.aspx

https://www.elsevier.com/books/bitemporal-data/johnston/978-0-12-408067-6 https://www.dataversity.net/bitemporal-data-modeling-learn-history/ https://martinfowler.com/eaaDev/timeNarrative.html https://en.wikipedia.org/wiki/Temporal_database

Bi-Temporal[edit]

A bi-temporal database has two axis of time.

valid time. transaction time or decision time.

https://www.marklogic.com/blog/bitemporal/ https://www.sciencedirect.com/topics/computer-science/bitemporal-data https://www.sciencedirect.com/science/article/pii/B9780123750419000029 https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0058476.html

A bitemporal table is a table that combines the historical tracking of a system-period temporal table with the time-specific data storage capabilities of an application-period temporal table. Use bitemporal tables to keep user-based period information as well as system-based historical information.

Bitemporal tables behave as a combination of system-period temporal tables and application-period temporal tables. All the restrictions that apply to system-period temporal tables and application temporal tables also apply to bitemporal tables.

https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0058481.html https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/r0052344.html

http://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/ Let’s Stop Ascribing Meaning to Code Points.

Breaking Our Latin-1 Assumptions http://manishearth.github.io/blog/2017/01/15/breaking-our-latin-1-assumptions/

UAX #29: Unicode Text Segmentation https://unicode.org/reports/tr29/

https://github.com/unicode-rs/unicode-segmentation https://github.com/unicode-rs/unicode-segmentation

Iterators which split strings on Grapheme Cluster or Word boundaries, according to the Unicode Standard Annex #29 rules.

http://site.icu-project.org/home

ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.

http://userguide.icu-project.org/

http://userguide.icu-project.org/boundaryanalysis

http://userguide.icu-project.org/boundaryanalysis#TOC-Character-Boundary

https://stackoverflow.com/questions/40878804/how-to-count-grapheme-clusters-or-perceived-emoji-characters-in-java

https://engineering.linecorp.com/en/blog/the-7-ways-of-counting-characters/

https://softwareengineering.stackexchange.com/questions/13207/string-class-based-on-graphemes

https://news.ycombinator.com/item?id=13832831 Emoji.length == 2

This is most definitely not a solved problem, because graphemes (visual symbols) are a poor way to deal with unicode in the real world. Pretty much all systems either deal with the length in bytes (if they're old-style C), in code units / byte pairs (if they're UTF-16 based, like windows, java and javascript), or in unicode code points (if they're UTF-8 based, like every proper system should be). Dealing with the length in visual symbols is actually pretty much impossible in practice because databases won't let you define field lengths in graphemes.

The way things compose: bytes combine into code points (unicode numbers), and code points combine into graphemes (visual symbols). In UTF-16 for legacy compatibility reasons with UCS-2, code points decompose into code units (byte pairs), and high code points, which need a lot of bits to represent their number, need two code units (4 bytes) instead of one.

Java and JavaScript are UTF-16 based, so they measure length in code units and not code points. An emoji code point can be a low or high number depending on when it was added. Low numbers can be stored in two bytes, high numbers need four bytes. So an emoji can have length 1 or 2 in UTF-16. However, when moving to the database it will typically be stored in UTF-8, and the field length will be code points, not code units. So, that emoji will have a length of 1 regardless of whether it is low or high. You don't notice this as a problem because app-level field length checks will return a bigger number than what the database perceives, so no field length limits are exceeded.

There isn't any such thing as "characters" in code. In documentation when they say "characters" usually they mean bytes, code units or code points. Almost never do they mean graphemes, which is intuitively what people think they mean. The bottom line is two-fold: (A) always understand what is meant in documentation by "length in characters", because it almost never means the intuitive thing, and (B) don't try to use graphemes as your unit of length, it won't work in practice.

https://wiki.sei.cmu.edu/confluence/display/java/STR01-J.+Do+not+assume+that+a+Java+char+fully+represents+a+Unicode+code+point

The char data type is based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of Unicode code points is now U+0000 to U+10FFFF. The set of characters from U+0000 to U+FFFF is called the basic multilingual plane (BMP), and characters whose code points are greater than U+FFFF are called supplementary characters. Such characters are generally rare, but some are used, for example, as part of Chinese and Japanese personal names. To support supplementary characters without changing the char primitive data type and causing incompatibility with previous Java programs, supplementary characters are defined by a pair of Unicode code units called surrogates. According to the Java API [API 2014] class Character documentation (Unicode Character Representations):

The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).

A char value, therefore, represents BMP code points, including the surrogate code points, or code units of the UTF-16 encoding. An int value represents all Unicode code points, including supplementary code points. The lower (least significant) 21 bits of int are used to represent Unicode code points, and the upper (most significant) 11 bits must be zero. Similar to UTF-8 (see STR00-J. Don't form strings containing partial characters from variable-width encodings), UTF-16 is a variable-width encoding. Because the UTF-16 representation is also used in char arrays and in the String and StringBuffer classes, care must be taken when manipulating string data in Java. In particular, do not write code that assumes that a value of the primitive type char (or a Character object) fully represents a Unicode code point. Conformance with this requirement typically requires using methods that accept a Unicode code point as an int value and avoiding methods that accept a Unicode code unit as a char value because these latter methods cannot support supplementary characters.

https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-67/icu4c/readme.html

https://unicode-org.github.io/icu-docs/#/icu4c/

https://bollu.github.io/mathemagic/declarative/index.html https://news.ycombinator.com/item?id=23231361

https://apps.timwhitlock.info/unicode/inspect?s=e%CC%81 unicode inspector

é <- this one is two codepoints

é <- this one isn’t

I thought it would be useful to share my personal list of scripts that break our Latin-1 assumptions. This is a list I mentally check against whenever I am attempting to reason about text. I check if I’m making any assumptions that break in these scripts. Most of these concepts are independent of Unicode; so any program would have to deal with this regardless of encoding.

I again recommend going through eevee’s post, since it covers many related issues. Awesome-Unicode also has a lot of random tidbits about Unicode.

https://en.wikipedia.org/wiki/Arabic_alphabet#Table_of_basic_letters

https://www.gutenberg.org/catalog/

https://en.wikipedia.org/wiki/Devanagari_(Unicode_block)

https://stackoverflow.com/questions/6805311/combining-devanagari-characters

http://unicode.org/faq/char_combmark.html

http://unicode.org/faq/

http://www.unicode.org/versions/Unicode6.0.0/ch04.pdf

https://en.wikipedia.org/wiki/Unicode_block

  1. Grapheme clusters: how many of what end users might consider "characters". In this example, the Devanagari syllable "ni" must be composed using a base character "na" (न) followed by a combining vowel for the "i" sound ( ि), although end users see and think of the combination of the two "नि" as a single unit of text. In this sense, the example string can be thought of as containing 4 “characters” as end users see them. A default grapheme cluster is specified in UAX #29, Unicode Text Segmentation, as well as in UTS #18, Unicode Regular Expressions.

The choice of which count to use and when depends on the use of the value, as well as the tradeoffs between efficiency and comprehension. For example, Java, Windows, and ICU use UTF-16 code unit counts for low-level string operations, but also supply higher level APIs for counting bytes, characters, or denoting boundaries between grapheme clusters, when circumstances require them. An application might use these to, say, limit user input based on a number of "screen positions" using the user-perceived "character" (grapheme cluster) count. Or the application might have an internal limit based on storage allocation in a database field counted in bytes. This approach allows for efficient low-level processing, with allowance for higher-level usage. However, for a very high-level application, such as word-processing macros, grapheme clusters alone may be sufficient.

https://emojipedia.org/

https://blog.emojipedia.org/what-the-2021-unicode-delay-means-for-emoji/

https://emojipedia.org/emoji-zwj-sequence/

https://emojipedia.org/eye-in-speech-bubble/

https://emojipedia.org/emoji-flag-sequence/

https://blog.emojipedia.org/emoji-zwj-sequences-three-letters-many-possibilities/

https://en.wikipedia.org/wiki/Unicode_block

A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

Each block is generally, but not always, meant to include all the glyphs used by one or more specific languages, or in some general application area such as mathematics, surveying, decorative typesetting, social forums, etc.

Unicode blocks are identified by unique names, which use only ASCII characters and are usually descriptive of the nature of the symbols, in English; such as "Tibetan" or "Supplemental Arrows-A". (When comparing block names, one is supposed to equate uppercase with lowercase letters, and ignore any whitespace, hyphens, and underbars; so the last name is equivalent to "supplemental_arrows__a" and "SUPPLEMENTALARROWSA".[1]

Blocks are pairwise disjoint, that is, they do not overlap. The starting code point and the size (number of code points) of each block are always multiples of 16; therefore, in the hexadecimal notation, the starting (smallest) point is U+xxx0 and the ending (largest) point is U+yyyF, where xxx and yyy are three or more hexadecimal digits. (These constraints are intended to simplify the display of glyphs in Unicode Consortium documents, as tables with 16 columns labeled with the last hexadecimal digit of the code point.[1]) The size of a block may range from the minimum of 16 to a maximum of 65,536 code points.

Every assigned code point has a glyph property called "Block", whose value is a character string naming the unique block that owns that point.[2] However, a block may also contain unassigned code points, usually reserved for future additions of characters that "logically" should belong to that block. Code points not belonging to any of the named blocks, e.g. in the unassigned planes 3–13, have the value block="No_block".[1]

https://unicode.org/emoji/charts/full-emoji-list.html unicode emojis, grapheme clusters

https://stackoverflow.com/questions/40878804/how-to-count-grapheme-clusters-or-perceived-emoji-characters-in-java

https://users.rust-lang.org/t/how-to-iterate-over-emojis-grapheme-clusters/14254

https://users.rust-lang.org/t/how-to-iterate-over-emojis-grapheme-clusters/14254/4

https://hsivonen.fi/string-length/ <- this is awesome!!!!!

https://blog.jonnew.com/posts/poo-dot-length-equals-two but, is that true?

https://stackoverflow.com/questions/54369513/how-to-count-the-correct-length-of-a-string-with-emojis-in-javascript

For example, the character encoding scheme ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex. The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 216) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.

Why is Java’s primitive “char” designed to respond to 1 code unit of UTF-16 instead of 1 grapheme or 1 code point? Because when Java was first designed, Unicode’s entire code points were defined in 16 bit.

The concept of “encoding every character in 16 bits” was something that the original designers of Unicode were proud enough to include in their design principles.(Not long after Java was announced, Unicode was expanded beyond 16 bits. As of Unicode 7.0, It is defined as U+10FFFF, or 17*65536=1,114,112.) Meanwhile, MySQL or Oracle’s “utf8” charset is more closer to CESU-8 than it is to UTF-8, possibly requiring more space. When encoding in UTF-8, charsets “AL32UTF8” (Oracle) or “utf8mb4” (MySQL) must be used. Swift, one of the most recent programming languages, is defined so that a character type is expressed as 1 grapheme.

https://en.wikipedia.org/wiki/Plane_(Unicode)

Planes are further subdivided into Unicode blocks, which, unlike planes, do not have a fixed size. The 308 blocks defined in Unicode 13.0 cover 26% of the possible code point space, and range in size from a minimum of 16 code points (fifteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.[4]

https://www.w3.org/International/articles/definitions-characters/

UTF-32

From Wikipedia, the free encyclopedia

Jump to navigationJump to search

UTF-32 (32-bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode code points).[citation needed] UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

The main advantage of UTF-32 is that the Unicode code points are directly indexed. Finding the Nth code point in a sequence of code points is a constant time operation. In contrast, a variable-length code requires sequential access to find the Nth code point in a sequence. This makes UTF-32 a simple replacement in code that uses integers that are incremented by one to examine each location in a string, as was commonly done for ASCII.

The main disadvantage of UTF-32 is that it is space-inefficient, using four bytes per code point, including 11 bits that are always zero. Characters beyond the BMP are relatively rare in most texts, and can typically be ignored for sizing estimates. This makes UTF-32 close to twice the size of UTF-16. It can be up to four times the size of UTF-8 depending on how many of the characters are in the ASCII subset.

Though a fixed number of bytes per code point seems convenient, it is not as useful as it appears. It makes truncation easier but not significantly so compared to UTF-8 and UTF-16 (both of which can search backwards for the point to truncate by looking at 2–4 code units at most).

It is extremely rare[citation needed] that code wishes to find the Nth code point without earlier examining the code points 0 to N–1. For instance, XML parsing cannot do anything with a character without first looking at all preceding characters.[4] So an integer index that is incremented by 1 for each character can be replaced with an integer offset, measured in code units and incremented by the number of code units as each character is examined. This removes the perceived speed advantages[citation needed] of UTF-32.

Each hexadecimal digit represents four binary digits, also known as a nibble, which is half a byte. For example, a single byte can have values ranging from 00000000 to 11111111 in binary form, which can be conveniently represented as 00 to FF in hexadecimal.

In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes".[1] The very last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 13.0, seven of the planes have assigned code points (characters), and five are named.

The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word.[2] UTF-8 was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and can encode 221 (2,097,152) code points (32 planes) even under the current limit of 4 bytes.[3]

The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are surrogates (used to make the pairs in UTF-16), 66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment.

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Every platonic letter in every alphabet is assigned a magic number by the Unicode consortium which is written like this: U+0639. This magic number is called a code point. The U+ means “Unicode” and the numbers are hexadecimal. U+0639 is the Arabic letter Ain. The English letter A would be U+0041. You can find them all using the charmap utility on Windows 2000/XP or visiting the Unicode web site.

There is no real limit on the number of letters that Unicode can define and in fact they have gone beyond 65,536 so not every unicode letter can really be squeezed into two bytes, but that was a myth anyway.

Well, technically, yes, I do believe it could, and, in fact, early implementors wanted to be able to store their Unicode code points in high-endian or low-endian mode, whichever their particular CPU was fastest at, and lo, it was evening and it was morning and there were already two ways to store Unicode. So the people were forced to come up with the bizarre convention of storing a FE FF at the beginning of every Unicode string; this is called a Unicode Byte Order Mark and if you are swapping your high and low bytes it will look like a FF FE and the person reading your string will know that they have to swap every other byte. Phew. Not every Unicode string in the wild has a byte order mark at the beginning.

Almost every stupid “my website looks like gibberish” or “she can’t read my emails when I use accents” problem comes down to one naive programmer who didn’t understand the simple fact that if you don’t tell me whether a particular string is encoded using UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western European), you simply cannot display it correctly or even figure out where it ends. There are over a hundred encodings and above code point 127, all bets are off.

https://en.wikipedia.org/wiki/Universal_Character_Set_characters

The UCS uses surrogates to address characters outside the initial Basic Multilingual Plane without resorting to more than 16 bit byte representations. There are 1024 "high" surrogates (D800–DBFF) and 1024 "low" surrogates (DC00–DFFF). By combining a pair of surrogates, the remaining characters in all the other planes can be addressed (1024 × 1024 = 1048576 code points in the other 16 planes). In UTF-16, they must always appear in pairs, as a high surrogate followed by a low surrogate, thus using 32 bits to denote one code point.

http://unicode.org/faq/char_combmark.html

https://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode

https://en.wikipedia.org/wiki/Unicode_equivalence

Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters.

Unicode provides two such notions, canonical equivalence and compatibility. Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006E (the Latin lowercase "n") followed by U+0303 (the combining tilde "◌̃") is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter "ñ" of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other. Similarly, each Hangul syllable block that is encoded as a single character may be equivalently encoded as a combination of a leading conjoining jamo, a vowel conjoining jamo, and, if appropriate, a trailing conjoining jamo.

The standard also defines a text normalization procedure, called Unicode normalization, that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the normalization form or normal form of the original text. For each of the two equivalence notions, Unicode defines two normal forms, one fully composed (where multiple code points are replaced by single points whenever possible), and one fully decomposed (where single points are split into multiple ones).

https://hackage.haskell.org/package/text-utf8

https://www.smashingmagazine.com/2012/06/all-about-unicode-utf8-character-sets/

The High Surrogate (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.

https://en.wikipedia.org/wiki/UTF-16#U+10000_to_U+10FFFF

https://en.wikipedia.org/wiki/Category:Unicode_formatting_code_points

All characters satisfying a given condition, using properties defined in the Unicode Character Database [UCD]: https://unicode.org/reports/tr41/tr41-26.html#UCD

https://unicode.org/reports/tr29/

https://www.gutenberg.org/catalog/

https://www.win.tue.nl/~aeb/linux/uc/nfc_vs_nfd.html

Roughly speaking, NFC is the short form, fully composed, like U+1F85, and NFD is the long form, fully decomposed, in some well-defined order, like U+03B1 U+0314 U+0301 U+0345. (These are the two non-lossy normal forms

The rules for grapheme clusters can be easily converted into a regular expression, as in Table 1b, Combining Character Sequences and Grapheme Clusters. It must be evaluated starting at a known boundary (such as the start of the text), and it will determine the next boundary position. The resulting regular expression can also be used to generate fast, deterministic finite-state machines that will recognize all the same boundaries that the rules do.

https://hackage.haskell.org/package/text

Currently the text library uses UTF-16 as its internal representation which is neither a fixed-width nor always the most dense representation for Unicode text. We're currently investigating the feasibility of changing Text's internal representation to UTF-8 and if you need such a Text type right now you might be interested in using the spin-off packages text-utf8 and text-short.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment