rclark/email-that-isnt-spam

## email-that-isnt-spam
You've got 1M points in your system. Here are the places where that's going to bottleneck you:

- Asking Geoserver for WMS or WFS results is going to mean pulling those 1M points from the database and then running them through a pipeline that results in either a) a large XML document (WFS) or b) an image (WMS). In both cases, query results will be held in memory on the server until the processing is complete.

- There is probably much less processing involved in building the XML doc than there is to render the image. However, the next step in the process is sending the result to the client. The XML doc will be huge (either in the "shortened" content model form, or the enormous flat form you mention). It will take forever to get across the wire, and you'll probably start timing out in various parts of the pipeline.

- The data arrives at the client's web browser. In the case of the WMS request, you're fine, because its a little file and a minimal amount of processing required to draw that image on the map. In the WFS case, though, you're pretty much screwed. OpenLayers does not work in any "streaming" fashion -- it will hold the entire data document in memory. Then it will read the document an build SVG elements on the page for each point. It has to do that a million times, plus now your DOM has millions of elements in it. The browser struggles to render all that SVG. Every time you "pan" the map there's some javascript processing adjusting point locations and the browser has to render them again.

- This is probably why Matt hung Safari trying to view the whole result set. I would guess that the 1M point data package was transferred to Matt's machine, but when Safari received it it tries to render the XML ina "friendly" way. Some syntax highlighting, some collapsable elements, that kind of thing. And so Safari failed. Same thing happens in Chrome when I look at enormous XML docs.

So what can you do?

1) You cannot allow clients to make requests for all 1M points through WFS. If your server handles the processing, the client will get bored waiting and their application will crash when it gets there. Geoserver allows you to set a limit to the number of records that can be returned in a WFS response. The response document will describe how to paginate to get more results, if the client needs them.

But really, you want to minimize WFS requests. Make them extremely targeted. Perhaps you only invoke WFS requests to return information about a single feature that a user clicks on. If you're building tables or plotting graphs, make WFS requests that only ask for the fields that you want to show in the graph -- especially important would be to tell it NOT to send you the geometry information.

2) All map-based views should be image-based. This is a problem that has a solution -- people periodically make maps of the locations of millions of tweets. OpenStreetMap itself is millions and millions of vertexes, and you can use them in fluid maps because they're transferred as images.

With a WMS, by default, images are rendered by the server on-the-fly. As you've realized that is a prohibitively costly operation for millions of points. The images have to be generated first. Once that happens, there's no server processing time, there's no waiting for data to transfer over the wire, and you can map to your hearts content.

GeoWebCache gets you part of the way there. The trick is that most maps these days use a specific mercator projection, and in that projection the world has been segmented into a pre-defined grid of "tiles". The maps you use on the internet only allow you to zoom to a set of pre-defined zoom levels that line up with that tile grid. At zoom level 0 there's 1 tile for the world. Zoom level 1 >> 4 tiles, 2 >> 16 tiles, etc..

These tiles have predefined boundaries, and that's the catch: GeoWebcache will send an image to the client from the tile cache only if the bbox of the WMS requests aligns perfectly with the boundaries of a tile. It has been a while since I've written an OpenLayers app, but I recall that there was a way to ask a WMS for a single image or for tiled images -- you'd have to make sure you're not asking for single images. You'll have to make sure that the requests to the WMS align with tile boundaries.

The approach that I've been moving towards for a lot of maps is to use another app called TileMill. In TileMill on my desktop, I make a connection to my datasource (the PostgreSQL database in this case). Then I write some CSS-ish code that describes how I want to symbolize the data. Then, TileMill can generate all the tiles for whatever span of zoom levels I want. These tiles are just .png files in a set of folders that I can upload to my server and serve with nothing but nginx or apache -- they are just static files. I can make sure that my server sets appropriate caching headers to further speed my client application.

In OpenLayers, you would connect to those tiles with an `OpenLayers.Layers.XYZ`, where you enter a URL template like:

    http://my-server.com/tiles/${z}/${y}/${x}.png

... where /1/1/1.png is the path to one of those tiles that TileMill generated. This means there's no WMS load on Geoserver.

Then when I want to build interactions with the data, I make really specific WFS requests to Geoserver.

We can also talk about point clustering if you'd like to pursue that. It can ease the burden on the browser of visualizing tons of SVG points, but there are hitches, and for millions of points you would actually have to build clusters on the server-side. I've recently done some work to allow this.

You can also consider (if you wanna change pretty dramatically), OpenLayers 3, which is in alpha, but includes the ability to potentially render millions of points in a Canvas or WebGL layer (instead of SVG). http://ol3js.org/en/master/examples/ten-thousand-points.html

Okay, that's enough writing for now. I've asked Cathy for a login so that I can take a look at the site and better understand how you're interacting with Geoserver. FYI I got a 500 error from Tomcat when I tried to log in with Google. I hope you haven't already engineered all these things and I'm just spinning my wheels...

Thanks,
Ryan
	You've got 1M points in your system. Here are the places where that's going to bottleneck you:

	- Asking Geoserver for WMS or WFS results is going to mean pulling those 1M points from the database and then running them through a pipeline that results in either a) a large XML document (WFS) or b) an image (WMS). In both cases, query results will be held in memory on the server until the processing is complete.

	- There is probably much less processing involved in building the XML doc than there is to render the image. However, the next step in the process is sending the result to the client. The XML doc will be huge (either in the "shortened" content model form, or the enormous flat form you mention). It will take forever to get across the wire, and you'll probably start timing out in various parts of the pipeline.

	- The data arrives at the client's web browser. In the case of the WMS request, you're fine, because its a little file and a minimal amount of processing required to draw that image on the map. In the WFS case, though, you're pretty much screwed. OpenLayers does not work in any "streaming" fashion -- it will hold the entire data document in memory. Then it will read the document an build SVG elements on the page for each point. It has to do that a million times, plus now your DOM has millions of elements in it. The browser struggles to render all that SVG. Every time you "pan" the map there's some javascript processing adjusting point locations and the browser has to render them again.

	- This is probably why Matt hung Safari trying to view the whole result set. I would guess that the 1M point data package was transferred to Matt's machine, but when Safari received it it tries to render the XML ina "friendly" way. Some syntax highlighting, some collapsable elements, that kind of thing. And so Safari failed. Same thing happens in Chrome when I look at enormous XML docs.

	So what can you do?

	1) You cannot allow clients to make requests for all 1M points through WFS. If your server handles the processing, the client will get bored waiting and their application will crash when it gets there. Geoserver allows you to set a limit to the number of records that can be returned in a WFS response. The response document will describe how to paginate to get more results, if the client needs them.

	But really, you want to minimize WFS requests. Make them extremely targeted. Perhaps you only invoke WFS requests to return information about a single feature that a user clicks on. If you're building tables or plotting graphs, make WFS requests that only ask for the fields that you want to show in the graph -- especially important would be to tell it NOT to send you the geometry information.

	2) All map-based views should be image-based. This is a problem that has a solution -- people periodically make maps of the locations of millions of tweets. OpenStreetMap itself is millions and millions of vertexes, and you can use them in fluid maps because they're transferred as images.

	With a WMS, by default, images are rendered by the server on-the-fly. As you've realized that is a prohibitively costly operation for millions of points. The images have to be generated first. Once that happens, there's no server processing time, there's no waiting for data to transfer over the wire, and you can map to your hearts content.

	GeoWebCache gets you part of the way there. The trick is that most maps these days use a specific mercator projection, and in that projection the world has been segmented into a pre-defined grid of "tiles". The maps you use on the internet only allow you to zoom to a set of pre-defined zoom levels that line up with that tile grid. At zoom level 0 there's 1 tile for the world. Zoom level 1 >> 4 tiles, 2 >> 16 tiles, etc..

	These tiles have predefined boundaries, and that's the catch: GeoWebcache will send an image to the client from the tile cache only if the bbox of the WMS requests aligns perfectly with the boundaries of a tile. It has been a while since I've written an OpenLayers app, but I recall that there was a way to ask a WMS for a single image or for tiled images -- you'd have to make sure you're not asking for single images. You'll have to make sure that the requests to the WMS align with tile boundaries.

	The approach that I've been moving towards for a lot of maps is to use another app called TileMill. In TileMill on my desktop, I make a connection to my datasource (the PostgreSQL database in this case). Then I write some CSS-ish code that describes how I want to symbolize the data. Then, TileMill can generate all the tiles for whatever span of zoom levels I want. These tiles are just .png files in a set of folders that I can upload to my server and serve with nothing but nginx or apache -- they are just static files. I can make sure that my server sets appropriate caching headers to further speed my client application.

	In OpenLayers, you would connect to those tiles with an `OpenLayers.Layers.XYZ`, where you enter a URL template like:

	http://my-server.com/tiles/${z}/${y}/${x}.png

	... where /1/1/1.png is the path to one of those tiles that TileMill generated. This means there's no WMS load on Geoserver.

	Then when I want to build interactions with the data, I make really specific WFS requests to Geoserver.

	We can also talk about point clustering if you'd like to pursue that. It can ease the burden on the browser of visualizing tons of SVG points, but there are hitches, and for millions of points you would actually have to build clusters on the server-side. I've recently done some work to allow this.

	You can also consider (if you wanna change pretty dramatically), OpenLayers 3, which is in alpha, but includes the ability to potentially render millions of points in a Canvas or WebGL layer (instead of SVG). http://ol3js.org/en/master/examples/ten-thousand-points.html

	Okay, that's enough writing for now. I've asked Cathy for a login so that I can take a look at the site and better understand how you're interacting with Geoserver. FYI I got a 500 error from Tomcat when I tried to log in with Google. I hope you haven't already engineered all these things and I'm just spinning my wheels...

	Thanks,
	Ryan