Chucooleg/Encoding_Architecture.md

## Encoding_Architecture.md

      
    Raw
  

              Encoding_Architecture.md
            
          
    Data Characteristics

To properly frame the encoding problem, I think we can start with the data characteristics of architectural elements. In digital models, each architectural element is represented by geometries with certain points, dimensions and type (such as nurbs and mesh). In construction documents, each element is tagged with its functional identifiers (such as door id), along with spreadsheets that associate more elaborate information (such as fire-resistance). In more advanced BIM softwares many of these features are linked together through some data structures. Before we jump onto building any kind of predictive or inference models, we need to acknowledge that our data is dual-natured:

Numeric properties : dimensions (length, height, width), material properties (light transparencies, reflectiveness, thermal conductivity)

To model using data like this, we are facing the curse of dimensionality. For every new invention of architectural elements out there, new material features may be invented. Some features are not applicable for older elements in our existing catalog. This means, as time progress, our data representation becomes sparser and sparser. Also, every data point would seem very far in high dimensional space. For most tasks, some kind of feature identification, selection, or dimensionality reduction techniques are necessary.

Functional properties : toilet, facade, ceiling, window, wall, roof, floor, escalator, door, balcony, fireplace, elevator, ramps, stair, corridor

To model using data like this, we have to deal with the discreteness of values. There are can be as many types of architectural elements as there are words. Just among windows, we have several hundred types of panes. When modeling continuous variables, we can make reasonable assumptions for the families of smooth distributions and make reasonable "guesses" about where the most likely target value is. In contrast, when modeling a large number of discrete values, our smooth class functions don't work well. Basically, any change of the input variables may have a huge impact of the value of predictions.
In the following, I am going to do some thought experiments on different types of encoding, and discuss the options by the kind of tasks enabled.
Modeling continuous variables

Focusing on the numeric properties of architectural elements, we may encode each element with as many numeric features as necessary. For particular facade system, we may gather a spreadsheet of architectural elements from the different kinds of construction schedules:


id
features (H,W,L, orientation, fire rating ,elastic modulus,...)


gate_1
(8', 6', 5", 270, 3hr, 209GPa,...)


perforated_panel_5
(8', 4', 1", 270, 0.5hr, 30Gpa,...)


tempered_glass_pane_2
(4', 4', 0.25", 270, 0.3hr, 50Gpa,...)


...
...


One-Dimensional Search Problem

It turns out, with this (id, vector) data representation, we can built models for some simple tasks already.
Suppose an architect is designing a multi-layer wall system. He/she realizes that the current design iteration is not meeting a set of specific insulation, lighting and moisture control requirements for an exhibition gallery. At the same time, there exists other constraints such as maximum facade load and maximum thickness allowed for all layers combined. The architect needs to search for a substitute for the current rainscreen layer. To automate this search process, we first perform simple arithmetics on the current design iteration's numeric properties, such as unit weight, thickness, clarity, and % void to arrive at some requirements for the "perfect rainscreen". We may normalize these numeric values and algorithmically search for the k closest match from our database.
Two-Dimensional Detection Problem

Since the scope of architectural design is not limited to assembling wall sections, I look at it as a two dimensional facade problem. It is common for architects to give aesthetics top priority. That is, some kind of graphic pattern tend to be drawn before a structural framework is inserted. There is a somewhat back-and-forth process to synchronize the two ends, but there's always a good amount of retro-fitting. The architects may be adding studs behind more complex geometries, switching out solid panels for glass panes, or deleting some glass panes for better ventilation. If the building is huge and the tiles are relatively small, this can be a very painful task.
To automate this detection process, we should first recognize that most graphic patterns are tiled, so we can think of the tiles as if they were image pixels. Now, our input features becomes a x by y matrix. Each tile has certain properties such as thickness, weight and clarity, which is similar to how each image pixel may have different color channels. As such, similar to how image inputs for convolutional neural nets are x by y by m matrices (where m refer to number of color channels), we can also reshape our data as x by y by m matrices (where m refer to number of facade material properties). Since convolutional neural nets excel at detection problems, such key point detection, similarly we can adapt them to detect facade coordinates where more openings, transparencies, or structural support are necessary.
Of course, I acknowledge that this is a rather quick and dirty task. Facades can have very complex 3D profiles that affect its performance. The geometries in these complex 3D profiles would require way too many data points to encode. For instance, A 2D tiled representation may not work for some Frank Gehry buildings. However, quick analyses like this can help save some back and forth exchanges between architects and their consultants, or provide initial design validation before physical models are built in the workshop.
Modeling discrete variables

In the above section I assumed that architectural elements can be described by a set of continuous values. To the models, a door and a wall are both encoded as "boxes" with different dimensions and material properties. However, the fact that the door is a functionally a door and a wall is functionally a wall is actually useful. For example, we never put a door next to another door next to another door until our entire room is wrapped by doors. We design and enclose our space in a specific ways to serve specific use. To train a machine that can aid functional design, we will need to frame the meaning of architectural design carefully and feed the machine with examples in a format it can digest.
As I have mentioned in the beginning, there are as many functional identifiers for architectural elements as there are words out there. Actually, I find many parallels between how we design architecture and how we write text. Before the application of computational methods on natural language processing exists, there already exists a natural parallel between the way human understand and construct language vs architecture:


Syntactic properties in language can be found in architecture. For instance, a door is typically used next to a wall, a ceiling and a floor. It is seldom designed next to another door leading to the same room. Similarly, we should be able to tag architectural elements as if they were nouns, adjectives, verbs...etc


The recursive structure in language also exists in architecture. As we unfold building envelops, we may encounter a sequence like this rain screen -- gate -- shear wall -- front door -- partitions -- inner door.... We should be able to do part of speech(PoS) tagging as if they were noun phrases, propositional phrases or verb phrases. Some architectural equivalents of PoS tags may be sidewalk, entrance lobby, circulation, rooms...


Syntactic and semantic ambiguities are also commonly encountered in the two domains. For example, the word "permit" can be a "parking permit" or the verb "permit", an architectural opening may be a front entrance, an open window, or a skylight.


High levels of abstractions are developed as our minds and bodies go through a piece of text or architecture. For example, as you are reading this paragraph, you probably don't remember all the previous words, but you understand the gists and remember some key phrases. Similarly, as we experience ventilation, thermal comfort or openness of an architecture, we also don't need to remember the exact sequence of walls and doors and windows. We simply develop a spatial experience, and we may remember only certain architectural features such as a big skylight, a generous head room, or an spiraling circulation pathway that leads to the current room. This is the kind of intuition developed behind recurrent neural nets.


As such, we should be able to explain architectural design to the machine as if we were explaining natural language to it. In natural language processing, we use words as our atomic units. We feed a sequence of words from a corpus to the machine. We teach the machine to probabilistically predict the next word based on the context (training a language model) and we teach the machine to perform higher level tasks on top of these language models. Similarly in architectural design, we use architectural elements as our atomic units. We feed a sequence of architectural elements to the machine. We teach the machine to probabilistically predict the next architectural element conditional on the previous architectural elements and other information such as building type and style. If we manage to unfold a building into a sequence according to either circulation, sunlight path, or envelop coordinates, we may have a data type like the following. Each element in the sequence comes from a large vocabulary.


building,level
Sequence


De Young Muesum Lobby
["door_double","glass_tempered","glass_tempered","glass_tempered","copper_perforated","combined_copper_concrete","combined_copper_concrete", "elevator", "fire_escape_door_steel","dry_wall", "dry_wall"]


Honestly, I have some doubts about how effective architecture can be expressed as a sequence. There are earlier architects such as Peter Eisemen who framed architecture as a language in a more pedagogical way. It is also very common that precedents in architecture are studied using sequences (such as sequence of pathway). Therefore, at the very minimum this is not a strange way for architects to think like that at all. Practically, the challenge is that architectural objects are inherently 3-dimensional so there are many ways we can unfold a building. Some ways may be more successful, and some others may not. Depending on the nature of our tasks. To further this discussion and test how far I can go, I am going to assume that there exists meaningful and productive ways to unfold architecture as sequences for our modeling.
Borrowing the general development of natural language processing from the past couple decades, I would assert that some of the key lessons are transferable to how we may develop probabilistic models for architectural design. Here, my objective is to develop a proper data representation of architectural elements for downstream tasks. If we have a way to encode architectural elements as if they were words in natural language processing, we can perform an array of tasks that are already possible in NLP:


detection (~sentiment analysis) : we can detect if a part of the building has some design failure (such as having fire-escape pathways that are too difficult to follow); similar to how a certain key phrase in an Amazon customer review may characterize it as positive, negative or neutral.


"architectural modeling" (~language modeling): given a sequence of architectural elements already modeled in digital space, the machine can suggest the most likely architectural element for the architectural designer to place. This is similar to how we are able to guess "I lived in France, so I can speak ____ "


parsing (~ part of speech tagging): upon importing a digital architectural model, the digital geometries come with tags such as door, window, ceiling or dry wall. But their higher level functionality is unknown. For instance, we want to know if a certain combination of these elements are enclosing a room, a corridor, a lobby or fire escape pathways. We may be able to tag the geometry again using Part-of-Speech tagging techniques in NLP. Being able to do so can help architects meet certain design requirements. For instance, he/she may be missing a fire-escape connection for a meeting room on a certain floor..


translation/summarization/simplification : sometimes a digital architectural model may be over-modeled, the geometries may be unnecessarily complex. If architects only need the "gist" of the design for shadow studies in another primitive software, significant of extraction work will need to be done manually. Therefore, we need some way to simplify this extraction process automatically. In NLP, some sequence-to-sequence models are already adapted for translation and simplification tasks (for example, there have been 5-6 good papers on training machines to simplify Wikipedia into Simple Wikipedia for kids and English learners. The model are trained on examples that are sentence-aligned). In addition, some other algorithms may enable us to understand what elements are deleted or substituted in that simplification process.


Of course, getting to these complex tasks is still a long shot. But we can start by training "Embeddings" for architectural elements, as if they were "word embeddings". Having this in place would allow us to do more downstream tasks by adapting advanced NLP techniques.
Corpus

Before jumping into embeddings, I also want to discuss briefly about constructing and using an architectural corpus. Ideally, we would collect a corpus of buildings, just like how we would construct a curpus of text documents. Some corpora are more domain or topic specific. Some writing styles are also more similar (especially if written by the same author). These two text corpus characteristics are somewhat transferrable to architectural archives for obvious reasons. However, the construction of our "architectural corpus" is much more difficult because buildings in the wild are archived in all different types of formats. Some only have a few plans, some have elaborate drawings, some have 3D mesh models, some have nicely tagged 3D models that are synced up with their drawings. In most of my previous architecture case studies, there are lots of mismatch between drawings and models collected from everywhere. I think text is much simplier because on most cases we only need to stick with the most updated version of the text. Some text tokenization and canonicalization would be necessary, but is much more doable than architecture. It would be nice to get a sense of the building data that has been archived in BIM so far.
Embeddings

Researchers have gone about training word embeddings in several ways. I am following the timeline along which these techniques were invented, and gauge their feasibility with architectural elements instead of words. I am also evaluating whether their NLP pitfalls apply on architectural modeling.
N-grams
Constructing N-grams for architectural elements would be similar to capturing elevations of fixed window of length N atomic units. For example, some 2 to 5-gram representations of a facade may look like this and a corresponding 5-gram may be encoded as ["plaster_wall", "portrait_window", "fireplace_unit", "portrait_window","dry_wall"]. 5-grams have worked reasonable well in NLP for some years before going out of style, and may work reasonably well for simple architectural modeling task. For instance, some architect may be interior modeling a high rise with many floor plans each with a large perimeter. A simple predictive model of can quickly suggest a reasonable model block for the next unit.
The drawback of N-grams in NLP applies very much in architectural modeling as well. Mainly, the higher N is, the more useful the predictive model is. However, the number of possible N-grams will grow exponentially as the number of any specific N-grams we see drastically decrease. Some smoothing techniques can be applied (such using interpolation to partially back-off to lower order N-grams), but with very very sparse data, the smoothing can compromise predictive power.
WordNet
It is plausible to construct a knowledge base for all the architectural elements, through user engagement. Users may built new architectural elements and insert them properly into a master hierarchy of synsets (in wordnet, similar words are assigned into similar synsets, a word can appear in different synsets). Existing online and offline catalogs, even CAD and Revit libraries, already have pieces of this master hierarchy. Therefore it may be a matter of merging them and encoding this hierarchy like the WordNet library in the NLTK package. This data structure would allow users to very intuitively look up architectural elements, but its encoding doesn't provide an easy way for the machine to compute similarity and hence advanced predictive models. Also, the master hierarchy may easily becomes obsolete as the way architects design buildings evolve organically.
Brown Clustering
The Brown Clustering algorithm could be useful if we want to avoid static assignment of architectural elements into a rigid hierarchy of classes. With the algorithm, we can learn the classes of architectural elements in an unsupervised manner using statistical methods, relearn the classes with more updated corpora as time passes. The classes won't exactly reflect the categories we see in printed catalogs, but will reflect how architects are actually using the elements in a probabilistic manner. This may work fair enough like N-grams, and provide some idea of similarity between architectural elements. But partial membership of architectural elements can be difficult. For instance, a skylight should be a partial member under "roof" and a partial member under "window". Multiple relationships between architectural elements are also not captured well. In a hypothetical class, the algorithm may have derived a class, ("skylight"|"rollup_garage_door"|"sliding_door"|"rollup_sunshades"), but "rollup_garage_door" will not work next to a sequence ["ceramic_tile_wall","standing_shower_unit","wall_mount_toilet_with_plumbing"] while the other three architectural elements in the same class can. Which means, this system can fail at capturing more subtle relationships between atomic architectural elements.
Co-occurrence matrix and SVD
To enable the evaluation of nuance relationships between architectural elements, we want to obtain some distributed representation for the architectural elements. Perhaps Latent Semantic Analysis like approaches can work. In NLP, we may fill in the co-occurrence matrix by sliding the co-occurrence window one sentence at a time. For buildings, we can try sliding this co-occurrence window to one floor at a time or one vertical strip at a time. It may really work well to capture facade design ideas from architects like MVRDV (WOZOCO, Mirador Building, DNB Headquarters) to suggest it for future designs. I think the major downside of this approach, like its downside in NLP, is that its training mechanism is not very compatible with the training mechanism for neural nets, which we may want to use extensively for downstream tasks. Basically, once we obtain an embedded representation for our architectural elements, we feed them into the feed forward layers as inputs. As the neural net starts training, back propagation cannot propagate through the embedded inputs themselves. Which is saying that we cannot keep training our embeddings for more building type specific applications.
Word2Vec
Word2Vec can solve the last problem I mentioned, because methods like CBOW and Skip-gram uses predictive mechanism to train the embeddings. This allows back propagation in the downstream tasks to propagate through the embedding parameters and keep training them to adapt to the current task. This may seem like a perfect solution for us. However, besides the technical problem of expensive softmax estimations, the scale of data available to us may not make sense for this kind of training. Note that the earliest version of Word2Vec in NLP was trained using Wikipedia in 2010 using 1B tokens, and re-trained using 42B tokens available from the web. I have a hard time imagining us having data at that kind of scale and availability to train reliable embeddings...
This is the the farthest I have gone with this fun thought experiment.  Pleaes feel free to let me know what you think!
id	features (H,W,L, orientation, fire rating ,elastic modulus,...)
gate_1	(8', 6', 5", 270, 3hr, 209GPa,...)
perforated_panel_5	(8', 4', 1", 270, 0.5hr, 30Gpa,...)
tempered_glass_pane_2	(4', 4', 0.25", 270, 0.3hr, 50Gpa,...)
...	...