Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@sancarn
Last active January 27, 2023 10:25
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sancarn/7708c382660e994db4bbe55584ff8bc7 to your computer and use it in GitHub Desktop.
Save sancarn/7708c382660e994db4bbe55584ff8bc7 to your computer and use it in GitHub Desktop.

Formula tracer

Have you ever been told to audit or understand a large formula application? Have you ever took on the challenge, and after a few hours of searching you still have no idea how the outputs relate to the inputs?

Recently I had this same issue while trying to debug issues in a spreadsheet owned by a non-profit organisation. This spreadsheet had huge tables, each table column containing formulas with one of the 7k+ relationships used in the spreadsheet. To be honest, I'm astonished Excel is capable of dealing with this number of relationships in memory with little drop in performance...

This tool can be used to create a list of every relationship in a spreadsheet, which can ultimately be boiled down into a graph showing how each sheet interacts with eachother.

![relationshis

With this tool, you'll be able to make the same kinds of graphs to help you with your own auditing needs.

How it works

This application loops over every formula of every worksheet of the current workbook and tokenises them into a set of "syntactic constructs". These constructs could include Booleans, Strings, Function identifiers, Cell References and more. Where a cell or range reference is identified (either directly, or through some other syntax) this is picked up by the algorithm and reported to the output.

Features

  • Analyses every formula, of every sheet, of the current workbook.
  • Traces through name definitions and even defines relationships in formulas assigned as names.
  • Includes support for tracing table syntax.
  • Includes a utility to detect tables of data/formulae and can handle these differently if required. (See bDetectTables and iMaxRows constants)
  • Can deal with protected sheets (see UnprotectPass constant)
  • Has verbosity options (see bBIG constant)

Using the output for debugging

Imagine you have an issue in Sheet1!B3, currently it's returning an error, but why?

Let's run our tool and have a look. Look in your output.csv

CellWorksheetName CellRefersToWorksheetName Cell CellRefersTo RefersInFormula
Sheet1 Sheet2 [Book1]Sheet1!B3 [Book1]Sheet2!B3 '=Sheet2!B3
Sheet2 Sheet3 [Book1]Sheet2!B3 [Book1]Sheet3!B3:C7 '=VLOOKUP(C3,Sheet3!$B$3:$C$7,2,0)
Sheet3 VOID

So looks like [Book1]Sheet1!B3 refers to [Book1]Sheet2!B3 which contains a vlookup formula '=VLOOKUP(C3,Sheet3!$B$3:$C$7,2,0)... Could it be that C3 can't be found in Sheet3!$B$3:$C$7? Yes, yes that's exactly what happened.

Hopefully you can see that this tool doesn't help you understand the issue, but it does help you understand where to look to find the issue.

Limitations

  • It can not view the formula on protected sheets. These need to be unprotected to use the tool, or you can insert UnprotectPass in the config at the top of the module.
  • It can not inspect the consequence of UDFs.
  • Does not track relationships which can only be determined at runtime.
    • Relationships between offset references are not tracked. E.G. If the name Apples is specified, a user might user Offset(Apples,1,2) which doesn't relate directly to apples, but relates to another cell offset of the current cell. A relationship to Apples will be created, rather than to the offset cell. (This would have pros and cons.)
    • Relationships via the result of indirect() method are not tracked. Indirect(A1) will only bind a relationship to A1 not Indirect(...) result.
Attribute VB_Name = "modTraceFormulae"
Const sRegex As String = "(?<Boolean>true|false)|(?<String>""[^""]*?"")|(?<Function>\w+\()|(?:(?:'(?:\[(?<WorkbookName>.*?)\])?(?<SheetName>.*?)'|(?<SheetName2>\w+))!)?(?:(?<CellRef>(?:\$?[A-Za-z]+\$?\d+(?::\$?[A-Za-z]+\$?\d+)?|\$?[A-Za-z]+:\$?[A-Za-z]+|\$?\d+:\$?\d+|(?<ListObject>\[(?:\w+|@\[.*?\])\])))|(?<Error>#\w+[!?]))|(?<NameOrTable>[A-Za-z][A-Za-z0-9_£]+)(?<TableFields>\[(?:\[.*?\]:\[.*?\])\]|\[#?\w*?\])?"
Const bBIG As Boolean = False
Const bDetectTables As Boolean = True '- likely less verbose but also smaller and faster
Const iMaxRows As Long = 300
Const UnprotectPass as string = ""
Const OUTPUT_DEST as string = "C:\Output.csv"
'Test set:
Sub SelectSelection()
Dim cellRefs As New Collection
Dim subarea As Range, cell As Range
For Each cell In Selection.Cells
If Not cell.EntireRow.Hidden Then
cellRefs.Add cell.Value
End If
Next
Dim cellRef As Variant, bFlag As Boolean
bFlag = False
For Each cellRef In cellRefs
If Not bFlag Then
Range(cellRef).Parent.Parent.Activate
Range(cellRef).Parent.Activate
Range(cellRef).Select
bFlag = True
Else
Application.Union(Range(cellRef), Selection).Select
End If
Next
End Sub
Function TraceFormula() As Object
'Create object to consume tree
Dim oRet As Object
Set oRet = CreateObject("Scripting.Dictionary")
Set oRet("CellDict") = CreateObject("Scripting.Dictionary")
Dim rx As stdRegex
Set rx = stdRegex.Create(sRegex, "ig")
Dim rangeDict As Object
Set rangeDict = GetNameToRefs()
Dim ff As Long: ff = FreeFile()
Open OUTPUT_DEST For Output As #ff
Print #ff, Serialize("CellWorksheetName") & ","; Serialize("CellRefersToWorksheetName") & "," & Serialize("Cell") & "," & Serialize("CellRefersTo") & "," & Serialize("RefersInFormula")
'Loop over all worksheets
Dim ws As Worksheet
For Each ws In Worksheets
Debug.Print "Processing " & ws.name
'Activate sheet
Dim visibility As XlSheetVisibility
visibility = ws.Visible
ws.Visible = xlSheetVisible
ws.Activate
'If is protected, attempt to unprotect
Dim bWasProtected As Boolean
bWasProtected = False
If ws.ProtectContents Then
ws.Unprotect UnprotectPass
bWasProtected = True
End If
'Loop over all formulae ranges:
Dim rForumalae As Range
Set rFormulae = getFormula(ws)
'If formulae present
If Not rFormulae Is Nothing Then
'Loop over all subareas of formulae
Dim subarea As Range
For Each subarea In rFormulae.Areas
Debug.Print "Subarea """ & subarea.Address & """ processed of """ & ws.name & """"
'Ensure subarea trimmed down to 300 rows
If subarea.Rows.CountLarge > iMaxRows Then Set subarea = subarea.Resize(iMaxRows)
'Bulk extract formula if possible
Dim vFormulae As Variant
If subarea.Rows.CountLarge = 1 And subarea.Columns.CountLarge = 1 Then
ReDim vFormulae(1 To 1, 1 To 1)
vFormulae(1, 1) = subarea.Formula
Else
vFormulae = subarea.Formula
End If
Dim i As Long, j As Long
Dim ub1 As Long: ub1 = UBound(vFormulae, 1)
Dim ub2 As Long: ub2 = UBound(vFormulae, 2)
For i = 1 To ub1
If bDetectTables And ((ub1 > 10 And ub2 > 1) Or ub1 > 60) Then
'When i = 5 skip to ub1-5, in this way we ensure tables are summarised in 10 cells of data
If i = 5 Then i = ub1 - 5
End If
For j = 1 To ub2
'Loop over all cells of subareas
Dim cell As Range
Set cell = subarea.Cells(i, j)
'Get the address of this cell
Dim sAddress As String: sAddress = cell.Address(False, False, External:=True)
'If not already existing, create a set of references for this cell address
If Not oRet("CellDict").exists(sAddress) Then Set oRet("CellDict")(sAddress) = stdArray.Create()
'Use regex to match all other cell references in the formula i.e. Lex the formula
Dim results As stdEnumerator: Set results = stdEnumerator.CreateFromIEnumVariant( rx.MatchAll(vFormulae(i, j)))
'Loop over all tokens/cell references and bind these to the parent cell
Dim iRel As Long
For iRel = 1 To results.Length
'Get matching regex item
Dim oMatch As Object
Set oMatch = results.item(iRel)
'Ensure oMatch didn't match String, Function or Boolean
If IsEmpty(oMatch("Error")) And IsEmpty(oMatch("String")) And IsEmpty(oMatch("Function")) And IsEmpty(oMatch("Boolean")) Then
'Interprate link address for reference text
'Here we have to use some local information to understand "is the cell in a list object" as this is important for determining relationship due to [@[...]] amd [...] formula syntax
'So first determine if we are in a ListObject (table)
Dim arrRanges As stdArray
Dim rLO As Range
If cell.ListObject Is Nothing Then
'We're not in a list object, so there should be no special syntax
'Check to see if name defined in rangeDict
If rangeDict.exists(oMatch(0)) Then
Set arrRanges = rangeDict(oMatch(0))
Else
If isValidRange(oMatch(0)) Then
Set arrRanges = stdArray.Create(Range(oMatch(0)))
Else
'Invalid range... error?
End If
End If
Else
'If token matches [@[*]], then this is a ListObject/Table cell reference
'If token matches [*], then this is a ListObject/table column reference
'Otherwise this is a normal address
Dim sColumnName As String
If oMatch(0) Like "[[]@[[]*[]][]]" Then
'Is like [@[*]], therefore get column name
sColumnName = Left(Mid(oMatch(0), 4), Len(oMatch(0)) - 5)
'Intersect range with cell row to identify cell reference
Set rLO = Application.Intersect(cell.EntireRow, cell.ListObject.ListColumns(sColumnName).Range)
Set arrRanges = stdArray.Create(rLO)
ElseIf oMatch(0) Like "[[]*[]]" Then
'Is like [*], therefore get column name
sColumnName = Left(Mid(oMatch(0), 2), Len(oMatch(0)) - 2)
'Get range of column
Set rLO = cell.ListObject.ListColumns(sColumnName).Range
Set arrRanges = stdArray.Create(rLO)
Else
'Normal token, Create address for binding
'Check to see if name defined in rangeDict
If rangeDict.exists(oMatch(0)) Then
Set arrRanges = rangeDict(oMatch(0))
Else
If isValidRange(oMatch(0)) Then
Set arrRanges = stdArray.Create(Range(oMatch(0)))
Else
'Invalid range... error?
End If
End If
End If
End If
Dim irLO As Long
For irLO = 1 To arrRanges.Length
Set rLO = arrRanges.item(irLO)
Dim rResultCell As Range, sLinkAddress As String
'Make sure ranges like L:L or O:O are bound to the usedrange of the range's parent sheet
Dim rIntersectRegion As Range
Set rIntersectRegion = Application.Intersect(rLO, rLO.Parent.UsedRange)
If Not rIntersectRegion Is Nothing Then Set rLO = rIntersectRegion
'Limit to 300 rows (mostly arbitrary)
If rLO.Rows.CountLarge > iMaxRows Then Set rLO = rLO.Resize(iMaxRows)
If Not bBIG Then
'SMALL - Produces smaller output and is faster
'Ensure relationship is cross-sheet
If Not rLO.Parent Is ws Then
'Get link address
sLinkAddress = rLO.Address(False, False, External:=True)
'Ensure no duplicates and print
If Not oRet("CellDict")(sAddress).includes(sLinkAddress) Then
Print #ff, Serialize(ws.name) & "," & Serialize(rLO.Parent.name) & "," & Serialize(sAddress) & "," & Serialize(sLinkAddress) & "," & Serialize("'" & cell.Formula)
oRet("CellDict")(sAddress).Push sLinkAddress
End If
End If
Else
'BIG - Produces huge output and is slow
'Loop over all cells in rLO
If bDetectTables Then
If rLO.Rows.CountLarge > 50 Then
'Loop over first 5 rows of result
For Each rResultCell In rLO.Resize(5).Cells
'Get result cell address
sLinkAddress = rResultCell.Address(False, False, External:=True)
'Print binding and bind to CellDict
If Not oRet("CellDict")(sAddress).includes(sLinkAddress) Then
Print #ff, Serialize(ws.name) & "," & Serialize(rLO.Parent.name) & "," & Serialize(sAddress) & "," & Serialize(sLinkAddress) & "," & Serialize("'" & cell.Formula)
oRet("CellDict")(sAddress).Push sLinkAddress
End If
Next
'Loop over last 5 rows of result
For Each rResultCell In rLO.Resize(5).Offset(rLO.Rows.CountLarge - 5).Cells
'Get result cell address
sLinkAddress = rResultCell.Address(False, False, External:=True)
'Print binding and bind to CellDict
If Not oRet("CellDict")(sAddress).includes(sLinkAddress) Then
Print #ff, Serialize(ws.name) & "," & Serialize(rLO.Parent.name) & "," & Serialize(sAddress) & "," & Serialize(sLinkAddress) & "," & Serialize("'" & cell.Formula)
oRet("CellDict")(sAddress).Push sLinkAddress
End If
Next
Else
'Loop over all rows and columns of result
For Each rResultCell In rLO.Cells
'Get result cell address
sLinkAddress = rResultCell.Address(False, False, External:=True)
'Print binding and bind to CellDict
If Not oRet("CellDict")(sAddress).includes(sLinkAddress) Then
Print #ff, Serialize(ws.name) & "," & Serialize(rLO.Parent.name) & "," & Serialize(sAddress) & "," & Serialize(sLinkAddress) & "," & Serialize("'" & cell.Formula)
oRet("CellDict")(sAddress).Push sLinkAddress
End If
Next
End If
Else
'Loop over all rows and columns of result
For Each rResultCell In rLO.Cells
'Get result cell address
sLinkAddress = rResultCell.Address(False, False, External:=True)
'Print binding and bind to CellDict
If Not oRet("CellDict")(sAddress).includes(sLinkAddress) Then
Print #ff, Serialize(ws.name) & "," & Serialize(rLO.Parent.name) & "," & Serialize(sAddress) & "," & Serialize(sLinkAddress) & "," & Serialize("'" & cell.Formula)
oRet("CellDict")(sAddress).Push sLinkAddress
End If
Next
End If
End If
Next irLO
'For now just skip non-ranges
NotARange:
End If
Next
Next
DoEvents
Next
Next
Else
Print #ff, Serialize(ws.name) & "," & Serialize("VOID") & "," & Serialize("") & "," & Serialize("") & "," & Serialize("")
End If
'Reprotect...
If bWasProtected Then
ws.Protect UnprotectPass
End If
'Return visibility to existing visibility
ws.Visible = visibility
DoEvents
Next
Close #ff
MsgBox "Formulae trace complete!", vbInformation
'Return trace formula object
Set TraceFormula = oRet
End Function
Function GetNameToRefs() As Object
Dim lookup As Object: Set lookup = CreateObject("Scripting.dictionary")
Dim rx As stdRegex: Set rx = stdRegex.Create(sRegex)
Dim n As name
For Each n In ThisWorkbook.Names
'Create a new array of references
Set lookup(n.name) = stdArray.Create()
'Match references with regex
Dim matches As stdEnumerator
Set matches = stdEnumerator.CreateFromIEnumVariant(rx.MatchAll(n.RefersTo))
'Loop over all references and add them
For i = 1 To matches.Length
'Get reference from matches
Dim o As Object
Set o = matches.item(i)
'Ensure reference is acceptable
If IsEmpty(o("Error")) And IsEmpty(o("String")) And IsEmpty(o("Function")) And IsEmpty(o("Boolean")) Then
'Ensure reference isn't duplicate
If Not lookup(n.name).includes(o(0)) Then
'Push reference to array
Call lookup(n.name).Push(o(0))
End If
End If
Next
Next
'Loop through keys to ensure values aren't names:
Dim keys As Variant, iKey As Long
keys = lookup.keys()
For iKey = LBound(keys) To UBound(keys)
'Get key
Dim sName As String: sName = keys(iKey)
Dim refs As stdArray: Set refs = lookup(sName)
Dim iRef As Long
For iRef = 1 To refs.Length
If iRef > refs.Length Then Exit For
'Get a reference for the given name
Dim ref As String: ref = refs.item(iRef)
If rx.Match(ref).exists("NameOrTable") Then
If lookup.exists(ref) Then
'Remove ith reference to name
Call lookup(sName).Remove(iRef)
'Add references of name to refs
Set lookup(sName) = refs.Concat(lookup(ref))
End If
End If
Next
Next
'Ensure all ranges valid
For iKey = LBound(keys) To UBound(keys)
Set refs = lookup(keys(iKey))
If refs.Length > 0 Then
For iRef = refs.Length To 1 Step -1
If isValidRange(refs.item(iRef)) Then
Call refs.PutItem(iRef, Range(refs.item(iRef)))
Else
Call refs.Remove(iRef)
End If
Next
Else
Call lookup.Remove(keys(iKey))
End If
Next
Set GetNameToRefs = lookup
End Function
Function isValidRange(ByVal sRange As String) As Boolean
On Error GoTo Invalid
isValidRange = Not (Range(sRange) Is Nothing)
Exit Function
Invalid:
isValidRange = False
End Function
Function Serialize(v As Variant) As String
Dim s As String
Select Case VarType(v)
Case VbVarType.vbString
s = """" & Replace(v, """", """""") & """"
s = Replace(s, "£", "£")
Case Else
s = v
End Select
Serialize = s
End Function
Function getFormula(ws As Worksheet) As Range
On Error GoTo NoFormula
If ws.ProtectContents Then
ws.Unprotect "SS"
bFlag = True
End If
Set getFormula = ws.UsedRange.SpecialCells(xlCellTypeFormulas)
If bFlag Then ws.Protect "SS"
NoFormula:
End Function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment