Software Architecture

I’ve performed many roles in my career, but software architecture is what I like most, and I’m pretty good at it.

My recipe for architecting a solution is this. First I study the current processes: I often need to feel the pain of working with them daily. So I work, learn, and discover improvable areas. Then I envision a new architecture that organically fits every piece together, allows simpler processes, and reuses as many building blocks as possible. Then I design a migration path, a chain of reachable steps to perform one at a time, without disrupting production.

Software architectures don’t last long without alterations, not even the best ones, because companies grow along many dimensions. It’s only a matter of time for those changes to go beyond what the architect foresaw. And technical debt ensues.

Software engineers and companies alike need to understand that embracing change (agile anyone?) includes architecting and re-architecting as often as needed.

Web Duplicates


A logic test from a job selection process:

Web search engines A and B each crawl a random subset of the same size of the Web. Some of the pages crawled are duplicates – exact textual copies of each other at different URLs. Assume that duplicates are distributed uniformly amongst the pages crawled by A and B. Further, assume that a duplicate is a page that has exactly two copies – no pages have more than two copies. A indexes pages without duplicate elimination whereas B indexes only one copy of each duplicate page. The two random subsets have the same size before duplicate elimination. If, 45% of A’s indexed URLs are present in B’s index, while 50% of B’s indexed URLs are present in A’s index, what fraction of the Web consists of pages that do not have a duplicate?

Solution

Be:

  • C_X the number of elements of X
    • X = { 1, 2, 3, 2 } -> C_X = 4
  • D_X the number of elements of X with a duplicate in X:
    • X = { 1, 2, 3, 2 } -> D_X = 1
  • N_X the number of elements of X without a duplicate in X
    • X = { 1, 2, 3, 2 } -> N_X = 2
  • XY the union of X and Y

  • X1 the set X before eliminating duplicates

  • X2 the set X after eliminating duplicates


Data:

  1. Web search engines A and B each crawl a random subset of the same size of the Web
    • C_A1 + C_B1 = C_W
  2. The two random subsets have the same size before duplicate elimination
    • C_A1 = C_B1
  3. Assume that duplicates are distributed uniformly amongst the pages crawled by A and B
    • D_A1 = D_B1
  4. Further, assume that a duplicate is a page that has exactly two copies – no pages have more than two copies
    • C_A1 = 2 * D_A1 + D_AB + N_A1
    • C_B1 = 2 * D_B1 + D_AB + N_B1
  5. A indexes pages without duplicate elimination whereas B indexes only one copy of each duplicate page
    • C_A2 = C_A1
    • C_B2 = C_B1 – D_B1
  6. 45% of A’s indexed URLs are present in B’s index, while 50% of B’s indexed URLs are present in A’s index
    • D_AB (in B2) = 45/100 * C_A2
    • D_AB (in A2) = 50/100 * C_B2
  7. N_AB / C_W = ?


Example (without taking into account Data:6):

  • A1 = 1 1 3 3 5 5 7 9 11 13 15

  • B1 = 2 2 4 4 6 6 7 9 10 12 14

  • A2 = A1

  • B2 = 2 4 6 7 9 10 12 14


Solution:

  1. N_AB = N_A1 + N_B1 — by definition.

  2. N_A1 = C_A1 – 2 * D_A1 – D_AB — because of Data:4.

  3. N_B1 = C_B1 – 2 * D_B1 – D_AB — because of Data:4.

  4. N_B1 = C_A1 – 2 * D_A1 – D_AB — because of Solution:3 + Data:2 + Data:3

  5. N_AB = 2 * C_A1 – 4 * D_A1 – 2 * D_AB — because of Solution:1 + Solution:2 + Solution:4

  6. C_B2 = C_A1 – D_A1 — because of Data:5 + Data:2 + Data:3

  7. 45/100 * C_A2 = 50/100 * C_B2 — because of Data:6

  8. 45/100 * C_A1 = 50/100 * ( C_B1 – D_B1 ) — because of Solution:7 + Data:5

  9. 45/100 * C_A1 = 50/100 * ( C_A1 – D_A1 ) — because of Data:2 + Data:3

  10. D_A1 = 10/100 * C_A1 — because of Solution:9

  11. N_AB = 200/100 * C_A1 – 4 * 10/100 * C_A1 – 2 * 45/100 * C_A1 — because of Solution:5 + Solution:10 + Data:6 + Data:5

  12. N_AB = 70/100 * C_A1 — because of Solution:11

  13. N_AB = 70/100 * 50/100 * C_W — because of Solution:12 + Data:1 + Data:2

  14. N_AB / C_W === 35% — because of Solution:13

Slices of pizza


A logic test from a job selection process:

Cut slices of a pizza with exactly 8 straight cuts. How many (at most) slices can you get?

Solution

  1. With 0 cuts I get 1 slice, i.e. the whole pizza.

  2. With 1 cut I get 2 slices, wherever I cut the pizza through.

  3. With 2 cuts I get 4 slices, if the second cut crosses the first one at any point (inside the pizza).

  4. With 3 cuts I can make a straight cut that crosses all of the previous cuts at different points (inside the pizza).

    • Each cut defines two slices (one on each side)

    • but N cuts define N + 1 slices

      because N – 1 slices are shared between each cut and the next, thus 2N – (N – 1) == N + 1.

    • I’m crossing all the N = 2 cuts from the previous step, which define N + 1 == 3 slices of the X = 4 slices.

    Thus I now get X – (N + 1) + 2 * (N + 1) slices == X + N + 1 == 4 + 3 == 7 slices.

  5. With 4 cuts, I get 7 + 4 == 11 slices.

  6. With 5 cuts, I get 11 + 5 == 16 slices.

  7. With 6 cuts, I get 16 + 6 == 22 slices.

  8. With 7 cuts, I get 22 + 7 == 29 slices.

  9. With 8 cuts, I get 29 + 8 == 37 slices.