Web Duplicates


A logic test from a job selection process:

Web search engines A and B each crawl a random subset of the same size of the Web. Some of the pages crawled are duplicates – exact textual copies of each other at different URLs. Assume that duplicates are distributed uniformly amongst the pages crawled by A and B. Further, assume that a duplicate is a page that has exactly two copies – no pages have more than two copies. A indexes pages without duplicate elimination whereas B indexes only one copy of each duplicate page. The two random subsets have the same size before duplicate elimination. If, 45% of A’s indexed URLs are present in B’s index, while 50% of B’s indexed URLs are present in A’s index, what fraction of the Web consists of pages that do not have a duplicate?

Solution

Be:

  • C_X the number of elements of X
    • X = { 1, 2, 3, 2 } -> C_X = 4
  • D_X the number of elements of X with a duplicate in X:
    • X = { 1, 2, 3, 2 } -> D_X = 1
  • N_X the number of elements of X without a duplicate in X
    • X = { 1, 2, 3, 2 } -> N_X = 2
  • XY the union of X and Y

  • X1 the set X before eliminating duplicates

  • X2 the set X after eliminating duplicates


Data:

  1. Web search engines A and B each crawl a random subset of the same size of the Web
    • C_A1 + C_B1 = C_W
  2. The two random subsets have the same size before duplicate elimination
    • C_A1 = C_B1
  3. Assume that duplicates are distributed uniformly amongst the pages crawled by A and B
    • D_A1 = D_B1
  4. Further, assume that a duplicate is a page that has exactly two copies – no pages have more than two copies
    • C_A1 = 2 * D_A1 + D_AB + N_A1
    • C_B1 = 2 * D_B1 + D_AB + N_B1
  5. A indexes pages without duplicate elimination whereas B indexes only one copy of each duplicate page
    • C_A2 = C_A1
    • C_B2 = C_B1 – D_B1
  6. 45% of A’s indexed URLs are present in B’s index, while 50% of B’s indexed URLs are present in A’s index
    • D_AB (in B2) = 45/100 * C_A2
    • D_AB (in A2) = 50/100 * C_B2
  7. N_AB / C_W = ?


Example (without taking into account Data:6):

  • A1 = 1 1 3 3 5 5 7 9 11 13 15

  • B1 = 2 2 4 4 6 6 7 9 10 12 14

  • A2 = A1

  • B2 = 2 4 6 7 9 10 12 14


Solution:

  1. N_AB = N_A1 + N_B1 — by definition.

  2. N_A1 = C_A1 – 2 * D_A1 – D_AB — because of Data:4.

  3. N_B1 = C_B1 – 2 * D_B1 – D_AB — because of Data:4.

  4. N_B1 = C_A1 – 2 * D_A1 – D_AB — because of Solution:3 + Data:2 + Data:3

  5. N_AB = 2 * C_A1 – 4 * D_A1 – 2 * D_AB — because of Solution:1 + Solution:2 + Solution:4

  6. C_B2 = C_A1 – D_A1 — because of Data:5 + Data:2 + Data:3

  7. 45/100 * C_A2 = 50/100 * C_B2 — because of Data:6

  8. 45/100 * C_A1 = 50/100 * ( C_B1 – D_B1 ) — because of Solution:7 + Data:5

  9. 45/100 * C_A1 = 50/100 * ( C_A1 – D_A1 ) — because of Data:2 + Data:3

  10. D_A1 = 10/100 * C_A1 — because of Solution:9

  11. N_AB = 200/100 * C_A1 – 4 * 10/100 * C_A1 – 2 * 45/100 * C_A1 — because of Solution:5 + Solution:10 + Data:6 + Data:5

  12. N_AB = 70/100 * C_A1 — because of Solution:11

  13. N_AB = 70/100 * 50/100 * C_W — because of Solution:12 + Data:1 + Data:2

  14. N_AB / C_W === 35% — because of Solution:13

Slices of pizza


A logic test from a job selection process:

Cut slices of a pizza with exactly 8 straight cuts. How many (at most) slices can you get?

Solution

  1. With 0 cuts I get 1 slice, i.e. the whole pizza.

  2. With 1 cut I get 2 slices, wherever I cut the pizza through.

  3. With 2 cuts I get 4 slices, if the second cut crosses the first one at any point (inside the pizza).

  4. With 3 cuts I can make a straight cut that crosses all of the previous cuts at different points (inside the pizza).

    • Each cut defines two slices (one on each side)

    • but N cuts define N + 1 slices

      because N – 1 slices are shared between each cut and the next, thus 2N – (N – 1) == N + 1.

    • I’m crossing all the N = 2 cuts from the previous step, which define N + 1 == 3 slices of the X = 4 slices.

    Thus I now get X – (N + 1) + 2 * (N + 1) slices == X + N + 1 == 4 + 3 == 7 slices.

  5. With 4 cuts, I get 7 + 4 == 11 slices.

  6. With 5 cuts, I get 11 + 5 == 16 slices.

  7. With 6 cuts, I get 16 + 6 == 22 slices.

  8. With 7 cuts, I get 22 + 7 == 29 slices.

  9. With 8 cuts, I get 29 + 8 == 37 slices.

12 Parties


A logic test from a job selection process:

A nation’s people can vote for members of parliament from 12 parties. One voter must cast only one vote for one representative. If a party doesn’t get more than 5% of votes, then it won’t get any chairs in the parliament. How many chairs can get (at most) the party which collects 25% of votes?

Solution

  1. If V1 == 25%, then Sum(Vi, i=2..12) == 75%

  2. The best for P1 would be that the other parties get 5% of votes, so that they lower the 75% but don’t get any chairs.

    • 5% * 11 = 55% < 75%
      11 other parties cannot get 5% each because there are no parties left to absorb the remaining 20% of votes.

    • 5% * 10 = 50% -> V2 = 75% – 50% == 25%
      10 other parties can get 5% each instead because the remaining party can get the remaining 25% of votes.

  3. Only P1 and P2 will share the parliament, and each of them with 50% of the chairs.