2014 Round Two

I had a hard time with this round, despite using nothing fancier than lists, arrays, and trees.

Data Packing

I first solved this with C, using a sort of greedy algorithm. I started from the biggest remaining file, and searched for the largest remaining file that could be paired with it; if none existed then we write the big file to a disc by itself. I stored the number of files of each size in an array, which I used to handle several cases at a time.

Later I realized we can do something simpler: pair the biggest remaining file with the smallest remaining file if their combined size fits on a single disc, otherwise we put the large file on a disc by itself. [Exercise: why does this also work?]

This algorithm works nicely with a doubly linked list. We start a pointer at either end then step them towards each other until they cross paths.

We approximate this in Haskell with two lists: the original list, and its reverse. A troublesome case arises when all remaining files have the same size. In a real doubly linked list, we can just wait until one pointer reaches the other, but our code instead manipulates the lengths of the lists to calculate r, the number of remaining files.

Then if two of them can fit on a single disc, we need r/2 discs when r is even, and (r+1)/2 when r is odd. We calculate this with the unorthodox uncurry (+) $ divMod r 2 to flex our functional programming muscles. Otherwise, if we can only fit one file per disc, we clearly need r discs.

import Jam
import Data.List

main = jam $ do
  [_, x] <- getints
  ns <- sort <$> getints
  let
    f aas@(a:as) bbs@(b:bs) acc
      | a > b = acc
      | a == b = acc + let r = length aas - (length ns - length bbs) in
        if 2 * a > x then r else uncurry (+) $ divMod r 2
      | a + b <= x = f as bs (acc + 1)
      | otherwise  = f (a:as) bs (acc + 1)
  return $ show $ f ns (reverse ns) 0

Up and Down

We can brute force the small case: we print the minimum number of inversions over all permutations that result in an up-and-down sequence.

Brute force is often the best first move: we can code it quickly to solve the small case, and score some points. It takes longer to design and code a clever algorithm, and even if we think of one immediately, it is good to have a simple brute force program to compare against to catch bugs.

import Jam
import Data.List

main = jam $ do
  [n] <- getints
  as <- getints
  return $ show $ minimum [invs p | p <- permutations [0..n-1],
    isUD $ map (as!!) p]

isUD (x:y:xs) | x < y     = isUD (y:xs)
              | otherwise = isD (y:xs)
isUD _ = True

isD (x:y:xs)  | x > y     = isD (y:xs)
              | otherwise = False
isD _ = True

invs []     = 0
invs (x:xs) = length (filter (x>) xs) + invs xs

Admittedly, this program cuts it close on slower machines. If we happen to be competing with an older computer, we may be better off with C.

How about the large case? Our first instinct may be to figure out what to do with the maximum and work our way down the other numbers. However, there are some cases where it’s unclear where we should move the maximum number.

The trick is to consider the opposite: start from the smallest number and work our way up. The smallest number must wind up on the extreme left or right. We choose the closer extreme. Then we can ignore this number and recurse on the remainder of the list.

But this means when working on a given number i, we ignore all numbers less than i, so we can compute the answer without actually moving anything. For each number i, count the number of element larger than i on its right and on its left, then add the minimum of the two to a running sum. We use an array instead of a list for constant time lookups.

import Jam
import Data.Array

main = jam $ do
  [n] <- getints
  a <- listArray (1, n) <$> getints
  return $ show $ sum [min (f [1..i]) (f [i..n]) | i <- [1..n],
    let f = length . filter (\j -> a!i < a!j)]

A reminder to read problems carefully: the first time around, my eye skipped over "adjacent" when reading the problem, despite the word being bold and italicized. The examples work either way. As a result, I wasted time coding a solution to the wrong problem.

Don’t Break The Nile

This is a maximum flow problem, but thankfully one with enough constraints that we can use a simple algorithm: we start at the south of the river, at the leftmost unblocked square. Then we try to reach the north side using the left-hand rule for solving mazes (allowing our path to intersect itself).

If we make it, we increment a counter, and mark all the squares we traveled as blocked, otherwise we leave everything alone. We repeat this for every unblocked square on the south side, going left to right. The final value of the counter is our answer.

Why does this work? Here’s a rough explanation.

Suppose we’re wrong, that is, there is some way of obtaining more flow in the river. Then consider the path P that starts from the leftmost square X that is not the start of a path in our solution. Somehow our algorithm failed to find it, which could only happen if P conflicts with a path Q starting from some square Y to the left of X. (The left-hand rule always finds a path across the river if one exists.)

By definition, the correct solution has some path Q' also starting from Y. Since we use the left-hand rule, inductively, each step of Q' must be a step of Q, or strictly to the right of a step of Q. This implies each step of P must be strictly to the right of each step of Q, which means our algorithm would have found some path starting from X. This is a contradiction.

It’s easier to code a left-hand maze solver if we proceed one tile at a time. The small input case is small enough that we can store the map of the river in a two-dimensional array:

import Jam
import Data.Array
import Data.List
import Data.Maybe

main = jam $ do
  [w, h, b] <- getints
  bs <- map (\[x0, y0, x1, y1] -> ((x0, y0), (x1, y1))) <$> getintsn b
  -- Place 4 buildings to simulate a border around the river.
  return $ solve w h $ bs
    ++ [((0, y), (w-1, y)) | y <- [-1, h]]
    ++ [((x, 0), (x, h-1)) | x <- [-1, w]]

solve :: Int -> Int -> [((Int, Int), (Int, Int))] -> String
solve w h bs = show $ let
  nile = listArray ((-1, -1), (w, h)) (repeat '.')
    // [(i, 'X') | bnds <- bs, i <- range bnds]
  -- Finds paths from (x, 0) for x <- [n..w-1].
  --   a: records squares blocked by buildings or paths found so far.
  -- acc: the number of paths found so far.
  f n a acc = if w == n then acc
    -- Prints a pretty picture of the river:
    --   unlines $ "" : [[a!(x,y) | x <- [-1..w]]| y <- [h,h-1..(-1)]]
   else let
    -- Follow the left-hand rule to find a path.
    start = (n, 0)
    try dir path@((x, y):_) = if y == h - 1 then Just path else let
      d = find (\(dx, dy) -> a!(x+dx, y+dy) == '.') $ take 4 $ iterate cw dir
      in if isNothing d then Nothing else let
        Just (dx, dy) = d
        i = (x + dx, y + dy)
        in if i == start then Nothing else try (ccw (dx, dy)) (i : path)
    -- We start from (n, 0) facing left. (Our left hand is touching the
    -- south side of the river.)
    p = try (-1, 0) [start]
    in if a!start == '.' && isJust p then
      -- If the start square is free and we find a path, then record the path
      -- on our array and increment the count of paths found so far.
      f (n + 1) (a // map (\x -> (x, head $ show n)) (fromJust p)) (acc + 1)
    else f (n + 1) a acc
  in f 0 nile 0

ccw (x, 0) = (0, x)
ccw (0, y) = (-y, 0)
cw (x, 0) = (0, -x)
cw (0, y) = (y, 0)

The much longer river in the large input means stepping one tile at a time is infeasible. Instead, we must compute entire line segments at a time. Given a direction and a building our left hand is touching, we try every building in turn and see if it intersects before we reach the corner of the touched building.

We need a couple of optimizations. Firstly, instead of marking every path on the map as a new obstacle, we only remember the last path found: since we’re working left to right using the left-hand rule, the only old path we can touch is the previously found path. Secondly, we coalesce consecutive path segments going in the same direction into one long path (cull).

We would benefit from sorting the coordinates in a tree, and removing buildings touched by previous paths, but it turns out our code is adequate.

import Jam
import Data.Array
import Data.List

main = jam $ do
  [w, h, b] <- getints
  bs <- map (\[x0, y0, x1, y1] -> ((x0, y0), (x1, y1))) <$> getintsn b
  -- Place 4 buildings to simulate a border around the river.
  return $ solve w h $ bs
    ++ [((0, y), (w-1, y)) | y <- [-1, h]]
    ++ [((x, 0), (x, h-1)) | x <- [-1, w]]

solve :: Int -> Int -> [((Int, Int), (Int, Int))] -> String

ccw (x, 0) = (0, x)
ccw (0, y) = (-y, 0)
cw (x, 0) = (0, -x)
cw (0, y) = (y, 0)

isLoop (_, d, (_, y)) = y == 0 && d == (-1, 0)
isOpen i bs = not $ or [inRange bnds i | bnds <- bs]

solve w h bs = let
  walk box@((a0, b0), (a1, b1)) d path@((x, y):_) bs = if y == h - 1
    then path else let
    next@(box1, d1, sq1) = case d of
      (-1, 0) -> foldl' (\a@(_, _, (x', _)) b@((x0, y0), (x1, y1)) ->
        if inRange ((x', y0), (x, y1)) (x1, y) then (b, cw d, (x1 + 1, y))
        else a) (box, ccw d, (a0 - 1, y)) bs
      (1, 0) -> foldl' (\a@(_, _, (x', _)) b@((x0, y0), (x1, y1)) ->
        if inRange ((x, y0), (x', y1)) (x0, y) then (b, cw d, (x0 - 1, y))
        else a) (box, ccw d, (a1 + 1, y)) bs
      (0, -1) -> foldl' (\a@(_, _, (_, y')) b@((x0, y0), (x1, y1)) ->
        if inRange ((x0, y'), (x1, y)) (x, y1) then (b, cw d, (x, y1 + 1))
        else a) (box, ccw d, (x, b0 - 1)) bs
      (0, 1) -> foldl' (\a@(_, _, (_, y')) b@((x0, y0), (x1, y1)) ->
        if inRange ((x0, y), (x1, y')) (x, y0) then (b, cw d, (x, y0 - 1))
        else a) (box, ccw d, (x, b1 + 1)) bs
    in if isLoop next then [] else walk box1 d1 (sq1:path) bs

  f n acc lst = if n == w then acc else let
    bs1 = lst ++ bs

    cull (a0:a1:a2:as) = let eqf f = f a0 == f a1 && f a1 == f a2
      in if eqf fst || eqf snd then cull (a0:a2:as) else a0 : cull (a1:a2:as)
    cull xs = xs

    ps = cull $ walk ((0, -1), (w-1, -1)) (-1, 0) [(n, 0)] bs1

    lst1 = zipWith (\(a, b) (c, d)
      -> ((min a c, min b d), (max a c, max b d))) ps (tail ps)
    found = isOpen (n, 0) bs1 && not (null ps)
    in f (n + 1) (acc + fromEnum found) (if found then lst1 else lst)

  in show $ f 0 0 []

The first time around, I made a mistake that would have been catastrophic in a contest: I thought I could make one pass over the path and remove adjacent entries with identical coordinates, then make a second pass to remove adjacent entries facing identical directions. However, this fails in some cases, and is anyway more complicated then our above cull function.

I also cheated in the sense that I had to try an initial version of my program on the large input before realizing more optimizations were needed. I should be generating my own large test cases to do this.

Trie Sharding

The small input is amenable to brute force. Our first sub-problem is to generate all partitions of a set into N pieces. The number of such partitions is given by Stirling numbers of the second kind, and we enumerate all partitions using the same ideas underpinning a well-known recurrence for these numbers.

Namely, to partition M elements into N sets, we can either put the Mth element in a singleton set and partition the remaining M - 1 elements in to N - 1 sets, or partition the remaining M - 1 elements in to N sets and add the Mth element to one of them.

For each partition, we reuse the ins function from our second solution to File-Fix-It to insert each of the N subsets into a tree, and count the number of created nodes. We then find the maximum of these numbers, along with the number of times the maximum occurred.

For convenience, we place a newline character in the root node, which is safe because it never appears in an input string.

import Jam
import Data.List (foldl')
import Data.Tree
import Data.Tree.Zipper

main = jam $ do
  [m, n] <- getints
  ms <- getsn m
  let
    (a, b) = maxTally $ map (sum . map
      (fst . foldl ins (1, fromTree $ Node '\n' []))) $ g ms n
  return $ show a ++ " " ++ show (b * product [1..n])

-- Returns maximum element of a list along with the number of times it appears.
--   maxTally [1, 2, 3, 2, 3] = (3, 2)
maxTally (x:xs) = foldl' (\(m, n) x -> case compare m x of
  GT -> (m, n)
  EQ -> (m, n + 1)
  LT  -> (x, 1)) (x, 1) xs

-- Generates all possible splits of a list into a given number of pieces.
--   g "abc" 2 = [["a","bc"],["ab","c"],["b","ac"]]
g     [] 0 = [[]]
g      _ 0 = []
g     [] _ = []
g (m:ms) n = map ([m]:) (g ms (n-1)) ++ concatMap (aug m) (g ms n)

--  aug 'a' ["bc", "d"] = [["abc","d"],["bc","ad"]]
aug m [xs] = [[m:xs]]
aug m (xs:xss) = ((m:xs):xss) : map (xs:) (aug m xss)

ins (n, tp) [] = (n, root tp)
ins (n, tp) (a:s) = let
  g Nothing = ins (n + 1, insert (Node a []) $ children tp) s
  g (Just tp) = if label tp == a then ins (n, tp) s else g (next tp)
  in g $ firstChild tp

We’ll need a much cleverer approach for the large input.

The worst-case number of nodes is relatively easy. If X strings share a certain prefix, then in the worst case, those strings must be sprayed across as many servers as possible: if X is less than N, they must be assigned to X different servers; otherwise, they are assigned them to N different servers. [Suppose not, that is, suppose in some worst-case scenario, there exists a server with at least two strings with the prefix, as well as a server which completely lacks such strings. Then what happens when we move one of those strings from one server to the other?]

Thus we insert all the strings into a single trie, and compute the worst-case number of nodes with a straightforward recursion (worst). There’s one subtlety: we must distinguish between, for example, a tree containing just "aaa", and a tree containing both "aa" and "aaa".

We do so by artificially appending a newline character to end of each input string. Since this character never appears within a string, it gives us a one-to-one correspondence between the input strings and the leaf nodes of the trie.

The product of the upper bounds (100, 100, 1000) gives us a crude upper bound for the worst-case number of nodes, so we know we can avoid modular arithmetic for now.

Finding the number of worst-case scenarios is the hardest part of this problem.

Consider an internal node P. Suppose one of its children Q is the root of a subtree with at least N leaf nodes. From above, we know that the leaves of this subtree must be sprayed across all N servers somehow or other, which implies that the node P must be created on all N servers.

Then suppose another child of P, say R, has r leaf nodes where r < N. We must assign the r leaf nodes to distinct servers so that R is duplicated as much as possible, but there are no other conditions because P is already everywhere. Thus if tally(x) is the number of worst-case assignments for the trie rooted at the node x, and exts(x) is the number of leaf nodes descended from the node x, then:

tally(P) = product [tally(Q) | Q <- children(P), exts(Q) >= N]
   * product [fall N exts(Q) | Q <- children(P), exts(Q) < N]

where fall is the falling factorial.

What if every child of P has fewer than N descendent leaf nodes? Since we know M is at least as large as N, and since we only need to call tally() on nodes with at least N descendent leaf nodes, we may assume P has at least N descendent leaf nodes spread among its children.

Our final sub-problem, then, is to count the number of ways we can distribute these leaf nodes among the N servers so that firstly, each server contains at least one such leaf node, and secondly, for every child Q of P, every descendent leaf node of Q lies on a distinct server.

Let xs = [x0, …​, xk] be the numbers of descendent leaf nodes of the children of P, and let m be their maximum (m = maximum xs). Let M be one of possibly several children of P with m leaf nodes. Let f(i) be the number of ways to distribute these leaf nodes to satisfy the two conditions on i servers, that is, each of the i servers has at least one leaf node, and the leaf nodes belonging to the same child of P are on distinct servers.

Trivially, f(i) = 0 for i < m, because there’s too few servers to distribute the leaf nodes of M so they end up in different places.

We can compute f(m) easily: for each child Q of P, we just count the number of ways its descendent leaf nodes can wind up on different servers:

f m = product [fall m x | x <- xs]

This is because distributing the leaf nodes of M to distinct servers automatically ensures each server has at least one leaf node.

More generally, we can compute f(m + k) recursively. As before, we start by counting the number of ways the descendent leaf nodes of each child of P can wind up on different servers:

product [fall (m + k) x | x <- xs]

Then we subtract the cases where some servers miss out on at least one leaf node. Suppose we know f(i) for i < m + k. Then for each such i, we subtract the number of ways of picking i servers from the m + k multiplied by f(i), that is, the number of ways we can distribute the leaf nodes among these i servers so that the above conditions are satisfied:

f (m + k) = product [fall (m + k) x | x <- xs]
  - sum [ch (m + k) i * f i | i <- [m..m+k-1]

where ch is the n-choose-k function.

There’s a few other details. We precompute n choose k and the kth falling factorial power of n for n and k between 0 and 100, and store them in arrays. Our arithmetic operations are performed modulo the number specified in the problem.

import Jam
import Data.Array
import Data.List (foldl', foldl1')
import Data.Tree
import Data.Tree.Zipper

mo = (`mod` 1000000007)
momul x y = mo (x * y)

main = jam $ do
  [m, n] <- getints
  ms <- getsn m
  return $ solve ms n

solve :: [String] -> Int -> String
solve ms n = let
  c = exts . toTree . snd $ foldl ins (1, fromTree $ Node '\n' []) $
    map (++ "\n") ms
  in show (worst n c) ++ " " ++ show (tally n c)

-- Given a tree, returns a tree of the same shape where each node contains the
-- number of external nodes descended from that node.
-- In the context of a trie, this is the number of strings with a given prefix.
exts (Node _ []) = Node 1 []
exts (Node _ xs) = let ns = map exts xs in Node (sum $ map rootLabel ns) ns

worst :: Int -> Tree Int -> Int
worst _ (Node _ []) = 0
worst n (Node x xs) = min n x + sum (map (worst n) xs)

fallA = array ((0, 0), (100, 100)) ([((n, 0), 1) | n <- [0..100]]
  ++ [((n, k), mo (n * fallA!(n - 1, k - 1))) | n <- [0..100], k <- [1..100]])

fall n k = fallA!(n, k)

chA = array ((0, 0), (100, 100)) ([((n, 0), 1) | n <- [0..100]]
  ++ [((n, k), mo (chA!(n - 1, k - 1)
   + if k > n - 1 then 0 else chA!(n - 1, k))) | n <- [0..100], k <- [1..100]])

ch n k = chA!(n, k)

tally :: Int -> Tree Int -> Int
tally 1 _ = 1
tally n (Node _ xs) = let
  kids = map rootLabel xs
  m = maximum kids
  a = [mo $ foldl1' momul [fall (m+k) i | i <- kids] - sum (
    zipWith momul [ch (m+k) j | j <- [k,k-1..1]] a ) | k <- [0..n - m]]
  in if m >= n then foldl1' momul [if rootLabel x >= n then
    tally n x else fall n (rootLabel x) | x <- xs]
  else mo $ Prelude.last a

ins (n, tp) [] = (n, root tp)
ins (n, tp) (a:s) = let
  g Nothing = ins (n + 1, insert (Node a []) $ children tp) s
  g (Just tp) = if label tp == a then ins (n, tp) s else g (next tp)
  in g $ firstChild tp

I made several mistakes that would probably have killed my chances in a real contest. I missed the subtlety with, for example, inputs containing "aa" and "aaa", but luckily this problem was exposed when comparing my second solution against the brute force solution.

More serious was my first attempt at computing tally. I had an inefficient recursion that in some sense was recomputing the same quantities over and over.


Ben Lynn blynn@cs.stanford.edu 💡