# Ocaml find ## 99 Problems (solved) in OCaml

Many of the solutions below have been written by Victor Nicollet. Please contribute more solutions or improve the existing ones.

This section is inspired by Ninety-Nine Lisp Problems which in turn was based on “Prolog problem list”. For each of these questions, some simple tests are shown—they may also serve to make the question clearer if needed. To work on these problems, we recommend you first install OCaml or use it inside your browser. The source of the following problems is available on GitHub.

### Working with lists

#### 3. Find the K'th element of a list. (easy)

REMARK: OCaml has which numbers elements from and raises an exception if the index is out of bounds.

#### 4. Find the number of elements of a list. (easy)

OCaml standard library has but we ask that you reimplement it. Bonus for a tail recursive solution.

#### 5. Reverse a list. (easy)

OCaml standard library has but we ask that you reimplement it.

#### 6. Find out whether a list is a palindrome. (easy)

HINT: a palindrome is its own reverse.

#### 10. Run-length encoding of a list. (easy)

An alternative solution, which is shorter but requires more memory, is to use the pack function declared above:

Here is an example:

#### 11. Modified run-length encoding. (easy)

Modify the result of the previous problem in such a way that if an element has no duplicates it is simply copied into the result list. Only elements with duplicates are transferred as (N E) lists.

Since OCaml lists are homogeneous, one needs to define a type to hold both single elements and sub-lists.

#### 12. Decode a run-length encoded list. (medium)

Given a run-length code list generated as specified in the previous problem, construct its uncompressed version.

#### 13. Run-length encoding of a list (direct solution). (medium)

Implement the so-called run-length encoding data compression method directly. I.e. don't explicitly create the sublists containing the duplicates, as in problem "Pack consecutive duplicates of list elements into sublists", but only count them. As in problem "Modified run-length encoding", simplify the result list by replacing the singleton lists (1 X) by X.

#### 14. Duplicate the elements of a list. (easy)

Remark: this function is not tail recursive. Can you modify it so it becomes so?

#### 15. Replicate the elements of a list a given number of times. (medium)

Note that is needed only because we want to be tail recursive.

#### 17. Split a list into two parts; the length of the first part is given. (easy)

If the length of the first part is longer than the entire list, then the first part is the list and the second part is empty.

Given two indices, and , the slice is the list containing the elements between the 'th and 'th element of the original list (both limits included). Start counting the elements with 0 (this is the way the module numbers elements).

This solution has a drawback, namely that the function is not tail recursive so it may exhaust the stack when given a very long list. You may also notice that the structure of and is similar and you may want to abstract their common skeleton in a single function. Here is a solution.

#### 20. Remove the K'th element from a list. (easy)

The first element of the list is numbered 0, the second 1,...

#### 21. Insert an element at a given position into a list. (easy)

Start counting list elements with 0. If the position is larger or equal to the length of the list, insert the element at the end. (The behavior is unspecified if the position is negative.)

#### 22. Create a list containing all integers within a given range. (easy)

If first argument is greater than second, produce a list in decreasing order.

A tail recursive implementation:

#### 23. Extract a given number of randomly selected elements from a list. (medium)

The selected items shall be returned in a list. We use the module but do not initialize it with for reproducibility.

#### 24. Lotto: Draw N different random numbers from the set 1..M. (easy)

The selected numbers shall be returned in a list.

#### 26. Generate the combinations of K distinct objects chosen from the N elements of a list. (medium)

In how many ways can a committee of 3 be chosen from a group of 12 people? We all know that there are C(12,3) = 220 possibilities (C(N,K) denotes the well-known binomial coefficients). For pure mathematicians, this result may be great. But we want to really generate all the possibilities in a list.

#### 27. Group the elements of a set into disjoint subsets. (medium)

1. In how many ways can a group of 9 people work in 3 disjoint subgroups of 2, 3 and 4 persons? Write a function that generates all the possibilities and returns them in a list.
2. Generalize the above function in a way that we can specify a list of group sizes and the function will return a list of groups.

#### 28. Sorting a list of lists according to length of sublists. (medium)

1. We suppose that a list contains elements that are lists themselves. The objective is to sort the elements of this list according to their length. E.g. short lists first, longer lists later, or vice versa.

2. Again, we suppose that a list contains elements that are lists themselves. But this time the objective is to sort the elements of this list according to their length frequency; i.e., in the default, where sorting is done ascendingly, lists with rare lengths are placed first, others with a more frequent length come later.

### Arithmetic

#### 31. Determine whether a given integer number is prime. (medium)

Recall that divides iff . This is a naive solution. See the Sieve of Eratosthenes for a more clever one.

#### 32. Determine the greatest common divisor of two positive integer numbers. (medium)

Use Euclid's algorithm.

#### 33. Determine whether two positive integer numbers are coprime. (easy)

Two numbers are coprime if their greatest common divisor equals 1.

#### 34. Calculate Euler's totient function φ(m). (medium)

Euler's so-called totient function φ(m) is defined as the number of positive integers r (1 ≤ r < m) that are coprime to m. We let φ(1) = 1.

Find out what the value of φ(m) is if m is a prime number. Euler's totient function plays an important role in one of the most widely used public key cryptography methods (RSA). In this exercise you should use the most primitive method to calculate this function (there are smarter ways that we shall discuss later).

#### 35. Determine the prime factors of a given positive integer. (medium)

Construct a flat list containing the prime factors in ascending order.

#### 36. Determine the prime factors of a given positive integer (2). (medium)

Construct a list containing the prime factors and their multiplicity. Hint: The problem is similar to problem Run-length encoding of a list (direct solution).

#### 37. Calculate Euler's totient function φ(m) (improved). (medium)

See problem "Calculate Euler's totient function φ(m)" for the definition of Euler's totient function. If the list of the prime factors of a number m is known in the form of the previous problem then the function phi(m) can be efficiently calculated as follows: Let be the list of prime factors (and their multiplicities) of a given number m. Then φ(m) can be calculated with the following formula:

φ(m) = (p1 - 1) × p1m1 - 1 × (p2 - 1) × p2m2 - 1 × (p3 - 1) × p3m3 - 1 × ⋯

#### 38. Compare the two methods of calculating Euler's totient function. (easy)

Use the solutions of problems "Calculate Euler's totient function φ(m)" and "Calculate Euler's totient function φ(m) (improved)" to compare the algorithms. Take the number of logical inferences as a measure for efficiency. Try to calculate φ(10090) as an example.

#### 39. A list of prime numbers. (easy)

Given a range of integers by its lower and upper limit, construct a list of all prime numbers in that range.

#### 40. Goldbach's conjecture. (medium)

Goldbach's conjecture says that every positive even number greater than 2 is the sum of two prime numbers. Example: 28 = 5 + 23. It is one of the most famous facts in number theory that has not been proved to be correct in the general case. It has been numerically confirmed up to very large numbers. Write a function to find the two prime numbers that sum up to a given even integer.

#### 41. A list of Goldbach compositions. (medium)

Given a range of integers by its lower and upper limit, print a list of all even numbers and their Goldbach composition.

In most cases, if an even number is written as the sum of two prime numbers, one of them is very small. Very rarely, the primes are both bigger than say 50. Try to find out how many such cases there are in the range 2..3000.

### Logic and Codes

Let us define a small "language" for boolean expressions containing variables:

A logical expression in two variables can then be written in prefix notation. For example, is written:

#### 46 & 47. Truth tables for logical expressions (2 variables). (medium)

Define a function, which returns the truth table of a given logical expression in two variables (specified as arguments). The return value must be a list of triples containing .

#### 48. Truth tables for logical expressions. (medium)

Generalize the previous problem in such a way that the logical expression may contain any number of logical variables. Define in a way that returns the truth table for the expression , which contains the logical variables enumerated in .

#### 49. Gray code. (medium)

An n-bit Gray code is a sequence of n-bit strings constructed according to certain rules. For example,

Find out the construction rules and write a function with the following specification: returns the -bit Gray code.

#### 50. Huffman code (hard)

First of all, consult a good book on discrete mathematics or algorithms for a detailed description of Huffman codes (you can start with the Wikipedia page)!

We consider a set of symbols with their frequencies. For example, if the alphabet is ,..., (represented as the positions 0,...5) and respective frequencies are 45, 13, 12, 16, 9, 5:

Our objective is to construct the Huffman code word for all symbols . In our example, the result could be (or ). The task shall be performed by the function defined as follows: returns the Huffman code table for the frequency table ### Binary Trees

A binary tree is either empty or it is composed of a root element and two successors, which are binary trees themselves.

In OCaml, one can define a new type that carries an arbitrary value of type (thus is polymorphic) at each node.

An example of tree carrying data is:

In OCaml, the strict type discipline guarantees that, if you get a value of type , then it must have been created with the two constructors and .

#### 55. Construct completely balanced binary trees. (medium)

In a completely balanced binary tree, the following property holds for every node: The number of nodes in its left subtree and the number of nodes in its right subtree are almost equal, which means their difference is not greater than one.

Write a function to construct completely balanced binary trees for a given number of nodes. The function should generate all solutions via backtracking. Put the letter as information into all nodes of the tree.

#### 56. Symmetric binary trees. (medium)

Let us call a binary tree symmetric if you can draw a vertical line through the root node and then the right subtree is the mirror image of the left subtree. Write a function to check whether a given binary tree is symmetric.

Hint: Write a function first to check whether one tree is the mirror image of another. We are only interested in the structure, not in the contents of the nodes.

#### 57. Binary search trees (dictionaries). (medium)

Construct a binary search tree from a list of integer numbers.

Then use this function to test the solution of the previous problem.

Apply the generate-and-test paradigm to construct all symmetric, completely balanced binary trees with a given number of nodes.

How many such trees are there with 57 nodes? Investigate about how many solutions there are for a given number of nodes? What if the number is even? Write an appropriate function.

#### 59. Construct height-balanced binary trees. (medium)

In a height-balanced binary tree, the following property holds for every node: The height of its left subtree and the height of its right subtree are almost equal, which means their difference is not greater than one.

Write a function to construct height-balanced binary trees for a given height. The function should generate all solutions via backtracking. Put the letter as information into all nodes of the tree.

The function is defined in the solution of Construct completely balanced binary trees.

#### 60. Construct height-balanced binary trees with a given number of nodes. (medium)

Consider a height-balanced binary tree of height . What is the maximum number of nodes it can contain? Clearly, max_nodes = 2 - 1.

However, what is the minimum number min_nodes? This question is more difficult. Try to find a recursive statement and turn it into a function defined as follows: returns the minimum number of nodes in a height-balanced binary tree of height .

The following solution comes directly from translating the question.

It is not the more efficient one however. One should use the last two values as the state to avoid the double recursion.

It is not difficult to show that = Fh+2‌ - 1, where (Fn) is the Fibonacci sequence.

On the other hand, we might ask: what are the minimum (resp. maximum) height H a height-balanced binary tree with N nodes can have? (resp. ) returns the minimum (resp. maximum) height of a height-balanced binary tree with nodes.

Inverting the formula max_nodes = 2 - 1, one directly find that Hₘᵢₙ(n) = ⌈log₂(n+1)⌉ which is readily implemented:

Let us give a proof that the formula for Hₘᵢₙ is valid. First, if h = n, there exists a height-balanced tree of height h with n nodes. Thus 2ʰ - 1 = ≥ n i.e., h ≥ log₂(n+1). To establish equality for Hₘᵢₙ(n), one has to show that, for any n, there exists a height-balanced tree with height Hₘᵢₙ(n). This is due to the relation Hₘᵢₙ(n) = 1 + Hₘᵢₙ(n/2) where n/2 is the integer division. For n odd, this is readily proved — so one can build a tree with a top node and two sub-trees with n/2 nodes of height Hₘᵢₙ(n) - 1. For n even, the same proof works if one first remarks that, in that case, ⌈log₂(n+2)⌉ = ⌈log₂(n+1)⌉ — use log₂(n+1) ≤ h ∈ ℕ ⇔ 2ʰ ≥ n + 1 and the fact that 2ʰ is even for that. This allows to have a sub-tree with n/2 nodes. For the other sub-tree with n/2-1 nodes, one has to establish that Hₘᵢₙ(n/2-1) ≥ Hₘᵢₙ(n) - 2 which is easy because, if h = Hₘᵢₙ(n/2-1), then h+2 ≥ log₂(2n) ≥ log₂(n+1).

The above function is not the best one however. Indeed, not every 64 bits integer can be represented exactly as a floating point number. Here is one that only uses integer operations:

This algorithm is still not the fastest however. See for example the Hacker's Delight, section 5-3 (and 11-4).

Following the same idea as above, if h = n, then one easily deduces that h ≤ n < (h+1). This yields the following code:

Of course, since is computed recursively, there is no need to recompute everything to go from to :

Now, we can attack the main problem: construct all the height-balanced binary trees with a given number of nodes. returns a list of all height-balanced binary tree with nodes.

First, we define some convenience functions that folds a function on the range ... i.e., it computes . You can think it as performing the assignment for except that there is no mutable variable in the code.

When constructing trees, there is an obvious symmetry: if one swaps the left and right sub-trees of a balanced tree, we still have a balanced tree. The following function returns all trees in together with their permutation.

Finally we generate all trees recursively, using a priori the bounds computed above. It could be further optimized but our aim is to straightforwardly express the idea.

Find out how many height-balanced trees exist for .

#### 61. Count the leaves of a binary tree. (easy)

A leaf is a node with no successors. Write a function to count them.

#### 61A. Collect the leaves of a binary tree in a list. (easy)

A leaf is a node with no successors. Write a function to collect them in a list.

#### 62. Collect the internal nodes of a binary tree in a list. (easy)

An internal node of a binary tree has either one or two non-empty successors. Write a function to collect them in a list.

#### 62B. Collect the nodes at a given level in a list. (easy)

A node of a binary tree is at level N if the path from the root to the node has length N-1. The root node is at level 1. Write a function to collect all nodes of the tree at level in a list.

Using it is easy to construct a function which creates the level-order sequence of the nodes. However, there are more efficient ways to do that.

#### 63. Construct a complete binary tree. (medium)

A complete binary tree with height H is defined as follows: The levels 1,2,3,...,H-1 contain the maximum number of nodes (i.e 2i-1 at the level i, note that we start counting the levels from 1 at the root). In level H, which may contain less than the maximum possible number of nodes, all the nodes are "left-adjusted". This means that in a levelorder tree traversal all internal nodes come first, the leaves come second, and empty successors (the nil's which are not really nodes!) come last.

Particularly, complete binary trees are used as data structures (or addressing schemes) for heaps.

We can assign an address number to each node in a complete binary tree by enumerating the nodes in levelorder, starting at the root with number 1. In doing so, we realize that for every node X with address A the following property holds: The address of X's left and right successors are 2*A and 2*A+1, respectively, supposed the successors do exist. This fact can be used to elegantly construct a complete binary tree structure. Write a function with the following specification: returns iff is a complete binary tree with nodes.

#### 64. Layout a binary tree (1). (medium)

As a preparation for drawing the tree, a layout algorithm is required to determine the position of each node in a rectangular grid. Several layout methods are conceivable, one of them is shown in the illustration. In this layout strategy, the position of a node v is obtained by the following two rules:

• x(v) is equal to the position of the node v in the inorder sequence;
• y(v) is equal to the depth of the node v in the tree.

In order to store the position of the nodes, we will enrich the value at each node with the position .

The tree pictured above is

#### 65. Layout a binary tree (2). (medium) An alternative layout method is depicted in this illustration. Find out the rules and write the corresponding OCaml function.

Hint: On a given level, the horizontal distance between neighbouring nodes is constant.

The tree shown is

#### 66. Layout a binary tree (3). (hard) Yet another layout strategy is shown in the above illustration. The method yields a very compact layout while maintaining a certain symmetry in every node. Find out the rules and write the corresponding predicate.

Hint: Consider the horizontal distance between a node and its successor nodes. How tight can you pack together two subtrees to construct the combined binary tree? This is a difficult problem. Don't give up too early!

In order to pack the tree tightly, the layout function will return in addition to the layout of the tree the left and right profiles of the tree, that is lists of offsets relative to the position of the root node of the tree.

Which layout do you like most?

#### 67. A string representation of binary trees. (medium) Somebody represents binary trees as strings of the following type (see example): .

• Write an OCaml function which generates this string representation, if the tree is given as usual (as or term). Then write a function which does this inverse; i.e. given the string representation, construct the tree in the usual form. Finally, combine the two predicates in a single function which can be used in both directions.
• Write the same predicate using difference lists and a single predicate which does the conversion between a tree and a difference list in both directions.

For simplicity, suppose the information in the nodes is a single letter and there are no spaces in the string.

A simple solution is:

One can also use a buffer to allocate a lot less memory:

For the reverse conversion, we assume that the string is well formed and do not deal with error reporting.

#### 68. Preorder and inorder sequences of binary trees. (medium)

We consider binary trees with nodes that are identified by single lower-case letters, as in the example of the previous problem.

1. Write functions and that construct the preorder and inorder sequence of a given binary tree, respectively. The results should be atoms, e.g. 'abdecfg' for the preorder sequence of the example in the previous problem.
2. Can you use from problem part 1 in the reverse direction; i.e. given a preorder sequence, construct a corresponding tree? If not, make the necessary arrangements.
3. If both the preorder sequence and the inorder sequence of the nodes of a binary tree are given, then the tree is determined unambiguously. Write a function that does the job.
4. Solve problems 1 to 3 using difference lists. Cool! Use the function (defined in problem “Compare the two methods of calculating Euler's totient function.”) to compare the solutions.

What happens if the same character appears in more than one node. Try for instance .

We use lists to represent the result. Note that and can be made more efficient by avoiding list concatenations.

Sours: https://ocaml.org/learn/tutorials/99problems.html

## Module List

module List: ..

List operations.

Some functions are flagged as not tail-recursive. A tail-recursive function uses constant stack space, while a non-tail-recursive function uses stack space proportional to the length of its list argument, which can be a problem with very long lists. When the function takes several list arguments, an approximate formula giving stack usage (in some unspecified constant unit) is shown in parentheses.

The above considerations can usually be ignored if your lists are not longer than about 10000 elements.

The labeled version of this module can be used as described in the module.

type t = =

An alias for the type of lists.

val length :

Return the length (number of elements) of the given list.

val compare_lengths :

Compare the lengths of two lists. is equivalent to , except that the computation stops after reaching the end of the shortest list.

val compare_length_with :

Compare the length of a list to an integer. is equivalent to , except that the computation stops after at most iterations on the list.

val cons :

is

• Since 4.03.0 (4.05.0 in ListLabels)
val hd :

Return the first element of the given list.

• Raises if the list is empty.
val tl :

Return the given list without its first element.

• Raises if the list is empty.
val nth :

Return the -th element of the given list. The first element (head of the list) is at position 0.

• Raises
• if the list is too short.
• if is negative.
val nth_opt :

Return the -th element of the given list. The first element (head of the list) is at position 0. Return if the list is too short.

• Since 4.05
• Raises if is negative.
val rev : val init :

is , evaluated left to right.

• Since 4.06.0
• Raises if .
val append :

Concatenate two lists. Same function as the infix operator . Not tail-recursive (length of the first argument). The operator is not tail-recursive either.

val rev_append :

reverses and concatenates it with . This is equivalent to , but is tail-recursive and more efficient.

val concat :

Concatenate a list of lists. The elements of the argument are all concatenated together (in the same order) to give the result. Not tail-recursive (length of the argument + length of the longest sub-list).

val flatten :

Same as . Not tail-recursive (length of the argument + length of the longest sub-list).

### Comparison

val equal :

holds when the two input lists have the same length, and for each pair of elements , at the same position we have .

Note: the function may be called even if the lists have different length. If you know your equality function is costly, you may want to check first.

val compare :

performs a lexicographic comparison of the two input lists, using the same interface as :

• is smaller than (negative result) if is smaller than , or if they are equal (0 result) and is smaller than
• the empty list is strictly smaller than non-empty lists

Note: the function will be called even if the lists have different lengths.

### Iterators

val iter :

applies function in turn to . It is equivalent to .

val iteri :

Same as , but the function is applied to the index of the element as first argument (counting from 0), and the element itself as second argument.

val map :

applies function to , and builds the list with the results returned by . Not tail-recursive.

val mapi :

Same as , but the function is applied to the index of the element as first argument (counting from 0), and the element itself as second argument. Not tail-recursive.

val rev_map :

gives the same result as , but is tail-recursive and more efficient.

val filter_map :

applies to every element of , filters out the elements and returns the list of the arguments of the elements.

val concat_map :

gives the same result as . Tail-recursive.

val fold_left_map :

is a combination of and that threads an accumulator through calls to .

val fold_left :

is .

val fold_right :

is . Not tail-recursive.

### Iterators on two lists

val iter2 :

calls in turn .

• Raises if the two lists are determined to have different lengths.
val map2 :

is .

• Raises if the two lists are determined to have different lengths. Not tail-recursive.
val rev_map2 :

gives the same result as , but is tail-recursive and more efficient.

val fold_left2 :

is .

• Raises if the two lists are determined to have different lengths.
val fold_right2 :

is .

• Raises if the two lists are determined to have different lengths. Not tail-recursive.

### List scanning

val for_all :

checks if all elements of the list satisfy the predicate . That is, it returns for a non-empty list and if the list is empty.

val exists :

checks if at least one element of the list satisfies the predicate . That is, it returns for a non-empty list and if the list is empty.

val for_all2 :
• Raises if the two lists are determined to have different lengths.
val exists2 :
• Raises if the two lists are determined to have different lengths.
val mem :

is true if and only if is equal to an element of .

val memq :

Same as , but uses physical equality instead of structural equality to compare list elements.

### List searching

val find :

returns the first element of the list that satisfies the predicate .

• Raises if there is no value that satisfies in the list .
val find_opt :

returns the first element of the list that satisfies the predicate . Returns if there is no value that satisfies in the list .

val find_map :

applies to the elements of in order, and returns the first result of the form , or if none exist.

val filter :

returns all the elements of the list that satisfy the predicate . The order of the elements in the input list is preserved.

val find_all : val filteri :

Same as , but the predicate is applied to the index of the element as first argument (counting from 0), and the element itself as second argument.

val partition :

returns a pair of lists , where is the list of all the elements of that satisfy the predicate , and is the list of all the elements of that do not satisfy . The order of the elements in the input list is preserved.

val partition_map :

returns a pair of lists such that, for each element of the input list :

• if is , then is in , and
• if is , then is in .

The output elements are included in and in the same relative order as the corresponding input elements in .

In particular, is equivalent to .

### Association lists

val assoc :

returns the value associated with key in the list of pairs . That is, if is the leftmost binding of in list .

• Raises if there is no value associated with in the list .
val assoc_opt :

returns the value associated with key in the list of pairs . That is, if is the leftmost binding of in list . Returns if there is no value associated with in the list .

val assq :

Same as , but uses physical equality instead of structural equality to compare keys.

val assq_opt :

Same as , but uses physical equality instead of structural equality to compare keys.

val mem_assoc :

Same as , but simply return if a binding exists, and if no bindings exist for the given key.

val mem_assq :

Same as , but uses physical equality instead of structural equality to compare keys.

val remove_assoc :

returns the list of pairs without the first pair with key , if any. Not tail-recursive.

val remove_assq :

Same as , but uses physical equality instead of structural equality to compare keys. Not tail-recursive.

### Lists of pairs

val split :

Transform a list of pairs into a pair of lists: is . Not tail-recursive.

val combine :

Transform a pair of lists into a list of pairs: is .

• Raises if the two lists have different lengths. Not tail-recursive.

### Sorting

val sort :

Sort a list in increasing order according to a comparison function. The comparison function must return 0 if its arguments compare as equal, a positive integer if the first is greater, and a negative integer if the first is smaller (see Array.sort for a complete specification). For example, is a suitable comparison function. The resulting list is sorted in increasing order. is guaranteed to run in constant heap space (in addition to the size of the result list) and logarithmic stack space.

The current implementation uses Merge Sort. It runs in constant heap space and logarithmic stack space.

val stable_sort :

Same as , but the sorting algorithm is guaranteed to be stable (i.e. elements that compare equal are kept in their original order).

The current implementation uses Merge Sort. It runs in constant heap space and logarithmic stack space.

val fast_sort : val sort_uniq :

Same as , but also remove duplicates.

• Since 4.02.0 (4.03.0 in ListLabels)
val merge :

Merge two lists: Assuming that and are sorted according to the comparison function , will return a sorted list containing all the elements of and . If several elements compare equal, the elements of will be before the elements of . Not tail-recursive (sum of the lengths of the arguments).

### Lists and Sequences

val to_seq : val of_seq :

Create a list from a sequence.

Sours: https://ocaml.org/api/List.html

## Real World OCaml

We’ve so far experienced OCaml largely through the toplevel. As you move from exercises to real-world programs, you’ll need to leave the toplevel behind and start building programs from files. Files are more than just a convenient way to store and manage your code; in OCaml, they also correspond to modules, which act as boundaries that divide your program into conceptual units.

In this chapter, we’ll show you how to build an OCaml program from a collection of files, as well as the basics of working with modules and module signatures.

### Single-File Programs

We’ll start with an example: a utility that reads lines from , computes a frequency count of the lines, and prints out the ten most frequent lines. We’ll start with a simple implementation, which we’ll save as the file freq.ml.

This implementation will use two functions from the module, which provides utility functions for interacting with association lists, i.e., lists of key/value pairs. In particular, we use the function , which looks up a key in an association list; and , which adds a new binding to an association list, as shown here:

Note that doesn’t modify the original list, but instead allocates a new list with the requisite key/value pair added.

Now we can write .

The function reads in lines from , constructing from those lines an association list with the frequencies of each line. It does this by invoking (similar to the function described in Chapter 3, Lists And Patterns), which reads through the lines one by one, calling the provided function for each line to update the accumulator. That accumulator is initialized to the empty list.

With defined, we then call the function to build the association list, sort that list by frequency in descending order, grab the first 10 elements off the list, and then iterate over those 10 elements and print them to the screen. These operations are tied together using the operator described in Chapter 2, Variables And Functions.

##### Where Is ?

Unlike programs in C, Java or C#, programs in OCaml don’t have a unique function. When an OCaml program is evaluated, all the statements in the implementation files are evaluated in the order in which they were linked together. These implementation files can contain arbitrary expressions, not just function definitions. In this example, the declaration starting with plays the role of the function, kicking off the processing. But really the entire file is evaluated at startup, and so in some sense the full codebase is one big function.

The idiom of writing may seem a bit odd, but it has a purpose. The binding here is a pattern-match to a value of type , which is there to ensure that the expression on the righthand side returns , as is common for functions that operate primarily by side effect.

If we weren’t using or any other external libraries, we could build the executable like this:

But as you can see, it fails because it can’t find and . We need a somewhat more complex invocation to get them linked in:

This uses , a tool which itself invokes other parts of the OCaml toolchain (in this case, ) with the appropriate flags to link in particular libraries and packages. Here, is asking to link in the library; asks ocamlfind to link in the packages as is necessary for building an executable.

While this works well enough for a one-file project, more complicated projects require a tool to orchestrate the build. One good tool for this task is . To invoke , you need to have a file that specifies the details of the build.

With that in place, we can invoke as follows.

We can run the resulting executable, , from the command line. Executables built with will be left in the directory, from which they can be invoked. The specific invocation below will count the words that come up in the file itself.

Conveniently, allows us to combine the building and running an executable into a single operation, which we can do using .

##### Bytecode Versus Native Code

OCaml ships with two compilers: the native code compiler and the bytecode compiler. Programs compiled with are interpreted by a virtual machine, while programs compiled with are compiled to machine code to be run on a specific operating system and processor architecture. With , targets ending with are built as bytecode executables, and those ending with are built as native code.

Aside from performance, executables generated by the two compilers have nearly identical behavior. There are a few things to be aware of. First, the bytecode compiler can be used on more architectures, and has some tools that are not available for native code. For example, the OCaml debugger only works with bytecode (although , the GNU Debugger, works with some limitations on OCaml native-code applications). The bytecode compiler is also quicker than the native-code compiler. In addition, in order to run a bytecode executable, you typically need to have OCaml installed on the system in question. That’s not strictly required, though, since you can build a bytecode executable with an embedded runtime, using the compiler flag.

As a general matter, production executables should usually be built using the native-code compiler, but it sometimes makes sense to use bytecode for development builds. And, of course, bytecode makes sense when targeting a platform not supported by the native-code compiler. We’ll cover both compilers in more detail in Chapter 25, The Compiler Backend: Byte Code And Native Code.

### Multifile Programs and Modules

Source files in OCaml are tied into the module system, with each file compiling down into a module whose name is derived from the name of the file. We’ve encountered modules before, such as when we used functions like and from the module. At its simplest, you can think of a module as a collection of definitions that are stored within a namespace.

Let’s consider how we can use modules to refactor the implementation of . Remember that the variable contains an association list representing the counts of the lines seen so far. But updating an association list takes time linear in the length of the list, meaning that the time complexity of processing a file is quadratic in the number of distinct lines in the file.

We can fix this problem by replacing association lists with a more efficient data structure. To do that, we’ll first factor out the key functionality into a separate module with an explicit interface. We can consider alternative (and more efficient) implementations once we have a clear interface to program against.

We’ll start by creating a file, , that contains the logic for maintaining the association list used to represent the frequency counts. The key function, called , bumps the frequency count of a given line by one.

The file counter.ml will be compiled into a module named , where the name of the module is derived automatically from the filename. The module name is capitalized even if the file is not. Indeed, module names are always capitalized.

We can now rewrite to use .

The resulting code can still be built with , which will discover dependencies and realize that needs to be compiled.

### Signatures and Abstract Types

While we’ve pushed some of the logic to the module, the code in can still depend on the details of the implementation of . Indeed, if you look at the definition of , you’ll see that it depends on the fact that the empty set of frequency counts is represented as an empty list. We’d like to prevent this kind of dependency, so we can change the implementation of without needing to change client code like that in .

The implementation details of a module can be hidden by attaching an interface. (Note that in the context of OCaml, the terms interface, signature, and module type are all used interchangeably.) A module defined by a file can be constrained by a signature placed in a file called .

For , we’ll start by writing down an interface that describes what’s currently available in , without hiding anything. declarations are used to specify values in a signature. The syntax of a declaration is as follows:

Using this syntax, we can write the signature of as follows.

Note that will detect the presence of the file automatically and include it in the build.

To hide the fact that frequency counts are represented as association lists, we’ll need to make the type of frequency counts abstract. A type is abstract if its name is exposed in the interface, but its definition is not. Here’s an abstract interface for :

Note that we needed to add and to , since otherwise there would be no way to create a or get data out of one.

We also used this opportunity to document the module. The file is the place where you specify your module’s interface, and as such is a natural place to put documentation. We started our comments with a double asterisk to cause them to be picked up by the tool when generating API documentation. We’ll discuss more in Chapter 24, The Compiler Frontend Parsing And Type Checking.

Here’s a rewrite of to match the new :

If we now try to compile , we’ll get the following error:

This is because depends on the fact that frequency counts are represented as association lists, a fact that we’ve just hidden. We just need to fix to use instead of and to use to convert the completed counts to an association list. The resulting implementation is shown below.

With this implementation, the build now succeeds!

Now we can turn to optimizing the implementation of . Here’s an alternate and far more efficient implementation, based on ’s data structure.

There’s some unfamiliar syntax in the above example, in particular the use of to generate an empty map. Here, we’re making use of a more advanced feature of the language (specifically, first-class modules, which we’ll get to in later chapters). The use of these features for the Map data-structure in particular is covered in Chapter 13, Maps And Hash Tables.

### Concrete Types in Signatures

In our frequency-count example, the module had an abstract type for representing a collection of frequency counts. Sometimes, you’ll want to make a type in your interface concrete, by including the type definition in the interface.

For example, imagine we wanted to add a function to for returning the line with the median frequency count. If the number of lines is even, then there is no precise median, and the function would return the lines before and after the median instead. We’ll use a custom type to represent the fact that there are two possible return values. Here’s a possible implementation:

In the above, we use to throw an exception for the case of the empty list. We’ll discuss exceptions more in Chapter 7, Error Handling. Note also that the function simply returns the first element of any two-tuple.

Now, to expose this usefully in the interface, we need to expose both the function and the type with its definition. Note that values (of which functions are an example) and types have distinct namespaces, so there’s no name clash here. Adding the following two lines to does the trick.

The decision of whether a given type should be abstract or concrete is an important one. Abstract types give you more control over how values are created and accessed, and make it easier to enforce invariants beyond what is enforced by the type itself; concrete types let you expose more detail and structure to client code in a lightweight way. The right choice depends very much on the context.

### Nested Modules

Up until now, we’ve only considered modules that correspond to files, like . But modules (and module signatures) can be nested inside other modules. As a simple example, consider a program that needs to deal with multiple identifiers like usernames and hostnames. If you just represent these as strings, then it becomes easy to confuse one with the other.

A better approach is to mint new abstract types for each identifier, where those types are under the covers just implemented as strings. That way, the type system will prevent you from confusing a username with a hostname, and if you do need to convert, you can do so using explicit conversions to and from the string type.

Here’s how you might create such an abstract type, within a submodule:

Note that the and functions above are implemented simply as the identity function, which means they have no runtime effect. They are there purely as part of the discipline that they enforce on the code through the type system. We also chose to put in an equality function, so you can check if two usernames match. In a real application, we might want more functionality, like the ability to hash and compare usernames, but we’ve kept this example purposefully simple.

The basic structure of a module declaration like this is:

We could have written this slightly differently, by giving the signature its own top-level declaration, making it possible to create multiple distinct types with the same underlying implementation in a lightweight way:

The preceding code has a bug: it compares the username in one session to the host in the other session, when it should be comparing the usernames in both cases. Because of how we defined our types, however, the compiler will flag this bug for us.

This is a trivial example, but confusing different kinds of identifiers is a very real source of bugs, and the approach of minting abstract types for different classes of identifiers is an effective way of avoiding such issues.

### Opening Modules

Most of the time, you refer to values and types within a module by using the module name as an explicit qualifier. For example, you write to refer to the function in the module. Sometimes, though, you want to be able to refer to the contents of a module without this explicit qualification. That’s what the statement is for.

We’ve encountered already, specifically where we’ve written to get access to the standard definitions in the library. In general, opening a module adds the contents of that module to the environment that the compiler looks at to find the definition of various identifiers. Here’s an example:

is essential when you want to modify your environment for a standard library like , but it’s generally good style to keep the opening of modules to a minimum. Opening a module is basically a trade-off between terseness and explicitness—the more modules you open, the fewer module qualifications you need, and the harder it is to look at an identifier and figure out where it comes from.

Here’s some general advice on how to deal with s:

• Opening modules at the toplevel of a module should be done quite sparingly, and generally only with modules that have been specifically designed to be opened, like or .

• If you do need to do an open, it’s better to do a local open. There are two syntaxes for local opens. For example, you can write:

Here, and the infix operators are the ones from the module.

There’s another, even more lightweight syntax for local s, which is particularly useful for small expressions:

• An alternative to local s that makes your code terser without giving up on explicitness is to locally rebind the name of a module. So, when using the type, instead of writing:

you could write:

Because the module name only exists for a short scope, it’s easy to read and remember what stands for. Rebinding modules to very short names at the top level of your module is usually a mistake.

### Including Modules

While opening a module affects the environment used to search for identifiers, including a module is a way of adding new identifiers to a module proper. Consider the following simple module for representing a range of integer values:

We can use the directive to create a new, extended version of the module:

The difference between and is that we’ve done more than change how identifiers are searched for: we’ve changed what’s in the module. If we’d used , we’d have gotten a quite different result:

To consider a more realistic example, imagine you wanted to build an extended version of the module, where you’ve added some functionality not present in the module as distributed in . That’s a job for .

Now, how do we write an interface for this new module? It turns out that works on signatures as well, so we can pull essentially the same trick to write our . The only issue is that we need to get our hands on the signature for the module. This can be done using , which computes a signature from a module:

Note that the order of declarations in the does not need to match the order of declarations in the . The order of declarations in the mostly matters insofar as it affects which values are shadowed. If we wanted to replace a function in with a new function of the same name, the declaration of that function in the would have to come after the declaration.

We can now use as a replacement for . If we want to use in preference to in our project, we can create a file of common definitions:

And if we then put after at the top of each file in our project, then references to will automatically go to instead.

### Common Errors with Modules

When OCaml compiles a program with an and an , it will complain if it detects a mismatch between the two. Here are some of the common errors you’ll run into.

### Type Mismatches

The simplest kind of error is where the type specified in the signature does not match the type in the implementation of the module. As an example, if we replace the declaration in by swapping the types of the first two arguments:

and we try to compile, we’ll get the following error.

### Missing Definitions

We might decide that we want a new function in for pulling out the frequency count of a given string. We could add that to the by adding the following line.

Now if we try to compile without actually adding the implementation, we’ll get this error.

A missing type definition will lead to a similar error.

### Type Definition Mismatches

Type definitions that show up in an need to match up with corresponding definitions in the . Consider again the example of the type . The order of the declaration of variants matters to the OCaml compiler, so the definition of in the implementation listing those options in a different order:

will lead to a compilation error.

Order is similarly important to other type declarations, including the order in which record fields are declared and the order of arguments (including labeled and optional arguments) to a function.

### Cyclic Dependencies

In most cases, OCaml doesn’t allow cyclic dependencies, i.e., a collection of definitions that all refer to one another. If you want to create such definitions, you typically have to mark them specially. For example, when defining a set of mutually recursive values (like the definition of and in Chapter 2, Recursive Functions), you need to define them using rather than ordinary .

The same is true at the module level. By default, cyclic dependencies between modules are not allowed, and cyclic dependencies among files are never allowed. Recursive modules are possible but are a rare case, and we won’t discuss them further here.

The simplest example of a forbidden circular reference is a module referring to its own module name. So, if we tried to add a reference to from within .

we’ll see this error when we try to build:

The problem manifests in a different way if we create cyclic references between files. We could create such a situation by adding a reference to from , e.g., by adding the following line.

In this case, will notice the error and complain explicitly about the cycle:

### Designing with Modules

The module system is a key part of how an OCaml program is structured. As such, we’ll close this chapter with some advice on how to think about designing that structure effectively.

### Expose Concrete Types Rarely

When designing an , one choice that you need to make is whether to expose the concrete definition of your types or leave them abstract. Most of the time, abstraction is the right choice, for two reasons: it enhances the flexibility of your design, and it makes it possible to enforce invariants on the use of your module.

Abstraction enhances flexibility by restricting how users can interact with your types, thus reducing the ways in which users can depend on the details of your implementation. If you expose types explicitly, then users can depend on any and every detail of the types you choose. If they’re abstract, then only the specific operations you want to expose are available. This means that you can freely change the implementation without affecting clients, as long as you preserve the semantics of those operations.

In a similar way, abstraction allows you to enforce invariants on your types. If your types are exposed, then users of the module can create new instances of that type (or if mutable, modify existing instances) in any way allowed by the underlying type. That may violate a desired invariant i.e., a property about your type that is always supposed to be true. Abstract types allow you to protect invariants by making sure that you only expose functions that preserves your invariants.

Despite these benefits, there is a trade-off here. In particular, exposing types concretely makes it possible to use pattern-matching with those types, which as we saw in Lists And Patterns is a powerful and important tool. You should generally only expose the concrete implementation of your types when there’s significant value in the ability to pattern match, and when the invariants that you care about are already enforced by the data type itself.

### Design for the Call Site

When writing an interface, you should think not just about how easy it is to understand the interface for someone who reads your carefully documented file, but more importantly, you want the call to be as obvious as possible for someone who is reading it at the call site.

The reason for this is that most of the time, people interacting with your API will be doing so by reading and modifying code that uses the API, not by reading the interface definition. By making your API as obvious as possible from that perspective, you simplify the lives of your users.

There are many ways of improving readability at the call site. One example is labeled arguments (discussed in Chapter 2, Labeled Arguments), which act as documentation that is available at the call site.

You can also improve readability simply by choosing good names for your functions, variant tags and record fields. Good names aren’t always long, to be clear. If you wanted to write an anonymous function for doubling a number: , a short variable name like is best. A good rule of thumb is that names that have a small scope should be short, whereas names that have a large scope, like the name of a function in a module interface, should be longer and more explicit.

There is of course a tradeoff here, in that making your APIs more explicit tends to make them more verbose as well. Another useful rule of thumb is that more rarely used names should be longer and more explicit, since the cost of verbosity goes down and the benefit of explicitness goes up the less often a name is used.

### Create Uniform Interfaces

Designing the interface of a module is a task that should not be thought of in isolation. The interfaces that appear in your codebase should play together harmoniously. Part of achieving that is standardizing aspects of those interfaces.

, and other libraries from the same family have been designed with a uniform set of standards in mind around the design of module interfaces. Here are some of the guidelines that they use.

• A module for (almost) every type. You should mint a module for almost every type in your program, and the primary type of a given module should be called .

• Put first. If you have a module whose primary type is , the functions in that take a value of type should take it as their first argument.

• Functions that routinely throw an exception should end in . Otherwise, errors should be signaled by returning an or an (both of which are discussed in Chapter 7, Error Handling ).

There are also standards in Base about what the type signature for specific functions should be. For example, the signature for is always essentially the same, no matter what the underlying type it is applied to. This kind of function-by-function API uniformity is achieved through the use of signature includes, which allow for different modules to share components of their interface. This approach is described in Chapter 9, Using Multiple Interfaces.

Base’s standards may or may not fit your projects, but you can improve the usability of your codebase by finding some consistent set of standards to apply.

### Interfaces before implementations

OCaml’s concise and flexible type language enables a type-oriented approach to software design. Such an approach involves thinking through and writing out the types you’re going to use before embarking on the implementation itself.

This is a good approach both when working in the core language, where you would write your type definitions before writing the logic of your computations, as well as at the module level, where you would write a first draft of your before working on the .

Of course, the design process goes in both directions. You’ll often find yourself going back and modifying your types in response to things you learn by working on the implementation. But types and signatures provide a lightweight tool for constructing a skeleton of your design in a way that helps clarify your goals and intent, before you spend a lot of time and effort fleshing it out.

Sours: https://dev.realworldocaml.org/files-modules-and-programs.html
Trees with Map and Fold - OCaml Programming - Chapter 4 Video 7

## Lists

A list is an ordered sequence of elements. All elements of a list in OCaml must be the same type. Lists are built into the language and have a special syntax. Here is a list of three integers:

Note semicolons separate the elements, not commas. The empty list is written . The type of this list of integers is .

A list, if it is not empty, has a head (the first element) and a tail (the list consisting of the rest of the elements). In our example, the head is the integer while the tail is the list . An empty list has neither a head nor a tail. Here are some more lists:

Notice the type of the empty list is (its element type is not known). Notice also the type of the last list - or a list of lists of integers.

There are two built-in operators on lists. The or cons operator, adds one element to the front of a list. The or append operator combines two lists:

### Functions on lists

We can write functions which operate over lists by pattern matching:

Consider a function to find the length of a list:

This function operates not just on lists of integers, but on any kind of list.

Why is this? Because in the pattern the head of the list is not inspected, so its type cannot be relevant. Such a function is called polymorphic. Here is another polymorphic function, our own version of the operator for appending:

Notice that the memory for the second list is shared, but the first list is effectively copied.

### Higher order functions on lists

We might wish to apply a function to each element in a list, yielding a new one. We shall write a function which is given another function as its argument - such a function is called "higher-order":

Notice the type of the function in parentheses as part of the whole type. This function, given a function of type and a list of s, will build a list of s. Sometimes and might be the same type, of course. Here are two examples showing the function in use:

### The standard library List module

The standard library List module contains a wide range of useful utility functions, including pre-written versions of many of the functions we have written in this tutorial. A version of the module with labeled functions is available as part of StdLabels.

In the List module documentation, functions which can raise an exception are marked. Such exceptions are usually the result of lists which are empty (and therefore have neither a head nor a tail) or lists of mismatched length.

### Maps and iterators

We have already written a function from scratch, and it is no surprise that one is included in the List module. There is also a variant for two lists:

In addition, we have an imperative analog to , called . It takes an imperative function of type and an and applies the function to each element in turn. A suitable function might be :

There is a variant for two lists too:

Notice that and will fail if the lists are of unequal length:

### List scanning

The useful function checks whether a given element is a member of a list by scanning its contents:

There are more elaborate scanning functions: imagine we wish to check to see if all elements of a list are even, or if any element is even. We could either write functions to go over each element of the list, keeping a boolean check, or use and other functions already known to us:

This is rather clumsy, though. The standard library provides two useful functions and for this common problem:

So you can see how the standard library has evolved into its present state: pieces of frequently-used code are turned into useful general functions.

### List searching

The function returns the first element of a list matching a given predicate (a predicate is a testing function which returns either true or false when given an element). It raises an exception if such an element is not found:

The function again takes a predicate and tests it against each element in the list, but this time returns the list of all elements which test true:

If we wish to know also which elements did not test true, we can use which returns a pair of lists: the first being the list of elements for which the predicate is true, the second those for which it is false.

Note that the documentation for and tells us that the order of the input is preserved in the output. Where this is not stated it the documentation, it cannot be assumed.

### Association lists

Association lists are a simple (and simplistic) way of implementing the dictionary data structure: that is to say, a group of keys each with an associated value. For large dictionaries, for efficiency, we would use the standard library's Map or Hashtbl modules. But these functions from the List module are useful for lists which are generally small, and have other advantages: since they are just lists of pairs, they can be built and modified easily. They are also easily printed in the toplevel.

When using association lists, and for other purposes, it is sometimes useful to be able to make a list of pairs from a pair of lists and vice versa. The module provides the functions and for this purpose:

### Sorting lists

The function , given a comparison function of type (zero if equal, negative if first smaller, positive if second smaller) and an input list of type , returns the list sorted according to the comparison function. Typically, we use the built-in comparison function which can compare any two values of like type (with the exception of functions which are incomparable).

(The function reverses the argument order of a binary function.)

### Folds

There are two interestingly-named functions in the List module, and . Their job is to combine the elements of a list together, using a given function, accumulating an answer which is then returned. The answer returned depends upon the function given, the elements of the list, and the initial value of the accumulator supplied. So you can imagine these are very general functions. Let's explore first.

In this example, we supply the addition function and an initial accumulator value of 0:

The result is the sum of the elements in the list. Now let's use OCaml's built-in function which returns the larger of two given integers in place of our addition function. We use , the smallest possible integer, as our initial accumulator

The largest number in the list is found. Let's look at the type of the function:

The function is of type where is the accumulator and is the type of each element of the list. The next argument is the initial accumulator, which must be of type , and then finally the input list of type . The result is the final value of the accumulator, so it must have type . Of course, in both of our examples, and are the same as one another. But this is not always so.

Consider the following definition of which uses ( considers the elements from the left, from the right):

In this example, the initial accumulator is the second list, and each element of the first is consed to it in turn. You can see the order of arguments to fold right is a little different:

The function comes first, then the list of elements, then the initial accumulator value. We can use to define our usual function:

But care is needed. If we try that with , which turns a list of lists into a list by concatenating the lists together, we produce this:

Unfortunately, the order of evaluation here is such that larger and larger items are passed to the operator as its first argument, and so the function has a worse running time than . You can read more about the time and space efficiency of lists and other common OCaml data structures in the comparison of standard containers.

Here are some more redefinitions of familiar functions in terms of or . Can you work out how they operate?

### Lists and tail recursion

Our function builds up an intermediate expression of a size proportional to its input list:

For long lists, this may overflow the stack (be too large for the computer to handle). The solution is to write our function with an accumulating argument, like this:

This function now uses a constant amount of space on the stack:

We call such a function tail-recursive. We may write a wrapper function so that the initial accumulator value is supplied automatically:

Or, we can do it all in one function:

In the standard library documentation, functions which are not tail-recursive are marked.

Sours: https://ocaml.org/learn/tutorials/lists.html

## Find ocaml

.

OCaml EXPLAINED in UNDER 1 MINUTE

.

### Now discussing:

.

391 392 393 394 395