domingo, marzo 06, 2016

Compilers construction resources


However, those who wants to be serious about compilers should consider acquiring the books: 
  • Compilers: Principles, Techniques, and Tools. Either 1st or 2nd Edition. I've got the 1st edition from AbeBooks, the basics are still the same today compared to 30 years ago (tokenizer, parser, syntax driven translation, intermediate code generation). 
  • Engineering a Compiler, Second Edition. This one is a good book too, and it talks in more detail about Recursive Descent Parsers, which in practice is what many compilers are doing today (ie, gcc for C++). It is a simple technique, and you can write a RDP by hand easily.

jueves, enero 14, 2016

lunes, septiembre 14, 2015

tl;dr: Python 2 vs Python 3

$ cat main.py
print (u"Hello " + b"World!");

$ python2 main.py
Hello World!

$ python3 main.py
Traceback (most recent call ast):
  File "main.py", line 1, in <module>
    print (u"Hello " + b"World!");
TypeError: Can't convert 'bytes' object to str implicitly

domingo, marzo 29, 2015

Aprendiendo a programar: eligiendo un lenguaje

Si querés aprender a programar y querés elegir un lenguaje, usá uno que te permita aprender los conceptos básicos de programación: Variables, uso de memoria, estructuras de control, flujo de ejecución, expresiones, abstracción, programación modular y reuso de código, recursión, estructuras de datos, algoritmos, eficiencia, programación genérica. Punteros es un tema que todos deberían conocer y estudiar, conocer de su existencia y de su razón.

lunes, enero 19, 2015

12v UPS para Raspbery Pi

Raspberry Pi UPS

Andaba necesitando una forma de mantener mi Raspbery Pi funcionando aún con cortes de luz. Mi circuito está basado en este otro regulador de MRE con la diferencia que lo armé con lo que tenía en mi caja de herramientas. Los materiales son:

  • Batería de gel de 12v 7Ah
  • Cargador de batería de 12v de flote, 1A.
  • Regulador 5V switching (¡una joyita!)
  • Raspberry Pi
  • Relés, Op Amps, resistencias, transistor, etc.
La idea del circuito es muy simple: conmutar entre 12v de un transformador y 12v de una batería de gel en el momento en que el transformador pierde energía. Para esto usamos el circuito de MRE que con un op-amp en modo comparador contra un diodo zener abre o cierra los relés.

La primer modificación importante al circuito de MRE es reemplazar el diodo zener con un LM336Z-5.0 que, a diferencia de un zener, el LM336Z es un integrado para usar como referencia de voltaje, por ejemplo en un instrumento de medifición. 

Este integrado es el mismo recomendado por XQ2FOD en su circuito de regulación de energía solar (muy interesante y vale la pena leerlo), y los que lean con atención verán que tanto el circuito de MRE como de XQ2FOD funcionan con el mismo principio: un op-amp como comparador de tensión.

Mi primer circuito no tenía un comparador de tensión, sino que se conectaban los 12v del transformador directo a los relés. El problema con hacer esto es que los relés necesitan mucha menos corriente que el regulador de 5V, y permanecían cerrados durante un segundo de más, por lo que la Rasperry Pi se reiniciaba. 

El comparador de tensión lo que hace entonces es cortar los relés antes que la tensión caiga por debajo de lo que el regulador de 5V necesita; esto se ajusta manualmente con el preset de 100k R1. Hice la prueba de cortar intermitentemente la tensión de los 12v del transformador y la Raspberry Pi nunca se reinició.

Otra diferencia con el circuito de MRE es que mi circuito usa dos relés en lugar de uno. El primer relé conmuta entre los 12v del transformador de 220v-12v, y el segundo conmuta el cargador de batería hacia la batería cuando hay 220v en la línea. El segundo relé existe para que el cargador no esté conectado a la batería al momento de cortarse los 220v de línea.

El componente importante y final es el regulador switching de 5V. De mi parte usé el RECOM R-78B5.0-1.0 que posee una de conversión eficiencia del 97%, por lo que a diferencia del clásico regulador lineal 7805 no necesita disipador de calor: a 12v 1A el 7805 posee una eficiencia del ~45%.


jueves, enero 08, 2015

oggdec lame pipe: converting ogg to mp3

If this command shows "Warning: unsupported audio format":
oggdec -o - file.ogg | lame - file.mp3
Try using raw mode:
oggdec -R -o - file.ogg | lame -r - file.mp3
Or, if the generated mp3 file is pure noise, swap the bytes with -x:
oggdec -R -o - file.ogg | lame -rx - file.mp3
To convert a batch of multiple ogg files:
for x in *.ogg; do oggdec -R -o - "$x" | lame -rxh - "$x.mp3"; done

miércoles, julio 02, 2014

jueves, junio 26, 2014

Eclipse Luna on Debian Wheezy Crash


The crash is related to a version mismatch between GTK2, GTK3 and Debian's GLIBC. The relevant bug is here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=430736

To force the use of GTK2 on Eclipse Luna you can try:
$ export SWT_GTK3=0
For a more permanent solution you can set the GTK version on the eclipse.ini file:
 --launcher.GTK_version
2
This option should be inserted before the --launcher.appendVmargs option.

And now you should be able to use Eclipse Luna on Wheezy.

domingo, mayo 04, 2014

Substring matching in Python (run between naive, Boyer-Moore, and Suffix Array)

A  few days ago I found this very interesting problem: given a list of strings L, write a function that returns the elements of L which contains some substring S.

For example, given L=["Casa", "Perro", "Gato", "Onomatopeya", "internacionalizacion", "Om nom nom"] and S="nom", we want the result of find(L, S) = ['Onomatopeya', 'Om nom nom'].

Naïve Version

On Python this sounds simple enough, and we can write:
def find1(L, S):
    return [x for x in (L) if S in x]
However, for a big enough L and S we can see the runtime of this function depends not only on the size of L but also on the size of S. That's it, the runtime complexity of find1 in BigOh notation is: O(n.m.s), where:
  • n=len(L)
  • m=max([len(x) for x in L])
  • s=len(S)
The plot of time for find1 for a fixed L and where we increase the size of S looks like this:


Boyer-Moore-Horspool

It turns out that since version 2.5, Python's "in" operator is implemented internally using a modified version of Boyer-Moore algorithm for substring searching. The details are here.

We can take advantage of this detail by creating a temporary structure for faster lookups. We pre-process L so we can make fast queries.

The idea is to construct a big string W with the concatenation of all the elements of L, using a special char as separator, a char that is not present on S nor any element of L. For example:
L = ["Casa", "Perro", "Gato", ...]
W = "Casa\nPerro\nGato\n..."
Then, finding if a substring S is present in any of the elements of L can be answered by just writing: 
S in W
This allows us to answer whether a substring is present or not. To actually construct the resulting list of elements of L which contain S we need another helper structure. We build T, a list of integers that, for every elements in L, equals the starting index of this element in W. Continuing with the example:
T = [0, 5, 11, 16, ... ]
This means the first element, "Casa", starts at index 0 in W; the second element "Perro" starts at index 5 in W, etc. And this structure allows us to quickly determine the index in W for every element in, and we lookup the index by doing a binary search on T.

The runtime complexity for constructing this intermediate index is O(n), with O(n) memory usage.

Our new find function should then:
  • find the first position of S in W as p
  • determine for which element of L this index relates to, by doing a binary search on T
  • from p+1 onwards, find again the next S in W.
Since an element of L can contain many times the same substring S we may jump to the next word on W.

On code, the find2 function looks like:
def find2(L, S):
    # Using the native Boyer-Moore implementation of the "in" operator
    R = []
    i = W.find(S)
    while i != -1:
        p = bisect.bisect_right(T, i) - 1
        e = L[p]
        #assert S in e
        R.append(e)
        i = W.find(S, T[p] + len(e))
    return R
The runtime complexity of this new find function is: O(n.m). We still need to take into account the length of each element of L since BMH algorithm is (mostly) linear on the W string. 

The plot of runtime for find1 vs. find2 looks like the following graphic. Again, we are leaving a fixed L and increasing the size of S:



Suffix Array

There is a third way to solve this problem, by means of constructing a suffix array

This amazing data structure offers a runtime complexity of O(log N) for suffix lookups, where N is the length of the string. Incidentally, it also allows to lookup for substrings, since we just lookup until a suffix on the SA has S as prefix.

Again, we need to construct an intermediate index, which is again very simple: sort all the possible suffixes on W. The trick is how to do it: we shall not keep every possible suffix as an string, but just a list of starting positions for every suffix, and sort this list by the actual string of the suffix.

In code, the construction of the SA table is really simple:
# Suffix Array Table
SL = list(range(len(W)))
SL.sort(key=lambda x: W[x:x+100])
The runtime complexity for constructing this intermediate index is O(n.log n), with O(n) memory usage.

To find a specific suffix we should binary search the SA table, using the element on SL to determine where in W the suffix starts.

Since a substring may appear many times on many elements of S, we may have many sufixes starting with S. The good news is, since the list of suffixes is sorted, all this suffixes will be one after another on the SL table. But since we are doing a binary search on the list of suffixes, we can't be sure on where the middle pointer will jump in this contiguous sequence of suffixes, all starting with S. 

Therefore, when we find the position of some suffix we should go back a little to make sure we are starting on the first suffix on the sequence of suffixes that start with S.

On code, our new find3 function looks like:
def find3(L, S):
    # Suffix array
    start = 0
    end = len(SL)
    while start < end:
        mid = start + (end - start) // 2
        pa = SL_key_fn(W, SL[mid], 100)
        pb = SL_key_fn(S, 0, len(S))
        if pa < pb:
            start = mid + 1
        elif pb < pa:
            end = mid
        else:
            # A word may contain the same S multiple times
            R = set()
            while mid > 0 and W.startswith(S, SL[mid]):
                mid = mid - 1
            if not W.startswith(S, SL[mid]):
                mid = mid + 1
            while mid < len(SL) and W.startswith(S, SL[mid]):
                p = bisect.bisect_right(T, SL[mid]) - 1
                e = L[p]
                assert S in e
                R.add(p)
                mid = mid + 1
            return [(L[i]) for i in R]
    return []
The SL_key_fn function was a failed experiment to enhance the performance of the lookups. This function today is:
def SL_key_fn(data, x, llen):
    return data[x:x+llen]
Which is the same as the key on the SA table sorter.

The runtime performance of the find3 function is: O(log (n.m)), and the plot of the three functions looks like this:



Drawbacks

This SA implementation in Python is using a lot of temporary memory for sorting the table. My implementation on my laptop is using 2.4GB of RAM to sort an L of 150k elements. There's been some discussion about this memory issue on this blog post and in this Stack Overflow question.

Special thanks

Python Argentina community is a great place to look for help for all your spanish Python programming needs.