Horrible tags on Stack Overflow

Every once in a while I stumble about a tag on Stack Overflow which

  • has no useful meaning; or
  • is a marker for questions which are generally off-topic or not constructive.

Sometimes I try to open a post on Meta Stack Overflow about these to collect some help to clean them up. But as each user has only a limited amount of close-votes (and you need five votes from different users (or one from a moderator) to close a question), one can do that only every so often, these questions would take votes away from each other.

So here I’ll try to have a list of these tags so I don’t forget.

  • [future]: A lot of these are questions about future events. Prophesies are not on-topic on SO.. The on-topic meaning would be the concept used in concurrent programming.
  • [protected]: About quite some different things, like the protected keyword in some programming languages, protected memory, protected Twitter accounts, password-protected files, protecting video downloads, … . I’m not sure how to sensibly retag this. (The tag also has no tag wiki.)
  • [alternative]: This looks like another tag which contains many non-constructive questions.

  • [documents]

  • [word] vs. [msword]

I’ll add more later as I stumble about them.

The standard answer to embedding source code listings in a LaTeX document is the listings package. It works in pure LaTeX, by parsing the source code and highlighting some parts of them.

An alternative I just found on Stack Overflow is the minted package, created by Konrad Rudolph. It uses the Pygments library to do the actual parsing and highlighting.

I did not yet try the package, but I will and then update this post for my experiences.

An article about how to model hierarchical data in a relational database (specifically MySQL, but likely also applying to other relational DBMS).

Originally, this article was on the MySQL developement site, and linked heavily as http://dev.mysql.com/tech-resources/articles/hierarchical-data.html. But now Oracle (which owns MySQL after buying Sun, which itself bought MySQL some years ago) seems to have reorganized its site, with the effect that this link goes only to an Oracle search page.

This was remarked by some anonymous Stack Overflow user, who suggested the following edit to an answer linking to this page:

— UPDATE: The link is dead. Thanks Oracle. —

I saw this edit suggestion by browsing through the list of suggested edits, and thought this article should not have been completely gone away.

Googleing the URL did not really result in finding the current location of the article (only in some pages linking to the original one). A good idea in such cases is the Wayback Machine of the Internet Archive. It stores previous versions of lots of pages - this one, too (last accessed in January 2010).

If I only wanted to read the old article, this would be enough - but no, I in fact want to correct the link, and it is not a good idea to link to the Wayback machine. But having the article itself, it was easy to copy a sampe sentence to Google, which finally found me the new location of the article (on the web site of its original author).

Thus I edited this answer to include the new link (and the title of the article).

A short google search for the original URL on stackoverflow alone shows that the link was quite popular. So I started an editing spree, starting to replace each occurrence of this link with the new one. I always try to correct other issues with the post, as well, or course.

Have a look at my activity log to see how many edits I had to do for this (most entries with revised, starting at 16:15 UTC today).

An example on how not to ask questions and give answers on Stack Overflow.

Ein Vorschlag, um eine deutsche Fragen-und-Antworten-Site im Stack-Exchange-Netzwerk zu erstellen. Das wäre damit eine deutsche Version von Stack Overflow.

Zur Zeit befindet sich der Vorschlag in der Definitions-Phase: Es werden Fragen gesammelt, die on-topic bzw. off topic sind (und jeweils dafür gestimmt, zu welcher dieser Kategorien sie gehören - oder »bad example«).

Danach kommt die Commitment-Phase - hier müssen sich mindestens 200 Leute finden, die zusagen, sich in der beta-Phase am Beleben des Projektes zu beteiligen, davon mindestens 100, die bereits in anderen Stack-Exchange-Sites aktiv sind (in mindestens einer mindestens 200 Reputation haben). Außerdem müssen diese zusammen eine gewisse Reputation gesammelt haben.

Darauf folgt dann eine private Beta-Phase, in der die Committer anfangen, die Site mit Fragen und Antworten zu füllen, Tags (Kategorien) anlegen, Moderatoren wählen, einen geeigneten Namen finden (»Programming (in German)« ist zwar gut für die Definition auf Area 51, aber nicht als endgültiger Name), etc.

Ist das erfolgreich, kommt die öffentliche Beta. Hier kommt es darauf an, Öffentlichkeit zu gewinnen (mehr Frager und Antworter, mehr Besucher, …).

Wenn die öffentliche Beta-Phase gut läuft, wird die Site offiziell Mitglied im Stack-Exchange-Netwerk, eventuell mit ihrem eigenen Domain-Namen. Falls nicht, wird sie wieder gelöscht.

Falls ihr Interesse daran habt, registriert euch dort als “Follower” und tragt bei zur Definition der Site. Und später als “Committer”, natürlich.


Etwas Hintergrund.

Area 51 ist die Site im Stack-Exchange-Netzwerk, welche sich mit der Erstellung neuer Sites beschäftigt.

Andere interessante Vorschläge, die zur Zeit diskutiert werden, sind diese:

  • Cryptography (auf Englisch, über Kryptographie) hat Commitment-Phase vor kurzem abgeschlossen, demnächst wird die private Beta starten. (Man kann noch Committer werden, denke ich.)
  • Planned and Constructed Languages (über Konstruierte bzw. Plansprachen, wie z.B. Esperanto) hat Anfang Juni die Commitment-Phase begonnen … und stagniert jetzt bei knapp über 30 Committern (ich bin Nummer 32).

Diese Sites sind zur Zeit in der Beta-Phase:

  • German Language and Usage: Über deutsche Sprache und ihre (korrekte?) Verwendung. Der erste einer Reihe von ähnlichen Vorschlägen für diverse Sprachen, am 31. Mai von private beta zu public beta übergegangen. Fragen und Antworten können sowohl auf Deutsch als auch auf Englisch erfolgen (und werden teilweise dann von der Community in die jeweils andere Sprache übersetzt).
  • Sceptics zu wissenschaftlichem Skeptizismus (auf Englisch). Hier kann man Fragen zu allem möglichen stellen (bevorzugt zu öffentlich getätigten Aussagen), und die Antworten untersuchen den Wahrheitsgehalt, hinterlegt mit Referenzen zu wissenschaftlichen Untersuchungen.
  • IT Security zur Sicherheit in der Informationstechnik.
  • User Experience zu Nutzerschnittstellen u.ä.

Schon eine Weile aus dem Beta-Status heraus sind etwa diese (alle auf Englisch):

  • Mathematics zu Mathematik in allen Formen
  • TeX zu TeX, LaTeX und verwandten Programmen
  • Ask Ubuntu für Fragen zu Ubuntu (einer der populärsten Linux-Distributionen)

Schon vor der Einführung des Area-51-Prozesses gab es diese Sites (bisher auch die größten im Netzwerk):

On my current campaign against plz (inspired by this question on meta.stackoverflow), I found this nice answer which demonstrates that SimpleDateFormat is not thread-safe.

Big numbers selfmade: Part 14/14: Missing bits

This is the end of this series, for now.

We have now a class which is able to represent natural numbers as large as are fitting into the memory, can convert numbers from/to all radixes lower than RADIX, can add and multiply and compare numbers, can divide by any numbers smaller than RADIX (and get the remainder).

What is missing, to be useful like BigInteger:

  • Subtraction!
  • (maybe) extension to negative numbers by addition of a sign bit.
  • Minimum, maximum (this should be easy)
  • Exponentiation (e.g. pow(int))
  • Access to individual digits (like flipBit, setBit, clearBit, testBit in BigInteger), digit-wise operations (I’m not sure if and, or, xor make sense here, or if there are any analogues).
  • Conversion to Java’s primitive numbers, e.g. implementing Number.
  • Division and remainder by larger divisors (e.g. more than one digit) and everything based on this, as modular addition/multiplication, greatest common divisor, modular exponentiation
  • Creating random numbers, prime tests.

If you want to contribute, you can fork my github repository, change the class and send a pull request.

Just a reminder: If you liked this and are already active on Stack overflow, consider upvoting my answer to the original question.


0: introduction, 1: number representation, 2: conversion from decimal format, 3: decimal formatting, 4: addition, 5: multiplication, 6: factorials, 7: comparison, 8: normalizing, 9: equals/hashCode, 10: converting between arbitrary radix systems, 11: converting from arbitrary radix, 12: division by small numbers, 13: conversion to arbitrary radix, 14: missing bits - Original question - Full code

Lua’s string patterns versus regular expressions

Major programming languages often include support for regular expressions - either directly in the language (like in Perl - which even coined the word “Perl compatible regular expression” for a family of RE languages) or as a part of the standard library (like Java’s java.util.regex package). Other languages have one or even multiple regex packages available as add-on libraries.

In theory, a regular expression is some way of describing a regular language, i.e. a language which can be recognized by a finite automaton. (In practice, some regular expression implementations include features which can make the recognized language non-regular, like back references.) Important for regular expressions are:

  • literals (i.e. an expression matching exactly one specific input character, like a in most RE implementations)
  • repetition (the * or + operator) of arbitrary expressions
  • alternatives | between arbitrary expressions
  • grouping (( and )) is used to make the syntax unambiguous while still allowing all these constructs to live together.

Most other features (like character classes) can be combined from these. Often the groups can also be used for capturing parts of the result, and/or for reusing these results later in the same regular expression (which in fact makes the expression non-regular, as we need more than a finite automaton to implement these).

Here is an example expression: \b[0-9]*\.?\b[0-9]?(?!]). It matches a word boundary, zero or more digits, one or no dot and another word boundary, zero or one digit - but not when followed by a closing bracket. (It is said to match “decimal numbers” without following ], but it doesn’t really match all of them, and it also matches the empty string at most word boundaries (when not followed by ]).

Lua is a scripting language with focus on easy embedding in other programs. For this reason, it is quite small - smaller than most regular expression engines. Thus, naturally if does not include such an engine.

It has its own replacement feature, named simply patterns in the manual.

These are three-leveled:

  1. Character classes,

    • literal ones (a matches a, %. matches .),
    • some pre-build ones (%l for any letter, . for any single character),
    • sets ([a-z] represents any single one of the letters a to z).
  2. pattern items:

    • a character class, optionally followed by one of the repetition modifiers ?, *, +, -
    • a reused captured item (%3 matches the 3rd captured string)
    • a balanced group (%b() matches a substring starting with (, ending with ) and balanced parenthesis between them)
  3. patterns: A sequence of

    • pattern items
    • (-)-enclosed patterns (these represent captures which can be used later in the pattern)
    • empty captures () (they capture the current index into the string instead of a substring)

Note that there are:

  • no alternatives except for single characters/classes
  • no repetitions except for single characters classes

This makes it impossible to recognize the languages of some quite simple regular expressions:

  • ab|ba (either ab or ba - this is even a finite language)
  • (ab)* (a string consisting of any number of ab, but no aa nor bb.)

For our example language, we could create those patterns:

  • %d+%.?%d* matches all decimal numbers with at least one digit before the dot (or without a dot),
  • %d*%d.?%d+ matches all decimal numbers with at least one digit after the dot (or without a dot).
  • %.%d+ matches decimal numbers without a digit before the dot.

Regular expressions would allow to compose them to match what we want, but Lua patterns don’t (no alternatives). Thus the alternative has to be implemented externally, using Lua code.

On the other hand, the %b feature allows recognizing languages which are not regular (if we ignore that this is most probably implemented with a finite-size int counter), like this simple one:

  • %b() - any ()-balanced string (with any other character interspersed in it).

With pure RE (in the theoretical sence), this is impossible, since we need at least a stack automaton (or a counting one) for this. Perl’s RE implementation (in version 5.14) allows recursion, so here it would look like \(([^()]*(?R))*[^()]*\) (the (?R) part recurses to the main pattern). Java does not support this.

Of course, often you will need to restrict the contents of your ()-balanced string a bit more than only “must be balanced and can contain anything else”, and then the Lua patterns won’t help you. Write a real parser (or use Perl’s recursive feature).

Likewise, the %1 feature (use captured group started by first () is not possible in pure RE, but most real life RE implementations have it, in the form \1 in PCRE (i.e. both Perl and Java).

Another feature used in the example regular expression above is the \b zero-width assertion. It matches at a border between word characters (letters, digits and similar characters) and non-word-characters (like space, punctuation, …).

Lua seems to have nothing like this, if reading only the manual. But when I pointed this out in my answer, jpjacobs mentioned the frontier pattern in a comment.

This is a generalization of the regular expression \b: %b takes a set (in []) as argument, and matches where there is a transition from not in set to in set. For example %f[%.%d] matches at the boundary between not part of a decimal digit to part of a decimal digit.

With this, I could create the expression %f[%.%d]%d*%.?%d*%f[^%.%d%]], which matches all decimal numbers which are preceded by something that is neither digit nor dot (or by nothing), and followed by something that is neither ] nor digit nor dot (or by nothing). It also matches the single dot, though - there seems to be no way to allow both .1 and 1. without allowing either the single dot or strings with two dots like .1..

Big numbers selfmade: Part 10/14: converting between arbitrary radix systems

This is a bit of theory for our next steps. I posted this first as an answer to a new question (also by frodosamoa) How Can I Convert Very Large Decimal Numbers to Binary In Java on Stack Overflow.


The conversion between different number systems in principle is a repeated “division, remainder, multiply, add” operation. Let’s look at an example:

We want to convert 123 from decimal to a base 3 number. What do we do?

  1. Take the remainder modulo 3 - prepend this digit to the result.
  2. Divide by 3.
  3. If the number is bigger than 0, continue with this number at step 1

So it looks like this:

  • 123 % 3 == 0. ⇒ The last digit is 0.
  • 123 / 3 == 41.
  • 41 % 3 == 2 ⇒ The second last digit is 2.
  • 41 / 3 == 13
  • 13 % 3 == 1 ⇒ The third digit is 1.
  • 13 / 3 == 4
  • 4 % 3 == 1 ⇒ The fourth digit is 1 again.
  • 4 / 3 == 1
  • 1 % 3 == 1 ⇒ The fifth digit is 1.

So, we have 11120 as the result.

The problem is that for this you need to have already some kind of division by 3 in decimal format, which is usually not the case if you don’t implement your number in a decimal-based format.

But it works for converting from your internal number format to any external format.


So, let’s look at how we would do the inverse calculation, from 11120 (base 3) to its decimal equivalent. (Base 3 is here the placeholder for an arbitrary radix, Base 10 the placeholder for your internal radix.) In principle, this number can be written as this:

1 * 3^4 + 1 * 3^3 + 1*3^2 + 2*3^1 + 0*3^0

A better way (faster to calculate) is this:

((((1 * 3) + 1 )*3 + 1 )*3 + 2)*3 + 0
    1
        3
             4
                12
                    13
                        39
                            41
                              123
                                  123

(This is known as Horner scheme, usually used for calculating values of polynomials.)

You can implement this in the number scheme you are implementing, if you know how to represent the input radix (and the digits) in your target system.


In the next parts we will apply this to our DecimalBigInt class.


0: introduction, 1: number representation, 2: conversion from decimal format, 3: decimal formatting, 4: addition, 5: multiplication, 6: factorials, 7: comparison, 8: normalizing, 9: equals/hashCode, 10: converting between arbitrary radix systems, 11: converting from arbitrary radix, 12: division by small numbers, 13: conversion to arbitrary radix, 14: missing bits - Original question - Full code

Big numbers selfmade: Part 1/14: Number Representation

First, we need a representation of the number, based on the data types Java gives us.

As the asker thought the decimal conversion is the most complicated part, let’s stay in a decimal based mode. For efficiency, we’ll store not single decimal digits, but work in radix (or base) 1 000 000 000 = 10^9 < 2^30 (This is one billion in US English, in German it is eine Milliarde, while eine Billion is 10^12 here. As the exact value does not really matter for most of the calculations here, I’ll most often simply speak about radix). This nicely fits in a Java int (up to 2^31 or 2^32), and the product of two such digits fits nicely in a Java long.

final static int RADIX = 1000000000;
final static int RADIX_DECIMAL_DIGITS = 9;

Then the digits-array:

private int[] digits;

Do we store the digits in little or big endian format, i.e. the most significant digits last or first? It does not really matter, so we decide on big endian since this is how humans want to read it. (We will later see that little endian format would have made some of the implementations prettier. If you do this again, try it with little endian.)

For now we concentrate on non-negative values - later we can add a sign bit for negative numbers.

For testing purposes, we add a constructor which allows initializing from such a int[].

/**
 * creates a DecimalBigInt based on an array of digits.
 * @param digits a list of digits, each between 0 (inclusive)
 *    and {@link #RADIX} (exclusive).
 * @throws IllegalArgumentException if any digit is out of range.
 */
public DecimalBigInt(int... digits) {
    for(int digit : digits) {
        if(digit < 0 ||  RADIX <= digit) {
            throw new IllegalArgumentException("digit " + digit +
                                               " out of range!");
        }
    }
    this.digits = digits.clone();
}

As a added bonus, this constructor is also usable for a single int (if smaller than RADIX), and even for no int (which we’ll interpret as 0). So, we now can do this:

DecimalBigInt d = new DecimalBigInt(7, 5, 2, 12345);
System.out.println(d);

This gives us de.fencing_game.paul.examples.DecimalBigInt@6af62373, not so useful. So, we add a toString() method:

/**
 * A simple string view for debugging purposes.
 * (Will be replaced later with a real decimal conversion.)
 */
public String toString() {
    return "Big" + Arrays.toString(digits);
}

The output is now Big[7, 5, 2, 12345], which is more useful for testing, isn’t it?


0: introduction, 1: number representation, 2: conversion from decimal format, 3: decimal formatting, 4: addition, 5: multiplication, 6: factorials, 7: comparison, 8: normalizing, 9: equals/hashCode, 10: converting between arbitrary radix systems, 11: converting from arbitrary radix, 12: division by small numbers, 13: conversion to arbitrary radix, 14: missing bits - Original question - Full code