A concordance - concordância - finds all the sentences containing the search word or phrase, and lists them in a standard format - in this case, in order of occurrence in the corpus, with the search phrase emboldened. (Other programs allow more control over the layout of the output)
Extract from concordance of SEMPRE (14.1.2002)
É uma das mais antigas discotecas do Algarve, situada em Albufeira, que continua a manter os traços decorativos e as clientelas de sempre . e continua a manter os traços decorativos e as clientelas de sempre. É um pouco a versão de uma espécie de «outro lado» da n
E razão desta escolha é, obviamente, a progressão demente da Frente Nacional, que prospera sempre a apontar o imigrante como bode expiatório e simultaneamente como a fonte de todos males do povo francês.
Carrington fez sempre questão de salientar que as hipóteses de sucesso do cessar-fogo dependem sobretudo dos beligerantes .-
Com argumentos economicistas e de operacionalidade, o Executivo de Cavaco Silva sempre se escusou a concretizar o SIED, cujas competências foram, entretanto, transferidas para o SIM (Serviços de Informações Militares) , por via de um polémico acto administrativo do Governo, que assim chamava a si matérias da exclusiva competência da AR.
(To achieve an output of more than one sentence, it is necessary to
put punctuation marks in your search string.)
If you search for a number of different words simultaneously,
a word list will tell you exactly which words the search will find
(before you ask for all the details of every word!) Using
"casa.*" as your search string will give every word made up of "casa"
and one other letter, i.e. "casai", "casal", "casam",
"casar", "casas".
This output will be ordered by frequency
unless you indicate that it should be alphabetical.
words - sequences of words - sets of words - alternatives - exclusions - phrases - punctuation
The text-searching program is a device for locating strings, i.e.-sequences of characters in the corpus. It does not 'know' anything about Portuguese grammar or orthography, and very little about punctuation, and so has to be tricked into finding what the user really wants. A number of searching devices make this less difficult, though each of them incurs a risk of finding sequences other than the ones being searched for. In the examples following search strings are in RED and output forms are in PURPLE:
1. Words and parts of words. Typing in a single string will find all cases of that 'word', i.e. that string preceded and followed by a space or major punctuation marks . , ! « » : ; etc
mas finds , mas «mas mas,
Não finds Não. Não! «Não Não» «Não»
To search for a sequence of words, enclose each word in double quotation marks:
"não" "para" finds all phrases containing não para (but not phrases with just não or just para)
"não para" will get no result, as there is no word of that form; and não para will be rejected as obviously two words.
2. Sets of words
Various dummy characters and wild cards allow you to search for more
than one word at a time.
The full stop "." stands for any letter:
so
"." will find all one-letter words
"..." will find all three-letter words.
If you want to use the full stop literally, i.e. to indicate the character ".", you must precede it with "\" (the "escape character". (This applies to all characters with a special value in the search commands, including "?" and "*".)
".*" (known as a wildcard character) indicates any number of letters (including zero), and is used to find all words beginning or ending with a given sequence of characters:
cama.* finds cama, camas, camarins, camarária, camarária, camada, camarada, camaradagem, camarata etc.
.*ama.* finds Camara, camarada, Camarata, Samara, viamarense etc.
To instruct the program to look for any one of two or more characters at a given point in the string, place the chosen characters between square brackets:
[Tt]udo or [tT]udo will find Tudo and tudo
Paul[oa] will find Paulo and Paula
[Ff]al[oae] will find Falo, Fala, Fale, falo, fala, fale
To instruct the program to look for one or more sequences of letters, separate the sequences with the vertical line | :
Tudo|tudo will find Tudo and tudo
sim|não will find sim and não
"que|não" "[oa]"will find all phrases with que or não followed by a singular pronoun or article.
The question mark "?" is used to indicate an optional character (i.e. zero or one occurrences of the preceding character):
"quem?" will find que and quem
The asterisk "*" is indicates any number of occurrences (including zero) of the previous character. It is mainly used together with the full stop, but may be used with any character:
"1*" will find the numbers 1, 11, 111, 1111
These devices can be combined in more complex formulae:
"[oa]s" ".*r" will find sequences of plural object pronoun and infinitive ... os fazer ... etc
"[Hh].*" "d.*" will find examples of the haver de construction (with a lot of other things too...)
To limit the number of 'hits', you can exclude words from the search sequence, by including a word or formula in the frame [word!=...].
e.g. To search for the adverb sempre excluding the conjunction sempre que, enter
"[Ss]empre" [word!="que"]
To search for phrases containing two or more words, use
[] (i.e. square brackets with no text inside) to indicate any single word or punctuation mark, or
[] {0,5} (the same, followed by curly brackets containing digits, to indicate the maximum and minimum number of words in a given position.
e.g. "não" []{1,4} "nunca|nada" yields phrases like
Lusa pedindo anonimato, «não ajuda em nada a dissuasão dos assaltos
esclarece: não houve nada com o Porto
a mesma convicção: que não; nada se tinha passado
«Tudo calmo». «Nada; não se passou nada».
a algumas empresas, que não «vão ter nada a ver» com os actuais empregados da Ce
6. Punctuation
Punctuation marks can be searched for like words. To find the end of a sentence, use
"!|\?|\." i.e. exclamation mark, question mark or full stop (the escape character"\" preceding the question mark and the full stop ensures that they are interpreted as specific letters and not as in section 1 and section 2).
To find the end of a clause, use:
"!|\?|\.|,|;|:"
To find verbs with enclitic pronouns, search for
".*-[l|lh|m|n|s].s?" (it really works!)
To find the word "Porém" at the beginning of sentence, with the whole of the preceding sentence of, use:
: [] "\." "Porém".
1. Headwords
To search for all forms of a noun or verb without providing a complete list, you can use the "lema" tag which searches for lexical words or lemmas. Nouns are searched for by their singular form; adjectives, possessives, relative and interrogative pronouns by their masculine singular form; personal pronouns by their subject form; and verbs by their bare infinitive form.
[lema="livro"] will find livro, livros
[lema="bom"] will find bom, bons, boa, boas
[lema="ele"] will find ele, ela, o, a, lhe, but not eles, elas, os, as, lhes,
[lema="cantar"] will find all forms of the verb cantar
[lema="precisar"] ["de|dos?|das?"] will find most cases of the construction precisar de
All of the corpora are stored in an "annotated" form, with grammatical information attached to each word. The most useful of these are the "part of speech" or "pos" markers, which can be searched for independently or as part of the specification of a set of words.
The main Part of Speech tags are:
N Noun
V Verb
PERS personal pronoun
PRP Preposition
ADV adverb
ADJ Adjective
DET Determiner (articles, demonstratives)
K Conjunction
NUM Numeral
Subclasses of noun and verb are indicated by additional tags attached to the main tag by the underline symbol "_".
N_,
N_
N_
V_
ADJ_n adjective also used as noun
ADV_rel relative adverb
DET_arti indefinite article
DET_artd definite article
To get a better idea of the POS tags used, select "Distribuição
da categoria gramatical (PoS)" in the Resultados box of
AC/DC.
To search for parts of speech, include "[pos=" "] in the search formula.
[pos="N.*"] will find all nouns. (The search formula needs to use the wildcard character ".*", to allow for subcatrgories, unless these are specifically included).
[pos="DET.*"] [pos="N.*"] will find all sequences of determiner plus noun
"o?" [pos="N.*"] will find all sequences of o or os followed by nouns
[word="o?" & pos="DET.*"] will
find all cases of o or os which are determiners (rather than
pronouns)
[word="c|Compra" & pos="N.*"] will
find
all
cases
of
the
noun
compra
but no forms of the of the verb comprar.
Note that you have to use the word=".." format whenever you wish to
combine a letter search and a category search.
In addition to the pos tag, the corpora use several other indications of grammatical classification, which can be used in conjunction with other search categories
deriv searches for the words derived from a lexical base
temcagr identifies tense and mood values for verb forms and case for pronouns
pessnum identifies the number of nouns and adjectives, and the person and number for pronouns and verb forms
gen identifies the gender of nouns, adjectives and pronouns
func indicates the grammatical function of words and phrases
For detailed information on the use of these tags, consult the AC/DC website