Advanced Linguateca

Searching for constructions


1.  Log into Linguateca and search for the verb "haver". 

How many different constructions does it enter into?

How might you search for just one or other of them?


2.  Exclusions

To limit the number of hits, you can exclude words from the search sequence, by including a word or formula in the frame [word!=...].

e.g. To search for the adverb sempre excluding the conjunction sempre que, enter

"[Ss]empre" [word!="que"]

Try excluding haver de from the haver set.

Try excluding há... que  phrases

3. Phrases

To search for phrases containing two or more words, use

[] (i.e. square brackets with no text inside) to indicate any single word or punctuation mark, or

[] {0,5} (the same, followed by curly brackets containing digits, to indicate the maximum and minimum number of words in a given position.

4. Punctuation

Punctuation marks can be searched for like words. To find the end of a sentence, use

"!|\?|\." i.e. exclamation mark, question mark or full stop (the escape character"\" preceding the question mark and the full stop ensures that they are interpreted as specific letters and not as wild cards).

To find the end of a clause, use:

"!|\?|\.|,|;|:"

To find the word "Porém" at the beginning of sentence, with the whole of the preceding sentence, use:

: [] "\." "Porém".

How could you use this to find enclitic pronouns (i.e. pronouns added to the end of the verb form)?

5. Searching for parts of speech

To search for parts of speech, include "[pos=" "] in the search formula. (All of the corpora are stored in an "annotated" form, with grammatical information attached to each word. The most useful of these are the "part of speech" or "pos" markers, which can be searched for independently or as part of the specification of a set of words)

The main Part of Speech tags are:

N Noun

V Verb

PERS personal pronoun

PRP Preposition

ADV adverb

ADJ Adjective

DET Determiner (articles, demonstratives)

KS Subordinating Conjunction
KC Coordinating conjunction

NUM Numeral

Subclasses of noun and verb are indicated by additional tags attached to the main tag by the underline symbol "_".

N_prop          proper noun  (also PROP),

PERS_obj      object pronoun

PERS_refl       reflexive pronoun

V_n                nominal from of verb
V_fmc            finite form of verb

ADJ_n             adjective also used as noun

ADV_rel          relative adverb

DET_arti          indefinite article

DET_artd         definite article

To get a better idea of the POS tags used, select "Distribuição da categoria gramatical (PoS)" in the Resultados box of AC/DC.

[pos="N.*"] will find all nouns. (The search formula needs to use the wildcard character ".*", to allow for subcategories, unless these are specifically included).

[pos="DET.*"] [pos="N.*"] will find all sequences of determiner plus noun

"o?" [pos="N.*"] will find all sequences of o or os followed by nouns

[word="o?" & pos="DET.*"] will find all cases of o or os which are determiners (rather than pronouns)
[word="c|Compra" & pos="N.*"] will find all cases of the singular noun compra but no forms of the of the verb comprar.  (Compare the effects of using [lema="compra"]
Note that you have to use the word=".." format whenever you wish to combine a letter search and a category search. 

Challenges
    Find verbs taking "de" as their preposition
    Find when you can omit the article before possessives

6.  Searching for verb forms  

The temcagr tag gives a full classification of verb forms:

PR_IND presente do indicativo
INF infinitivo
GER gerúndio
IMPF_IND imperfeito do indicativo
PCP particípio passado
IMPF_SUBJ imperfeito do conjuntivo
FUT_IND futuro do indicativo
PR_SUBJ presente do conjuntivo
FUT_SUBJ futuro do conjuntivo
FUT_IND futuro do indicativo
COND condicional
MQP_IND mais que perfeito simples
PS_IND perfeito do indicativo
PS/MQP_IND perfeito ou mais que perfeito

Use this to find speciific forms of difficult verbs

7.  Getting information distribution information about slots in a phrase

The Distribuição command will give information about the first item in your search string.  To select any other position for analysis,  place the "@" sign before it.

e.g.
[lema="dizer"] "que" [pos="V.*"]   will analyse forms of dizer
[lema="dizer"] "que" @[pos="V.*"]  will analyse the following verbs

Use this to refine your searches in 5.

SRP
1.12.09