Hi Dick,

On Thu, Aug 23, 2018 at 6:26 PM, Richard Hudson <[log in to unmask]> wrote:

Hello again Linas. Yes, I'm sure you're right about neural and symbolic networks being two sides of the same coin, if by that you mean they're like a phonological and grammatical analysis of the same sentence:

Actually no. They're much closer to each other, than that. The metaphor would be that, with neural nets, someone took webster's dictionary, and ran it through some lossy compression algorithm: what you get out is an incomprehensible collection of bytes that somehow seem to "know" what's in the dictionary, including rudimentary syntax and semantics, but you cannot understand how it works, because its been run through a mixmaster. In the paper I attached, I am trying to explain what that mixmaster is actually doing, and how we can do something similar, but better, and preserve the overt syntactic structure, and keep it readable, directly accessible.

Or perhaps this metaphor: neural nets take the dictionary, shred it, but precisely, so that entries and definitions are not destroyed. They then weave it back together into a carpet. Weave is the wrong word, though: more like making felt out of wool. You can vaguely discern that it was once a dictionary, and both the syntactic and semantic knowledge is still encoded in there; but you just can't read it.

I've started reading your draft paper, which looks potentially interesting but which I can't really understand because of my lack of relevant expertise.

OK. If you have questions, ask.

When a child learns their language,

Well, yes, of course this is an old idea. Psychologists have estimated that children hear millions of sentences (well, utterances) by the age of two -- thousands a day -- so don't underestimate the size of that corpus. And its coupled to visual and tactile feedback, which is hard to emulate with computers.

twhich makes lots of passes through a corpus, looking for cooccurrence patterns between gradually expanding n-grams: first in 2-grams, then (building on what's been learned from 2-grams) in 3-grams, and so on?

Yes, this is Deniz Yuret's PhD thesis from about 1998.

And in selecting cooccurrences you could start with individual lexemes (girl + run) before gradually grouping them into larger sets (girl+run & boy+run > girl/boy+run).

Yes, this is exactly what we are doing. For some unclear reason, it seems that no one has done this before, even though it appears to be a blatantly obvious idea. Beats me. So far, results look good. Sometimes surprising (apparently, houses are like cities, at least when you use text from Project Gutenberg.)

Incidentally, I've been doing quite a lot of work recently with the wonderful Google n-gram data, and have been pleasantly surprised to find that it provides an option of looking for dependencies; so they've got the text from millions of books annotated in terms of dependencies. Wow!

Yes. Well, I think I know how they did that -- I believe they did it with supervised training. I'm trying to describe how to do it with unsupervised training. And a big part of what I'm saying is that N-grams already contain "most" of the syntactic structure in them, just not explicitly.

Neural nets take N-grams as input, and run a mixmaster on them such that the result does NOT wreck the syntactic structure, but does encode it in an opaque, incomprehensible way. My claim is that, by taking Yuret's work, and your idea of "grouping them into larger sets", that one can accomplish the same thing that neural nets do, but this time, making the lexemes visible, overt, instead of a blended mass.

--linas

Best wishes, Dick

Second: And's remark of symbolic vs. neural-net approaches. -- These are traditionally painted as being at odds with one-another, unreconcilable. I'm strating to get the inkling that this is not really true; that they are more like two different faces of the same coin. The attached paper, still in draft form, sketches how the word2vec neural-net (gradient descent) algorithms (CBOW and SkipGram) are not all that different from a DG model of language. I wasn't attempting to formally show this; I'm actually trying to do something different, but the similarity of the two kind-of falls out "naturally", if you look at it the right way.

The result is not really "new"; there's a related result, from 10 or 20 years ago, that states that matrix factorization if clustering are "the same thing", So, matrix factorization is NN-like in that you have some big pile of impenetrable numbers, whereas with clustering, you compare items pair-wise, in a "symbolic"-flavored way (dependency grammars are built on a foundation of pairwise relationships).

So my inkling is that NN-style methods, and symbolic methods can be reconciled, in a more general setting: in two steps: HPSG and DG are freely convertible, and DG and some NN models are convertible or at least comparable. Any assistance in firming this up greatly appreciated. If I'm insanely wrong, a delicately-worded explanation of exactly how would also be appreciated.

-- Linas

On Sat, Aug 18, 2018 at 8:22 AM, And Rosta <[log in to unmask]> wrote:

On Thu, 16 Aug 2018 at 13:00, Richard Hudson <[log in to unmask]> wrote:

You may like to know that I've just uploaded (to http://dickhudson.com/papers/#2018) two chapter drafts that I've written for handbooks about other theories:

HPSG and Dependency Grammar (arguing that HPSG would be better without the PS)

Arguments against the universality of determiners and Det (arguing that even English doesn't have determiners, and offering a semantic analysis of determiners which could be used in typological comparisons)

They're both in second (and I hope final) draft, but please feel free to share comments on this list.

Here are some comments on the general thrust of the HPSG/DG paper. The paper is of course a very cogent summary of the established WG position, so the comments pertain to some of the ideas.

1. Although the question of the basic combinatorial mechanisms of syntax is important, it is far less fundamental to the overall character of the theory than certain other questions:

A. Is there a set of rules constituting a structuralist langue? Or is there just a mass of memories of usage, in which a neural network finds patterns (but not necessarily any fundamental order and coherence)? Most linguisticians (including WG & HPSG) go for the first option, but there's a growing branch of cognitive linguistics favouring the second.

B. What is the relation between syntactic structure, phonological structure and semantic structure? Are they just different sorts of information on a single structure (as in my limited understanding of Minimalism)? Are all three structures wholly distinct from each other, and related by correspondence rules, as in Jackendoff's Parallel Architecture and its concomitant Simpler Syntax? Are there just Phonology=Syntax on the one hand, and Semantics on the other, as seems to underlie traditional and naive conceptions of syntax as a structure imposed on phonological words? Or is there Phonology on the one hand and Syntax=Semantics on the other (as in my view, or as in my limited understanding of Nanosyntax)?

2. PS--DS is not a simple dichotomy.

i. Are the paper's four criteria really the key criteria?

a. Containment: "if X is directly related to Y then X must contain Y or Y must contain X". But what does "contain" mean? Is it anything more than a metaphor, whose tenor is "be superordinate to" and whose vehicle is based on box notation? Or is it based on the idea that nodes are sets? (Even if the latter, is the idea that nodes are sets really so inimical to DG? I don't see any obvious advantages or disadvantages to holding that in DG, nodes ('words') are sets, provided that the sets have properties beyond their members.)

b. Continuity. A set-based conception of PS doesn't entail continuity. (But it makes it easy to state.)

c. Asymmetry. "in DS, but not in PS, a direct relation between two items must be asymmetrical, with one depending on the other (the head of the relation)." Given (a), Containment, isn't PS asymmetrical too? That is, the mother--daughter relation is asymmetrical. The sister--sister relation is not asymmetrical, but that goes for DG too -- two dependents of the same parent are not necessarily in an asymmetrical relationship.

d. Functions. "DS, but not PS, recognises subtypes of dependency, viz the traditional grammatical functions (e.g. ‘subject’) as distinct relations." I agree it's easy to model in DS. You can model it, more clunkily, in PS provided you have unary or binary branching and either exocentric phrases or else extra nodes like 'little v'. On the other hand, Subject and Complement arguably exhaust the subtypes of dependency that are actually needed, and X-bar PS has its similar Specifier--Complement distinction.

ii. endocentricity vs exocentricity. Depending on how you view it, DS is either radically endocentric, since all the properties of the head are properties of the phrase, or radically exocentric, since none of the properties of the mother are properties of the daughter(s).

iii. Arity of branching. It strikes me that, say, the WG structure for "typical French house" is closer to strictly binary branching PS than either is to a model in which a mother node can have more than two daughter nodes. Relatedly, if dependencies can be dependents (as in an analysis I at one time advocated, and as discussed in an interesting section in Kahane & Osborne's introduction to their translation of Tesniere's Elements of Syntax where K & O give their different interpretations of a part of T's system, K's interpretation being that T held that a dependency can be a dependent) then the result is in effect a binary branching phrase. Related to this, the paper says that in "What? Him hungry?", "him" is subject of "hungry"; by the same reasoning, "Him" would be subject of "the" in "What? Him the likely winner?" and "What? Him the colour of beetroot?", but it seems to me that the referent of the entire phrase does not isa the referent of "the" plus its complement, and hence that the entire phrase is either the subject dependency or an exocentric small clause.

3. Sec 4.1 talks about the taxonomy--partonomy distinction. I don't say the distinction is trivial, but the trees have the same (or sameish?) graph-theoretic properties, and how much difference is it going to make to the analyses -- or at least the tree structure -- one comes up with for a given construction? The difference is fundamental to the basic mechanisms of syntax, but not necessarily to choosing a broad area of theory-space, where more important considerations take priority.

4a. Coordination is not symmetric: the relationship between conjunct and conjunction is asymmetric, and within WG I long ago argued that the conjunction is the head of the coordination.

4b. I don't see how treating conjuncts as mere lists of words serves to build the compositional meaning of the coordination.

4c. Apparent cases of conjuncts such that not all parts of the conjunct belong to the same phrase (-- are not a 'catena' in the Osborne--Gross sense) are indeed a problem, but not an insoluble one (-- in some cases, being a conjunct does diagnose phrasehood; in other RNR-type cases (like example 9) it can be demonstrated that there's ellipsis-like shenanigans going on).

5. Mutual dependency. It is true that DS, in arrow notation, but not PS, allows for mutual dependency. But in "who lives nearby", "lives" is nothing like a complement of "who". If it were, then "whose mother's brother lives nearby" should instead be "who lives nearby's mother's brother". In, say, "I wonder which person lives nearby", the true complement of "which" will always precede the pseudocomplement. Rather, what constructions like this show us is that the superordinate--subordinate relationship is distinct from the arrowtail--arrowhead relationship. In "who lives nearby", "who" is superordinate and arrowhead and "lives" is subordinate and arrowtail, whereas in "She lives nearby", "she" is subordinate and arrowhead and "lives" is superordinate and arrowtail. The obvious way to diagram this in DS would be to use a stemma in which the branches bear arrowheads and arrowtails: sometimes the branches would point upwards and sometimes downwards. So we don't want a model that allows mutual dependency.

To be more precise, when there is a complement dependency, the arrowhead is always subordinate. It's only in the case of the subject/extractee relation that you get the dissociability of subordination and arrow direction.

--And.

To unsubscribe from the WORDGRAMMAR list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=WORDGRAMMAR&A=1

--

cassette tapes - analog TV - film cameras - you

To unsubscribe from the WORDGRAMMAR list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=WORDGRAMMAR&A=1

-- Richard Hudson (dickhudson.com)

Virus-free. www.avg.com
To unsubscribe from the WORDGRAMMAR list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=WORDGRAMMAR&A=1

cassette tapes - analog TV - film cameras - you