Search or ask a question

Showing papers by "Jakob Uszkoreit published in 2012"

PDF

Open Access

Proceedings Article•

Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure

[...]

Oscar Täckström¹, Ryan McDonald², Jakob Uszkoreit²•Institutions (2)

Uppsala University¹, Google²

03 Jun 2012

TL;DR: It is shown that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%.

...read moreread less

Abstract: It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%.

...read moreread less

268 citations

Patent•

Virtual participant-based real-time translation and transcription system for audio and video teleconferences

[...]

Jakob Uszkoreit¹, Ashish Venugopal¹, Johan Schalkwyk¹, Joshua James Estelle¹•Institutions (1)

Google¹

30 Apr 2012

TL;DR: In this article, the authors describe a teleconferencing system that uses a virtual participant processor to translate language content of the teleconference into each participant's spoken language without additional user inputs.

...read moreread less

Abstract: The present disclosure describes a teleconferencing system that may use a virtual participant processor to translate language content of the teleconference into each participant's spoken language without additional user inputs. The virtual participant processor may connect to the teleconference as do the other participants. The virtual participant processor may intercept all text or audio data that was previously exchanged between the participants may now be intercepted by the virtual participant processor. Upon obtaining a partial or complete language recognition result or making a language preference determination, the virtual participant processor may call a translation engine appropriate for each of the participants. The virtual participant processor may send the resulting translation to a teleconference management processor. The teleconference management processor may deliver the respective translated text or audio data to the appropriate participant.

...read moreread less

21 citations

Proceedings Article•

A Feature-Rich Constituent Context Model for Grammar Induction

[...]

Dave Golland¹, John DeNero², Jakob Uszkoreit²•Institutions (2)

University of California, Berkeley¹, Google²

08 Jul 2012

TL;DR: LLCCM retains the simplicity of the original CCM but extends robustly to long sentences but outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.

...read moreread less

Abstract: We present LLCCM, a log-linear variant of the constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.

...read moreread less

15 citations