|
Current CSLU Research
Projects
CSLU
conducts a wide range of research projects, including
projects focused on core speech processing and natural language
processing algorithms (Technology Research Projects) and projects
focused on biomedical applications (Biomedical Research Projects),
specifically on creation of diagnostic, remedial, and assistive methods
for neurodevelopmental and neurodegenerative disorders and diseases.
TECHNOLOGY
RESEARCH PROJECTS
BIOMEDICAL
RESEARCH PROJECTS
Autism Projects
Other
Neurodevelopmental Disorders
Neurodegenerative
Disorders
I. TECHNOLOGY
RESEARCH PROJECTS
[Richard Sproat and Julia
Hirschberg
[Columbia U.], PI's; Owen Rambow
[Columbia U.], co-PI;
NSF]. The researchers are developing new theoretical models and
technology to automatically convert descriptive text into 3D scenes
representing the text's meaning. They do this via the Scenario-Based
Lexical Knowledge Resource (SBLR), a resource they are creating from
existing sources (PropBank, WordNet, FrameNet) and from automated
mining of Wikipedia and other un-annotated text. In addition to
predicate-argument structure and semantic roles, the SBLR includes
necessary roles, typical role fillers, contextual elements, and
activity poses which enables analysis of input sentences at a deep
level and assembly of appropriate elements from libraries of 3D objects
to depict the fuller scene implied by a sentence. For example, "Terry
ate breakfast" does not tell us where (kitchen, dining room,
restaurant) or what he ate (cereal, doughnut, or rice, umeboshi, and
natto). These elements must be supplied from knowledge about typical
role fillers appropriate for the information that is specified in the
input. Note that the SBLR has a component that varies by cultural
context.
Textually-generated 3D scenes will have a profound,
paradigm-shifting effect in human computer interaction, giving people
unskilled in graphical design the ability to directly express
intentions and constraints in natural language -- bypassing standard
low-level direct-manipulation techniques. This research will open up
the world of 3D scene creation to a much larger group of people and a
much wider set of applications. In particular, the research will target
middle-school age students who need to improve their communicative
skills, including those whose first language is not English or who have
learning difficulties: a field study in a New York after-school program
will test whether use of the system can improve literacy skills. The
technology also has the potential for interesting a more diverse
population in computer science at an early age, as interactions with
K-12 teachers have indicated.
- Efficient hidden
structure annotation via structural multiple-sequence alignments
[Brian Roark,
PI; NSF]. The focus of this project is to develop finite-state
syntactic processing models for natural language that use features
encoding global structural constraints derived through multiple
sequence alignment (MSA) techniques, to significantly improve accuracy
without expensive context-free inference. MSAs are widely used in
computational biology for building finite-state models that capture
long-distance dependencies in sequences (e.g., in RNA secondary
structure). Given a large set of functionally aligned sequences in MSA
format, finite-state models can be constructed that allow for the
efficient alignment of new sequences with the given MSA. In natural
language processing (NLP), only very rarely have MSA techniques been
used, and then to characterize phonetic or semantic similarity. This
project is exploring the definition of a purely syntactic functional
alignment between semantically unrelated strings from the same
language, to define a structural MSA for constructing finite-state
syntactic models. The project has two specific aims. The first aim is
to develop natural language sequence processing algorithms and models
that can: a) define sequence alignments with respect to syntactic
function; b) build structural MSAs based on defined functional
alignments; c) derive finite-state models to efficiently align new
sequences with the built MSA; and d) extract features from an alignment
with the MSA for improved sequence modeling. The second aim is to
empirically validate this approach within a number of large-scale text
processing applications in multiple domains and languages. The
resulting algorithms are expected to provide improved finite-state
natural language models that will contribute to the state-of-the-art in
critical text processing applications.
- Discriminative
Syntactic Language Modeling:
Automatic Feature Selection and Efficient Annotation
[Brian Roark,
PI; NSF].
The focus of this NSF_funded project is on the effective
use of parser-derived and tagger-derived features within discriminative
approaches to language modeling for automatic speech recognition.
Discriminative language modeling approaches provide a tremendous amount
of flexibility in defining features, but the size of the potential
parser-derived feature space requires efficient feature annotation and
selection algorithms. The project has four specific aims. The first aim
is to develop a set of efficient, general, and scalable syntactic
feature selection algorithms for use with various kinds of annotation
and several parameter estimation techniques. The second aim is to
develop general tree and grammar transformation algorithms designed to
preserve selected feature annotations yet lead to faster parsing or
even tagging approximations to parsing. The third aim is to evaluate a
broad range of feature selection and grammar transformation approaches
on a large vocabulary continuous speech recognition (LVCSR) task,
namely Switchboard. The final aim is to design and package the
algorithms to straightforwardly support future research into other
applications, such as machine translation (MT); and into other
languages, such as Chinese and Arabic. The algorithms developed as a
part of this project are expected to contribute to improvements in
LVCSR accuracy and applications that rely upon this technology. The
algorithms are being packaged into a publicly available software
library, enabling researchers working in many application areas --
including LVCSR and MT -- and various languages to investigate best
practices in syntactic language modeling for their specific task,
without having to hand-select and evaluate feature sets.
- Learning
Mixed-Initiative Dialogue Strategies
[Peter Heeman, PI; NSF].
This research project enables next generation dialogue systems to be
able to collaborate with a user without the limitations of
system-initiative interaction, in order to solve complex tasks in an
optimal manner. The research develops reinforcement learning (RL)
strategies to learn dialogue policies that are mixed-initiative. The
specific aims of this are to (a) extend RL to mixed-initiative dialogue
interaction; (b) allow the system policy to adapt to different user
types, such as people with poor memory, or poor problem-solving skills;
and (c) simultaneously learn the policy for the simulated user.
This approach will allow more advanced dialogue
systems to be deployed, such as assisting the elderly so they can live
independently longer, and helping provide health care information to
rural areas. The proposed research project will result in a toolkit
that will allow a wide range of users to easily develop dialogue
policies. The toolkit will (a) allow students to be effectively trained
in this area, (b) lower the barrier for other researchers to contribute
to the field, and (c) help transfer this new technology to industry.
- High-Quality
Compression, Enhancement, and Personalization of Text-to-Speech Voices
[Alexander Kain, PI; Todd
Leen, co-PI;
NSF]. The vast variability of the human speech signal
remains a central challenge for Text-to-Speech (TTS) systems. The
objective of this research is to develop TTS technologies that focus on
elimination of concatenation errors, and accurate speech modifications
in the areas of coarticulation, degree of articulation, prosodic
effects, and speaker characteristics. The investigators are exploring
an asynchronous interpolation model (AIM), which promises to provide
for high-quality and flexible TTS. The core idea of AIM is to represent
a short region of speech as a composition of several types of features
called streams. Each stream is computed by asynchronous interpolation
of basis vectors. Each basis vector is associated with a
particular phoneme, allophone, or more specialized unit. Thus, the
speech region is described by the varying degrees of influence of
several types of preceding and following acoustic features. Using AIM,
the investigators are also developing methods to optimally compress the
acoustic inventories of TTS systems, given a size or a quality
constraint, and to adapt the system to a new voice, given a few
training samples. The system being researched forms a hybrid between
traditional concatenative and formant-based synthesis, having
advantages of both, resulting in a high-quality, optimized TTS system
with voice adaptation capabilities. TTS has generally recognized
societal benefits for universal access, education, and information
access by voice. Our research will make it possible, for example, to
build personalized TTS systems for individuals with speech disorders
who can only intermittently produce normal speech sounds.
- Multi-Threaded
Dialogues For Real-Time
Applications
[Peter Heeman, PI; NSF].
The
goal
of this NSF-funded project is to create a speech interface that
supports a user in interacting with multiple real-time devices at the
same time, where the interaction with each device is a separate
dialogue thread. The first aim is to show, using a human-computer
study, that the simple way to implement a speech interface for managing
multiple threads is not effective. The second aim is to run a
human-human study to show that people can inherently manage multiple
dialogue threads, and to determine what conventions they use. The third
aim is to build a speech interface that implements the conventions that
were found.
The main impact of this work is the development of a
model that accounts for how people deal with multi-threaded dialogues.
This model will be demonstrated in an implemented speech interface.
This work will create a technology that will be useful in interacting
with the pervasive electronic devices that we can expect to see in the
future.
- Small
Footprint Speech Synthesis
This
NSF
Small Business Technology Transfer Phase I project is led by Alexander Kain at BioSpeech Inc., a CSLU startup,
and Jan van Santen.
The project aims to develop and implement a new algorithm in the
area of text-to-speech synthesis (TTS) that will lead to (i) dramatic
decreases in disk and memory requirements at a given speech quality
level and (ii) minimization of the amount of voice recordings needed to
create a new synthetic voice. Most current TTS systems operate by
concatenating segments of recorded speech ([acoustic] units). A
challenge for TTS is coarticulation: The dependency of the acoustic
manifestations of a phoneme on its neighbors. Current TTS systems use
multi-phone acoustic units such as diphones, which preserve
coarticulatory patterns naturally present in speech. However, this
approach requires a large amount of recordings and generates systems
with large footprints. BioSpeech proposes a uniphone approach that
addresses coarticulation processes with an explicit model. The method
uses complex spectral vectors (basis vectors) representing brief
segments of speech inside single phonemes, and decomposes these into
two components: A formant vector and a spectral balance vector. To
generate speech, the formant and spectral balance vectors derived from
the basis vectors corresponding to successive phonemes are subjected to
separate--and hence generally asynchronous--interpolation operations
using time varying weights; the formant and spectral balance vector
trajectories thus created are re-combined to create a trajectory in
complex spectral space; finally, this trajectory is converted into
output speech with the inverse Fourier transform. Asynchronicity is
necessitated by the quasi-independence of articulators underlying
different spectral features (e.g., frication, formant frequencies). The
proposed work has implications for other speech technologies, including
Automatic Speech Recognition (ASR). Current ASR technologies address
coarticulation by using multi-phone units, typical triphones. The
number of triphones in English is over 70,000, and thus requires a
large amount of training recordings. The proposed model could
dramatically impact on the amount of recordings required for system
training. Second, TTS has generally recognized societal benefits for
universal access, education, and information access by voice. For
example, TTS-based augmentative devices are available for individuals
who have lost their voice; and reading machines for the blind have been
available for several decades. Third, the approach will make
higher-quality TTS more available for smaller devices. For example,
voice based caller ID on low-end mobile telephones is currently not
possible due to memory limitations. Fourth, it enables voice adaptation
with a minimum of recordings. This will enable building personalized
TTS systems for individuals with speech disorders who can only
intermittently produce normal speech sounds or for individuals who are
about to undergo surgery that will irreversibly alter their speech. The
method proffered by BioSpeech only requires recordings of valid samples
of each of (less than 50) phonemes instead of each of (2000 or more)
diphones.
- Objective
Methods for Predicting and Optimizing Synthetic Speech Quality
[Jan
van Santen,
PI]. This NSF-funded project focuses on how humans
perceive acoustic
discontinuities in speech. Current text-to-speech synthesis ("TTS")
technology operates by retrieving intervals of stored digitized
speech("units") from a database and splicing ("concatenating")
them to form the output utterance. Unavoidably, there are acoustic
discontinuities at the time points where the successive speech
intervals meet. An unsolved problem is how to predict from the
quantitative, acoustic properties of two to-be-concatenated units
whether humans will hear a discontinuity. This is of immediate
relevance for TTS systems that select units at run time from a large
speech corpus. During selection, the systems search through the space
of all possible sequences of units that can be used for the utterance
and selects the sequence that has the lowest overall objective cost
measure, such as the Euclidean distance between the final frame and
initial frame of two units. However, research has already shown that
this method and related methods do not predict well whether humans will
hear a discontinuity. The current research, by being explicitly focused
on perceptually optimized objective cost measures, will directly
contribute to the perceptual accuracy of cost measures and hence to
synthesis quality.
- Prosody
Generation for Child Oriented Speech Synthesis
[Jan
van Santen,
PI]. This NSF-funded project [joint with Alan Black at Carnegie
Mellon University and Richard
Sproat at
the University of Illinois at Urbana-Champaign / now at CSLU]
focuses
on innovative algorithms for generating highly expressive synthetic
speech. Generating expressive speech involves three hard research
problems. (i) Computation of abstract tags that specify, e.g., which
words need emphasis, and phrasing (e.g., where to pause). (ii) Based on
these tags, the system has to compute a fundamental frequency contour.
(iii) Severe modification of the stored speech fragments
("acoustic units") to obtain these contours. The central goal of the
project is to address these research problems, and create a TTS system
that will make the next generation of TTS based language
remediation systems viable.
- Creating
the Next Generation of Intelligent Animated Conversational Agents
The
goal of this NSF-funded project [Jan
van Santen,
co-PI; Ron
Cole (PI) at the
University of Colorado and Javier Movellan, co-PI, at the University of
California at San Diego] is to
improve reading achievement of children with reading problems by
designing computer-based interactive reading tutors that incorporate
new speech and language technologies. The reading tutors will help
English- and Spanish-speaking children learn to read by providing
classroom teachers and reading specialists with tools to instruct and
exercise the set of auditory, visual and linguistic skills needed to
read, speech discrimination, speech production, phonological awareness,
sound-to-letter mappings, vocabulary, fluency and comprehension. The
tutors will be designed, tested and refined in collaboration with
reading specialists and instructional designers, and tested with
children in special education programs in elementary schools in Boulder
Colorado.
BIOMEDICAL
RESEARCH
PROJECTS
AUTISM
- Expressive
and Receptive Prosody in Autism
This NIH-supported
project, led by Jan
van Santen and Lois
Black, and in collaboration with Rhea Paul
and Fred
Volkmar at Yale's Child Study Center
and Larry
Shriberg at the University of Wisconsin's Waisman Center,
focuses on automated technologies for assessment of prosodic ability in
autism. Autistic Spectrum Disorders (ASD) form a group of
neuropsychiatric conditions whose core behavioral features include
impairments in reciprocal social interaction, in communication, and
repetitive, stereotyped, or restricted interests and behaviors. The
importance of prosodic deficits in the adaptive communicative
competence of speakers with ASD, as well as for a fuller understanding
of the social disabilities central to these disorders is generally
recognized; yet current studies are few in number and have significant
methodological limitations. The objective of the proposed project is to
detail prosodic deficits in young speakers with ASD through a series of
experiments that address these disabilities and related areas of
function. Key features of the project include: 1) the application of
innovative technology. The study will apply computer-based speech and
language technologies for quantifying expressive prosody, for computing
dialogue structure, and for generating acoustically controlled speech
stimuli for measuring receptive prosody; moreover, all experiments will
be delivered via computer to insure consistency of stimuli and accuracy
of recording responses; 2) broad coverage of the dimensions of prosody.
All three functions of prosody, grammatical, pragmatic, and affective,
will be addressed; expressive and receptive tasks are included; and
both contextualized tasks (dialogue, story comprehension and memory)
and decontextualized tasks (e.g., vocal affect recognition) will be
used; 3) inclusion of neuropsychological assessment and classification
methodologies to address within-group heterogeneity and obtain a
detailed characterization of the groups; 4) inclusion of two comparison
groups: children with typical development and those with Developmental
Language Disorder; 5) inclusion of an experimental treatment program to
enhance the prosodic abilities of speakers with ASD. A student
fellowship for this project is supported by Autism
Speaks. The software architecture is designed and implemented
by Senior Programmer Jacques de Villiers.
- Expressive crossmodal affect integration in
autism
[ Lois Black, PI; Jan
van Santen, Alexander
Kain, Esther Klabbers,
Zak Shafran,
Investigators; NIH]. Children with autism spectrum disorder (ASD) have
often been observed to express affect either weakly, only in one
modality at a time (e.g., choice of words) or in multiple modalities
but not in a coordinated fashion. These difficulties in crossmodal
integration of affect expression may have roots in certain global
characteristics of brain structure in autism, specifically atypical
interconnectivity between brain areas. Poor crossmodal integration of
affect expression may also play a critical role in communication
difficulties that are well documented in ASD. Not understanding how
e.g., facial expression can be used to modify the interpretation of
words undermines social reciprocity. Impairment in crossmodal
integration of affect is thus a potentially powerful explanatory
concept in ASD.
The study will provide much needed data on
expressive
crossmodal integration impairment in ASD and its association with
receptive croosmodal integration impairment, using innovative
technologies to create stimuli for a judgmental procedure that makes
possible independent assessment of the individual modalities. These
technologies are critical because human observers are not able to
selectively filter out modalities. In addition, the vocal measures and
the audiovisual database lay the essential groundwork for the next
step: Creation of audiovisual analysis methods for automated assessment
of expressive crossmodal integration.
These methods will be applied to
audio-visual recordings of a structured play situation; the child will
participate in this play situation twice, once with an examiner and
once with a caregiver. This procedure for measuring expressive
crossmodal integration will be complemented by a procedure for
measuring crossmodal integration of affect processing using dynamic
talking-face stimuli in which the audio and video stream are recombined
(preserving perfect synchrony of the facial and vocal channels) to
create stimuli with congruent vs. incongruent affect expression. Both
procedures will be applied to three groups: Children with ASD, children
with Developmental Language Disorder (DLD), and typically developing
children; ages will be six to ten.
Our study would be the first to
perform a comprehensive analysis of crossmodal integration of affect
expression in ASD. If the study confirms the existence of these
impairments in ASD, and provides a detailed picture of these
impairments, this could (i) guide brain studies to specifically target
areas responsible for affect expression; (ii) provide a deeper
understanding of impairments in social reciprocity; and (iii) help
design remedial programs for intensive training of under-used or
incoordinated expressive modalities. The study thus contributes to
etiology diagnosis, and treatment.
- Automatic detection of atypical patterns in
crossmodal affect
[Jan van Santen,
PI; Lois Black, Alexander Kain, and Zak Shafran, Co-PI's; NSF].
The
expression of affect in face-to-face situations requires the ability to
generate a complex, coordinated, crossmodal affective signal, in
gesture, facial expression, vocal prosody, and language content
modalities. This ability is compromised in certain neurological
disorders (e.g., Parkinson's disease, autism spectrum disorder
(ASD)).
Our long term goal is to build interactive, agent based systems for
remediation of poor affect communication and diagnosis of the
underlying neurological disorders based on analysis of affective
signals. A requirement for such systems is technology to detect
atypical patterns in affective signals. Our immediate-term research
objective is to develop this technology. Specific aims are (i) to
collect and annotate audio-visual data in a play situation designed for
eliciting affect from children with typical development (TD) and
children with ASD; and (ii) to develop algorithms for the analysis of
affective incongruity and evaluate their TD vs. ASD differentiation
ability.
- A Computerized Interactive Game for Remediation
of Prosody in Children with Autism
[Lois Black; Autism
Speaks]. The proposed
project focuses on computer-assisted remediation of expressive and
receptive prosody in children with autism spectrum disorders (ASD).
Prosody refers to loudness, pitch, timing, melody, and other aspects of
speech that illuminate the different meanings of what is verbally
communicated. Prosody plays a critical role in an individual's
communicative competence and social emotional reciprocity. The ability
to appropriately understand and express prosody may, in fact, be an
integral part of the theory of mind deficits considered central to ASD,
and play a role in the reported lack of ability to make inferences
about others' intentions, desires, and feelings. Yet, prosody is an
under-explored feature of ASD, both in diagnosis and intervention. The
goal of the proposed study is to create a new computer-assisted prosody
remediation program and evaluate its efficacy in improving expressive
and receptive prosody in children with ASD who have known prosody
impairments. The program consists of an interactive, computerized
"drama book" that contains a collection of videotaped social scenarios,
each consisting of a series of interrelated scenes dramatically enacted
by child and adult actors. A scenario will open with one scene, and the
next scenes will occur based on what and how the child with ASD,
speaking on behalf of an on-screen targeted child, communicates to the
other characters. Therapist-assisted, the child with ASD will be able
to practice different things to say -- and so experience the power of what and how he speaks as able to
change
the course of events.
- Detection of Autism
in Infants
[Jan van Santen, Lois Black,
Co-PI's; the OHSU Foundation]. The project goal is to
develop, demonstrate, and validate an automated, objective system for
detecting early warning signs of autism in infants. The approach
is non-invasive and uses an in-home system comprising low-cost, off-the
shelf equipment in the form of microphones, video cameras, and
accelerometers. Data generated by the system
are transmitted via internet protocol to a central processing facility
where innovative algorithms -- which will be the core
contribution of
the proposed study -- extract diagnostic profiles. Unlike
current
diagnosis and detection of autism, which relies on behavioral
assessment and subjective clinical
judgment along with parent questionnaires, these diagnostic profiles
are
objective and based on sophisticated computer analysis of voice and
movement patterns and hence are expected to be more reliable, accurate,
and information-rich.
The prototype system exemplifies an exciting new
telemedicine model that may be applicable to a broad range of both
neurodevelopmental disorders in addition to autism (e.g., ADHD, child
bipolar
disorder, ...) as well as neurodegenerative disorders
(e.g., Parkinson's,
Alzheimer's, ALS, ...). By replacing
expensive direct clinical
observation with automated data collection, and by providing the
experts with highly informative and accurate diagnostic profiles,
significant cost savings and simultaneous increases in diagnostic
accuracy and accessibility can be expected.
Equally exciting about this project is the wealth of
data and the powerful algorithms it will create, which will provide
leverage for several future research studies on autism that in turn
will lead to new generations of methods for diagnosis and intervention.
- In
Your Own Voice: Personal Augmentative and
Alternative Communication Voices for Minimally Verbal Children with
Autism Spectrum Disorders
[ Jan van Santen,
Lois Black,
PI's; Alexander Kain,
Esther
Klabbers, Investigators; Nancy
Lurie Marks Family
Foundation]. Many children with autism who have limited
verbal abilities use Augmentative and Alternative Communication (AAC)
devices to help them communicate with others. Often, these devices
produce speech output. Necessarily, the voice of such a system does not
resemble in any way the voice of the child who uses the system. This
project is for children who have at least some speech capability, such
as saying a few isolated words. The investigators will develop
technology that performs a voice transplant of the child's natural
voice onto the AAC device, so that the device's voice will sound like
the child. The investigators hypothesize that an AAC device with a
personalized voice that mimics the child's voice will psychologically
reinforce powerful motivational factors and a sense of owness for
communication so that the frequency and richness of AAC use, and its
acceptance by family members and friends, will be enhanced. In
addition, as a tool for improving a child's speech capabilities, a
system that speaks with a voice similar to the child's own voice is
likely to be more effective than a system that speaks with a default
synthetic voice because the computer provides a model that is closer to
the child's speech and hence is easier to emulate by the child. To
create the system, the investigators will build on the most recent
voice transformation, speech synthesis, and other speech technologies
that have been developed by our group.
- Automated
Measurement of Dialogue Structure in
Autism
[ Brian Roark, Lois Black, Jan van Santen,
AutismSpeaks]. This
project seeks to bring the power of machine-based sensing and
computation to improve the study of speech patterns in individuals with
autism. By combining technologies stemming from natural language
processing methods and prosodic analysis methods, the study expects to
find
aspects of speech that could be used as clinical markers. Current
manual methods for measuring narrative coherence are not only difficult
to obtain and extremely time consuming but it is unclear whether the
human coder can even detect the statistical degree of semantic
similarity and organization as the machine can. This research will
analyze recordings
being collected from two narrative recall tests that have the potential
to uncover a wider range of speech differences between ASD and others.
The hope is that this will clinically define children with ASD relative
to typically developing children and differentiate ASD from other
groups who also have communication impairments, i.e., children with
developmental language delay (DLD), as well as differentiate speech
characteristics or markers that might better discriminate subtypes
within the ASD umbrella (e.g., HFA vs. Asperger's). We expect that
speech and language technologies will not only make critical diagnostic
speech features easier to document but also may actually uncover
distinguishing speech features in autism and autistic subtypes that
have previously gone undetected.
- ERP Based Communication Device for Nonverbal
Children on the Autism Spectrum
[ Deniz
Erdogmus, Lois Black,
PI's; Brian Roark,
Jan van
Santen, Investigators; Nancy Lurie Marks
Family Foundation]. Children with Autism Spectrum Disorders
(ASD) exhibit varying levels of communication abilities. In this
project, the investigators will address the communication needs of the
subset that: 1) lack expressive speech and language; 2) lack ability to
operate a keyboard, pointing device, or other typical assistive
interface; and 3) are assumed to have adequate cognition, literacy, and
receptive language understanding. This research aims to develop a
communication system for such children. Resulting technology could also
benefit other children and adults with adequate cognition but limited
communication options. The investigators will develop an assistive
communication facilitation device referred to as the RSVP Keyboard. It
unites three technologies: 1) Rapid serial visual presentation (RSVP,
with individually adjustable presentation rates) of
letters/words/phrases; 2) a yes/no intent detection mechanism based on
detecting evoked-response potentials (ERP) in the brain to determine
which target letter or letters the child wants to convey; 3) a
statistical language model based dynamic sequencing optimization
procedure that computes which letter needs to be presented next to take
advantage of regularities in language. The system will operate by
showing the sequence of candidate letters on the screen as well as
previously typed text, such that words and phrases are formed naturally
by adding selected letters. The first goal is to test the viability of
the basic concept of facilitated communication through the RSVP
Keyboard System. Upon demonstration of feasibility through neuroimaging
and statistical analysis of brain responses to RSVP stimuli sequences,
the investigators will evaluate performances of typically developing
children and nonverbal children with ASD in three interactive cognitive
tasks.
- Differentiating between Autism Spectrum Disorder
and Developmental Language Disorders via Story Recall Analysis
[ Brian Roark, PI,
Medical Research Foundation of Oregon]. The analysis of elicited spoken
language samples plays a key role in the diagnosis of a wide
range of linguistic and cognitive impairments, from developmental
impairments, such as Developmental Language Disorders (DLD) or Autism
Spectrum Disorder (ASD), to degenerative cognitive impairments, such as
dementia. Perhaps the most popular means of eliciting such
a sample is through a narrative recall task, where the subject is told
a story of sufficient length to preclude verbatim recall, and then
asked, either immediately or after some delay, to retell the story they
have been told. Most clinical uses of such tests involve a very
simple scoring mechanism, in which the recall of specific items in the
story is noted by the administering clinician (as the story is being
re-told), and summary scores are calculated based on the number of
these recalled items. The resulting summary score fails to
capture much of the potentially relevant information available in the
spoken language sample, e.g., grammatical complexity, pause frequency,
or the ordering of recalled items. The long-term objective of the
proposed work is to identify multiple complex markers, derived
from open and cued responses to narrative recall tasks, for
differentiating between: (1) children broadly diagnosed with ASD; (2)
children broadly diagnosed with DLD; and (3) normally developing
children. In the proposed study, narrative retellings produced by
a relatively limited number of children will be analyzed for the
feasibility of automatically extracting markers from the spoken
language samples to effectively discriminate between the three groups.
OTHER
NEURODEVELOPMENTAL DISORDERS
- Investigating the Diagnostic Utility of
Spontaneous Measures of Language
[ Amy Costanza-Smith (Child
Development and Rehabilitation Center, OHSU), mentored by Lois Black and Jan van Santen;
Medical Research Foundation of Oregon]. A language disorder is an
impairment in communication characterized by poor grammar, poor
vocabulary and/or poor social use of language. Language disorders
affect nearly 4 million school-age children and put them at risk for
further learning disabilities. These disorders are typically diagnosed
via standardized assessments (psychometric tests) that bear little
resemblance to real-life communication. Collections of real-life
communication, called spontaneous language samples, are also used to
describe language abilities. Language samples are rarely used in
diagnosis, however, due to the time it takes to transcribe and analyze
them. However, language samples provide a rich context to assess
a child’s language and often give more accurate information than
standardized assessment.
The purpose of this project is to determine the
diagnostic ability of spontaneous language measures (e.g. vocabulary,
grammar) to differentiate between children with
language disorders and typically developing children. It is
hypothesized that the real-life richness of spontaneous language
samples will provide adequate information to diagnose language
disorders. The results of this project will move toward the ultimate
goal of developing new markers of language disorder, capitalizing on
recent advances in technology to develop automated scoring procedures.
These results have broader implications for the study of human
communication disorders including adult onset disorders such as
aphasia, dementia, and Parkinson’s disease.
This project uses data currently being collected in a larger NIH-funded
project on autism and developmental language disorders in
children.
- Computer assisted disfluency counts for stuttered
speech - Phase I
[ Peter Heeman, NIH; joint
with BioSpeech Inc.].
Stuttering is a communication disorder
characterized by disfluencies that are frequent and disruptive to
communication. Clinicians extensively use disfluency counts to decide
whether a client should be treated, to assess treatment progress, and
to document treatment outcomes. Clinicians often do disfluency counts
in real-time as a speaker is talking. However, these are not very
specific, and cannot be re-examined. Clinicians can also use a verbatim
transcription approach, in which they first transcribe exactly what was
said, and then mark up the transcript with disfluency codes. This
method allows more detailed and accurate counts to be obtained. The
long term objective of this project is to build a computer tool that
will assist clinicians in performing disfluency counts, both real-time
and transcript based. The tool will allow both richer use of these
counts, and, in the case of transcript-based counts, much less effort
to create the counts. In fact, the amount of time should be reduced
enough to enable transcript-based counts to be used in clinical
practice.
The goal of this Phase I-STTR is to demonstrate the
feasibility of a computer tool to assist users in performing both
real-time and transcript- based disfluency counts. For real-time
counts, we will show that the tool achieves the same reliability and
user acceptance as the pencil-and-paper method. We will also
investigate whether the real-time counts can be re- examined and
corrected (unlike the pencil-and-paper counts). For the
transcript-based counts, we will show that the tool, for at least
read-speech samples, allows the counts to be done substantially faster
and with better reliability than current approaches. This will be
achieved by incorporating an Automatic Speech Recognizer (ASR) that
will use the story text to assist in transcribing the speech; and by
incorporating a powerful user interface that allows the clinician to
easily review and correct the automatic transcription. In Phase II, we
will demonstrate the increased utility of disfluency counts due to them
being stored in a computer file and time-aligned to the audio signal.
We will extend the tool so that it can compare disfluency counts across
multiple audio files. This will help clinicians better see the impact
of their treatment over a period of time on the client's disfluency
patterns. We will also demonstrate that the tool can assist clinicians
with transcript- based counts of spontaneous speech. Again, we will
incorporate an ASR to assist in the transcription process, and we will
show that the tool allows transcript-based counts to be performed in
substantially less time than current approaches. Furthermore, we will
have a several clinicians use the tool over a period of several months
with clients. This will demonstrate both the usefulness and
practicality of the overall tool, and allow us to determine how to
improve and augment it to best suit clinical needs.
- Novel Computerized
Behavioral Assessment Methods
for Attention Deficit Hyperactivity Disorder.
This
internally funded exploratory project, conducted by Lois Black,
Holly Jimison, Leeza Maron, Misha Pavel, and Jan van Santen (PI),
focuses on building a computerized assessment system that has
these features.
- A clear understanding of which neuropsychological
functions are measured.
- Interactivity (the computer adapts its behavior
instantly to the subjects' responses, thereby being able to operate
at a level of optimal sensitivity).
- Instantaneous and timed measurement of a range of
behavioral responses including the force dynamics of button pushing and
eye movements.
- Mathematical modeling of the underlying cognitive
processes in order to derive "purer" measures of the
neuropsychological functions.
- A more motivating and shorter assessment process.
- Pilot Study for Word Recognition of Children
with Speech Delay
John-Paul Hosom , PI,
Medical Research Foundation of Oregon. Children with speech delay
of unknown origin (hereafter referred to as "speech delay") are
characterized by a number of language problems, including reduced
vocabulary size, atypical grammar, and highly unintelligible speech.
The long-term objective of the proposed research is to enable children
with speech delay to communicate more effectively. This proposal
presents only the first step in realizing this long-term objective. In
this first step, speech data from a limited number of children with
speech delay will be analyzed to evaluate the feasibility of
automatically identifying acoustic features in the speech signal that
may be used to identify intended phonemes. The hypothesis of the
proposed research is that there are correlations between intended
phonemes and certain acoustic features of children with speech delay,
when the intended phoneme is not the same as the phoneme actually
spoken. Such correlations could then be used to assist in the automatic
word recognition of an intended utterance.
NEURODEGENERATIVE
DISORDERS
- Quantitative
Modeling of Segmental Timing in Dysarthria
[ Jan van Santen
and Kris
Tjaden [University at Buffalo], PI's; NIH]. Quantitative, acoustic
models of segmental timing in spoken English, such as have been
developed for text-to-speech synthesis (TTS), acknowledge that segment
durations in connected speech reflect the combined influence of
systematic factors as well as nonsystematic or random factors.
Systematic Variability in segment durations reflects factors such as
context, stress, speaking style or register, and cognitive load.
Segment durations also reflect within-speaker variability termed
"Random Variability“ that cannot be attributed to any of these
systematic factors. An individual talker's speech
duration
patterns therefore can be mathematically characterized in terms of the
magnitude of the effects of each systematic factor (e.g., amount of
lengthening associated with word stress), as well as in terms of the
relative and absolute amounts of systematic and random
variability. Importantly, this powerful modeling framework can be
applied to meaningful sentence productions, and is capable of isolating
the effects of individual systematic factors without requiring the use
of artificial speech materials. This approach to quantitatively
modeling segmental timing in TTS has further proven crucial for
successfully synthesizing intelligible, natural-sounding speech.
Given the importance of this modeling framework for
generating high quality speech synthesis, it is surprising that similar
modeling efforts have not been applied to dysarthria as a means of
understanding the source of reduced intelligibility and naturalness in
this speech disorder. Aberrancies in the temporal patterning of
speech are ubiquitous in most persons with dysarthria, and the
contribution of speech duration variables to intelligibility and
naturalness is suggested in a variety of studies. The approach
used in many existing studies is to document whether speech durations
in dysarthria are - on average - atypically short, long or variable as
compared to normal speech. The TTS modeling framework described
above, however, goes beyond this type of simple description to identify
the relative contribution of specific systematic factors influencing
segment durations for an individual speaker as well as the combined
relative and absolute contributions of systematic and random factors to
segmental timing for that individual. The TTS modeling framework
further allows model parameters for an individual speaker to be
manipulated via speech synthesis to determine the impact on
intelligibility and naturalness. The proposed exploratory project
seeks to apply such a quantitative modeling framework to segment
durations in sentences produced by speakers with a variety of
neurological diagnoses and dysarthrias. The perceptual relevance
of model parameters will be further studied via speech resynthesis to
determine their impact on judgments of intelligibility and naturalness.
- Measuring Spoken
Language Variability in Elderly Individuals
[ Brian Roark,
PI; John-Paul Hosom,
Susan
Kemper [University of Kansas], Diane
Howieson,
co-PI's; NSF]. The focus of this project is to develop techniques to
objectively (automatically) measure spoken language variability and
change in aging. Many of the most effective methods for cognitive
assessment are mediated by observed behavior, particularly spoken
language production. These include clinical instruments, e.g., the Mini
Mental Status Examination (MMSE), but also less formal assessments
involving interviews or dialogs with physicians or even friends and
family. Behavioral changes noted through these spoken language
interactions could indicate pathological changes associated with a
disorder; or the changes may be transient, due to missing medication or
depression at the time of assessment. Alternatively, the observed
behavior may be simply due to normal change in spoken language due to
aging, or even within the range of natural behavioral variation.
Understanding normal versus pathological language change with age
requires the collection and annotation of repeated samples from both
healthy and impaired individuals. This project has three specific aims:
1) to collect and transcribe longitudinal spoken language sample data
elicited in multiple ways from diverse elderly adults; 2) to develop
algorithms for automatically extracting features from these spoken
language samples; and 3) to characterize the variability of feature
values across samples of the same individual; and the utility of
feature values and even feature variances for discriminating between
subject groups. A particular challenge being addressed by this research
is to achieve high-quality, efficient automatic annotation of discourse
structure for the spoken language samples. The resulting methods are
expected to directly contribute to important behavioral assessment
applications.
- New Methods to
assess social, cognitive, and physical function in older persons
[ Thomas Glass, PI
[Johns Hopkins University]; Zak
Shafran, Investigator; NIH]. The aim of this 4-year project is to
develop and test a new personal monitoring device to measure physical,
social and cognitive functioning in continuous time and in real life
settings. The proposed device, called the LIFEmeter, combines four
technologies: accelerometry (motion sensors), digital audio recording
(for capturing speech), automatic speech recognition (for fast
efficient analysis of speech), and location identification (to explore
environmental influences on function). This light-weight, compact, and
wearable device will be tested and validated in three phases of data
collection. We will also construct the first automatic speech
recognition (ASR) system designed to transcribe and analyze the natural
speech of older adults. New metrics and methods will be developed to
analyze complex time-embedded data on functioning. The proposed system
will overcome biases and limitations found in current self-report
techniques. Our team combines expertise from the MIT Media Lab (sensors
and wearable computing), the JHU Center for Language and Speech
Processing (ASR) and the Bloomberg School of Public Health, which is
uniquely able to deliver an innovative approach to measuring complex
function with potentially broad applicability.
The proposed research builds on a previous 5-year
cohort study (The Baltimore Memory Study, AG19604) of
community-dwelling older adults. Existing data from this study allow us
to compare and validate measures of function obtained from our new
device against a range of self-reported and clinically-measured
outcomes. We will also validate our instrument against the most widely
used accelerometer (called ActiGraph). Data gathered using our new
device will allow us to study of the impact of the built and social
environment on functioning with improved precision. A key goal will
also be to create and disseminate tools that allow other investigators
to adopt, refine and test this new approach.
- Spoken Language
Markers for Social Engagement
[ Zak Shafran, PI; Roybal
Pilot Grant]. Health, quality of life and treatment outcomes in older
adults have been shown to be influenced by their level of social
engagement in personal relationships and activities — both positive or
negative – with family members, peers, community members, local
institutions, and, at the broadest level, society. Because of the heavy
reliance on the cognitive and memory function of the subject, current
measures of social engagement, which are based on self-report, suffer
from inaccuracies. Further, they do not provide fine-level information
necessary to design intervention and treatments. While advances in
sensor technology is being exploited to augment self-report based
assessment of physical abilities of older adults, such advances have
not been realized in the assessment of social engagement due to
inherent difficulties in characterizing an individual’s network of
social support. This network is multi-faceted and is mediated through
several different types of communications, including emails, financial
transactions, and conversations with a wide range of persons, including
family members, friends, medical personnel, and business associates. Of
these types of communication, adult humans rely on conversations for
most social interactions. Using conversations as source of data
reflecting social engagement, advanced speech and language technology
now gives us the capability to characterize these interactions.
Our long-term goal is to design a computational
framework to measure social engagement that accounts for variations in
size, type and nature of an individual’s social network using
conversations as our data source.
Our research objective in this proposal is to create
algorithms that detect spoken language markers in an older adult’s
everyday conversations that are indicative of an individual’s social
engagement. The three specific aims of this proposal are:
1. to determine the feasibility and acceptability of collecting
conversational speech to assess social engagement of older adults,
2 . to design algorithms to detect spoken-language markers of social
engagement from conversations, and
3. to identify the spoken-language markers of social engagement.
- Making
Dysarthric Speech Intelligible
[ Jan van Santen,
PI; John-Paul Hosom,
Melanie
Fried-Oken,
co-PI]. This NSF-funded project [joint with at the Child Development
and Rehabilitation Center at
the Oregon Health & Science University] will develop new
algorithms that will enable dysarthric individuals to be more easily
understood. Currently available devices are essentially spectral
filters and amplifiers that enhance certain parts of the spectrum.
While these can help certain types of dysarthria, many dysarthric
persons suffer from speech problems that require forms of speech
modification that are much more profound and complex such as: irregular
sub-glottal pressure, resulting in loudness bursts that can be
difficult to adjust to; absence, or poor control, of voicing;
systematic mispronunciation of certain phoneme groups, resulting in
certain sounds becoming indistinguishable or unrecognizable; variable
mispronunciation; and poor prosody (pitch control, timing, and
loudness). For these difficult problems, new approaches are needed that
do not merely filter the speech signal but analyze it at acoustic,
articulatory, phonetic, and linguistic levels.
- Automatic spoken language analysis for detecting
cognitive impairment
[Brian Roark,
PI]. Clinical research into Alzheimer's disease (AD) and the mild
cognitive impairment (MCI) that precedes its full onset, is
increasingly focused on early diagnosis and treatment that can delay or
even prevent full onset of AD. Effective diagnosis requires
differentiating between changes in cognitive and linguistic abilities
that occur during normal aging and those that are due to impairment.
Both manual linguistic analyses of spoken language samples and orally
administered clinical exams are effective but costly methods for
discriminating between healthy and MCI subjects. For widespread testing
of the growing elderly population for markers of MCI, automation of
testing procedures will be required.
The objective of the NIH-Roybal-funded project will
be to develop statistical speech and language analysis techniques to
automatically extract features from spoken language samples recorded
during clinical examinations. Healthy and MCI elderly subjects of
on-going studies at the Layton Center of OHSU take full
neuropsychological examinations annually for life. We will request
their permission to record and analyze these sessions, which include
several tests of particular interest, including a delayed story recall
test and a picture description task. We will transcribe the words and
annotate syntactic structure for selected tests, and develop algorithms
for automatically deriving features from the spoken language samples.
These automatically-derived speech- and language-based features will
then be used to build classifiers for discriminating between healthy
and MCI subjects. In addition to test automation, the statistical
speech and language processing techniques will provide two benefits of
primary importance: inclusion of approximations to previously
researched manually-derived features; and the use of unexplored
features derived from statistical characteristics of the samples, such
as a number of entropy-based features.
- Voice Transformation
for Dysarthria - Phase I
[ Jan van Santen,
PI; Alexander Kain,
co-PI; NIH]. A large percentage of the more than 2.5 million
adult Americans with significant disability due to chronic neurological
impairment present with dysarthria or speech
impairment as one of their disabling conditions. There are no cures for
speech impairments. Dysarthric individuals report losses to employment,
educational opportunities, social integration, and quality of life.
Individuals are taught strategies that compensate for their
impairments, but the isolation caused by communication impairment is
pervasive. The project goal is to develop a system that uses a wearable
computer to transform speech compromised by dysarthria into
easier-to-understand and more natural-sounding speech, and will thereby
enable dysarthric individuals to communicate more effectively by
telephone or in face-to-face contexts.
Software will be developed in a collaborative
project with BioSpeech Inc.,
supported by the NIH,
that transforms speech compromised by dysarthria into
easier-to-understand and more natural- sounding speech. The software
will reside on laptop computers, with microphone input and amplified
speaker or line output. Such software and hardware solutions will
assist individuals with dysarthria to better communicate by voice,
whether face-to-face or by telephone; it will also help these
individuals when interacting with voice controlled services and
devices, which are increasingly more popular. The system operates in
"Interpreter Mode", meaning that output will take place after a brief
processing delay once the speaker has completed an utterance. The
software is based on a multi-step formant re-synthesis process: (i)
Robust extraction of formant, energy, spectral balance, and pitch
trajectories from input speech; (ii) Modification of extracted
trajectories by imposition of smoothness and shape based constraints,
and by bringing these trajectories in closer proximity to trajectories
of normal speech; (iii) Conversion of the trajectories into a speech
signal by formant synthesis. Results obtained with a prototype,
personal computer based system show that this process is robust,
enhances intelligibility, and completely eliminates "vocal fry", i.e.,
distortions caused by irregularities in the temporal pattern of the
vocal folds.
In Phase I, the core algorithms performing these
steps
will be improved and extended, and the software will be ported to a
pocketable computer; the resulting system will evaluated on multiple
speakers and listeners; and feedback will be obtained from potential
users and their partners about desired features, usability, and
functionality. In Phase II, acceptable processing delays will be
achieved using known methods for optimizing memory and processing
speed; further enhancement capabilities will be added, and the system
will be evaluated. The currently targeted product will be the first in
a family of speech enhancement products with continually expanding
functionality, by capitalizing on ongoing algorithmic and hardware
improvements. Usage of standard hardware and software platforms, that
in turn are compatible with a wide range of headsets and wearable
amplified speakers or telephones, puts this software in a strong
competitive position.
- User Adaptation of AAC Device Voices - Phase I
[Jan van Santen,
PI; Esther Klabbers,
co-PI, NIH; joint with BioSpeech
Inc.]. A
wide range of individuals
cannot communicate by voice.
Voice enabled Augmentative and Alternative Communication (AAC) devices,
also known as Speecg Generating Devices (SGD's)
are often the only channel available by which these individuals can
communicate. While many voice enabled AAC devices are currently
available, they lack the important ability to generate customized
speech that mimics aspects of the user's past or intermittently
available speech. Modern "concatenative" speech synthesis technology
can mimic a given speaker's voice, by excising speech fragments from a
recorded speech data base ("acoustic inventory") and recombining these
into output speech using sophisticated algorithms. It requires,
however, a large amount of recordings and a high degree of consistency
of pronunciation of the speaker. Many AAC users cannot meet these
requirements because they already have lost the capability to speak or
they cannot speak with adequate consistency of pronunciation. A new
type of technology, voice transformation (VT) technology, is available
that can transform speech spoken by a "source" speaker into speech that
is perceived as spoken by a specific "target" speaker. To tune the
transformation system, parallel "training recordings" of the same text
are needed from the source and target speakers. The amount of training
recordings is far less than what is needed for a high-quality acoustic
inventory.
In this joint project with BioSpeech Inc., supported by the NIH,
we propose to use VT in combination with speech synthesis to convert
the synthesis system's acoustic inventory into an acoustic inventory
that mimics the target speaker's voice. The training recordings can
consist of old home videos, or fragmented recordings produced during
periods of intact speech, provided that they contain at least one
sample of each phoneme. In Phase I, we will develop and evaluate a VT
based synthesis system. The project will use high- quality and
home-video quality recordings from male and female adults and children
to create limited acoustic inventories (adequate to generate a specific
set of test sentences) and VT training recordings. Perceptual
experiments will be conducted to evaluate voice quality and perceived
speaker identity. Phase II will focus on developing complete acoustic
inventories for several canonical speakers that will be selected to
cover a range of speaker characteristics, and on producing portable,
user-friendly software.
- Automated Test of Word Recognition - Phase II
[Robert
Margolis,
University of Minnesota, PI; John-Paul Hosom,
Investigator]. Over 5 million word
recognition tests are administered annually by audiologists in the
United States with an associated cost of more than $100 million. These
tests are currently performed manually by highly trained audiologists.
This NIH-funded project describes the Phase II development of automated
clinical speech recognition tests using clinical test recordings and an
automated speech recognition system to score the subjects' responses. A
method for automatically interpreting the test scores will also be
evaluated. The objectives are to increase the accuracy and efficiency
of these clinical tests, substantially reduce the cost, and provide an
objective, automatic, evidence-based method for interpreting the
results. The automated speech recognition test in combination with the
automated pure tone audiogram (currently an STTR Phase II project) will
perform diagnostic testing of a majority of audiology patients, freeing
the audiologists' time for activities that require their training and
skill. Contemporary changes in training and reimbursement patterns
create a high demand for automated clinical procedures. The automated
procedures are implemented on existing commercial audiometers with a
personal computer that controls the audiometer delivery and routing of
stimuli. Phase I results were obtained with automatic speech
recognizers that were trained on a limited number of subjects (n=9).
Estimates of the agreement between human and machine scoring ranged
from 82-93%. Additional refinements with benefits that are predictable
from prior experience will increase recognizer performance to a level
that equals or exceeds human-human agreement and provide the basis for
efficient and accurate clinical tests. In Phase II, an automatic speech
recognition threshold test will be compared to the manual method used
in routine clinical practice. Two different recognizer scoring
strategies will be developed, one that requires more test time but is
independent of individual speaker differences and is easily adaptable
to other languages, and one that requires less time but may not be
applicable to all patients. A pilot study will test the method on a
Spanish-language speech-recognition test.
- Speech Supplemented Word Prediction Program -
Phase II
[ Thomas Jakobs, InvoTek,
PI; John-Paul Hosom,
Investigator]. Commercial speech recognition software offers many
people
with physical limitations an important computer access method. While
this access method is reasonably reliable for people with typical
speech, people with motor speech disorders (dysarthria) are presently
not able to use this technology reliably. The purpose of this
NIH-funded research is to provide these people with a unique
assistive-device access method that utilizes their speech. We will
accomplish this by continuing to develop a Speech Supplemented Word
Prediction Program (SSWPP) that enables people with dysarthria to use
their speech capabilities to interact with personal computers, with an
emphasis on assisted writing. The central element of the SSWPP is
custom speech-recognition software used in conjunction with word
prediction. The feasibility results for the SSWPP developed during
Phase 1 are exciting. The average keystroke savings achieved by people
with dysarthria on typical sentences was 68%. Commercially available
word prediction programs achieved no better than 47% keystroke savings
on the same text. Phase 2 design activities include improving the
speech recognition engine, developing an optimized microphone
interface, integrating the SSWPP into Microsoft Word, and developing a
speech-to-text display for use in face-to-face communication. People
with disability will evaluate the new SSWPP. The Speech Supplemented
Word Prediction Program is a tool for people with disability, who also
have difficult to understand speech. This tool enables these people to
use their speech to reduce the amount of work required to enter text
into a computer and to communicate verbally more effectively.
- Automated voice-based cognitive assessment and
spoken language-based markers for neurodegenerative diseases and Alzheimer's Disease Cooperative Study:
Home-Based Assessments
This
project ( Tamara Hayes,
PI; John-Paul Hosom,
Investigator),
funded under a new program of Intel's Digital Health Group called the Behavioral
Assessment and Intervention Commons, is aimed at initiating and
accelerating research into behavioral markers of disease, such as
changes in walking, speech and performance on computer games, that
eventually translate into health-related products and services. CSLU is
developing voice enabled automated assessment "kiosk" based versions of
standard neurocognitive tasks (e.g., digit span) and speech and
language based markers for neurodegenerative diseases. Kiosk
development is also supported by the Alzheimer's Disease
Cooperative Study (ADCS; NIH) program ( Jeff
Kaye, OHSU PI; John-Paul
Hosom, Investigator). The software architecture is designed and
implemented by Senior Programmer Jacques de Villiers.
|