Hello and welcome
to This is My Architecture.
My name is Walid 
and today I am with
Martin from Sanofi
and Arnaud from Teamwork.
Hello Martin, tell us
about your role at Sanofi.
Hello Walid, I work 
as a Solutions Architect
in the ITS CMO department,
Chief Medical Officer,
on the Real World Evidence
part of the Darwin ecosystem.
Hello Arnaud!
Hello, I'm a Data Architect
in the Teamwork company.
And we are AWS partners.
Martin, tell us about your business need
and why this solution.
As part of the Darwin ecosystem,
and working on real world data,
Real World Evidence data,
we needed our data scientists.
About 30% to 40%
of their analysis was needed,
to be done 
on more or less big data,
plus smart dataset,
with environments or tools rather centered
around R studio, Sage Maker.
There is also a need for NLP translation
services for their analysis as well.
So, you need an ecosystem
for data scientists on AWS.
Yes, it's a secure system.
We could implement data traceability 
at the same time.
All right.
Arnaud, can you give us
a little more detail
on the technical side?
Yes absolutely, we have built
the solution entirely on AWS,
with all the services available to us,
and we have developed a static website
via the S3's Static website functionality.
And to reduce costs, we used
a Cloudfront distribution,
which also holds the certificate.
We authenticated our users
with Cognito and Sanofi SSO.
With this static part,
we also have a dynamic part
with data traceability,
which we implemented
with API Gateway and DynamoDB.
So, there is a web portal
from which users have access.
A static part with an authentication
via Identity Federation on Cognito
and identities on your Sanofi SSO,
and a dynamic part
with the use of API Gateway.
What is the use of DynamoDB here?
DynamoDB is used
to contain user meta-data,
but also all the actions that are performed
on the Data Analytics portal.
So, there is a strong traceability
of users on DynamoDB.
Absolutely!
All right.
So, this is the portal section.
What exactly do users
have access to afterwards?
Via this portal, users have
the possibility to acquire products
that we developed with Martin.
We have mainly two products,
which are SageMaker and R Studio.
And we have developed
these products using Service Catalog.
Indeed, Service Catalog allows us
to design specific products
via Cloud Formation.
Thus, SageMaker Notebooks 
are encrypted
using a 100 GB disk to store data.
Each user has
his own SageMaker Notebook
or R Studio device,
which allows him
to access only his set of data
to which he needs to have access.
Excellent.
So, what I understand here
is that you use Service Catalog
to have a portfolio of products
to offer to your users directly
from the portal.
On Service Catalog, you will
of course use Cloud Formation,
which allows you to repeat
the deployment to these users,
with specific environments,
with an emphasis on the security part
and partitioning of users.
Absolutely.
So, what I see here is
the Service Catalog part.
Over here I see other Machine
Learning services on AWS.
What are they for
and how do you use them on your portal?
Through the portal we developed,
we used two services.
The first one is Translate, because there was
a need for mass translation
of raw files.
For this, we used the SDK
directly via Lambdas
to translate from a language
automatically detected
via Comprehend to a target language
chosen by the user.
The second possibility
is also Comprehend Medical,
as we have data identification needs
within the raw files.
For this, we have also developed Lambdas
that use the Comprehend SDK
to identify this metadata.
Excellent! So here,
on this case, you use services
offered by AWS, which are pre-trained
by AWS directly for your data scientists
with Translate and Comprehend Medical.
Excellent, and here we see a Data Lake!
What is it used for?
And how do you integrate it into your platform?
Today, the big criticality
of this platform
is that the data is very sensitive.
Thus, the data has
to remain in the environment.
We allow the user to extract data
from our Data Lake,
in their home user, in the S3 bucket.
It is then processed
by the different systems,
whether it is R studio, SageMaker,
Translate or Comprehend NLP.
Well, the idea is that the data
that is used in our Darwin environment,
our current ecosystem,
must remain in the environment
but not extractable, cannot be downloaded
into workstations or anything else.
It's segregated. We use
the Data Lake part and the environment
we've implemented for that purpose.
Alright.
Thus, Data Lake is for data,
but there's an extra emphasis
on data segregation,
data encryption,
data control and auditing.
Yes, today we use DynamoDB,
as Arnaud explained,
for the traceability part of each data
that is used in the environment there too,
and each user has the right
to his own home user,
where he can put his own data
and share it with other users,
because each data is
only accessible by one user
according to predefined rights.
That's great.
And what is the feedback
from users with your portal?
We've been in production
since the beginning of August,
I'd say, Arnaud.
- Exactly.
We were targeting
5 to 10 users, beta testers,
and after 2 to 3 weeks,
we quickly went up to 30-40 users.
And we're still going up.
So today, with Teamwork,
we're going to set up other services
also based on analytical tools
or Data Lake parts,
a bit like LakeFormation,
but it's very, very positive today.
Thank you, Martin. Thank you, Arnaud.
Thank you, Walid.
Thank you for watching
This My Architecture.
